Step-by-Step Guide for Benchmark

Step-by-Step Guide for Benchmark

For any cloud vector database, the testing process follows the flowchart below:

Below are the specific testing processes for each cloud vector database.

MyScale

Step 1. Create Cluster

Go to the MyScale official website and create a cluster. In the cluster console, record the cluster connection information: host, port, username, and password.

Step 2. Modify the configuration

We have provided two configuration files for testing MyScale:

myscale_cloud_mstg_laion-768-5m-ip.json
myscale_cloud_mstg_arxiv-titles-384-angular.json

You need to write the cluster connection information obtained in Step 1 into the configuration files. To modify the configuration files for testing, open each file and locate the connection_params section. Update the values for host, port, user, and password with the appropriate cluster connection information obtained in Step 1. Finally, move the modified configuration file into the experiments/configurations directory. Here is an example of how the modified section may look:

"connection_params": {
  "host": "your_host.aws.dev.myscale.cloud",
  "port": 8443,
  "http_type": "http",
  "user": "your_username",
  "password": "your_password"
},

Step 3. Run the tests

python3 run.py --engines *myscale*

Step 4. View the test results

cd results
grep -E 'rps|mean_precision' $(ls -t)

Pinecone

Step 1. Create Cluster

Register with Pinecone and obtain the cluster connection information for Environment and Value.

Step 2. Modify the configuration

We have provided two configuration files for testing Pinecone:

pinecone_cloud_s1_laion-768-5m-ip.json
pinecone_cloud_s1_arxiv-titles-384-angular.json

You need to write the cluster connection information obtained in Step 1 into the configuration files. Modify the connection_params section of the files and update the values for environment and api_key. Finally, move the modified configuration file into the experiments/configurations directory. Here is an example of how the modified section may look:

"connection_params": {
  "api-key": "your_api_key",
  "environment": "your_environment"
},

Step 3. Run the tests

python3 run.py --engines *pinecone*

Step 4. View the test results

cd results
grep -E 'rps|mean_precision' $(ls -t)

Zilliz

Step 1. Create Cluster

You need to find the cluster connection information, including end_point, user, and password, in the Zilliz Cloud console. The user and password are the credentials you specified when creating the cluster.

Step 2. Modify the configuration

We have provided two configuration files for testing Zilliz:

zilliz_cloud_1cu_storage_optimized_laion-768-5m-ip.json
zilliz_cloud_1cu_storage_optimized_arxiv-titles-384-angular.json

You need to write the cluster connection information obtained in Step 1 into the configuration files. To modify the configuration files for testing, open each file and locate the connection_params section. Update the values for end_point, cloud_user, and cloud_password with the appropriate cluster connection information obtained in Step 1. Finally, move the modified configuration file into the experiments/configurations directory.

Here is an example of how the modified section may look:

"connection_params": {
  "cloud_mode": true,
  "host": "127.0.0.1",
  "port": 19530,
  "user": "",
  "password": "",
  "end_point": "https://your_host.zillizcloud.com:19538",
  "cloud_user": "your_user",
  "cloud_password": "your_password",
  "cloud_secure": true
},

Step 3. Run the tests

python3 run.py --engines *zilliz*

Step 4. View the test results

cd results
grep -E 'rps|mean_precision' $(ls -t)

Weaviate Cloud

Step 1. Create Cluster

Register with Weaviate Cloud and create a cluster. Record the cluster connection information: cluster URL and Authentication.

Step 2. Modify the configuration

We have provided two configuration files for testing Weaviate Cloud:

weaviate_cloud_standard_laion-768-5m-ip.json
weaviate_cloud_standard_arxiv-titles-384-angular.json

You need to write the cluster connection information obtained in Step 1 into the configuration files. Modify the connection_params section of the files and update the values for host and api_key. The host corresponds to the cluster URL, and the api_key is the Authentication. Finally, move the modified configuration file into the experiments/configurations directory.

Here is an example of how the modified section may look:

"connection_params": {
  "host": "https://your_host.weaviate.cloud",
  "port": 8090,
  "timeout_config": 2000,
  "api_key": "your_api_key"
},

Step 3. Run the tests

python3 run.py --engines *weaviate*

Step 4. View the test results

cd results
grep -E 'rps|mean_precision' $(ls -t)

Qdrant

Step 1. Create Cluster

Register with Qdrant Cloud and create a cluster. Record the cluster connection information: URL and API key.

Step 2. Modify the configuration

We have provided three configuration files for testing Qdrant:

qdrant_cloud_hnsw_2c16g_storage_optimized_laion-768-5m-ip.json
qdrant_cloud_quantization_2c16g_storage_optimized_laion-768-5m-ip.json
qdrant_cloud_hnsw_2c16g_storage_optimized_arxiv-titles-384-angular.json

You need to write the cluster connection information obtained in Step 1 into the configuration files. Modify the connection_params section of the files and update the values for host and api_key. Please note that for the connection_params section, you need to remove the port from the end of the host string. Finally, move the modified configuration file into the experiments/configurations directory. Here is an example of how the modified section may look:

"connection_params": {
  "host": "https://your_host.aws.cloud.qdrant.io",
  "port": 6333,
  "grpc_port": 6334,
  "prefer_grpc": false,
  "api_key": "your_api_key"
},

Step 3. Run the tests

python3 run.py --engines *qdrant*

Step 4. View the test results

cd results
grep -E 'rps|mean_precision' $(ls -t)

OpenSearch

Step 1. Create Cluster

Create an OpenSearch domain in your AWS console. When filling in the Fine-Grained Access Control information, select "Set IAM ARN as Master User" for the Master User and enter your ARN information. Record the cluster domain endpoint (without https://).

Step 2. Modify the configuration

We have provided two configuration files for testing Qdrant:

opensearch_hnsw_laion-768-5m-ip.json
opensearch_hnsw_laion-768-5m-probability-ip.json

You need to write the cluster connection information obtained in Step 1 into the configuration files. Modify the connection_params section of the files and update the values for host, aws_access_key_id and aws_secret_access_key. Finally, move the modified configuration file into the experiments/configurations directory. Here is an example of how the modified section may look:

"connection_params": {
  "host": "your opensearch cluster domain endpoint",
  "port": 443,
  "user": "elastic",
  "password": "passwd",
  "aws_access_key_id": "your aws access key id",
  "aws_secret_access_key": "your aws secret access key",
  "region": "us-east-2",
  "service": "es"
},

Step 3. Run the tests

python3 run.py --engines *opensearch*

Step 4. View the test results

cd results
grep -E 'rps|mean_precision' $(ls -t)

Postgres (Pgvector & Pgvecto.rs)

Step 1. Create Server

For deploying Postgres with pgvector (a Postgres plugin written in C), you can use Docker. Below is an example docker-compose.yaml configuration:

version: '3'

services:
  pgvector:
    image: ankane/pgvector:latest
    container_name: pgvector
    environment:
      - POSTGRES_USER=root
      - POSTGRES_PASSWORD=123456
    ports:
      - "5432:5432"

Similarly, for Postgres with pgvecto.rs (a Postgres plugin written in Rust), Docker can also be utilized. Here's a corresponding docker-compose.yaml example:

version: '3'

services:
  pgvector:
    image: tensorchord/pgvecto-rs:latest
    container_name: pgvector
    environment:
      - POSTGRES_USER=root
      - POSTGRES_PASSWORD=123456
    ports:
      - "5432:5432"

Step 2. Modify the configuration

We have provided four configuration files for testing.

For pgvector:

pgvector_c_HNSW_single_node_laion-768-5m-ip.json
pgvector_c_HNSW_single_node_laion-768-5m-probability-ip.json

For pgvecto.rs:

pgvector_rust_HNSW_single_node_laion-768-5m-ip.json
pgvector_rust_HNSW_single_node_laion-768-5m-probability-ip.json

After deploying your own Postgres service, you need to modify the connection_params fields in the configuration file. Additionally, you can append custom search_params for testing purposes. You can also alter upload_params to modify the parameters for index creation.

"connection_params": {
    "host": "127.0.0.1",
    "port": 5432,
    "user": "root",
    "password": "123456"
}

Step 3. Run the tests

python3 run.py --engines *pgvector*

Step 4. View the test results

cd results
grep -E 'rps|mean_precision' $(ls -t)

Files

step-by-step-guide-for-benchmark.md

Latest commit

History

step-by-step-guide-for-benchmark.md

File metadata and controls

Step-by-Step Guide for Benchmark

MyScale

Step 1. Create Cluster

Step 2. Modify the configuration

Step 3. Run the tests

Step 4. View the test results

Pinecone

Step 1. Create Cluster

Step 2. Modify the configuration

Step 3. Run the tests

Step 4. View the test results

Zilliz

Step 1. Create Cluster

Step 2. Modify the configuration

Step 3. Run the tests

Step 4. View the test results

Weaviate Cloud

Step 1. Create Cluster

Step 2. Modify the configuration

Step 3. Run the tests

Step 4. View the test results

Qdrant

Step 1. Create Cluster

Step 2. Modify the configuration

Step 3. Run the tests

Step 4. View the test results

OpenSearch

Step 1. Create Cluster

Step 2. Modify the configuration

Step 3. Run the tests

Step 4. View the test results

Postgres (Pgvector & Pgvecto.rs)

Step 1. Create Server

Step 2. Modify the configuration

Step 3. Run the tests

Step 4. View the test results