- Step-by-Step Guide for Benchmark
For any cloud vector database, the testing process follows the flowchart below:
Below are the specific testing processes for each cloud vector database.
Go to the MyScale official website and create a cluster. In the cluster console, record the cluster connection information: host
, port
, username
, and password
.
We have provided two configuration files for testing MyScale:
You need to write the cluster connection information obtained in Step 1 into the configuration files. To modify the configuration files for testing, open each file and locate the connection_params
section. Update the values for host
, port
, user
, and password
with the appropriate cluster connection information obtained in Step 1. Finally, move the modified configuration file into the experiments/configurations
directory.
Here is an example of how the modified section may look:
"connection_params": {
"host": "your_host.aws.dev.myscale.cloud",
"port": 8443,
"http_type": "http",
"user": "your_username",
"password": "your_password"
},
python3 run.py --engines *myscale*
cd results
grep -E 'rps|mean_precision' $(ls -t)
Register with Pinecone and obtain the cluster connection information for
Environment
and Value
.
We have provided two configuration files for testing Pinecone:
You need to write the cluster connection information obtained in Step 1 into the configuration files. Modify the connection_params
section of the files and update the values for environment
and api_key
. Finally, move the modified configuration file into the experiments/configurations
directory. Here is an example of how the modified section may look:
"connection_params": {
"api-key": "your_api_key",
"environment": "your_environment"
},
python3 run.py --engines *pinecone*
cd results
grep -E 'rps|mean_precision' $(ls -t)
You need to find the cluster connection information, including end_point
, user
, and password
,
in the Zilliz Cloud console.
The user
and password
are the credentials you specified when creating the cluster.
We have provided two configuration files for testing Zilliz:
- zilliz_cloud_1cu_storage_optimized_laion-768-5m-ip.json
- zilliz_cloud_1cu_storage_optimized_arxiv-titles-384-angular.json
You need to write the cluster connection information obtained in Step 1 into the configuration files.
To modify the configuration files for testing, open each file and locate the connection_params
section.
Update the values for end_point
, cloud_user
, and cloud_password
with the appropriate cluster connection information obtained in Step 1.
Finally, move the modified configuration file into the experiments/configurations
directory.
Here is an example of how the modified section may look:
"connection_params": {
"cloud_mode": true,
"host": "127.0.0.1",
"port": 19530,
"user": "",
"password": "",
"end_point": "https://your_host.zillizcloud.com:19538",
"cloud_user": "your_user",
"cloud_password": "your_password",
"cloud_secure": true
},
python3 run.py --engines *zilliz*
cd results
grep -E 'rps|mean_precision' $(ls -t)
Register with Weaviate Cloud and create a cluster.
Record the cluster connection information: cluster URL
and Authentication
.
We have provided two configuration files for testing Weaviate Cloud:
You need to write the cluster connection information obtained in Step 1 into the configuration files.
Modify the connection_params
section of the files and update the values for host
and api_key
.
The host
corresponds to the cluster URL
, and the api_key
is the Authentication
.
Finally, move the modified configuration file into the experiments/configurations
directory.
Here is an example of how the modified section may look:
"connection_params": {
"host": "https://your_host.weaviate.cloud",
"port": 8090,
"timeout_config": 2000,
"api_key": "your_api_key"
},
python3 run.py --engines *weaviate*
cd results
grep -E 'rps|mean_precision' $(ls -t)
Register with Qdrant Cloud and create a cluster.
Record the cluster connection information: URL
and API key
.
We have provided three configuration files for testing Qdrant:
- qdrant_cloud_hnsw_2c16g_storage_optimized_laion-768-5m-ip.json
- qdrant_cloud_quantization_2c16g_storage_optimized_laion-768-5m-ip.json
- qdrant_cloud_hnsw_2c16g_storage_optimized_arxiv-titles-384-angular.json
You need to write the cluster connection information obtained in Step 1 into the configuration files.
Modify the connection_params
section of the files and update the values for host
and api_key
.
Please note that for the connection_params
section, you need to remove the port
from the end of the host
string.
Finally, move the modified configuration file into the experiments/configurations
directory.
Here is an example of how the modified section may look:
"connection_params": {
"host": "https://your_host.aws.cloud.qdrant.io",
"port": 6333,
"grpc_port": 6334,
"prefer_grpc": false,
"api_key": "your_api_key"
},
python3 run.py --engines *qdrant*
cd results
grep -E 'rps|mean_precision' $(ls -t)
Create an OpenSearch domain in your AWS console.
When filling in the Fine-Grained Access Control information, select "Set IAM ARN as Master User" for the Master User and enter your ARN information.
Record the cluster domain endpoint (without https://
).
We have provided two configuration files for testing Qdrant:
You need to write the cluster connection information obtained in Step 1 into the configuration files.
Modify the connection_params
section of the files and update the values for host
, aws_access_key_id
and aws_secret_access_key
.
Finally, move the modified configuration file into the experiments/configurations
directory.
Here is an example of how the modified section may look:
"connection_params": {
"host": "your opensearch cluster domain endpoint",
"port": 443,
"user": "elastic",
"password": "passwd",
"aws_access_key_id": "your aws access key id",
"aws_secret_access_key": "your aws secret access key",
"region": "us-east-2",
"service": "es"
},
python3 run.py --engines *opensearch*
cd results
grep -E 'rps|mean_precision' $(ls -t)
For deploying Postgres with pgvector (a Postgres plugin written in C), you can use Docker. Below is an example docker-compose.yaml
configuration:
version: '3'
services:
pgvector:
image: ankane/pgvector:latest
container_name: pgvector
environment:
- POSTGRES_USER=root
- POSTGRES_PASSWORD=123456
ports:
- "5432:5432"
Similarly, for Postgres with pgvecto.rs (a Postgres plugin written in Rust), Docker can also be utilized. Here's a corresponding docker-compose.yaml
example:
version: '3'
services:
pgvector:
image: tensorchord/pgvecto-rs:latest
container_name: pgvector
environment:
- POSTGRES_USER=root
- POSTGRES_PASSWORD=123456
ports:
- "5432:5432"
We have provided four configuration files for testing.
For pgvector:
- pgvector_c_HNSW_single_node_laion-768-5m-ip.json
- pgvector_c_HNSW_single_node_laion-768-5m-probability-ip.json
For pgvecto.rs:
- pgvector_rust_HNSW_single_node_laion-768-5m-ip.json
- pgvector_rust_HNSW_single_node_laion-768-5m-probability-ip.json
After deploying your own Postgres service, you need to modify the connection_params
fields in the configuration file.
Additionally, you can append custom search_params
for testing purposes.
You can also alter upload_params
to modify the parameters for index creation.
"connection_params": {
"host": "127.0.0.1",
"port": 5432,
"user": "root",
"password": "123456"
}
python3 run.py --engines *pgvector*
cd results
grep -E 'rps|mean_precision' $(ls -t)