Update README.md

AllenNeuralDynamics · Oct 16, 2024 · a36b5ca · a36b5ca
1 parent 6930e9f
commit a36b5ca
Showing 1 changed file with 1 addition and 1 deletion.
diff --git a/README.md b/README.md
@@ -45,7 +45,7 @@ The current chat bot model uses Anthropic's Claude Sonnet 3 hosted on AWS' Bedro
 
 ### Vector Embeddings
 
-To improve retrieval accuracy and decrease hallucinations, we use vector embeddings to access relevant chunks of information found across the database. This process starts with accessing assets, and chunking each json file to chunks of 1000 tokens -- each chunk preserves the hierarchy found in json files. These chunks are converted to vector arrays of size 1024, through an embedding model (Amazon's Titan 2.0 Embedding). The user's query is converted to a vector and projected onto the latent space. The chunks that contain the most relevant information will be accessed through a cosine similarity search.
+To improve retrieval accuracy and decrease hallucinations, we use vector embeddings to access relevant chunks of information found across the database. This process starts with accessing assets, and chunking each json file to chunks of around 8000 tokens (10 chunks per file)-- each chunk preserves the hierarchy found in json files. These chunks are converted to vector arrays of size 1024, through an embedding model (Amazon's Titan 2.0 Embedding). The user's query is converted to a vector and projected onto the latent space. The chunks that contain the most relevant information will be accessed through a cosine similarity search.
 
 ### AIND-data-schema-access REST API