You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have searched the existing issues and this bug is not already filed.
My model is hosted on OpenAI or Azure. If not, please look at the "model providers" issue and don't file a new one here.
I believe this is a legitimate bug, not just a question. If this is a question, please use the Discussions area.
Describe the issue
Issue: When running a local query in GraphRAG, I got an error related to the nomic_embed_text model. The error message indicates that the model is not found, even though it appears in the list of available models. Here are the details:
python -m graphrag.query --root ./ragtest --method local "Who are the main demons Krishna defeated during his childhood?"
Error embedding chunk {'OpenAIEmbedding': 'Error code: 404 - {"error": {"message": "model 'nomic_embed_text' not found, try pulling it first", "type": "api_error", "param": null, "code": null}}'}
ZeroDivisionError: Weights sum to zero, can't be normalized
Steps to reproduce
The nomic_embed_text model is available and listed when I run the command ollama list.
I verified the embedding model API with a curl command, which successfully returned embeddings:
curl -X POST http://localhost:11434/v1/embeddings -H "Content-Type: application/json" -d '{"model": "nomic_embed_text", "input": "Test embedding generation with nomic model"}'
3.I've verified that the embedding API is correctly set in settings.yaml
embeddings:
llm:
model: nomic_embed_text
api_base: http://localhost:11434/v1
Global queries work fine, and embedding generation is successful in the global method.
GraphRAG Config Used
# Paste your config hereencoding_model: cl100k_baseskip_workflows: []llm:
#api_key: ${GRAPHRAG_API_KEY}type: openai_chat # or azure_openai_chat#model: gpt-4-turbo-previewmodel: mistralmodel_supports_json: true # recommended if this is available for your model.#max_tokens: 4000# request_timeout: 180.0api_base: http://localhost:11434/v1 #https://<instance>.openai.azure.com# api_version: 2024-02-15-preview# organization: <organization_id># deployment_name: <azure_model_deployment_name># tokens_per_minute: 150_000 # set a leaky bucket throttle# requests_per_minute: 10_000 # set a leaky bucket throttle#max_retries: 1# max_retry_wait: 10.0# sleep_on_rate_limit_recommendation: true # whether to sleep when azure suggests wait-times#concurrent_requests: 1 # the number of parallel inflight requests that may be made# temperature: 0 # temperature for sampling# top_p: 1 # top-p sampling# n: 1 # Number of completions to generateparallelization:
stagger: 0.3# num_threads: 50 # the number of threads to use for parallel processingasync_mode: threaded # or asyncioembeddings:
## parallelization: override the global parallelization settings for embeddingsasync_mode: threaded # or asyncio# target: required # or all# batch_size: 16 # the number of documents to send in a single request# batch_max_tokens: 8191 # the maximum number of tokens to send in a single requestllm:
api_key: ${GRAPHRAG_API_KEY}type: openai_embedding # or azure_openai_embedding#model: text-embedding-3-smallmodel: nomic_embed_textapi_base: http://localhost:11434/v1 #https://<instance>.openai.azure.com# api_version: 2024-02-15-preview# organization: <organization_id># deployment_name: <azure_model_deployment_name># tokens_per_minute: 150_000 # set a leaky bucket throttle# requests_per_minute: 10_000 # set a leaky bucket throttle# max_retries: 10# max_retry_wait: 10.0# sleep_on_rate_limit_recommendation: true # whether to sleep when azure suggests wait-times# concurrent_requests: 25 # the number of parallel inflight requests that may be madechunks:
size: 1200overlap: 100group_by_columns: [id] # by default, we don't allow chunks to cross documentsinput:
type: file # or blobfile_type: text # or csvbase_dir: "input"file_encoding: utf-8file_pattern: ".*\\.txt$"cache:
type: file # or blobbase_dir: "cache"# connection_string: <azure_blob_storage_connection_string># container_name: <azure_blob_storage_container_name>storage:
type: file # or blobbase_dir: "output/${timestamp}/artifacts"# connection_string: <azure_blob_storage_connection_string># container_name: <azure_blob_storage_container_name>reporting:
type: file # or console, blobbase_dir: "output/${timestamp}/reports"# connection_string: <azure_blob_storage_connection_string># container_name: <azure_blob_storage_container_name>entity_extraction:
## strategy: fully override the entity extraction strategy.## type: one of graph_intelligence, graph_intelligence_json and nltk## llm: override the global llm settings for this task## parallelization: override the global parallelization settings for this task## async_mode: override the global async_mode settings for this taskprompt: "prompts/entity_extraction.txt"entity_types: [organization,person,geo,event]max_gleanings: 1summarize_descriptions:
## llm: override the global llm settings for this task## parallelization: override the global parallelization settings for this task## async_mode: override the global async_mode settings for this taskprompt: "prompts/summarize_descriptions.txt"max_length: 500claim_extraction:
## llm: override the global llm settings for this task## parallelization: override the global parallelization settings for this task## async_mode: override the global async_mode settings for this task# enabled: trueprompt: "prompts/claim_extraction.txt"description: "Any claims or facts that could be relevant to information discovery."max_gleanings: 1community_reports:
## llm: override the global llm settings for this task## parallelization: override the global parallelization settings for this task## async_mode: override the global async_mode settings for this taskprompt: "prompts/community_report.txt"max_length: 2000max_input_length: 800058,12 56%cluster_graph:
max_cluster_size: 10embed_graph:
enabled: false # if true, will generate node2vec embeddings for nodes# num_walks: 10# walk_length: 40# window_size: 2# iterations: 3# random_seed: 597832umap:
enabled: false # if true, will generate UMAP embeddings for nodessnapshots:
graphml: falseraw_entities: falsetop_level_nodes: falselocal_search:
# text_unit_prop: 0.5# community_prop: 0.1# conversation_history_max_turns: 5# top_k_mapped_entities: 10# top_k_relationships: 10# llm_temperature: 0 # temperature for sampling# llm_top_p: 1 # top-p sampling# llm_n: 1 # Number of completions to generate# max_tokens: 12000global_search:
# llm_temperature: 0 # temperature for sampling# llm_top_p: 1 # top-p sampling# llm_n: 1 # Number of completions to generate# max_tokens: 12000# data_max_tokens: 12000# map_max_tokens: 1000# reduce_max_tokens: 2000# concurrency: 32
Logs and screenshots
No response
Additional Information
GraphRAG Version:
Operating System: Linux
Python Version: 3.10
Related Issues:
The text was updated successfully, but these errors were encountered:
dipakmeher
added
the
triage
Default label assignment, indicates new issue needs reviewed by a maintainer
label
Sep 30, 2024
Do you need to file an issue?
Describe the issue
Issue: When running a local query in GraphRAG, I got an error related to the nomic_embed_text model. The error message indicates that the model is not found, even though it appears in the list of available models. Here are the details:
python -m graphrag.query --root ./ragtest --method local "Who are the main demons Krishna defeated during his childhood?"
Error embedding chunk {'OpenAIEmbedding': 'Error code: 404 - {"error": {"message": "model 'nomic_embed_text' not found, try pulling it first", "type": "api_error", "param": null, "code": null}}'}
ZeroDivisionError: Weights sum to zero, can't be normalized
Steps to reproduce
The nomic_embed_text model is available and listed when I run the command ollama list.
I verified the embedding model API with a curl command, which successfully returned embeddings:
curl -X POST http://localhost:11434/v1/embeddings -H "Content-Type: application/json" -d '{"model": "nomic_embed_text", "input": "Test embedding generation with nomic model"}'
3.I've verified that the embedding API is correctly set in settings.yaml
embeddings:
llm:
model: nomic_embed_text
api_base: http://localhost:11434/v1
Global queries work fine, and embedding generation is successful in the global method.
GraphRAG Config Used
Logs and screenshots
No response
Additional Information
The text was updated successfully, but these errors were encountered: