[Issue]: <title> Error: Model 'nomic_embed_text' Not Found During Local Embedding (--method local ) in GraphRAG Query #1234

dipakmeher · 2024-09-30T21:27:06Z

Do you need to file an issue?

I have searched the existing issues and this bug is not already filed.
My model is hosted on OpenAI or Azure. If not, please look at the "model providers" issue and don't file a new one here.
I believe this is a legitimate bug, not just a question. If this is a question, please use the Discussions area.

Describe the issue

Issue: When running a local query in GraphRAG, I got an error related to the nomic_embed_text model. The error message indicates that the model is not found, even though it appears in the list of available models. Here are the details:

python -m graphrag.query --root ./ragtest --method local "Who are the main demons Krishna defeated during his childhood?"

Error embedding chunk {'OpenAIEmbedding': 'Error code: 404 - {"error": {"message": "model 'nomic_embed_text' not found, try pulling it first", "type": "api_error", "param": null, "code": null}}'}
ZeroDivisionError: Weights sum to zero, can't be normalized

Steps to reproduce

The nomic_embed_text model is available and listed when I run the command ollama list.
I verified the embedding model API with a curl command, which successfully returned embeddings:
curl -X POST http://localhost:11434/v1/embeddings -H "Content-Type: application/json" -d '{"model": "nomic_embed_text", "input": "Test embedding generation with nomic model"}'
3.I've verified that the embedding API is correctly set in settings.yaml
embeddings:
llm:
model: nomic_embed_text
api_base: http://localhost:11434/v1
Global queries work fine, and embedding generation is successful in the global method.

GraphRAG Config Used

# Paste your config here

encoding_model: cl100k_base
skip_workflows: []
llm:
        #api_key: ${GRAPHRAG_API_KEY}
  type: openai_chat # or azure_openai_chat
  #model: gpt-4-turbo-preview
  model: mistral
  model_supports_json: true # recommended if this is available for your model.
  #max_tokens: 4000
  # request_timeout: 180.0
  api_base: http://localhost:11434/v1 #https://<instance>.openai.azure.com
  # api_version: 2024-02-15-preview
  # organization: <organization_id>
  # deployment_name: <azure_model_deployment_name>
  # tokens_per_minute: 150_000 # set a leaky bucket throttle
  # requests_per_minute: 10_000 # set a leaky bucket throttle
  #max_retries: 1
  # max_retry_wait: 10.0
  # sleep_on_rate_limit_recommendation: true # whether to sleep when azure suggests wait-times
  #concurrent_requests: 1 # the number of parallel inflight requests that may be made
  # temperature: 0 # temperature for sampling
  # top_p: 1 # top-p sampling
  # n: 1 # Number of completions to generate

parallelization:
  stagger: 0.3
  # num_threads: 50 # the number of threads to use for parallel processing

async_mode: threaded # or asyncio

embeddings:
  ## parallelization: override the global parallelization settings for embeddings
  async_mode: threaded # or asyncio
  # target: required # or all
  # batch_size: 16 # the number of documents to send in a single request
  # batch_max_tokens: 8191 # the maximum number of tokens to send in a single request
  llm:
    api_key: ${GRAPHRAG_API_KEY}
    type: openai_embedding # or azure_openai_embedding
    #model: text-embedding-3-small
    model: nomic_embed_text
    api_base: http://localhost:11434/v1 #https://<instance>.openai.azure.com
    # api_version: 2024-02-15-preview
    # organization: <organization_id>
    # deployment_name: <azure_model_deployment_name>
    # tokens_per_minute: 150_000 # set a leaky bucket throttle
    # requests_per_minute: 10_000 # set a leaky bucket throttle
    # max_retries: 10
    # max_retry_wait: 10.0
    # sleep_on_rate_limit_recommendation: true # whether to sleep when azure suggests wait-times
    # concurrent_requests: 25 # the number of parallel inflight requests that may be made

chunks:
  size: 1200
  overlap: 100
  group_by_columns: [id] # by default, we don't allow chunks to cross documents

input:
  type: file # or blob
  file_type: text # or csv
  base_dir: "input"
  file_encoding: utf-8
  file_pattern: ".*\\.txt$"

cache:
  type: file # or blob
  base_dir: "cache"
  # connection_string: <azure_blob_storage_connection_string>
  # container_name: <azure_blob_storage_container_name>

storage:
  type: file # or blob
  base_dir: "output/${timestamp}/artifacts"
  # connection_string: <azure_blob_storage_connection_string>
  # container_name: <azure_blob_storage_container_name>

reporting:
  type: file # or console, blob
  base_dir: "output/${timestamp}/reports"
  # connection_string: <azure_blob_storage_connection_string>
  # container_name: <azure_blob_storage_container_name>

entity_extraction:
  ## strategy: fully override the entity extraction strategy.
  ##   type: one of graph_intelligence, graph_intelligence_json and nltk
  ## llm: override the global llm settings for this task
  ## parallelization: override the global parallelization settings for this task
  ## async_mode: override the global async_mode settings for this task
  prompt: "prompts/entity_extraction.txt"
  entity_types: [organization,person,geo,event]
  max_gleanings: 1

summarize_descriptions:
  ## llm: override the global llm settings for this task
  ## parallelization: override the global parallelization settings for this task
  ## async_mode: override the global async_mode settings for this task
  prompt: "prompts/summarize_descriptions.txt"
  max_length: 500

claim_extraction:
  ## llm: override the global llm settings for this task
  ## parallelization: override the global parallelization settings for this task
  ## async_mode: override the global async_mode settings for this task
  # enabled: true
  prompt: "prompts/claim_extraction.txt"
  description: "Any claims or facts that could be relevant to information discovery."
  max_gleanings: 1

community_reports:
  ## llm: override the global llm settings for this task
  ## parallelization: override the global parallelization settings for this task
  ## async_mode: override the global async_mode settings for this task
  prompt: "prompts/community_report.txt"
  max_length: 2000
  max_input_length: 8000
                                                                                           58,12         56%
cluster_graph:
  max_cluster_size: 10

embed_graph:
  enabled: false # if true, will generate node2vec embeddings for nodes
  # num_walks: 10
  # walk_length: 40
  # window_size: 2
  # iterations: 3
  # random_seed: 597832

umap:
  enabled: false # if true, will generate UMAP embeddings for nodes

snapshots:
  graphml: false
  raw_entities: false
  top_level_nodes: false

local_search:
  # text_unit_prop: 0.5
  # community_prop: 0.1
  # conversation_history_max_turns: 5
  # top_k_mapped_entities: 10
  # top_k_relationships: 10
  # llm_temperature: 0 # temperature for sampling
  # llm_top_p: 1 # top-p sampling
  # llm_n: 1 # Number of completions to generate
  # max_tokens: 12000

global_search:
  # llm_temperature: 0 # temperature for sampling
  # llm_top_p: 1 # top-p sampling
  # llm_n: 1 # Number of completions to generate
  # max_tokens: 12000
  # data_max_tokens: 12000
  # map_max_tokens: 1000
  # reduce_max_tokens: 2000
  # concurrency: 32

Logs and screenshots

No response

Additional Information

GraphRAG Version:
Operating System: Linux
Python Version: 3.10
Related Issues:

natoverse · 2024-10-01T21:48:31Z

Routing this to #657

dipakmeher added the triage Default label assignment, indicates new issue needs reviewed by a maintainer label Sep 30, 2024

dipakmeher mentioned this issue Sep 30, 2024

[Issue]: <title> {"type": "error", "data": "Error Invoking LLM", "stack": #1228

Closed

3 tasks

natoverse added community_support Issue handled by community members and removed triage Default label assignment, indicates new issue needs reviewed by a maintainer labels Oct 1, 2024

natoverse closed this as not planned Won't fix, can't repro, duplicate, stale Oct 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Issue]: <title> Error: Model 'nomic_embed_text' Not Found During Local Embedding (--method local ) in GraphRAG Query #1234

[Issue]: <title> Error: Model 'nomic_embed_text' Not Found During Local Embedding (--method local ) in GraphRAG Query #1234

dipakmeher commented Sep 30, 2024

natoverse commented Oct 1, 2024

[Issue]: <title> Error: Model 'nomic_embed_text' Not Found During Local Embedding (--method local ) in GraphRAG Query #1234

[Issue]: <title> Error: Model 'nomic_embed_text' Not Found During Local Embedding (--method local ) in GraphRAG Query #1234

Comments

dipakmeher commented Sep 30, 2024

Do you need to file an issue?

Describe the issue

Steps to reproduce

GraphRAG Config Used

Logs and screenshots

Additional Information

natoverse commented Oct 1, 2024