In-progress data is not cached. #57

tbtommyb · 2025-01-06T09:21:13Z

Describe the bug
The README says "The next time you initialize fast-graphrag from the same working directory, it will retain all the knowledge automatically". However, this does not seem to work.

To Reproduce
Steps to reproduce the behavior:

In pyproject.toml add:

[tool.poetry.scripts]
test-app = "fast_graphrag.test:start"

In fast_graphrag/test.py add:

import instructor
from dotenv import load_dotenv

from fast_graphrag import GraphRAG
from fast_graphrag._llm import OpenAIEmbeddingService, OpenAILLMService

DOMAIN = "Analyze this story and identify the characters. Focus on how they interact with each other, the locations they explore, and their relationships."

EXAMPLE_QUERIES = [
    "What is the significance of Christmas Eve in A Christmas Carol?",
    "How does the setting of Victorian London contribute to the story's themes?",
    "Describe the chain of events that leads to Scrooge's transformation.",
    "How does Dickens use the different spirits (Past, Present, and Future) to guide Scrooge?",
    'Why does Dickens choose to divide the story into "staves" rather than chapters?',
]

ENTITY_TYPES = ["Character", "Animal", "Place", "Object", "Activity", "Event"]

def start():
    grag = GraphRAG(
        working_dir="./book_example",
        domain=DOMAIN,
        example_queries="\n".join(EXAMPLE_QUERIES),
        entity_types=ENTITY_TYPES,
        config=GraphRAG.Config(
            llm_service=OpenAILLMService(
                api_key="bedrock",
                model="anthropic.claude-3-5-sonnet-20241022-v2:0",
                base_url="http://localhost:8008/api/v1",
                mode=instructor.Mode.JSON,
            ),
            embedding_service=OpenAIEmbeddingService(
                model="cohere.embed-english-v3",
                base_url="http://localhost:8008/api/v1",
                api_key="bedrock",
                embedding_dim=1024,  # the output embedding dim of the chosen model
            ),
        ),
    )

    with open("./book.txt") as f:
        grag.insert(f.read())

    print(grag.query("Who is Scrooge?").response)

Run poetry install
Run poetry run test-app
After a while kill the running extraction/embedding computation. Ideally once the extraction is completed.
Run again. Extraction restarts from the beginning.

Expected behaviour

If extraction is killed mid progress, I would expect it to restart roughly where it left off. If the extraction completes and the app hangs during embedding computation, I would expect it to skip extraction and restart embedding computation.

Example app run

In this test run I left it running and it froze during the embedding computation. I killed the process and restarted but I can see via network that it is redoing the entire extraction. The book_example directory has pickle files that are only a few hundred bytes in size.

user % poetry run test-app
Extracting data: 100%|████████████████████████████████████████████████████████████████████| 1/1 [52:14<00:00, 3134.86s/it]
Building... [computing embeddings]:  43%|█████████████████████▊                             | 3/7 [01:24<01:42, 25.53s/it]^CTraceback (most recent call last):
  File "<string>", line 1, in <module>
    import sys; from importlib import import_module; sys.argv = ['/Users/user/Library/Caches/pypoetry/virtualenvs/fast-graphrag-x9wMucwt-py3.13/bin/test-app']; sys.exit(import_module('fast_graphrag.test').start())
                                                                                                                                                                             ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^
  File "/Volumes/user/fast-graphrag/fast_graphrag/test.py", line 45, in start
    grag.insert(f.read())
    ~~~~~~~~~~~^^^^^^^^^^
  File "/Volumes/user/fast-graphrag/fast_graphrag/_graphrag.py", line 75, in insert
    return get_event_loop().run_until_complete(self.async_insert(content, metadata, params, show_progress))
           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/[email protected]/3.13.1/Frameworks/Python.framework/Versions/3.13/lib/python3.13/asyncio/base_events.py", line 707, in run_until_complete
    self.run_forever()
    ~~~~~~~~~~~~~~~~^^
  File "/opt/homebrew/Cellar/[email protected]/3.13.1/Frameworks/Python.framework/Versions/3.13/lib/python3.13/asyncio/base_events.py", line 678, in run_forever
    self._run_once()
    ~~~~~~~~~~~~~~^^
  File "/opt/homebrew/Cellar/[email protected]/3.13.1/Frameworks/Python.framework/Versions/3.13/lib/python3.13/asyncio/base_events.py", line 2033, in _run_once
    handle._run()
    ~~~~~~~~~~~^^
  File "/opt/homebrew/Cellar/[email protected]/3.13.1/Frameworks/Python.framework/Versions/3.13/lib/python3.13/asyncio/events.py", line 89, in _run
    self._context.run(self._callback, *self._args)
    ~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Volumes/user/fast-graphrag/fast_graphrag/_graphrag.py", line 128, in async_insert
    await self.state_manager.upsert(
        llm=self.llm_service, subgraphs=subgraphs, documents=new_chunks_per_data, show_progress=show_progress
    )
  File "/Volumes/user/fast-graphrag/fast_graphrag/_services/_state_manager.py", line 128, in upsert
    await self.entity_storage.upsert(ids=(i for i, _ in upserted_nodes), embeddings=embeddings)
  File "/Volumes/user/fast-graphrag/fast_graphrag/_storage/_vdb_hnswlib.py", line 58, in upsert
    while self.size + len(embeddings) >= new_size:
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
KeyboardInterrupt

user % poetry run test-app
Extracting data:   0%|

Note that I am using Bedrock Access Gateway to proxy requests to AWS Bedrock. This doesn't seem to be related but I include for completeness.

Note that the same occurs if my LLM starts returning 429 or 500 errors. I would expect the failed requests to be cached and incrementally completed if I kill and restart the process.

The text was updated successfully, but these errors were encountered:

liukidar · 2025-01-06T09:26:42Z

You are correct, at the moment we do not support any LLM caching. It is something we should definitely implement.

tbtommyb · 2025-01-06T09:37:46Z

I see. What does the README mean when it says "The next time you initialize fast-graphrag from the same working directory, it will retain all the knowledge automatically"?

liukidar · 2025-01-06T09:39:45Z

It means that the graph you create is saved in memory, so once you insert document A, that will be persistently saved into the graph (but as you pointed out, if something goes wrong during insertion you need to start over for that document as we do not cache any LLM computation), so that if you try to query the graph at a second occasion the knowledge will be retained. Hope that this answers your question :)

tbtommyb · 2025-01-06T09:49:43Z

Yes, thank you! I would like to try and fix this as the tool seems very useful but I haven't yet been able to get it to run successfully. I usually start getting error responses from my LLM due to rate limiting and I don't think they are retried. Then the embedding computation seems to keep hanging. It seems that it gets stuck so I would like to try and fix that and make it more robust to failures and retries.

Please can you point to roughly where you'd want this functionality implemented and I will try to take a look when I get time?

liukidar · 2025-01-06T09:56:07Z

Sure, I can draft something, in the meanwhile can you elaborate on the llm/embedding problems? What services are you using?
In theory retries should be enabled so i'd like to know more about this (also you can change the maximum number of concurrent LLM requests via export CONCURRENT_TASK_LIMIT=n).

liukidar · 2025-01-06T10:08:25Z

Please can you point to roughly where you'd want this functionality implemented and I will try to take a look when I get time?

Ideally, the BaseLLMService and BaseEmbeddingService should implement a "cache" function within send_message and encode that works according to the following pseudo-code:

if cache exists:
    check if params are in cache, if yes return the associated value (similar to how rlu cache works)
otherwise compute result and cache it

Then the state_manager should do the following:

.... compute all the graph insertion stuff ....
If all is good empty the cache, if something breaks save cache to disk

So at the end of state_manager._insert_done there should be a "clear cache files" and all the insertion process should be wrapped in a try catch that on catch does something like llm.save_cache(); embedding.save_cache().
For caching/data storage of key-value pairs we are using normal files, but maybe there are better ways, such as https://github.com/grantjenks/python-diskcache ? Not sure if it's suitable for this tho, but it seems a good candidate.

tbtommyb · 2025-01-06T10:30:40Z

in the meanwhile can you elaborate on the llm/embedding problems? What services are you using?
In theory retries should be enabled so i'd like to know more about this (also you can change the maximum number of concurrent LLM requests via export CONCURRENT_TASK_LIMIT=n).

I am using AWS Bedrock but the models I have access to are very severely rate-limited for non-production uses so I have set the concurrent limits to 2. What happens is that I get around half way through the extraction phase and start getting lots of 500s (which are actually rate limits, I don't know why they aren't 429). Fast-graphrag seems to skip through the failed requests and then starts doing the embeddings. These consistently get to 43% and then hang. Perhaps it is getting stuck when trying to embed the failed extraction requests? I have left it running overnight and it didn't progress so I know it is hanging and not just very slow.

I will try with concurrent requests = 1 and see how that works. I may also try async batching requests (see AWS documentation here, OpenAI here) as it seems a good fit for this work.

liukidar · 2025-01-06T10:57:48Z

in the meanwhile can you elaborate on the llm/embedding problems? What services are you using?
In theory retries should be enabled so i'd like to know more about this (also you can change the maximum number of concurrent LLM requests via export CONCURRENT_TASK_LIMIT=n).

I am using AWS Bedrock but the models I have access to are very severely rate-limited for non-production uses so I have set the concurrent limits to 2. What happens is that I get around half way through the extraction phase and start getting lots of 500s (which are actually rate limits, I don't know why they aren't 429). Fast-graphrag seems to skip through the failed requests and then starts doing the embeddings. These consistently get to 43% and then hang. Perhaps it is getting stuck when trying to embed the failed extraction requests? I have left it running overnight and it didn't progress so I know it is hanging and not just very slow.

I will try with concurrent requests = 1 and see how that works. I may also try async batching requests (see AWS documentation here, OpenAI here) as it seems a good fit for this work.

It could be that instructor, the library we are using for LLM queries, doesn't retry on 500s but only on 429. I would check for that documentation.

Another user reported that the process gets stuck at 43%, I will further investigate on this.

tbtommyb · 2025-01-06T13:00:13Z

The 500 errors were coming from the Bedrock proxy I am using and updating to use 429s seems to fix the retry issue.

Regarding getting stuck at 43%, the issue is that here self.size and new_size are both 0, so multiplying by 2 does nothing and it is stuck in an infinite loop. This fixes it:

diff --git a/fast_graphrag/_storage/_vdb_hnswlib.py b/fast_graphrag/_storage/_vdb_hnswlib.py
index 012903d..ba8dcaa 100644
--- a/fast_graphrag/_storage/_vdb_hnswlib.py
+++ b/fast_graphrag/_storage/_vdb_hnswlib.py
@@ -37,7 +37,7 @@ class HNSWVectorStorage(BaseVectorStorage[GTId, GTEmbedding]):

     @property
     def max_size(self) -> int:
-        return self._index.get_max_elements()
+        return self._index.get_max_elements() or 1

And I get a response from my test above!

liukidar · 2025-01-06T13:28:35Z

I see, nice catch! I'd appreciate if you could issue a pull request for that since you spotted it (in case, can you make the 1the constant INITIAL_MAX_ELEMENTS). Otherwise I can take care of it. Let me know.
I wonder why max_size is 0 tho. The default value should be in the 100000s. Did you change any configuration setting?

tbtommyb · 2025-01-06T14:29:58Z

I can't do a PR, sorry, as this is my work machine.

I haven't changed any config apart from the concurrency settings.

liukidar added enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed labels Jan 6, 2025

liukidar mentioned this issue Jan 11, 2025

Time-consuming to build graphs #60

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

In-progress data is not cached. #57

In-progress data is not cached. #57

tbtommyb commented Jan 6, 2025 •

edited

Loading

liukidar commented Jan 6, 2025

tbtommyb commented Jan 6, 2025

liukidar commented Jan 6, 2025

tbtommyb commented Jan 6, 2025

liukidar commented Jan 6, 2025 •

edited

Loading

liukidar commented Jan 6, 2025

tbtommyb commented Jan 6, 2025

liukidar commented Jan 6, 2025

tbtommyb commented Jan 6, 2025

liukidar commented Jan 6, 2025 •

edited

Loading

tbtommyb commented Jan 6, 2025

In-progress data is not cached. #57

In-progress data is not cached. #57

Comments

tbtommyb commented Jan 6, 2025 • edited Loading

liukidar commented Jan 6, 2025

tbtommyb commented Jan 6, 2025

liukidar commented Jan 6, 2025

tbtommyb commented Jan 6, 2025

liukidar commented Jan 6, 2025 • edited Loading

liukidar commented Jan 6, 2025

tbtommyb commented Jan 6, 2025

liukidar commented Jan 6, 2025

tbtommyb commented Jan 6, 2025

liukidar commented Jan 6, 2025 • edited Loading

tbtommyb commented Jan 6, 2025

tbtommyb commented Jan 6, 2025 •

edited

Loading

liukidar commented Jan 6, 2025 •

edited

Loading

liukidar commented Jan 6, 2025 •

edited

Loading