Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: delete index in graphrag #1637

Open
3 tasks
ajain85 opened this issue Jan 20, 2025 · 2 comments
Open
3 tasks

[Bug]: delete index in graphrag #1637

ajain85 opened this issue Jan 20, 2025 · 2 comments
Labels
bug Something isn't working triage Default label assignment, indicates new issue needs reviewed by a maintainer

Comments

@ajain85
Copy link

ajain85 commented Jan 20, 2025

Do you need to file an issue?

  • I have searched the existing issues and this bug is not already filed.
  • My model is hosted on OpenAI or Azure. If not, please look at the "model providers" issue and don't file a new one here.
  • I believe this is a legitimate bug, not just a question. If this is a question, please use the Discussions area.

Describe the bug

I am facing index deletion issue in azure ai search and in local , I have deleted the file in blob input folder and After running update command it doesn't delete the indexes in the azure search ai. I am running below command .

cli - python -m graphrag update --config .\cli_graphrag\settings.yaml

Steps to reproduce

No response

Expected Behavior

No response

GraphRAG Config Used

# Paste your config here

Logs and screenshots

No response

Additional Information

  • GraphRAG Version:
  • Operating System:
  • Python Version:
  • Related Issues:
@ajain85 ajain85 added bug Something isn't working triage Default label assignment, indicates new issue needs reviewed by a maintainer labels Jan 20, 2025
@ajain85 ajain85 changed the title [Bug]: <title> [Bug]: delete index in graphrag Jan 20, 2025
@natoverse
Copy link
Collaborator

This is correct - the update command appends new data, but does not remove data. That's a much more complicated task because the summarization process is lossy.

If you retain your cache, subsequent runs can avoid re-invoking the LLM for things like graph extraction. This makes it possible to re-run indexing with different mixes of document content. So you could remove the documents in question and then re-run the regular indexing. Graph extraction should be "free" in that it just uses the cache for all the existing text units. Depending on how the removed documents affect the community structure, community report generation could be the same cost as a usual run, or cheaper if some do not change and therefore also use the cache.

@natoverse
Copy link
Collaborator

I should mention: when the embeddings are generated during normal indexing, they overwrite the existing vector store index, so your old entries should go away. Because the update command only adds new content, the old entries would still be there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working triage Default label assignment, indicates new issue needs reviewed by a maintainer
Projects
None yet
Development

No branches or pull requests

2 participants