Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ideas for features #52

Open
PavelAgurov opened this issue Dec 19, 2024 · 7 comments
Open

Ideas for features #52

PavelAgurov opened this issue Dec 19, 2024 · 7 comments

Comments

@PavelAgurov
Copy link

  1. It will be very useful to have count of used tokens, cost and timing of indexing and quiring.
  2. Ability to get list of UNKNOWN from graph and their description
  3. Ability to get not connected nodes from graph and their description
  4. Clear way how to load created graph from disk (now I should provide all parameters in constructor event if it's not needed)
@liukidar
Copy link
Contributor

Hello! These are great ideas.
In particular, for point 1 it should be enough to modify the llm interface to track the token and information.
Can you please elaborate what do you mean for point 4? Currently the data loading is completely managed by the library, so you don't have to worry about anything related to it.

@PavelAgurov
Copy link
Author

Hi!

Yes, it's correct, but I have to provide all parameters event if I want to load graph:

grag = GraphRAG(
    working_dir="./book_example",
    domain=DOMAIN,
    example_queries="\n".join(EXAMPLE_QUERIES),
    entity_types=ENTITY_TYPES
)

But I think, for example, example_queries is not required for query. It's needed only to build graph.
In general, I see use case as "build graph" once and use it later many times.
Of course we should think also about adding new files into graph, so that maybe you can save all information provided during graph building into special file and load it.

@PavelAgurov
Copy link
Author

And one more idea is to create special simple prompt like "give me noun(s) for entity(entities) based on descriptions. It can be useful for scenario:

  • build graph
  • find all UNKNOWN and their descriptions
  • ask to build new entity_types based on descriptions
  • add new entities
  • re-build graph
  • check if I still have UNKNOWN
  • ....
  • check if I have not connected nodes
  • ...
  • save graph

Also maybe good to add cache as it's in langchain to avoid re-calculation if I have the same prompt during re-build graph, because I guess it will be 4-5 cycles.

@PavelAgurov
Copy link
Author

What can be very useful in enterprise solution is to add metadata for connections to have ability to delete file from graph, but I'm not sure that it's possible at all, because connections can be extracted from multiple sources. Maybe better to re-calculate full graph with cached data for each prompt.

@liukidar
Copy link
Contributor

Hi!

Yes, it's correct, but I have to provide all parameters event if I want to load graph:

grag = GraphRAG(
    working_dir="./book_example",
    domain=DOMAIN,
    example_queries="\n".join(EXAMPLE_QUERIES),
    entity_types=ENTITY_TYPES
)

But I think, for example, example_queries is not required for query. It's needed only to build graph. In general, I see use case as "build graph" once and use it later many times. Of course we should think also about adding new files into graph, so that maybe you can save all information provided during graph building into special file and load it.

This makes a lot of sense, indeed all those infos are not used when using a graph for query. I will make this clear in the examples.

@liukidar
Copy link
Contributor

And one more idea is to create special simple prompt like "give me noun(s) for entity(entities) based on descriptions. It can be useful for scenario:

  • build graph
  • find all UNKNOWN and their descriptions
  • ask to build new entity_types based on descriptions
  • add new entities
  • re-build graph
  • check if I still have UNKNOWN
  • ....
  • check if I have not connected nodes
  • ...
  • save graph

Also maybe good to add cache as it's in langchain to avoid re-calculation if I have the same prompt during re-build graph, because I guess it will be 4-5 cycles.

I see, these functionalities make sense but they are a bit out of scope of what we are amining to provide right now (something completely automatic). What we can do is try to expose methods that allow contributions in the directions you are suggesting.

@liukidar
Copy link
Contributor

liukidar commented Dec 27, 2024

What can be very useful in enterprise solution is to add metadata for connections to have ability to delete file from graph, but I'm not sure that it's possible at all, because connections can be extracted from multiple sources. Maybe better to re-calculate full graph with cached data for each prompt.

This is indeed quite challenging, but it is exaclty what we have been working on in the past month, it is going to be a major update, but hopefully should be ready in a week or so (re calculating is not an efficient option so we are excluding it) and should expose a method to delete all the content related to a specific file id.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants