Ideas for features #52

PavelAgurov · 2024-12-19T09:37:41Z

It will be very useful to have count of used tokens, cost and timing of indexing and quiring.
Ability to get list of UNKNOWN from graph and their description
Ability to get not connected nodes from graph and their description
Clear way how to load created graph from disk (now I should provide all parameters in constructor event if it's not needed)

liukidar · 2024-12-23T16:17:44Z

Hello! These are great ideas.
In particular, for point 1 it should be enough to modify the llm interface to track the token and information.
Can you please elaborate what do you mean for point 4? Currently the data loading is completely managed by the library, so you don't have to worry about anything related to it.

PavelAgurov · 2024-12-23T21:49:22Z

Hi!

Yes, it's correct, but I have to provide all parameters event if I want to load graph:

grag = GraphRAG(
    working_dir="./book_example",
    domain=DOMAIN,
    example_queries="\n".join(EXAMPLE_QUERIES),
    entity_types=ENTITY_TYPES
)

But I think, for example, example_queries is not required for query. It's needed only to build graph.
In general, I see use case as "build graph" once and use it later many times.
Of course we should think also about adding new files into graph, so that maybe you can save all information provided during graph building into special file and load it.

PavelAgurov · 2024-12-23T21:53:46Z

And one more idea is to create special simple prompt like "give me noun(s) for entity(entities) based on descriptions. It can be useful for scenario:

build graph
find all UNKNOWN and their descriptions
ask to build new entity_types based on descriptions
add new entities
re-build graph
check if I still have UNKNOWN
....
check if I have not connected nodes
...
save graph

Also maybe good to add cache as it's in langchain to avoid re-calculation if I have the same prompt during re-build graph, because I guess it will be 4-5 cycles.

PavelAgurov · 2024-12-23T21:57:21Z

What can be very useful in enterprise solution is to add metadata for connections to have ability to delete file from graph, but I'm not sure that it's possible at all, because connections can be extracted from multiple sources. Maybe better to re-calculate full graph with cached data for each prompt.

liukidar · 2024-12-27T11:12:48Z

Hi!

Yes, it's correct, but I have to provide all parameters event if I want to load graph:
grag = GraphRAG(
    working_dir="./book_example",
    domain=DOMAIN,
    example_queries="\n".join(EXAMPLE_QUERIES),
    entity_types=ENTITY_TYPES
)
But I think, for example, example_queries is not required for query. It's needed only to build graph. In general, I see use case as "build graph" once and use it later many times. Of course we should think also about adding new files into graph, so that maybe you can save all information provided during graph building into special file and load it.

This makes a lot of sense, indeed all those infos are not used when using a graph for query. I will make this clear in the examples.

liukidar · 2024-12-27T11:15:13Z

And one more idea is to create special simple prompt like "give me noun(s) for entity(entities) based on descriptions. It can be useful for scenario:

build graph

find all UNKNOWN and their descriptions

ask to build new entity_types based on descriptions

add new entities

re-build graph

check if I still have UNKNOWN

....

check if I have not connected nodes

...

save graph

Also maybe good to add cache as it's in langchain to avoid re-calculation if I have the same prompt during re-build graph, because I guess it will be 4-5 cycles.

I see, these functionalities make sense but they are a bit out of scope of what we are amining to provide right now (something completely automatic). What we can do is try to expose methods that allow contributions in the directions you are suggesting.

liukidar · 2024-12-27T11:16:29Z

What can be very useful in enterprise solution is to add metadata for connections to have ability to delete file from graph, but I'm not sure that it's possible at all, because connections can be extracted from multiple sources. Maybe better to re-calculate full graph with cached data for each prompt.

This is indeed quite challenging, but it is exaclty what we have been working on in the past month, it is going to be a major update, but hopefully should be ready in a week or so (re calculating is not an efficient option so we are excluding it) and should expose a method to delete all the content related to a specific file id.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ideas for features #52

Ideas for features #52

PavelAgurov commented Dec 19, 2024

liukidar commented Dec 23, 2024

PavelAgurov commented Dec 23, 2024

PavelAgurov commented Dec 23, 2024

PavelAgurov commented Dec 23, 2024

liukidar commented Dec 27, 2024

liukidar commented Dec 27, 2024

liukidar commented Dec 27, 2024 •

edited

Loading

Ideas for features #52

Ideas for features #52

Comments

PavelAgurov commented Dec 19, 2024

liukidar commented Dec 23, 2024

PavelAgurov commented Dec 23, 2024

PavelAgurov commented Dec 23, 2024

PavelAgurov commented Dec 23, 2024

liukidar commented Dec 27, 2024

liukidar commented Dec 27, 2024

liukidar commented Dec 27, 2024 • edited Loading

liukidar commented Dec 27, 2024 •

edited

Loading