Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatic Entities Extraction #43

Open
ziudeso opened this issue Dec 9, 2024 · 12 comments
Open

Automatic Entities Extraction #43

ziudeso opened this issue Dec 9, 2024 · 12 comments

Comments

@ziudeso
Copy link

ziudeso commented Dec 9, 2024

Is your feature request related to a problem? Please describe.
As far as I understood fast-graphrag requires specifying the entities to process.

Describe the solution you'd like
I'd love to automatically generate the entities so the entire process requires no manual intervention

Describe alternatives you've considered
LightRAG

Additional context

@liukidar
Copy link
Contributor

liukidar commented Dec 9, 2024

Hello,
the idea of letting you specify the entities to extract is to allow extract only the knowledge that you believe is important for your task. We believe this to be a very important step in the graph creation step. We are working on a simple and straightforward interface to help you choose the right entity types given your problem (basically picking from some options generated by an LLM given your task prompt). Would that be a reasonable solution?

@ziudeso
Copy link
Author

ziudeso commented Dec 9, 2024

@liukidar thanks for your prompt reply, my two cents are that knowing the entities means knowing a client's domain, and this is rarely the case in business scenarios where you want distribute your RAG to a heterogeneous public.

Generic entities around a business case are super welcome, however I believe they should be document-oriented maybe, and not domain oriented, don't you agree? I believe it'd lower the adoption barrier. Nonetheless if you implement the prompt system you mentioned I'd be super happy to try it

@liukidar
Copy link
Contributor

liukidar commented Dec 9, 2024

I see, what you say makes sense. We could maybe modify the prompt such that the type list provided shouldn't be interpreted as an exhaustive list by the llm but just as inspiration. I'll think about this.

@ziudeso
Copy link
Author

ziudeso commented Dec 9, 2024

And what about extracting it from the input docs or their (recursive) summary?

@liukidar
Copy link
Contributor

liukidar commented Dec 9, 2024

I mean, we can definitely leave that as an option, but I believe there should be some user intervention in the process (also reading the input to define the entity types can become a rather expensive operation if done for each document). We could allow to create an ontology from a sunset of documents and use that (but this would be very similar to the 'prompt generation' solution I mentioned above)

@ziudeso
Copy link
Author

ziudeso commented Dec 10, 2024

Sure, looking forward to it, please comment back when you'll integrate it. Great job!!
Btw: heard of Lazyrag? https://www.microsoft.com/en-us/research/blog/lazygraphrag-setting-a-new-standard-for-quality-and-cost/

They heavily cut the costs of entity generation using NLP

@ziudeso ziudeso closed this as completed Dec 10, 2024
@ziudeso ziudeso reopened this Dec 10, 2024
@ziudeso
Copy link
Author

ziudeso commented Dec 10, 2024

I'll close it, please let know when the prompt will be integrated.
Thanks

@ziudeso ziudeso closed this as completed Dec 10, 2024
@ziudeso
Copy link
Author

ziudeso commented Dec 17, 2024

Hi! Any updates on the prompt?
Thanks

@ziudeso ziudeso reopened this Dec 24, 2024
@ziudeso
Copy link
Author

ziudeso commented Dec 24, 2024

Hi, did load the prompt eventually?
Merry Christmas!

@liukidar
Copy link
Contributor

liukidar commented Dec 25, 2024

Sure, looking forward to it, please comment back when you'll integrate it. Great job!! Btw: heard of Lazyrag? https://www.microsoft.com/en-us/research/blog/lazygraphrag-setting-a-new-standard-for-quality-and-cost/

They heavily cut the costs of entity generation using NLP

Hello, yes, we are also looking into Lazyrag to see if there are any interesting ideas we can integrate in our system.

Hi, did load the prompt eventually?

We have been experimenting with different ideas, but ultimately we have been focusing on other things and didn't reach a final decision. We were also thinking to provide "negative" instead of positive entities (i.e., concept you do not want to extract rather that what you want to extract, but we are still benchmarking things), what do you think about that?

Merry Christmas!

Thank you! You too :)

@ziudeso
Copy link
Author

ziudeso commented Dec 25, 2024

Ciao Luca!! ho visto ora che sei Bresciano! rispondere ad una Issue a Natale ti fa onore, grazie! =D se vuoi ci scambiam le mail cosí possiamo parlarne più apertamente magari a voce, la mia è: [email protected] ok back to the reply:

Thanks for your feedback @liukidar! so the negative entities idea would be cool but not a game changer business-wise, lots of the use cases clients are willing to pay for involve ingesting mostly unknown KB and from my experience they can bear the toll of a more "expensive" ingestion to improve accuracy, https://github.com/HKUDS/LightRAG does decent job at creating entities but I think it can be hugely improved.

Let's have a chat!
Happy Christmas again!!

@btebbutt
Copy link

I'd like to iterate on this discussion.

Would be great if it's made more clear whether the entity types are a fixed schema or just initial suggestions. Would be cool if not specifying the schema was possible and then only after K extractions does it settle on one based on the thresholds at that point in time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants
@ziudeso @liukidar @btebbutt and others