Automatic Entities Extraction #43

ziudeso · 2024-12-09T06:30:11Z

Is your feature request related to a problem? Please describe.
As far as I understood fast-graphrag requires specifying the entities to process.

Describe the solution you'd like
I'd love to automatically generate the entities so the entire process requires no manual intervention

Describe alternatives you've considered
LightRAG

Additional context

liukidar · 2024-12-09T19:17:08Z

Hello,
the idea of letting you specify the entities to extract is to allow extract only the knowledge that you believe is important for your task. We believe this to be a very important step in the graph creation step. We are working on a simple and straightforward interface to help you choose the right entity types given your problem (basically picking from some options generated by an LLM given your task prompt). Would that be a reasonable solution?

ziudeso · 2024-12-09T19:35:40Z

@liukidar thanks for your prompt reply, my two cents are that knowing the entities means knowing a client's domain, and this is rarely the case in business scenarios where you want distribute your RAG to a heterogeneous public.

Generic entities around a business case are super welcome, however I believe they should be document-oriented maybe, and not domain oriented, don't you agree? I believe it'd lower the adoption barrier. Nonetheless if you implement the prompt system you mentioned I'd be super happy to try it

liukidar · 2024-12-09T20:03:48Z

I see, what you say makes sense. We could maybe modify the prompt such that the type list provided shouldn't be interpreted as an exhaustive list by the llm but just as inspiration. I'll think about this.

ziudeso · 2024-12-09T20:23:48Z

And what about extracting it from the input docs or their (recursive) summary?

liukidar · 2024-12-09T20:39:29Z

I mean, we can definitely leave that as an option, but I believe there should be some user intervention in the process (also reading the input to define the entity types can become a rather expensive operation if done for each document). We could allow to create an ontology from a sunset of documents and use that (but this would be very similar to the 'prompt generation' solution I mentioned above)

ziudeso · 2024-12-10T07:30:33Z

Sure, looking forward to it, please comment back when you'll integrate it. Great job!!
Btw: heard of Lazyrag? https://www.microsoft.com/en-us/research/blog/lazygraphrag-setting-a-new-standard-for-quality-and-cost/

They heavily cut the costs of entity generation using NLP

ziudeso · 2024-12-10T17:01:07Z

I'll close it, please let know when the prompt will be integrated.
Thanks

ziudeso · 2024-12-17T14:46:27Z

Hi! Any updates on the prompt?
Thanks

ziudeso · 2024-12-24T14:00:50Z

Hi, did load the prompt eventually?
Merry Christmas!

liukidar · 2024-12-25T08:57:11Z

Sure, looking forward to it, please comment back when you'll integrate it. Great job!! Btw: heard of Lazyrag? https://www.microsoft.com/en-us/research/blog/lazygraphrag-setting-a-new-standard-for-quality-and-cost/

They heavily cut the costs of entity generation using NLP

Hello, yes, we are also looking into Lazyrag to see if there are any interesting ideas we can integrate in our system.

Hi, did load the prompt eventually?

We have been experimenting with different ideas, but ultimately we have been focusing on other things and didn't reach a final decision. We were also thinking to provide "negative" instead of positive entities (i.e., concept you do not want to extract rather that what you want to extract, but we are still benchmarking things), what do you think about that?

Merry Christmas!

Thank you! You too :)

ziudeso · 2024-12-25T09:08:06Z

Ciao Luca!! ho visto ora che sei Bresciano! rispondere ad una Issue a Natale ti fa onore, grazie! =D se vuoi ci scambiam le mail cosí possiamo parlarne più apertamente magari a voce, la mia è: [email protected] ok back to the reply:

Thanks for your feedback @liukidar! so the negative entities idea would be cool but not a game changer business-wise, lots of the use cases clients are willing to pay for involve ingesting mostly unknown KB and from my experience they can bear the toll of a more "expensive" ingestion to improve accuracy, https://github.com/HKUDS/LightRAG does decent job at creating entities but I think it can be hugely improved.

Let's have a chat!
Happy Christmas again!!

btebbutt · 2025-01-21T22:49:30Z

I'd like to iterate on this discussion.

Would be great if it's made more clear whether the entity types are a fixed schema or just initial suggestions. Would be cool if not specifying the schema was possible and then only after K extractions does it settle on one based on the thresholds at that point in time.

ziudeso closed this as completed Dec 10, 2024

ziudeso reopened this Dec 10, 2024

ziudeso closed this as completed Dec 10, 2024

ziudeso reopened this Dec 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Automatic Entities Extraction #43

Automatic Entities Extraction #43

ziudeso commented Dec 9, 2024

liukidar commented Dec 9, 2024

ziudeso commented Dec 9, 2024

liukidar commented Dec 9, 2024

ziudeso commented Dec 9, 2024

liukidar commented Dec 9, 2024

ziudeso commented Dec 10, 2024 •

edited

Loading

ziudeso commented Dec 10, 2024

ziudeso commented Dec 17, 2024

ziudeso commented Dec 24, 2024

liukidar commented Dec 25, 2024 •

edited

Loading

ziudeso commented Dec 25, 2024

btebbutt commented Jan 21, 2025

Automatic Entities Extraction #43

Automatic Entities Extraction #43

Comments

ziudeso commented Dec 9, 2024

Additional context

liukidar commented Dec 9, 2024

ziudeso commented Dec 9, 2024

liukidar commented Dec 9, 2024

ziudeso commented Dec 9, 2024

liukidar commented Dec 9, 2024

ziudeso commented Dec 10, 2024 • edited Loading

ziudeso commented Dec 10, 2024

ziudeso commented Dec 17, 2024

ziudeso commented Dec 24, 2024

liukidar commented Dec 25, 2024 • edited Loading

ziudeso commented Dec 25, 2024

btebbutt commented Jan 21, 2025

ziudeso commented Dec 10, 2024 •

edited

Loading

liukidar commented Dec 25, 2024 •

edited

Loading