-
Notifications
You must be signed in to change notification settings - Fork 140
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Automatic Entities Extraction #43
Comments
Hello, |
@liukidar thanks for your prompt reply, my two cents are that knowing the entities means knowing a client's domain, and this is rarely the case in business scenarios where you want distribute your RAG to a heterogeneous public. Generic entities around a business case are super welcome, however I believe they should be document-oriented maybe, and not domain oriented, don't you agree? I believe it'd lower the adoption barrier. Nonetheless if you implement the prompt system you mentioned I'd be super happy to try it |
I see, what you say makes sense. We could maybe modify the prompt such that the type list provided shouldn't be interpreted as an exhaustive list by the llm but just as inspiration. I'll think about this. |
And what about extracting it from the input docs or their (recursive) summary? |
I mean, we can definitely leave that as an option, but I believe there should be some user intervention in the process (also reading the input to define the entity types can become a rather expensive operation if done for each document). We could allow to create an ontology from a sunset of documents and use that (but this would be very similar to the 'prompt generation' solution I mentioned above) |
Sure, looking forward to it, please comment back when you'll integrate it. Great job!! They heavily cut the costs of entity generation using NLP |
I'll close it, please let know when the prompt will be integrated. |
Hi! Any updates on the prompt? |
Hi, did load the prompt eventually? |
Hello, yes, we are also looking into Lazyrag to see if there are any interesting ideas we can integrate in our system.
We have been experimenting with different ideas, but ultimately we have been focusing on other things and didn't reach a final decision. We were also thinking to provide "negative" instead of positive entities (i.e., concept you do not want to extract rather that what you want to extract, but we are still benchmarking things), what do you think about that?
Thank you! You too :) |
Ciao Luca!! ho visto ora che sei Bresciano! rispondere ad una Issue a Natale ti fa onore, grazie! =D se vuoi ci scambiam le mail cosí possiamo parlarne più apertamente magari a voce, la mia è: [email protected] ok back to the reply: Thanks for your feedback @liukidar! so the negative entities idea would be cool but not a game changer business-wise, lots of the use cases clients are willing to pay for involve ingesting mostly unknown KB and from my experience they can bear the toll of a more "expensive" ingestion to improve accuracy, https://github.com/HKUDS/LightRAG does decent job at creating entities but I think it can be hugely improved. Let's have a chat! |
I'd like to iterate on this discussion. Would be great if it's made more clear whether the entity types are a fixed schema or just initial suggestions. Would be cool if not specifying the schema was possible and then only after K extractions does it settle on one based on the thresholds at that point in time. |
Is your feature request related to a problem? Please describe.
As far as I understood fast-graphrag requires specifying the entities to process.
Describe the solution you'd like
I'd love to automatically generate the entities so the entire process requires no manual intervention
Describe alternatives you've considered
LightRAG
Additional context
The text was updated successfully, but these errors were encountered: