-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Foundation Models Integration - Data Contributions #71
Comments
The gpt4all-datalake has provided the API for contributed the data. I compared different vector databases, Weaviate, Pinecone and Chroma For search, Weaviate's GraphQL API are very useful for integration Data product owner can easily submit their data to Weaviate vector database. |
Hi @caldeirav, I'd like to install Weaviate Vector database on Red Hat AI and show examples how to send data to Weaviate. Many thanks, |
@neoxu999 Weaveviate looks like a good candidate - I think the key is to ensure we can integrate the vector database with our MLOps automation first and foremost and once this is successful, we can start looking at data contributions and data tracing / lineage requirements in details first. |
@neoxu999 Do you think it is possible to introduce Weaveviate into the Data Mesh pattern deployment now? As we are installing a new instance, we can then start to run simple examples such as the ones in the OpenAI playbook, before we introduce our own training pipeline. Reference: https://github.com/openai/openai-cookbook/tree/main/examples/vector_databases |
@caldeirav good to know we have a new instance. |
Date mesh pattern should provide a way for data product owner to contribute curated data for LLM training. A good approach and reference is the datalake approach for gpt4all:
https://github.com/nomic-ai/gpt4all-datalake
The text was updated successfully, but these errors were encountered: