Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Foundation Models Integration - Data Contributions #71

Open
caldeirav opened this issue Jul 2, 2023 · 5 comments
Open

Foundation Models Integration - Data Contributions #71

caldeirav opened this issue Jul 2, 2023 · 5 comments
Assignees
Labels
kind/enhancement New feature or request

Comments

@caldeirav
Copy link
Collaborator

caldeirav commented Jul 2, 2023

Date mesh pattern should provide a way for data product owner to contribute curated data for LLM training. A good approach and reference is the datalake approach for gpt4all:

https://github.com/nomic-ai/gpt4all-datalake

@caldeirav caldeirav converted this from a draft issue Jul 2, 2023
@caldeirav caldeirav added the kind/enhancement New feature or request label Jul 2, 2023
@caldeirav caldeirav moved this from 🆕 New to 📋 Backlog in Data Mesh Pattern Backlog Jul 2, 2023
@neoxu999
Copy link
Collaborator

neoxu999 commented Jul 9, 2023

The gpt4all-datalake has provided the API for contributed the data.
https://api.gpt4all.io/v1/ingest/chat
{ "source": "gpt4all-chat", "submitter_id": "EliteHacker#42", "agent_id": "gpt4all-j-v1.2-jazzy", "ingest_id": "string", "conversation": [ { "content": "Hello, how can I assist you today?", "role": "assistant", "rating": "negative", "edited_content": "Hello, how may I assist you today?" }, { "content": "Write me python code to contribute data to the GPT4All Datalake!", "role": "user" } ], "prompt_template": "string" }

I compared different vector databases, Weaviate, Pinecone and Chroma
Weaviate vector database has native REST API for creating objects, very convenient, worth to try.
https://weaviate.io/developers/weaviate/api/rest/batch

For search, Weaviate's GraphQL API are very useful for integration
https://weaviate.io/developers/weaviate/api/graphql

Data product owner can easily submit their data to Weaviate vector database.

@neoxu999
Copy link
Collaborator

neoxu999 commented Jul 9, 2023

Hi @caldeirav,

I'd like to install Weaviate Vector database on Red Hat AI and show examples how to send data to Weaviate.
What do you reckon?

Many thanks,
Neo

@caldeirav
Copy link
Collaborator Author

@neoxu999 Weaveviate looks like a good candidate - I think the key is to ensure we can integrate the vector database with our MLOps automation first and foremost and once this is successful, we can start looking at data contributions and data tracing / lineage requirements in details first.

@caldeirav
Copy link
Collaborator Author

@neoxu999 Do you think it is possible to introduce Weaveviate into the Data Mesh pattern deployment now? As we are installing a new instance, we can then start to run simple examples such as the ones in the OpenAI playbook, before we introduce our own training pipeline.

Reference: https://github.com/openai/openai-cookbook/tree/main/examples/vector_databases

@caldeirav caldeirav moved this from 📋 Backlog to 🏗 In progress in Data Mesh Pattern Backlog Aug 6, 2023
@neoxu999
Copy link
Collaborator

neoxu999 commented Aug 7, 2023

@caldeirav good to know we have a new instance.
Yes, I can try the OpenAI playbook before installing Weaveviate on Data Mesh Pattern.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/enhancement New feature or request
Projects
No open projects
Status: 🏗 In progress
Development

No branches or pull requests

2 participants