Foundation Models Integration - Data Contributions #71

caldeirav · 2023-07-02T05:12:48Z

Date mesh pattern should provide a way for data product owner to contribute curated data for LLM training. A good approach and reference is the datalake approach for gpt4all:

https://github.com/nomic-ai/gpt4all-datalake

neoxu999 · 2023-07-09T12:42:10Z

The gpt4all-datalake has provided the API for contributed the data.
https://api.gpt4all.io/v1/ingest/chat
{ "source": "gpt4all-chat", "submitter_id": "EliteHacker#42", "agent_id": "gpt4all-j-v1.2-jazzy", "ingest_id": "string", "conversation": [ { "content": "Hello, how can I assist you today?", "role": "assistant", "rating": "negative", "edited_content": "Hello, how may I assist you today?" }, { "content": "Write me python code to contribute data to the GPT4All Datalake!", "role": "user" } ], "prompt_template": "string" }

I compared different vector databases, Weaviate, Pinecone and Chroma
Weaviate vector database has native REST API for creating objects, very convenient, worth to try.
https://weaviate.io/developers/weaviate/api/rest/batch

For search, Weaviate's GraphQL API are very useful for integration
https://weaviate.io/developers/weaviate/api/graphql

Data product owner can easily submit their data to Weaviate vector database.

neoxu999 · 2023-07-09T12:46:23Z

Hi @caldeirav,

I'd like to install Weaviate Vector database on Red Hat AI and show examples how to send data to Weaviate.
What do you reckon?

Many thanks,
Neo

caldeirav · 2023-07-13T16:24:54Z

@neoxu999 Weaveviate looks like a good candidate - I think the key is to ensure we can integrate the vector database with our MLOps automation first and foremost and once this is successful, we can start looking at data contributions and data tracing / lineage requirements in details first.

caldeirav · 2023-08-06T03:55:40Z

@neoxu999 Do you think it is possible to introduce Weaveviate into the Data Mesh pattern deployment now? As we are installing a new instance, we can then start to run simple examples such as the ones in the OpenAI playbook, before we introduce our own training pipeline.

Reference: https://github.com/openai/openai-cookbook/tree/main/examples/vector_databases

neoxu999 · 2023-08-07T05:48:03Z

@caldeirav good to know we have a new instance.
Yes, I can try the OpenAI playbook before installing Weaveviate on Data Mesh Pattern.

caldeirav added this to Data Mesh Pattern Backlog Jul 2, 2023

caldeirav converted this from a draft issue Jul 2, 2023

caldeirav added the kind/enhancement New feature or request label Jul 2, 2023

caldeirav assigned neoxu999 Jul 2, 2023

caldeirav moved this from 🆕 New to 📋 Backlog in Data Mesh Pattern Backlog Jul 2, 2023

caldeirav moved this from 📋 Backlog to 🏗 In progress in Data Mesh Pattern Backlog Aug 6, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Foundation Models Integration - Data Contributions #71

Foundation Models Integration - Data Contributions #71

caldeirav commented Jul 2, 2023 •

edited

Loading

neoxu999 commented Jul 9, 2023 •

edited

Loading

neoxu999 commented Jul 9, 2023

caldeirav commented Jul 13, 2023

caldeirav commented Aug 6, 2023

neoxu999 commented Aug 7, 2023

Foundation Models Integration - Data Contributions #71

Foundation Models Integration - Data Contributions #71

Comments

caldeirav commented Jul 2, 2023 • edited Loading

neoxu999 commented Jul 9, 2023 • edited Loading

neoxu999 commented Jul 9, 2023

caldeirav commented Jul 13, 2023

caldeirav commented Aug 6, 2023

neoxu999 commented Aug 7, 2023

caldeirav commented Jul 2, 2023 •

edited

Loading

neoxu999 commented Jul 9, 2023 •

edited

Loading