forked from meta-llama/llama-cookbook
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[OctoAI model provider] Llama3 update (meta-llama#494)
- Loading branch information
Showing
7 changed files
with
311 additions
and
474 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -6,8 +6,43 @@ | |
"id": "LERqQn5v8-ak" | ||
}, | ||
"source": [ | ||
"# **Getting to know Llama 2: Everything you need to start building**\n", | ||
"Our goal in this session is to provide a guided tour of Llama 2, including understanding different Llama 2 models, how and where to access them, Generative AI and Chatbot architectures, prompt engineering, RAG (Retrieval Augmented Generation), Fine-tuning and more. All this is implemented with a starter code for you to take it and use it in your Llama 2 projects." | ||
"# **Getting to know Llama 3: Everything you need to start building**\n", | ||
"Our goal in this session is to provide a guided tour of Llama 3, including understanding different Llama 3 models, how and where to access them, Generative AI and Chatbot architectures, prompt engineering, RAG (Retrieval Augmented Generation), Fine-tuning and more. All this is implemented with a starter code for you to take it and use it in your Llama 3 projects." | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": { | ||
"id": "h3YGMDJidHtH" | ||
}, | ||
"source": [ | ||
"### **Install dependencies**" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": { | ||
"id": "VhN6hXwx7FCp" | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"# Install dependencies and initialize\n", | ||
"%pip install \\\n", | ||
" langchain==0.1.19 \\\n", | ||
" matplotlib \\\n", | ||
" octoai-sdk==0.10.1 \\\n", | ||
" openai \\\n", | ||
" sentence_transformers \\\n", | ||
" pdf2image \\\n", | ||
" pdfminer \\\n", | ||
" pdfminer.six \\\n", | ||
" unstructured \\\n", | ||
" faiss-cpu \\\n", | ||
" pillow-heif \\\n", | ||
" opencv-python \\\n", | ||
" unstructured-inference \\\n", | ||
" pikepdf" | ||
] | ||
}, | ||
{ | ||
|
@@ -58,7 +93,7 @@ | |
" A[Users] --> B(Applications e.g. mobile, web)\n", | ||
" B --> |Hosted API|C(Platforms e.g. Custom, OctoAI, HuggingFace, Replicate)\n", | ||
" B -- optional --> E(Frameworks e.g. LangChain)\n", | ||
" C-->|User Input|D[Llama 2]\n", | ||
" C-->|User Input|D[Llama 3]\n", | ||
" D-->|Model Output|C\n", | ||
" E --> C\n", | ||
" classDef default fill:#CCE6FF,stroke:#84BCF5,textColor:#1C2B33,fontFamily:trebuchet ms;\n", | ||
|
@@ -69,19 +104,15 @@ | |
" flowchart TD\n", | ||
" A[User Prompts] --> B(Frameworks e.g. LangChain)\n", | ||
" B <--> |Database, Docs, XLS|C[fa:fa-database External Data]\n", | ||
" B -->|API|D[Llama 2]\n", | ||
" B -->|API|D[Llama 3]\n", | ||
" classDef default fill:#CCE6FF,stroke:#84BCF5,textColor:#1C2B33,fontFamily:trebuchet ms;\n", | ||
" \"\"\")\n", | ||
"\n", | ||
"def llama2_family():\n", | ||
"def llama3_family():\n", | ||
" mm(\"\"\"\n", | ||
" graph LR;\n", | ||
" llama-2 --> llama-2-7b\n", | ||
" llama-2 --> llama-2-13b\n", | ||
" llama-2 --> llama-2-70b\n", | ||
" llama-2-7b --> llama-2-7b-chat\n", | ||
" llama-2-13b --> llama-2-13b-chat\n", | ||
" llama-2-70b --> llama-2-70b-chat\n", | ||
" llama-3 --> llama-3-8b-instruct\n", | ||
" llama-3 --> llama-3-70b-instruct\n", | ||
" classDef default fill:#CCE6FF,stroke:#84BCF5,textColor:#1C2B33,fontFamily:trebuchet ms;\n", | ||
" \"\"\")\n", | ||
"\n", | ||
|
@@ -91,7 +122,7 @@ | |
" users --> apps\n", | ||
" apps --> frameworks\n", | ||
" frameworks --> platforms\n", | ||
" platforms --> Llama 2\n", | ||
" platforms --> Llama 3\n", | ||
" classDef default fill:#CCE6FF,stroke:#84BCF5,textColor:#1C2B33,fontFamily:trebuchet ms;\n", | ||
" \"\"\")\n", | ||
"\n", | ||
|
@@ -115,8 +146,8 @@ | |
" user --> prompt\n", | ||
" prompt --> i_safety\n", | ||
" i_safety --> context\n", | ||
" context --> Llama_2\n", | ||
" Llama_2 --> output\n", | ||
" context --> Llama_3\n", | ||
" Llama_3 --> output\n", | ||
" output --> o_safety\n", | ||
" i_safety --> memory\n", | ||
" o_safety --> memory\n", | ||
|
@@ -165,7 +196,7 @@ | |
"id": "i4Np_l_KtIno" | ||
}, | ||
"source": [ | ||
"##**1 - Understanding Llama 2**" | ||
"##**1 - Understanding Llama 3**" | ||
] | ||
}, | ||
{ | ||
|
@@ -174,14 +205,13 @@ | |
"id": "PGPSI3M5PGTi" | ||
}, | ||
"source": [ | ||
"### **1.1 - What is Llama 2?**\n", | ||
"### **1.1 - What is Llama 3?**\n", | ||
"\n", | ||
"* State of the art (SOTA), Open Source LLM\n", | ||
"* 7B, 13B, 70B\n", | ||
"* Llama 3 8B, 70B\n", | ||
"* Pretrained + Chat\n", | ||
"* Choosing model: Size, Quality, Cost, Speed\n", | ||
"* [Research paper](https://ai.meta.com/research/publications/llama-2-open-foundation-and-fine-tuned-chat-models/)\n", | ||
"\n", | ||
"* [Llama 3 blog](https://ai.meta.com/blog/meta-llama-3/)\n", | ||
"* [Responsible use guide](https://ai.meta.com/llama/responsible-use-guide/)" | ||
] | ||
}, | ||
|
@@ -208,7 +238,7 @@ | |
}, | ||
"outputs": [], | ||
"source": [ | ||
"llama2_family()" | ||
"llama3_family()" | ||
] | ||
}, | ||
{ | ||
|
@@ -217,11 +247,10 @@ | |
"id": "aYeHVVh45bdT" | ||
}, | ||
"source": [ | ||
"###**1.2 - Accessing Llama 2**\n", | ||
"###**1.2 - Accessing Llama 3**\n", | ||
"* Download + Self Host (on-premise)\n", | ||
"* Hosted API Platform (e.g. [OctoAI](https://octoai.cloud/), [Replicate](https://replicate.com/meta))\n", | ||
"* Hosted Container Platform (e.g. [Azure](https://techcommunity.microsoft.com/t5/ai-machine-learning-blog/introducing-llama-2-on-azure/ba-p/3881233), [AWS](https://aws.amazon.com/blogs/machine-learning/llama-2-foundation-models-from-meta-are-now-available-in-amazon-sagemaker-jumpstart/), [GCP](https://console.cloud.google.com/vertex-ai/publishers/google/model-garden/139))\n", | ||
"\n" | ||
"* Hosted Container Platform (e.g. [Azure](https://techcommunity.microsoft.com/t5/ai-machine-learning-blog/introducing-llama-2-on-azure/ba-p/3881233), [AWS](https://aws.amazon.com/blogs/machine-learning/llama-2-foundation-models-from-meta-are-now-available-in-amazon-sagemaker-jumpstart/), [GCP](https://console.cloud.google.com/vertex-ai/publishers/google/model-garden/139))" | ||
] | ||
}, | ||
{ | ||
|
@@ -230,7 +259,7 @@ | |
"id": "kBuSay8vtzL4" | ||
}, | ||
"source": [ | ||
"### **1.3 - Use Cases of Llama 2**\n", | ||
"### **1.3 - Use Cases of Llama 3**\n", | ||
"* Content Generation\n", | ||
"* Chatbots\n", | ||
"* Summarization\n", | ||
|
@@ -245,42 +274,9 @@ | |
"id": "sd54g0OHuqBY" | ||
}, | ||
"source": [ | ||
"##**2 - Using Llama 2**\n", | ||
"##**2 - Using Llama 3**\n", | ||
"\n", | ||
"In this notebook, we are going to access [Llama 13b chat model](https://octoai.cloud/tools/text/chat?mode=demo&model=llama-2-13b-chat-fp16) using hosted API from OctoAI." | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": { | ||
"id": "h3YGMDJidHtH" | ||
}, | ||
"source": [ | ||
"### **2.1 - Install dependencies**" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": { | ||
"id": "VhN6hXwx7FCp" | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"# Install dependencies and initialize\n", | ||
"%pip install -qU \\\n", | ||
" octoai-sdk \\\n", | ||
" langchain \\\n", | ||
" sentence_transformers \\\n", | ||
" pdf2image \\\n", | ||
" pdfminer \\\n", | ||
" pdfminer.six \\\n", | ||
" unstructured \\\n", | ||
" faiss-cpu \\\n", | ||
" pillow-heif \\\n", | ||
" opencv-python \\\n", | ||
" unstructured-inference \\\n", | ||
" pikepdf" | ||
"In this notebook, we are going to access [Llama 3 8b instruct model](https://octoai.cloud/text/chat?model=meta-llama-3-8b-instruct&mode=api) using hosted API from OctoAI." | ||
] | ||
}, | ||
{ | ||
|
@@ -292,9 +288,9 @@ | |
"outputs": [], | ||
"source": [ | ||
"# model on OctoAI platform that we will use for inferencing\n", | ||
"# We will use llama 13b chat model hosted on OctoAI server ()\n", | ||
"# We will use llama 3 8b instruct model hosted on OctoAI server\n", | ||
"\n", | ||
"llama2_13b = \"llama-2-13b-chat-fp16\"" | ||
"llama3_8b = \"meta-llama-3-8b-instruct\"" | ||
] | ||
}, | ||
{ | ||
|
@@ -326,21 +322,21 @@ | |
}, | ||
"outputs": [], | ||
"source": [ | ||
"# we will use OctoAI's hosted API\n", | ||
"from octoai.client import Client\n", | ||
"# We will use OpenAI's APIs to talk to OctoAI's hosted model endpoint\n", | ||
"from openai import OpenAI\n", | ||
"\n", | ||
"client = Client(OCTOAI_API_TOKEN)\n", | ||
"client = OpenAI(\n", | ||
" base_url = \"https://text.octoai.run/v1\",\n", | ||
" api_key = os.environ[\"OCTOAI_API_TOKEN\"]\n", | ||
")\n", | ||
"\n", | ||
"# text completion with input prompt\n", | ||
"def Completion(prompt):\n", | ||
" output = client.chat.completions.create(\n", | ||
" messages=[\n", | ||
" {\n", | ||
" \"role\": \"user\",\n", | ||
" \"content\": prompt\n", | ||
" }\n", | ||
" {\"role\": \"user\", \"content\": prompt}\n", | ||
" ],\n", | ||
" model=\"llama-2-13b-chat-fp16\",\n", | ||
" model=llama3_8b,\n", | ||
" max_tokens=1000\n", | ||
" )\n", | ||
" return output.choices[0].message.content\n", | ||
|
@@ -349,16 +345,10 @@ | |
"def ChatCompletion(prompt, system_prompt=None):\n", | ||
" output = client.chat.completions.create(\n", | ||
" messages=[\n", | ||
" {\n", | ||
" \"role\": \"system\",\n", | ||
" \"content\": system_prompt\n", | ||
" },\n", | ||
" {\n", | ||
" \"role\": \"user\",\n", | ||
" \"content\": prompt\n", | ||
" }\n", | ||
" {\"role\": \"system\", \"content\": system_prompt},\n", | ||
" {\"role\": \"user\", \"content\": prompt}\n", | ||
" ],\n", | ||
" model=\"llama-2-13b-chat-fp16\",\n", | ||
" model=llama3_8b,\n", | ||
" max_tokens=1000\n", | ||
" )\n", | ||
" return output.choices[0].message.content" | ||
|
@@ -370,7 +360,7 @@ | |
"id": "5Jxq0pmf6L73" | ||
}, | ||
"source": [ | ||
"### **2.2 - Basic completion**" | ||
"# **2.1 - Basic completion**" | ||
] | ||
}, | ||
{ | ||
|
@@ -391,7 +381,7 @@ | |
"id": "StccjUDh6W0Q" | ||
}, | ||
"source": [ | ||
"### **2.3 - System prompts**\n" | ||
"## **2.2 - System prompts**\n" | ||
] | ||
}, | ||
{ | ||
|
@@ -415,7 +405,7 @@ | |
"id": "Hp4GNa066pYy" | ||
}, | ||
"source": [ | ||
"### **2.4 - Response formats**\n", | ||
"### **2.3 - Response formats**\n", | ||
"* Can support different formatted outputs e.g. text, JSON, etc." | ||
] | ||
}, | ||
|
@@ -483,7 +473,7 @@ | |
"\n", | ||
"* User Prompts\n", | ||
"* Input Safety\n", | ||
"* Llama 2\n", | ||
"* Llama 3\n", | ||
"* Output Safety\n", | ||
"\n", | ||
"* Memory & Context" | ||
|
@@ -743,12 +733,9 @@ | |
"### **4.3 - Retrieval Augmented Generation (RAG)**\n", | ||
"* Prompt Eng Limitations - Knowledge cutoff & lack of specialized data\n", | ||
"\n", | ||
"* Retrieval Augmented Generation(RAG) allows us to retrieve snippets of information from external data sources and augment it to the user's prompt to get tailored responses from Llama 2.\n", | ||
"\n", | ||
"For our demo, we are going to download an external PDF file from a URL and query against the content in the pdf file to get contextually relevant information back with the help of Llama!\n", | ||
"* Retrieval Augmented Generation(RAG) allows us to retrieve snippets of information from external data sources and augment it to the user's prompt to get tailored responses from Llama 3.\n", | ||
"\n", | ||
"\n", | ||
"\n" | ||
"For our demo, we are going to download an external PDF file from a URL and query against the content in the pdf file to get contextually relevant information back with the help of Llama!" | ||
] | ||
}, | ||
{ | ||
|
@@ -797,24 +784,16 @@ | |
"source": [ | ||
"# langchain setup\n", | ||
"from langchain.llms.octoai_endpoint import OctoAIEndpoint\n", | ||
"# Use the Llama 2 model hosted on OctoAI\n", | ||
"# Temperature: Adjusts randomness of outputs, greater than 1 is random and 0 is deterministic, 0.75 is a good starting value\n", | ||
"\n", | ||
"# Use the Llama 3 model hosted on OctoAI\n", | ||
"# max_tokens: Maximum number of tokens to generate. A word is generally 2-3 tokens\n", | ||
"# temperature: Adjusts randomness of outputs, greater than 1 is random and 0 is deterministic, 0.75 is a good starting value\n", | ||
"# top_p: When decoding text, samples from the top p percentage of most likely tokens; lower to ignore less likely tokens\n", | ||
"# max_new_tokens: Maximum number of tokens to generate. A word is generally 2-3 tokens\n", | ||
"llama_model = OctoAIEndpoint(\n", | ||
" endpoint_url=\"https://text.octoai.run/v1/chat/completions\",\n", | ||
" model_kwargs={\n", | ||
" \"model\": llama2_13b,\n", | ||
" \"messages\": [\n", | ||
" {\n", | ||
" \"role\": \"system\",\n", | ||
" \"content\": \"You are a helpful, respectful and honest assistant.\"\n", | ||
" }\n", | ||
" ],\n", | ||
" \"max_tokens\": 1000,\n", | ||
" \"top_p\": 1,\n", | ||
" \"temperature\": 0.75\n", | ||
" },\n", | ||
" model=llama3_8b,\n", | ||
" max_tokens=1000,\n", | ||
" temperature=0.75,\n", | ||
" top_p=1\n", | ||
")" | ||
] | ||
}, | ||
|
@@ -973,10 +952,11 @@ | |
}, | ||
"source": [ | ||
"#### **Resources**\n", | ||
"- [GitHub - Llama 2](https://github.com/facebookresearch/llama)\n", | ||
"- [Github - LLama 2 Recipes](https://github.com/facebookresearch/llama-recipes)\n", | ||
"- [Llama 2](https://ai.meta.com/llama/)\n", | ||
"- [Research Paper](https://ai.meta.com/research/publications/llama-2-open-foundation-and-fine-tuned-chat-models/)\n", | ||
"- [GitHub - Llama](https://github.com/facebookresearch/llama)\n", | ||
"- [Github - LLama Recipes](https://github.com/facebookresearch/llama-recipes)\n", | ||
"- [Llama](https://ai.meta.com/llama/)\n", | ||
"- [Research Paper on Llama 2](https://ai.meta.com/research/publications/llama-2-open-foundation-and-fine-tuned-chat-models/)\n", | ||
"- [Llama 3 Page](https://ai.meta.com/blog/meta-llama-3/)\n", | ||
"- [Model Card](https://github.com/facebookresearch/llama/blob/main/MODEL_CARD.md)\n", | ||
"- [Responsible Use Guide](https://ai.meta.com/llama/responsible-use-guide/)\n", | ||
"- [Acceptable Use Policy](https://ai.meta.com/llama/use-policy/)\n", | ||
|
@@ -992,9 +972,9 @@ | |
"source": [ | ||
"#### **Authors & Contact**\n", | ||
" * [email protected], [Amit Sangani | LinkedIn](https://www.linkedin.com/in/amitsangani/)\n", | ||
" * [email protected], [Mohsen Agsen | LinkedIn](https://www.linkedin.com/in/mohsen-agsen-62a9791/)\n", | ||
" * [email protected], [Mohsen Agsen | LinkedIn](https://www.linkedin.com/in/dr-thierry-moreau/)\n", | ||
"\n", | ||
"Adapted to run on OctoAI by Thierry Moreau - [email protected]" | ||
"Adapted to run on OctoAI and use Llama 3 by [email protected] [Thierry Moreay | LinkedIn]()" | ||
] | ||
} | ||
], | ||
|
Oops, something went wrong.