Skip to content

Commit

Permalink
[OctoAI model provider] Llama3 update (meta-llama#494)
Browse files Browse the repository at this point in the history
  • Loading branch information
HamidShojanazeri authored May 10, 2024
2 parents ce4e5fb + 54f0949 commit b2eec4f
Show file tree
Hide file tree
Showing 7 changed files with 311 additions and 474 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,43 @@
"id": "LERqQn5v8-ak"
},
"source": [
"# **Getting to know Llama 2: Everything you need to start building**\n",
"Our goal in this session is to provide a guided tour of Llama 2, including understanding different Llama 2 models, how and where to access them, Generative AI and Chatbot architectures, prompt engineering, RAG (Retrieval Augmented Generation), Fine-tuning and more. All this is implemented with a starter code for you to take it and use it in your Llama 2 projects."
"# **Getting to know Llama 3: Everything you need to start building**\n",
"Our goal in this session is to provide a guided tour of Llama 3, including understanding different Llama 3 models, how and where to access them, Generative AI and Chatbot architectures, prompt engineering, RAG (Retrieval Augmented Generation), Fine-tuning and more. All this is implemented with a starter code for you to take it and use it in your Llama 3 projects."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "h3YGMDJidHtH"
},
"source": [
"### **Install dependencies**"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "VhN6hXwx7FCp"
},
"outputs": [],
"source": [
"# Install dependencies and initialize\n",
"%pip install \\\n",
" langchain==0.1.19 \\\n",
" matplotlib \\\n",
" octoai-sdk==0.10.1 \\\n",
" openai \\\n",
" sentence_transformers \\\n",
" pdf2image \\\n",
" pdfminer \\\n",
" pdfminer.six \\\n",
" unstructured \\\n",
" faiss-cpu \\\n",
" pillow-heif \\\n",
" opencv-python \\\n",
" unstructured-inference \\\n",
" pikepdf"
]
},
{
Expand Down Expand Up @@ -58,7 +93,7 @@
" A[Users] --> B(Applications e.g. mobile, web)\n",
" B --> |Hosted API|C(Platforms e.g. Custom, OctoAI, HuggingFace, Replicate)\n",
" B -- optional --> E(Frameworks e.g. LangChain)\n",
" C-->|User Input|D[Llama 2]\n",
" C-->|User Input|D[Llama 3]\n",
" D-->|Model Output|C\n",
" E --> C\n",
" classDef default fill:#CCE6FF,stroke:#84BCF5,textColor:#1C2B33,fontFamily:trebuchet ms;\n",
Expand All @@ -69,19 +104,15 @@
" flowchart TD\n",
" A[User Prompts] --> B(Frameworks e.g. LangChain)\n",
" B <--> |Database, Docs, XLS|C[fa:fa-database External Data]\n",
" B -->|API|D[Llama 2]\n",
" B -->|API|D[Llama 3]\n",
" classDef default fill:#CCE6FF,stroke:#84BCF5,textColor:#1C2B33,fontFamily:trebuchet ms;\n",
" \"\"\")\n",
"\n",
"def llama2_family():\n",
"def llama3_family():\n",
" mm(\"\"\"\n",
" graph LR;\n",
" llama-2 --> llama-2-7b\n",
" llama-2 --> llama-2-13b\n",
" llama-2 --> llama-2-70b\n",
" llama-2-7b --> llama-2-7b-chat\n",
" llama-2-13b --> llama-2-13b-chat\n",
" llama-2-70b --> llama-2-70b-chat\n",
" llama-3 --> llama-3-8b-instruct\n",
" llama-3 --> llama-3-70b-instruct\n",
" classDef default fill:#CCE6FF,stroke:#84BCF5,textColor:#1C2B33,fontFamily:trebuchet ms;\n",
" \"\"\")\n",
"\n",
Expand All @@ -91,7 +122,7 @@
" users --> apps\n",
" apps --> frameworks\n",
" frameworks --> platforms\n",
" platforms --> Llama 2\n",
" platforms --> Llama 3\n",
" classDef default fill:#CCE6FF,stroke:#84BCF5,textColor:#1C2B33,fontFamily:trebuchet ms;\n",
" \"\"\")\n",
"\n",
Expand All @@ -115,8 +146,8 @@
" user --> prompt\n",
" prompt --> i_safety\n",
" i_safety --> context\n",
" context --> Llama_2\n",
" Llama_2 --> output\n",
" context --> Llama_3\n",
" Llama_3 --> output\n",
" output --> o_safety\n",
" i_safety --> memory\n",
" o_safety --> memory\n",
Expand Down Expand Up @@ -165,7 +196,7 @@
"id": "i4Np_l_KtIno"
},
"source": [
"##**1 - Understanding Llama 2**"
"##**1 - Understanding Llama 3**"
]
},
{
Expand All @@ -174,14 +205,13 @@
"id": "PGPSI3M5PGTi"
},
"source": [
"### **1.1 - What is Llama 2?**\n",
"### **1.1 - What is Llama 3?**\n",
"\n",
"* State of the art (SOTA), Open Source LLM\n",
"* 7B, 13B, 70B\n",
"* Llama 3 8B, 70B\n",
"* Pretrained + Chat\n",
"* Choosing model: Size, Quality, Cost, Speed\n",
"* [Research paper](https://ai.meta.com/research/publications/llama-2-open-foundation-and-fine-tuned-chat-models/)\n",
"\n",
"* [Llama 3 blog](https://ai.meta.com/blog/meta-llama-3/)\n",
"* [Responsible use guide](https://ai.meta.com/llama/responsible-use-guide/)"
]
},
Expand All @@ -208,7 +238,7 @@
},
"outputs": [],
"source": [
"llama2_family()"
"llama3_family()"
]
},
{
Expand All @@ -217,11 +247,10 @@
"id": "aYeHVVh45bdT"
},
"source": [
"###**1.2 - Accessing Llama 2**\n",
"###**1.2 - Accessing Llama 3**\n",
"* Download + Self Host (on-premise)\n",
"* Hosted API Platform (e.g. [OctoAI](https://octoai.cloud/), [Replicate](https://replicate.com/meta))\n",
"* Hosted Container Platform (e.g. [Azure](https://techcommunity.microsoft.com/t5/ai-machine-learning-blog/introducing-llama-2-on-azure/ba-p/3881233), [AWS](https://aws.amazon.com/blogs/machine-learning/llama-2-foundation-models-from-meta-are-now-available-in-amazon-sagemaker-jumpstart/), [GCP](https://console.cloud.google.com/vertex-ai/publishers/google/model-garden/139))\n",
"\n"
"* Hosted Container Platform (e.g. [Azure](https://techcommunity.microsoft.com/t5/ai-machine-learning-blog/introducing-llama-2-on-azure/ba-p/3881233), [AWS](https://aws.amazon.com/blogs/machine-learning/llama-2-foundation-models-from-meta-are-now-available-in-amazon-sagemaker-jumpstart/), [GCP](https://console.cloud.google.com/vertex-ai/publishers/google/model-garden/139))"
]
},
{
Expand All @@ -230,7 +259,7 @@
"id": "kBuSay8vtzL4"
},
"source": [
"### **1.3 - Use Cases of Llama 2**\n",
"### **1.3 - Use Cases of Llama 3**\n",
"* Content Generation\n",
"* Chatbots\n",
"* Summarization\n",
Expand All @@ -245,42 +274,9 @@
"id": "sd54g0OHuqBY"
},
"source": [
"##**2 - Using Llama 2**\n",
"##**2 - Using Llama 3**\n",
"\n",
"In this notebook, we are going to access [Llama 13b chat model](https://octoai.cloud/tools/text/chat?mode=demo&model=llama-2-13b-chat-fp16) using hosted API from OctoAI."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "h3YGMDJidHtH"
},
"source": [
"### **2.1 - Install dependencies**"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "VhN6hXwx7FCp"
},
"outputs": [],
"source": [
"# Install dependencies and initialize\n",
"%pip install -qU \\\n",
" octoai-sdk \\\n",
" langchain \\\n",
" sentence_transformers \\\n",
" pdf2image \\\n",
" pdfminer \\\n",
" pdfminer.six \\\n",
" unstructured \\\n",
" faiss-cpu \\\n",
" pillow-heif \\\n",
" opencv-python \\\n",
" unstructured-inference \\\n",
" pikepdf"
"In this notebook, we are going to access [Llama 3 8b instruct model](https://octoai.cloud/text/chat?model=meta-llama-3-8b-instruct&mode=api) using hosted API from OctoAI."
]
},
{
Expand All @@ -292,9 +288,9 @@
"outputs": [],
"source": [
"# model on OctoAI platform that we will use for inferencing\n",
"# We will use llama 13b chat model hosted on OctoAI server ()\n",
"# We will use llama 3 8b instruct model hosted on OctoAI server\n",
"\n",
"llama2_13b = \"llama-2-13b-chat-fp16\""
"llama3_8b = \"meta-llama-3-8b-instruct\""
]
},
{
Expand Down Expand Up @@ -326,21 +322,21 @@
},
"outputs": [],
"source": [
"# we will use OctoAI's hosted API\n",
"from octoai.client import Client\n",
"# We will use OpenAI's APIs to talk to OctoAI's hosted model endpoint\n",
"from openai import OpenAI\n",
"\n",
"client = Client(OCTOAI_API_TOKEN)\n",
"client = OpenAI(\n",
" base_url = \"https://text.octoai.run/v1\",\n",
" api_key = os.environ[\"OCTOAI_API_TOKEN\"]\n",
")\n",
"\n",
"# text completion with input prompt\n",
"def Completion(prompt):\n",
" output = client.chat.completions.create(\n",
" messages=[\n",
" {\n",
" \"role\": \"user\",\n",
" \"content\": prompt\n",
" }\n",
" {\"role\": \"user\", \"content\": prompt}\n",
" ],\n",
" model=\"llama-2-13b-chat-fp16\",\n",
" model=llama3_8b,\n",
" max_tokens=1000\n",
" )\n",
" return output.choices[0].message.content\n",
Expand All @@ -349,16 +345,10 @@
"def ChatCompletion(prompt, system_prompt=None):\n",
" output = client.chat.completions.create(\n",
" messages=[\n",
" {\n",
" \"role\": \"system\",\n",
" \"content\": system_prompt\n",
" },\n",
" {\n",
" \"role\": \"user\",\n",
" \"content\": prompt\n",
" }\n",
" {\"role\": \"system\", \"content\": system_prompt},\n",
" {\"role\": \"user\", \"content\": prompt}\n",
" ],\n",
" model=\"llama-2-13b-chat-fp16\",\n",
" model=llama3_8b,\n",
" max_tokens=1000\n",
" )\n",
" return output.choices[0].message.content"
Expand All @@ -370,7 +360,7 @@
"id": "5Jxq0pmf6L73"
},
"source": [
"### **2.2 - Basic completion**"
"# **2.1 - Basic completion**"
]
},
{
Expand All @@ -391,7 +381,7 @@
"id": "StccjUDh6W0Q"
},
"source": [
"### **2.3 - System prompts**\n"
"## **2.2 - System prompts**\n"
]
},
{
Expand All @@ -415,7 +405,7 @@
"id": "Hp4GNa066pYy"
},
"source": [
"### **2.4 - Response formats**\n",
"### **2.3 - Response formats**\n",
"* Can support different formatted outputs e.g. text, JSON, etc."
]
},
Expand Down Expand Up @@ -483,7 +473,7 @@
"\n",
"* User Prompts\n",
"* Input Safety\n",
"* Llama 2\n",
"* Llama 3\n",
"* Output Safety\n",
"\n",
"* Memory & Context"
Expand Down Expand Up @@ -743,12 +733,9 @@
"### **4.3 - Retrieval Augmented Generation (RAG)**\n",
"* Prompt Eng Limitations - Knowledge cutoff & lack of specialized data\n",
"\n",
"* Retrieval Augmented Generation(RAG) allows us to retrieve snippets of information from external data sources and augment it to the user's prompt to get tailored responses from Llama 2.\n",
"\n",
"For our demo, we are going to download an external PDF file from a URL and query against the content in the pdf file to get contextually relevant information back with the help of Llama!\n",
"* Retrieval Augmented Generation(RAG) allows us to retrieve snippets of information from external data sources and augment it to the user's prompt to get tailored responses from Llama 3.\n",
"\n",
"\n",
"\n"
"For our demo, we are going to download an external PDF file from a URL and query against the content in the pdf file to get contextually relevant information back with the help of Llama!"
]
},
{
Expand Down Expand Up @@ -797,24 +784,16 @@
"source": [
"# langchain setup\n",
"from langchain.llms.octoai_endpoint import OctoAIEndpoint\n",
"# Use the Llama 2 model hosted on OctoAI\n",
"# Temperature: Adjusts randomness of outputs, greater than 1 is random and 0 is deterministic, 0.75 is a good starting value\n",
"\n",
"# Use the Llama 3 model hosted on OctoAI\n",
"# max_tokens: Maximum number of tokens to generate. A word is generally 2-3 tokens\n",
"# temperature: Adjusts randomness of outputs, greater than 1 is random and 0 is deterministic, 0.75 is a good starting value\n",
"# top_p: When decoding text, samples from the top p percentage of most likely tokens; lower to ignore less likely tokens\n",
"# max_new_tokens: Maximum number of tokens to generate. A word is generally 2-3 tokens\n",
"llama_model = OctoAIEndpoint(\n",
" endpoint_url=\"https://text.octoai.run/v1/chat/completions\",\n",
" model_kwargs={\n",
" \"model\": llama2_13b,\n",
" \"messages\": [\n",
" {\n",
" \"role\": \"system\",\n",
" \"content\": \"You are a helpful, respectful and honest assistant.\"\n",
" }\n",
" ],\n",
" \"max_tokens\": 1000,\n",
" \"top_p\": 1,\n",
" \"temperature\": 0.75\n",
" },\n",
" model=llama3_8b,\n",
" max_tokens=1000,\n",
" temperature=0.75,\n",
" top_p=1\n",
")"
]
},
Expand Down Expand Up @@ -973,10 +952,11 @@
},
"source": [
"#### **Resources**\n",
"- [GitHub - Llama 2](https://github.com/facebookresearch/llama)\n",
"- [Github - LLama 2 Recipes](https://github.com/facebookresearch/llama-recipes)\n",
"- [Llama 2](https://ai.meta.com/llama/)\n",
"- [Research Paper](https://ai.meta.com/research/publications/llama-2-open-foundation-and-fine-tuned-chat-models/)\n",
"- [GitHub - Llama](https://github.com/facebookresearch/llama)\n",
"- [Github - LLama Recipes](https://github.com/facebookresearch/llama-recipes)\n",
"- [Llama](https://ai.meta.com/llama/)\n",
"- [Research Paper on Llama 2](https://ai.meta.com/research/publications/llama-2-open-foundation-and-fine-tuned-chat-models/)\n",
"- [Llama 3 Page](https://ai.meta.com/blog/meta-llama-3/)\n",
"- [Model Card](https://github.com/facebookresearch/llama/blob/main/MODEL_CARD.md)\n",
"- [Responsible Use Guide](https://ai.meta.com/llama/responsible-use-guide/)\n",
"- [Acceptable Use Policy](https://ai.meta.com/llama/use-policy/)\n",
Expand All @@ -992,9 +972,9 @@
"source": [
"#### **Authors & Contact**\n",
" * [email protected], [Amit Sangani | LinkedIn](https://www.linkedin.com/in/amitsangani/)\n",
" * [email protected], [Mohsen Agsen | LinkedIn](https://www.linkedin.com/in/mohsen-agsen-62a9791/)\n",
" * [email protected], [Mohsen Agsen | LinkedIn](https://www.linkedin.com/in/dr-thierry-moreau/)\n",
"\n",
"Adapted to run on OctoAI by Thierry Moreau - [email protected]"
"Adapted to run on OctoAI and use Llama 3 by [email protected] [Thierry Moreay | LinkedIn]()"
]
}
],
Expand Down
Loading

0 comments on commit b2eec4f

Please sign in to comment.