[OctoAI model provider] Llama3 update (meta-llama#494)

octoml · May 10, 2024 · b2eec4f · b2eec4f
2 parents ce4e5fb + 54f0949
commit b2eec4f
Show file tree

Hide file tree

Showing 7 changed files with 311 additions and 474 deletions.
diff --git a/recipes/llama_api_providers/OctoAI_API_examples/Getting_to_know_Llama.ipynb b/recipes/llama_api_providers/OctoAI_API_examples/Getting_to_know_Llama.ipynb
@@ -6,8 +6,43 @@
     "id": "LERqQn5v8-ak"
    },
    "source": [
-    "# **Getting to know Llama 2: Everything you need to start building**\n",
-    "Our goal in this session is to provide a guided tour of Llama 2, including understanding different Llama 2 models, how and where to access them, Generative AI and Chatbot architectures, prompt engineering, RAG (Retrieval Augmented Generation), Fine-tuning and more. All this is implemented with a starter code for you to take it and use it in your Llama 2 projects."
+    "# **Getting to know Llama 3: Everything you need to start building**\n",
+    "Our goal in this session is to provide a guided tour of Llama 3, including understanding different Llama 3 models, how and where to access them, Generative AI and Chatbot architectures, prompt engineering, RAG (Retrieval Augmented Generation), Fine-tuning and more. All this is implemented with a starter code for you to take it and use it in your Llama 3 projects."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "h3YGMDJidHtH"
+   },
+   "source": [
+    "### **Install dependencies**"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "id": "VhN6hXwx7FCp"
+   },
+   "outputs": [],
+   "source": [
+    "# Install dependencies and initialize\n",
+    "%pip install \\\n",
+    "    langchain==0.1.19 \\\n",
+    "    matplotlib \\\n",
+    "    octoai-sdk==0.10.1 \\\n",
+    "    openai \\\n",
+    "    sentence_transformers \\\n",
+    "    pdf2image \\\n",
+    "    pdfminer \\\n",
+    "    pdfminer.six \\\n",
+    "    unstructured \\\n",
+    "    faiss-cpu \\\n",
+    "    pillow-heif \\\n",
+    "    opencv-python \\\n",
+    "    unstructured-inference \\\n",
+    "    pikepdf"
    ]
   },
   {
@@ -58,7 +93,7 @@
     "    A[Users] --> B(Applications e.g. mobile, web)\n",
     "    B --> |Hosted API|C(Platforms e.g. Custom, OctoAI, HuggingFace, Replicate)\n",
     "    B -- optional --> E(Frameworks e.g. LangChain)\n",
-    "    C-->|User Input|D[Llama 2]\n",
+    "    C-->|User Input|D[Llama 3]\n",
     "    D-->|Model Output|C\n",
     "    E --> C\n",
     "    classDef default fill:#CCE6FF,stroke:#84BCF5,textColor:#1C2B33,fontFamily:trebuchet ms;\n",
@@ -69,19 +104,15 @@
     "  flowchart TD\n",
     "    A[User Prompts] --> B(Frameworks e.g. LangChain)\n",
     "    B <--> |Database, Docs, XLS|C[fa:fa-database External Data]\n",
-    "    B -->|API|D[Llama 2]\n",
+    "    B -->|API|D[Llama 3]\n",
     "    classDef default fill:#CCE6FF,stroke:#84BCF5,textColor:#1C2B33,fontFamily:trebuchet ms;\n",
     "  \"\"\")\n",
     "\n",
-    "def llama2_family():\n",
+    "def llama3_family():\n",
     "  mm(\"\"\"\n",
     "  graph LR;\n",
-    "      llama-2 --> llama-2-7b\n",
-    "      llama-2 --> llama-2-13b\n",
-    "      llama-2 --> llama-2-70b\n",
-    "      llama-2-7b --> llama-2-7b-chat\n",
-    "      llama-2-13b --> llama-2-13b-chat\n",
-    "      llama-2-70b --> llama-2-70b-chat\n",
+    "      llama-3 --> llama-3-8b-instruct\n",
+    "      llama-3 --> llama-3-70b-instruct\n",
     "      classDef default fill:#CCE6FF,stroke:#84BCF5,textColor:#1C2B33,fontFamily:trebuchet ms;\n",
     "  \"\"\")\n",
     "\n",
@@ -91,7 +122,7 @@
     "    users --> apps\n",
     "    apps --> frameworks\n",
     "    frameworks --> platforms\n",
-    "    platforms --> Llama 2\n",
+    "    platforms --> Llama 3\n",
     "    classDef default fill:#CCE6FF,stroke:#84BCF5,textColor:#1C2B33,fontFamily:trebuchet ms;\n",
     "  \"\"\")\n",
     "\n",
@@ -115,8 +146,8 @@
     "  user --> prompt\n",
     "  prompt --> i_safety\n",
     "  i_safety --> context\n",
-    "  context --> Llama_2\n",
-    "  Llama_2 --> output\n",
+    "  context --> Llama_3\n",
+    "  Llama_3 --> output\n",
     "  output --> o_safety\n",
     "  i_safety --> memory\n",
     "  o_safety --> memory\n",
@@ -165,7 +196,7 @@
     "id": "i4Np_l_KtIno"
    },
    "source": [
-    "##**1 - Understanding Llama 2**"
+    "##**1 - Understanding Llama 3**"
    ]
   },
   {
@@ -174,14 +205,13 @@
     "id": "PGPSI3M5PGTi"
    },
    "source": [
-    "### **1.1 - What is Llama 2?**\n",
+    "### **1.1 - What is Llama 3?**\n",
     "\n",
     "* State of the art (SOTA), Open Source LLM\n",
-    "* 7B, 13B, 70B\n",
+    "* Llama 3 8B, 70B\n",
     "* Pretrained + Chat\n",
     "* Choosing model: Size, Quality, Cost, Speed\n",
-    "* [Research paper](https://ai.meta.com/research/publications/llama-2-open-foundation-and-fine-tuned-chat-models/)\n",
-    "\n",
+    "* [Llama 3 blog](https://ai.meta.com/blog/meta-llama-3/)\n",
     "* [Responsible use guide](https://ai.meta.com/llama/responsible-use-guide/)"
    ]
   },
@@ -208,7 +238,7 @@
    },
    "outputs": [],
    "source": [
-    "llama2_family()"
+    "llama3_family()"
    ]
   },
   {
@@ -217,11 +247,10 @@
     "id": "aYeHVVh45bdT"
    },
    "source": [
-    "###**1.2 - Accessing Llama 2**\n",
+    "###**1.2 - Accessing Llama 3**\n",
     "* Download + Self Host (on-premise)\n",
     "* Hosted API Platform (e.g. [OctoAI](https://octoai.cloud/), [Replicate](https://replicate.com/meta))\n",
-    "* Hosted Container Platform (e.g. [Azure](https://techcommunity.microsoft.com/t5/ai-machine-learning-blog/introducing-llama-2-on-azure/ba-p/3881233), [AWS](https://aws.amazon.com/blogs/machine-learning/llama-2-foundation-models-from-meta-are-now-available-in-amazon-sagemaker-jumpstart/), [GCP](https://console.cloud.google.com/vertex-ai/publishers/google/model-garden/139))\n",
-    "\n"
+    "* Hosted Container Platform (e.g. [Azure](https://techcommunity.microsoft.com/t5/ai-machine-learning-blog/introducing-llama-2-on-azure/ba-p/3881233), [AWS](https://aws.amazon.com/blogs/machine-learning/llama-2-foundation-models-from-meta-are-now-available-in-amazon-sagemaker-jumpstart/), [GCP](https://console.cloud.google.com/vertex-ai/publishers/google/model-garden/139))"
    ]
   },
   {
@@ -230,7 +259,7 @@
     "id": "kBuSay8vtzL4"
    },
    "source": [
-    "### **1.3 - Use Cases of Llama 2**\n",
+    "### **1.3 - Use Cases of Llama 3**\n",
     "* Content Generation\n",
     "* Chatbots\n",
     "* Summarization\n",
@@ -245,42 +274,9 @@
     "id": "sd54g0OHuqBY"
    },
    "source": [
-    "##**2 - Using Llama 2**\n",
+    "##**2 - Using Llama 3**\n",
     "\n",
-    "In this notebook, we are going to access [Llama 13b chat model](https://octoai.cloud/tools/text/chat?mode=demo&model=llama-2-13b-chat-fp16) using hosted API from OctoAI."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {
-    "id": "h3YGMDJidHtH"
-   },
-   "source": [
-    "### **2.1 - Install dependencies**"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {
-    "id": "VhN6hXwx7FCp"
-   },
-   "outputs": [],
-   "source": [
-    "# Install dependencies and initialize\n",
-    "%pip install -qU \\\n",
-    "    octoai-sdk \\\n",
-    "    langchain \\\n",
-    "    sentence_transformers \\\n",
-    "    pdf2image \\\n",
-    "    pdfminer \\\n",
-    "    pdfminer.six \\\n",
-    "    unstructured \\\n",
-    "    faiss-cpu \\\n",
-    "    pillow-heif \\\n",
-    "    opencv-python \\\n",
-    "    unstructured-inference \\\n",
-    "    pikepdf"
+    "In this notebook, we are going to access [Llama 3 8b instruct model](https://octoai.cloud/text/chat?model=meta-llama-3-8b-instruct&mode=api) using hosted API from OctoAI."
    ]
   },
   {
@@ -292,9 +288,9 @@
    "outputs": [],
    "source": [
     "# model on OctoAI platform that we will use for inferencing\n",
-    "# We will use llama 13b chat model hosted on OctoAI server ()\n",
+    "# We will use llama 3 8b instruct model hosted on OctoAI server\n",
     "\n",
-    "llama2_13b = \"llama-2-13b-chat-fp16\""
+    "llama3_8b = \"meta-llama-3-8b-instruct\""
    ]
   },
   {
@@ -326,21 +322,21 @@
    },
    "outputs": [],
    "source": [
-    "# we will use OctoAI's hosted API\n",
-    "from octoai.client import Client\n",
+    "# We will use OpenAI's APIs to talk to OctoAI's hosted model endpoint\n",
+    "from openai import OpenAI\n",
     "\n",
-    "client = Client(OCTOAI_API_TOKEN)\n",
+    "client = OpenAI(\n",
+    "   base_url = \"https://text.octoai.run/v1\",\n",
+    "   api_key = os.environ[\"OCTOAI_API_TOKEN\"]\n",
+    ")\n",
     "\n",
     "# text completion with input prompt\n",
     "def Completion(prompt):\n",
     "    output = client.chat.completions.create(\n",
     "        messages=[\n",
-    "            {\n",
-    "                \"role\": \"user\",\n",
-    "                \"content\": prompt\n",
-    "            }\n",
+    "            {\"role\": \"user\", \"content\": prompt}\n",
     "        ],\n",
-    "        model=\"llama-2-13b-chat-fp16\",\n",
+    "        model=llama3_8b,\n",
     "        max_tokens=1000\n",
     "    )\n",
     "    return output.choices[0].message.content\n",
@@ -349,16 +345,10 @@
     "def ChatCompletion(prompt, system_prompt=None):\n",
     "    output = client.chat.completions.create(\n",
     "        messages=[\n",
-    "            {\n",
-    "                \"role\": \"system\",\n",
-    "                \"content\": system_prompt\n",
-    "            },\n",
-    "            {\n",
-    "                \"role\": \"user\",\n",
-    "                \"content\": prompt\n",
-    "            }\n",
+    "            {\"role\": \"system\", \"content\": system_prompt},\n",
+    "            {\"role\": \"user\", \"content\": prompt}\n",
     "        ],\n",
-    "        model=\"llama-2-13b-chat-fp16\",\n",
+    "        model=llama3_8b,\n",
     "        max_tokens=1000\n",
     "    )\n",
     "    return output.choices[0].message.content"
@@ -370,7 +360,7 @@
     "id": "5Jxq0pmf6L73"
    },
    "source": [
-    "### **2.2 - Basic completion**"
+    "# **2.1 - Basic completion**"
    ]
   },
   {
@@ -391,7 +381,7 @@
     "id": "StccjUDh6W0Q"
    },
    "source": [
-    "### **2.3 - System prompts**\n"
+    "## **2.2 - System prompts**\n"
    ]
   },
   {
@@ -415,7 +405,7 @@
     "id": "Hp4GNa066pYy"
    },
    "source": [
-    "### **2.4 - Response formats**\n",
+    "### **2.3 - Response formats**\n",
     "* Can support different formatted outputs e.g. text, JSON, etc."
    ]
   },
@@ -483,7 +473,7 @@
     "\n",
     "* User Prompts\n",
     "* Input Safety\n",
-    "* Llama 2\n",
+    "* Llama 3\n",
     "* Output Safety\n",
     "\n",
     "* Memory & Context"
@@ -743,12 +733,9 @@
     "### **4.3 - Retrieval Augmented Generation (RAG)**\n",
     "* Prompt Eng Limitations - Knowledge cutoff & lack of specialized data\n",
     "\n",
-    "* Retrieval Augmented Generation(RAG) allows us to retrieve snippets of information from external data sources and augment it to the user's prompt to get tailored responses from Llama 2.\n",
-    "\n",
-    "For our demo, we are going to download an external PDF file from a URL and query against the content in the pdf file to get contextually relevant information back with the help of Llama!\n",
+    "* Retrieval Augmented Generation(RAG) allows us to retrieve snippets of information from external data sources and augment it to the user's prompt to get tailored responses from Llama 3.\n",
     "\n",
-    "\n",
-    "\n"
+    "For our demo, we are going to download an external PDF file from a URL and query against the content in the pdf file to get contextually relevant information back with the help of Llama!"
    ]
   },
   {
@@ -797,24 +784,16 @@
    "source": [
     "# langchain setup\n",
     "from langchain.llms.octoai_endpoint import OctoAIEndpoint\n",
-    "# Use the Llama 2 model hosted on OctoAI\n",
-    "# Temperature: Adjusts randomness of outputs, greater than 1 is random and 0 is deterministic, 0.75 is a good starting value\n",
+    "\n",
+    "# Use the Llama 3 model hosted on OctoAI\n",
+    "# max_tokens: Maximum number of tokens to generate. A word is generally 2-3 tokens\n",
+    "# temperature: Adjusts randomness of outputs, greater than 1 is random and 0 is deterministic, 0.75 is a good starting value\n",
     "# top_p: When decoding text, samples from the top p percentage of most likely tokens; lower to ignore less likely tokens\n",
-    "# max_new_tokens: Maximum number of tokens to generate. A word is generally 2-3 tokens\n",
     "llama_model = OctoAIEndpoint(\n",
-    "    endpoint_url=\"https://text.octoai.run/v1/chat/completions\",\n",
-    "    model_kwargs={\n",
-    "        \"model\": llama2_13b,\n",
-    "        \"messages\": [\n",
-    "            {\n",
-    "                \"role\": \"system\",\n",
-    "                \"content\": \"You are a helpful, respectful and honest assistant.\"\n",
-    "            }\n",
-    "        ],\n",
-    "        \"max_tokens\": 1000,\n",
-    "        \"top_p\": 1,\n",
-    "        \"temperature\": 0.75\n",
-    "    },\n",
+    "    model=llama3_8b,\n",
+    "    max_tokens=1000,\n",
+    "    temperature=0.75,\n",
+    "    top_p=1\n",
     ")"
    ]
   },
@@ -973,10 +952,11 @@
    },
    "source": [
     "#### **Resources**\n",
-    "- [GitHub - Llama 2](https://github.com/facebookresearch/llama)\n",
-    "- [Github - LLama 2 Recipes](https://github.com/facebookresearch/llama-recipes)\n",
-    "- [Llama 2](https://ai.meta.com/llama/)\n",
-    "- [Research Paper](https://ai.meta.com/research/publications/llama-2-open-foundation-and-fine-tuned-chat-models/)\n",
+    "- [GitHub - Llama](https://github.com/facebookresearch/llama)\n",
+    "- [Github - LLama Recipes](https://github.com/facebookresearch/llama-recipes)\n",
+    "- [Llama](https://ai.meta.com/llama/)\n",
+    "- [Research Paper on Llama 2](https://ai.meta.com/research/publications/llama-2-open-foundation-and-fine-tuned-chat-models/)\n",
+    "- [Llama 3 Page](https://ai.meta.com/blog/meta-llama-3/)\n",
     "- [Model Card](https://github.com/facebookresearch/llama/blob/main/MODEL_CARD.md)\n",
     "- [Responsible Use Guide](https://ai.meta.com/llama/responsible-use-guide/)\n",
     "- [Acceptable Use Policy](https://ai.meta.com/llama/use-policy/)\n",
@@ -992,9 +972,9 @@
    "source": [
     "#### **Authors & Contact**\n",
     "  * [email protected], [Amit Sangani | LinkedIn](https://www.linkedin.com/in/amitsangani/)\n",
-    "  * [email protected], [Mohsen Agsen | LinkedIn](https://www.linkedin.com/in/mohsen-agsen-62a9791/)\n",
+    "  * [email protected], [Mohsen Agsen | LinkedIn](https://www.linkedin.com/in/dr-thierry-moreau/)\n",
     "\n",
-    "Adapted to run on OctoAI by Thierry Moreau - [email protected]"
+    "Adapted to run on OctoAI and use Llama 3 by [email protected] [Thierry Moreay | LinkedIn]()"
    ]
   }
  ],