Merge pull request #24 from togethercomputer/add-quickstart

Added quickstart MoA code + architecture diagram
togethercomputer · Jun 24, 2024 · 9138bc7 · 9138bc7
2 parents cd19906 + 0d0bd48
commit 9138bc7
Show file tree

Hide file tree

Showing 3 changed files with 126 additions and 14 deletions.
diff --git a/README.md b/README.md
@@ -1,35 +1,98 @@
-# Mixture-of-Agents Enhances Large Language Model Capabilities
+# Mixture-of-Agents (MoA)
 
 [![License](https://img.shields.io/badge/License-Apache_2.0-green.svg)](LICENSE)
 [![arXiv](https://img.shields.io/badge/ArXiv-2406.04692-b31b1b.svg)](https://arxiv.org/abs/2406.04692)
 [![Discord](https://img.shields.io/badge/Discord-Together%20AI-blue?logo=discord&logoColor=white)](https://discord.com/invite/9Rk6sSeWEG)
 [![Twitter](https://img.shields.io/twitter/url/https/twitter.com/togethercompute.svg?style=social&label=Follow%20%40togethercompute)](https://twitter.com/togethercompute)
 
-## Overview
+<a href="https://www.pdftochat.com/">
+  <img alt="PDFToChat – Chat with your PDFs in seconds." src="./assets/together-moa-explained.png">
+</a>
+
+<p align="center">
+  <a href="#overview"><strong>Overview</strong></a> ·
+  <a href="#quickstart:-moa-in-50-loc"><strong>Quickstart</strong></a> ·
+  <a href="#interactive-cli-demo"><strong>Demo</strong></a>
+  ·
+  <a href="#evaluation"><strong>Evaluation</strong></a>
+  ·
+  <a href="#results"><strong>Results</strong></a>
+  .
+  <a href="#credits"><strong>Credits</strong></a>
+</p>
 
-<div align="center">
-  <img src="assets/moa.jpg" alt="moa" style="width: 100%; display: block; margin-left: auto; margin-right: auto;" />
-  <br>
-</div>
+## Overview
 
-Mixture of Agents (MoA) is a novel approach that leverages the collective strengths of multiple LLMs to enhance performance, achieving state-of-the-art results. By employing a layered architecture where each layer comprises several LLM agents, MoA significantly outperforms GPT-4 Omni’s 57.5% on AlpacaEval 2.0 with a score of 65.1%, using only open-source models!
+Mixture of Agents (MoA) is a novel approach that leverages the collective strengths of multiple LLMs to enhance performance, achieving state-of-the-art results. By employing a layered architecture where each layer comprises several LLM agents, **MoA significantly outperforms GPT-4 Omni’s 57.5% on AlpacaEval 2.0 with a score of 65.1%**, using only open-source models!
+
+## Quickstart: MoA in 50 LOC
+
+To get to get started with using MoA in your own apps, see `moa.py`. You'll need to:
+
+1. Install the Together Python library: `pip install together`
+2. Get your [Together API Key](https://api.together.xyz/settings/api-keys) & export it: `export TOGETHER_API_KEY=`
+3. Run the python file: `python moa.py`
+
+```py
+# Mixture-of-Agents in 50 lines of code – see moa.py
+import asyncio
+import os
+from together import AsyncTogether, Together
+
+client = Together(api_key=os.environ.get("TOGETHER_API_KEY"))
+async_client = AsyncTogether(api_key=os.environ.get("TOGETHER_API_KEY"))
+reference_models = [
+    "Qwen/Qwen2-72B-Instruct",
+    "Qwen/Qwen1.5-72B-Chat",
+    "mistralai/Mixtral-8x22B-Instruct-v0.1",
+    "databricks/dbrx-instruct",
+]
+aggregator_model = "mistralai/Mixtral-8x22B-Instruct-v0.1"
+aggreagator_system_prompt = "...synthesize these responses into a single, high-quality response... Responses from models:"
+
+async def run_llm(model):
+    response = await async_client.chat.completions.create(
+        model=model,
+        messages=[{"role": "user", "content": "What are some fun things to do in SF?"}],
+        temperature=0.7,
+        max_tokens=100,
+    )
+    return response.choices[0].message.content
+
+async def main():
+    results = await asyncio.gather(*[run_llm(model) for model in reference_models])
+
+    finalStream = client.chat.completions.create(
+        model=aggregator_model,
+        messages=[
+            {"role": "system", "content": aggreagator_system_prompt},
+            {"role": "user", "content": ",".join(str(element) for element in results)},
+        ],
+        stream=True,
+    )
+
+    for chunk in finalStream:
+        print(chunk.choices[0].delta.content or "", end="", flush=True)
+
+asyncio.run(main())
+```
 
-## Interactive Demo
+## Interactive CLI Demo
 
-We first present an interactive demo. It showcases a simple multi-turn chatbot where the final response is aggregated from various reference models.
+This interactive CLI demo showcases a simple multi-turn chatbot where the final response is aggregated from various reference models.
 
 ### Setup
 
 1. Export Your API Key:
 
-   Ensure you have your Together API key and export it as an environment variable:
+   Ensure you have your [Together API key](https://api.together.xyz/settings/api-keys) and export it as an environment variable:
 
    ```bash
    export TOGETHER_API_KEY={your_key}
    ```
 
 2. Install Requirements:
-   
+
    ```bash
    pip install -r requirements.txt
    ```
@@ -42,13 +105,12 @@ To run the interactive demo, execute the following script with Python:
 python bot.py
 ```
 
-The script will prompt you to input instructions interactively. Here's how to use it:
+The CLI will prompt you to input instructions interactively:
 
 1. Start by entering your instruction at the ">>>" prompt.
 2. The system will process your input using the predefined reference models.
 3. It will generate a response based on the aggregated outputs from these models.
 4. You can continue the conversation by inputting more instructions, with the system maintaining the context of the multi-turn interaction.
-5. enter `exit` to exit the chatbot.
 
 ### Configuration
 
@@ -65,7 +127,7 @@ You can configure the demo by specifying the following parameters:
 ## Evaluation
 
 We provide scripts to quickly reproduce some of the results presented in our paper
-For convenience, we have included the code from [AlpacaEval](https://github.com/tatsu-lab/alpaca_eval), 
+For convenience, we have included the code from [AlpacaEval](https://github.com/tatsu-lab/alpaca_eval),
 [MT-Bench](https://github.com/lm-sys/FastChat), and [FLASK](https://github.com/kaistAI/FLASK), with necessary modifications.
 We extend our gratitude to these projects for creating the benchmarks.
 

diff --git a/assets/together-moa-explained.png b/assets/together-moa-explained.png
diff --git a/moa.py b/moa.py
@@ -0,0 +1,50 @@
+# Mixture-of-Agents in 50 lines of code
+import asyncio
+import os
+from together import AsyncTogether, Together
+
+client = Together(api_key=os.environ.get("TOGETHER_API_KEY"))
+async_client = AsyncTogether(api_key=os.environ.get("TOGETHER_API_KEY"))
+
+user_prompt = "What are some fun things to do in SF?"
+reference_models = [
+    "Qwen/Qwen2-72B-Instruct",
+    "Qwen/Qwen1.5-72B-Chat",
+    "mistralai/Mixtral-8x22B-Instruct-v0.1",
+    "databricks/dbrx-instruct",
+]
+aggregator_model = "mistralai/Mixtral-8x22B-Instruct-v0.1"
+aggreagator_system_prompt = """You have been provided with a set of responses from various open-source models to the latest user query. Your task is to synthesize these responses into a single, high-quality response. It is crucial to critically evaluate the information provided in these responses, recognizing that some of it may be biased or incorrect. Your response should not simply replicate the given answers but should offer a refined, accurate, and comprehensive reply to the instruction. Ensure your response is well-structured, coherent, and adheres to the highest standards of accuracy and reliability.
+
+Responses from models:"""
+
+
+async def run_llm(model):
+    """Run a single LLM call with a reference model."""
+    response = await async_client.chat.completions.create(
+        model=model,
+        messages=[{"role": "user", "content": user_prompt}],
+        temperature=0.7,
+        max_tokens=512,
+    )
+    print(model)
+    return response.choices[0].message.content
+
+
+async def main():
+    results = await asyncio.gather(*[run_llm(model) for model in reference_models])
+
+    finalStream = client.chat.completions.create(
+        model=aggregator_model,
+        messages=[
+            {"role": "system", "content": aggreagator_system_prompt},
+            {"role": "user", "content": ",".join(str(element) for element in results)},
+        ],
+        stream=True,
+    )
+
+    for chunk in finalStream:
+        print(chunk.choices[0].delta.content or "", end="", flush=True)
+
+
+asyncio.run(main())