Merge pull request #23 from skalade/ollama_docs

add ollama instructions to docs
AMDResearch · Jun 28, 2024 · 771f74c · 771f74c
2 parents 9bc2ae4 + 5d2fcb5
commit 771f74c
Showing 1 changed file with 117 additions and 0 deletions.
diff --git a/docs/jobs.md b/docs/jobs.md
@@ -253,6 +253,123 @@ a new (or open an existing) notebook and access the GPUs on the compute node:
 Please see the [Python Environment](./software.md#python-environment) section to understand how the base Python environment and `pytorch` and `tensorflow` modules can be customized.
 ```
 
+## Large Language Models (Ollama)
+
+Users can experiment with open-weight models running on GPUs with [Ollama](https://ollama.com/). Ollama is a popular framework that enables easy interaction with Large Language Models (LLMs), and it uses [llama.cpp](https://github.com/ggerganov/llama.cpp) as a backend.
+
+It is easiest to run these steps from a JupyterLab environment (as outlined in the [Jupyter](#jupyter) section) since that allows you to spawn multiple terminal windows and one can be dedicated to the Ollama server, however you can do all this from an interactive session just as well.
+
+**Step 0:**
+
+Grab a compute node in an interactive session.
+
+```{note}
+Remember that the login node is NOT meant for compute-intensive tasks like serving LLMs, so please make sure to allocate a compute node to follow along with this section.
+```
+
+```bash
+salloc -N <number-of-nodes> -t <walltime> -p <partition>
+```
+
+**Step 1:**
+
+Download the Ollama executable and start the server.
+
+```bash
+curl -L https://ollama.com/download/ollama-linux-amd64 --output ollama
+chmod a+x ollama
+```
+
+If you are comfortable creating multiple terminal sessions on the compute same node, then simply run the serve command and open up a new terminal session on the same node to interact with the server.
+```bash
+OLLAMA_MODELS=<path-to-store-models> ./ollama serve
+```
+
+Otherwise you can run the server in the background with optional logging as follows:
+```bash
+OLLAMA_MODELS=<path-to-store-models> ./ollama serve 2>&1 | tee log > /dev/null &
+```
+```{note}
+By default, the models downloaded in Step 2 below will be saved in `~/.ollama`. However, your `$HOME` directory only has a storage capacity of 25GB and so can quickly fill up with larger models. Therefore, we recommend using the `OLLAMA_MODELS` evironment variable to change the directory where the models are saved to a location within your `$WORK` directory, which has a much larger capacity.
+```
+
+**Step 2:**
+
+Ollama hosts a list of open-weight models available on their [site](https://ollama.com/library). In this example we will pull in the Llama3 8B model -- one of the most popular open-weight models released by [Meta](https://llama.meta.com/llama3/).
+
+```bash
+./ollama pull llama3
+```
+
+As described in Step 1, these models will be saved in the directory specified by using the `OLLAMA_MODELS` environment variable.
+
+**Step 3:**
+
+The Ollama server is OpenAI API compatible and uses port **11434** by default. This means we can send requests, much like outlined in the [OpenAI API reference documentation](https://platform.openai.com/docs/api-reference/making-requests) using curl.
+
+```bash
+curl http://localhost:11434/v1/chat/completions -H "Content-Type: application/json" \
+        -d '{
+        "model": "llama3",
+        "messages": [
+            {
+                "role": "system",
+                "content": "You are a helpful assistant."
+            },
+            {
+                "role": "user",
+                "content": "Why did the chicken cross the road?"
+            }
+        ]
+    }'
+{"id":"chatcmpl-318","object":"chat.completion","created":1717762800,"model":"llama3","system_fingerprint":"fp_ollama","choices":[{"index":0,"message":{"role":"assistant","content":"The classic question!\n\nAccording to the most popular answer, the chicken crossed the road to get to the other side! But let's be honest, there are many creative and humorous responses to this question too.\n\nDo you have a favorite reason why the chicken might have crossed the road?"},"finish_reason":"stop"}],"usage":{"prompt_tokens":28,"completion_tokens":57,"total_tokens":85}}
+
+```
+
+Similarly, in Python, one can use the OpenAI Python package to interface with the Ollama server. To do so, you will first need to install the `openai` package in your user install directory or within a Python virtual environment.
+
+
+```bash
+pip3 install openai
+```
+
+Now you can use the Python OpenAI client to invoke your locally run Llama3 model.
+
+```python
+from openai import OpenAI
+client = OpenAI(base_url="http://localhost:11434/v1", api_key="none")
+response = client.chat.completions.create(
+    model="llama3",
+    messages=[{"role": "user", "content": "Hello this is a test"}],
+)
+print(response)
+```
+```
+ChatCompletion(id='chatcmpl-400', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content="Hello! This is indeed a test. I'm happy to be a part of it. How can I help you with your test?", role='assistant', function_call=None, tool_calls=None))], created=1717763172, model='llama3', object='chat.completion', system_fingerprint='fp_ollama', usage=CompletionUsage(completion_tokens=28, prompt_tokens=13, total_tokens=41))
+```
+
+**Step 4:**
+
+Shutting down the Ollama server.
+
+When your Slurm jobs ends (whether due to reaching walltime limit or manually canceling with `scancel <jobid>`), all user processes (including the Ollama server) will be cleaned up before putting the node back into the queue. However, if you need or want to manually shut down the server, you can do so multiple different ways. Here, we'll show only two of these ways.
+
+Recall that for our example we used an `&` to put our Ollama serve process in the background in Step 1 so we could continue using the same terminal to further interact with the server. Therefore, we need to find that background process that is running the server so we can shut it down.
+
+Option 1: `ps` + `kill`
+```bash
+ps -ef | grep "ollama serve"  # Look for the PID associated with this command in the results
+kill -9 <PID>                 # Kill the process 
+```
+
+Option 2: `jobs` + `fg`
+```bash
+jobs                          # This will show your background processes labeled as [1], [2], etc.
+fg <id>                       # Bring the process back to the foreground. E.g., `fg 1`
+                              # Then simply give the Ctrl+C command to stop the process
+```
+
+
 <!---
 ## Job dependencies (TODO)
 -->