From baed67774edca3ec60c7d7c359995fe6170b9e5c Mon Sep 17 00:00:00 2001 From: Suraj Subramanian <5676233+subramen@users.noreply.github.com> Date: Fri, 10 Nov 2023 13:21:08 -0500 Subject: [PATCH 1/7] add embeddings post --- content/posts/embeddings-101.md | 36 +++++++++++++++++++++++++++++++++ 1 file changed, 36 insertions(+) create mode 100644 content/posts/embeddings-101.md diff --git a/content/posts/embeddings-101.md b/content/posts/embeddings-101.md new file mode 100644 index 0000000..4a0dcc9 --- /dev/null +++ b/content/posts/embeddings-101.md @@ -0,0 +1,36 @@ +--- +title: "All About Embeddings" +summary: "and the multidimensional worlds our AI inhabit" +date: 2023-11-10 +draft: False +tags: ['machine-learning'] +--- + +> “What are embeddings?” + +> “Embeddings are a numerical representation of text that capture rich semantic informa-” + +> “No, not the definition. What are embeddings, really?” + +Embeddings are a fascinating concept, as they form the internal language for machine learning models. This interests me because language plays such a central role in human intelligence, and structures in language reflect how we perceive the world. In the same way, embeddings offer a window into the world through the eyes of a machine learning model. + +## Embedding the human experience +Language is not just about the way words sound and how they're spelled. It's also about nonverbal cues like gestures and facial expressions, as well as the emotions and feelings that come with them. Even the way we pause or remain silent can communicate a lot about what's going on inside us. All these elements help us interpret and convey our inner experiences of the outside world. + +So any superior AI must be equipped to at least perceive as much as we do. With deep neural architectures - mainly the Transformer - it is possible to translate many types of information into a machine-friendly embedding language. We have a way to represent what we see (videos and images), read (text), hear (audio), touch (thermal), how we move (IMU) and even [what we smell](https://arxiv.org/abs/1910.10685) in a format that computers can understand. This allows machines to combine, study, and make sense of information from different sources just like humans do. A homegrown example is the [ImageBind model](https://imagebind.metademolab.com/) from Meta AI that recognizes the connections between these different modalities and can analyze them together. + + +## What makes deep-learning embeddings so powerful? +Embeddings from deep-learning models are effective because they are not based on a hand-crafted ontology (like WordNet). Through trial-and-error, these models repeatedly refine the contextual understanding they need to be successful at a task. This contextual understanding is captured in the form of embeddings, which can be used for various downstream tasks. + +Text embeddings are a good and relevant example (especially in the age of retrieval-augmented generation). Many [state-of-the-art embedding models](https://huggingface.co/spaces/mteb/leaderboard) are trained using contrastive learning tasks, where they must identify related and unrelated pairs of text. By doing so, the model creates internal text representations that are highly optimized for document retrieval. This means that the model can effectively capture the nuances and relationships between different texts, allowing it to perform well on a variety of NLP tasks beyond text retrieval. + +## Why are vectors the data structure of choice for embeddings? +Vectors are so prevalent in ML because they follow the rules of linear algebra. You can use vectors to figure out how related or far apart two ideas are by taking their dot product. + +{{< figure src="https://corpling.hypotheses.org/files/2018/04/cosine_sim-500x426.jpg" align="center" caption="imgsrc: https://corpling.hypotheses.org/495" >}} + +You can also scale and combine them in different ways, which makes vectors good for creating complex ideas from simpler ones, like: + +```refined_taste = [pizza] + 10*[pineapple] +``` From 8bfb622ada57f42f139041e7ca587f3a4ffd7d3c Mon Sep 17 00:00:00 2001 From: Suraj Subramanian <5676233+subramen@users.noreply.github.com> Date: Fri, 10 Nov 2023 14:07:21 -0500 Subject: [PATCH 2/7] Update embeddings-101.md --- content/posts/embeddings-101.md | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/content/posts/embeddings-101.md b/content/posts/embeddings-101.md index 4a0dcc9..234d92f 100644 --- a/content/posts/embeddings-101.md +++ b/content/posts/embeddings-101.md @@ -1,15 +1,13 @@ --- title: "All About Embeddings" -summary: "and the multidimensional worlds our AI inhabit" +summary: "... and the multidimensional worlds our AI inhabit." date: 2023-11-10 draft: False tags: ['machine-learning'] --- > “What are embeddings?” - > “Embeddings are a numerical representation of text that capture rich semantic informa-” - > “No, not the definition. What are embeddings, really?” Embeddings are a fascinating concept, as they form the internal language for machine learning models. This interests me because language plays such a central role in human intelligence, and structures in language reflect how we perceive the world. In the same way, embeddings offer a window into the world through the eyes of a machine learning model. @@ -32,5 +30,6 @@ Vectors are so prevalent in ML because they follow the rules of linear algebra. You can also scale and combine them in different ways, which makes vectors good for creating complex ideas from simpler ones, like: -```refined_taste = [pizza] + 10*[pineapple] +``` +refined_taste = [pizza] + 10*[pineapple] ``` From 5188d199a3da09a35519bd81e05fb59619eabe95 Mon Sep 17 00:00:00 2001 From: Suraj Subramanian <5676233+subramen@users.noreply.github.com> Date: Mon, 29 Jan 2024 19:17:08 -0500 Subject: [PATCH 3/7] Create llama-loves-python.md --- content/posts/llama-loves-python.md | 32 +++++++++++++++++++++++++++++ 1 file changed, 32 insertions(+) create mode 100644 content/posts/llama-loves-python.md diff --git a/content/posts/llama-loves-python.md b/content/posts/llama-loves-python.md new file mode 100644 index 0000000..be4eee3 --- /dev/null +++ b/content/posts/llama-loves-python.md @@ -0,0 +1,32 @@ +--- +title: "Constraining LLM outputs" +summary: "" +date: 2024-1-29 +draft: True +tags: ['machine-learning'] +--- + +LLMs can be notoriously stochastic in their output formats. The smaller the model, the more immune they are to prompts directing them to format their response in a certain way. Doesn't matter if you politely request or COMMAND THEM IN ALL-CAPS - these parrots can tend to fly to their own tune. + +While using llama-cpp-python, I came across Llama Grammars - a novel method of constraining the output of an LLM to a specific format. I am not sure how this exactly works under the hood, but it works! I packed two tasks into one prompt, asked it to output me a JSON, and provided a simple format that the LLM obediently followed. No fluff or filler text, and mostly correct JSON that I could json.loads into my application. + +Generating a new output schema isn't too much work, but you still have a few hoops to jump through. A simpler way to constrain the output - just tell it to write python! Not actual python code, but format its output as if it was a valid data structure returned from a function. + +Let's say you want it to write bullet points based on some context... prompt the LLM to write a python list. +``` +prompt = { + "system": "Given a passage of text, concisely summarize the passage in simple language. Format your response as a python list of bullet points", + "user": f"PASSAGE: {passage}", + "output": "SUMMARY: ```python\n summary: List[str] = " +} +``` + +What about an ontology based on the text? Prompt the LLM to format it as a list of dicts. Give it an example too, just to make sure. +``` +prompt = { + "system": "Write an ontology of entities contained in the passage as list. Format your response as a python list" + "user": f"PASSAGE: {passage}", + "output": "ONTOLOGY: ```python\n# ontology = [{'entity': 'Japan', 'class': 'country'}, {'entity': 'pizza', class: 'food'}]\n\nontology = " +} +``` + From 1ab85708676cd3acdff23b5f4d0a4b374b19ad10 Mon Sep 17 00:00:00 2001 From: Suraj Subramanian <5676233+subramen@users.noreply.github.com> Date: Tue, 30 Jan 2024 10:30:21 -0500 Subject: [PATCH 4/7] Update gh-pages.yaml --- .github/workflows/gh-pages.yaml | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/.github/workflows/gh-pages.yaml b/.github/workflows/gh-pages.yaml index dbd0e1d..e1b44c1 100644 --- a/.github/workflows/gh-pages.yaml +++ b/.github/workflows/gh-pages.yaml @@ -13,7 +13,7 @@ on: hugoVersion: description: "Hugo Version" required: false - default: "0.102.1" + default: "0.121.0" # Allow one concurrent deployment concurrency: @@ -36,7 +36,7 @@ jobs: build: runs-on: ubuntu-latest env: - HUGO_VERSION: "0.102.1" + HUGO_VERSION: "0.121.0" steps: - name: Check version if: ${{ github.event.inputs.hugoVersion }} From 36dd973a976208a0360e29765b108e2012e77ec7 Mon Sep 17 00:00:00 2001 From: Suraj Subramanian <5676233+subramen@users.noreply.github.com> Date: Tue, 30 Jan 2024 10:40:13 -0500 Subject: [PATCH 5/7] Update gh-pages.yaml --- .github/workflows/gh-pages.yaml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/.github/workflows/gh-pages.yaml b/.github/workflows/gh-pages.yaml index e1b44c1..23bf532 100644 --- a/.github/workflows/gh-pages.yaml +++ b/.github/workflows/gh-pages.yaml @@ -43,7 +43,7 @@ jobs: run: export HUGO_VERSION="${{ github.event.inputs.hugoVersion }}" - name: Install Hugo CLI run: | - wget -O ${{ runner.temp }}/hugo.deb https://github.com/gohugoio/hugo/releases/download/v${HUGO_VERSION}/hugo_${HUGO_VERSION}_Linux-64bit.deb \ + wget -O ${{ runner.temp }}/hugo.deb https://github.com/gohugoio/hugo/releases/download/v${HUGO_VERSION}/hugo_${HUGO_VERSION}_Linux-arm64.deb \ && sudo dpkg -i ${{ runner.temp }}/hugo.deb - name: Checkout uses: actions/checkout@v3 From 2b65c70137e574cc0ac5904d62bbf5fcc9e137d7 Mon Sep 17 00:00:00 2001 From: Suraj Subramanian <5676233+subramen@users.noreply.github.com> Date: Tue, 30 Jan 2024 10:47:31 -0500 Subject: [PATCH 6/7] Update gh-pages.yaml --- .github/workflows/gh-pages.yaml | 13 +++++++++---- 1 file changed, 9 insertions(+), 4 deletions(-) diff --git a/.github/workflows/gh-pages.yaml b/.github/workflows/gh-pages.yaml index 23bf532..d12ba8e 100644 --- a/.github/workflows/gh-pages.yaml +++ b/.github/workflows/gh-pages.yaml @@ -41,10 +41,15 @@ jobs: - name: Check version if: ${{ github.event.inputs.hugoVersion }} run: export HUGO_VERSION="${{ github.event.inputs.hugoVersion }}" - - name: Install Hugo CLI - run: | - wget -O ${{ runner.temp }}/hugo.deb https://github.com/gohugoio/hugo/releases/download/v${HUGO_VERSION}/hugo_${HUGO_VERSION}_Linux-arm64.deb \ - && sudo dpkg -i ${{ runner.temp }}/hugo.deb + # - name: Install Hugo CLI + # run: | + # wget -O ${{ runner.temp }}/hugo.deb https://github.com/gohugoio/hugo/releases/download/v${HUGO_VERSION}/hugo_${HUGO_VERSION}_Linux-arm64.deb \ + # && sudo dpkg -i ${{ runner.temp }}/hugo.deb + - name: Install a binary from GitHub releases + uses: jaxxstorm/action-install-gh-release@v1.10.0 + with: + repo: gohugoio/hugo + tag: v0.121.0 - name: Checkout uses: actions/checkout@v3 with: From 8c73abaa48896109681e69ada9dce72da84eb26f Mon Sep 17 00:00:00 2001 From: Suraj Subramanian <5676233+subramen@users.noreply.github.com> Date: Tue, 30 Jan 2024 11:30:34 -0500 Subject: [PATCH 7/7] Update llama-loves-python.md --- content/posts/llama-loves-python.md | 32 +++++++++++++++++++---------- 1 file changed, 21 insertions(+), 11 deletions(-) diff --git a/content/posts/llama-loves-python.md b/content/posts/llama-loves-python.md index be4eee3..fa948f9 100644 --- a/content/posts/llama-loves-python.md +++ b/content/posts/llama-loves-python.md @@ -1,18 +1,22 @@ --- -title: "Constraining LLM outputs" -summary: "" -date: 2024-1-29 -draft: True -tags: ['machine-learning'] +title: "Hacky Prompt Engineering" +summary: "Using Python-Formatted Output to Constrain LLM Responses" +date: 2024-01-02 +draft: False +tags: ['machine-learning', 'llm'] --- -LLMs can be notoriously stochastic in their output formats. The smaller the model, the more immune they are to prompts directing them to format their response in a certain way. Doesn't matter if you politely request or COMMAND THEM IN ALL-CAPS - these parrots can tend to fly to their own tune. -While using llama-cpp-python, I came across Llama Grammars - a novel method of constraining the output of an LLM to a specific format. I am not sure how this exactly works under the hood, but it works! I packed two tasks into one prompt, asked it to output me a JSON, and provided a simple format that the LLM obediently followed. No fluff or filler text, and mostly correct JSON that I could json.loads into my application. +Large language models (LLMs) can be unpredictable in their output formats, making it challenging to direct them to produce specific results. A list of bullet points might be numbered or asterisked, for example. Sometimes - especially with Llama 2 - they also output unnecessary filler text ("Sure! Here is the output you requested...") in a bid to sound conversational. When the output is consumed directly by a human, these inconsistencies are forgivable. When they are to be consumed by another program or within an application, parsing non-uniform outputs can be a challenge. -Generating a new output schema isn't too much work, but you still have a few hoops to jump through. A simpler way to constrain the output - just tell it to write python! Not actual python code, but format its output as if it was a valid data structure returned from a function. +## Llama Grammars: A Novel Approach to Constraining LLM Outputs +While working with llama.cpp, I learnt about [Llama Grammars](https://github.com/ggerganov/llama.cpp/blob/master/grammars/README.md), a method that allows us to specify a strict format for the LLM's output. Although I'm not quite clear on how this method works under the hood, it (mostly) works! By providing a schema and prompting the LLM to only answer in JSON, we can obtain mostly correct JSON outputs without any fluff or filler text. + +The catch - constructing a new grammar file can be somewhat tricky (did you see the notation?!), and yet the LLM finds a way to stray away from the expected format. And because I don't know yet how it works, I'm hesitant to use it in my application. + +## Python-Formatted Output +So instead of generating a new output schema, I used a simpler approach to constrain the output of the LLM. The semantic of programming languages is a schema by itself; there is only one way to represent a python list of strings. By telling the LLM to write its output as if it were a valid data structure returned from a function, we can achieve consistently-formatted outputs. For example, if we want the LLM to provide bullet points based on some context, we can prompt it to write it in a python list. -Let's say you want it to write bullet points based on some context... prompt the LLM to write a python list. ``` prompt = { "system": "Given a passage of text, concisely summarize the passage in simple language. Format your response as a python list of bullet points", @@ -20,13 +24,19 @@ prompt = { "output": "SUMMARY: ```python\n summary: List[str] = " } ``` +This ensures my application downstream doesn't have to deal with asterisks or numbers. It can directly `eval` the LLM output into a data structure. -What about an ontology based on the text? Prompt the LLM to format it as a list of dicts. Give it an example too, just to make sure. +Similarly, if we want an ontology based on the text, we can ask the LLM to format its output as a list of dicts. Providing an example helps ensure that the LLM understands the desired format. Here's an example prompt: ``` prompt = { - "system": "Write an ontology of entities contained in the passage as list. Format your response as a python list" + "system": "Write an ontology of entities contained in the passage as a list. Format your response as a python list", "user": f"PASSAGE: {passage}", "output": "ONTOLOGY: ```python\n# ontology = [{'entity': 'Japan', 'class': 'country'}, {'entity': 'pizza', class: 'food'}]\n\nontology = " } ``` +The [Prompt Engineering Guide](https://github.com/facebookresearch/llama-recipes/blob/main/examples/Prompt_Engineering_with_Llama_2.ipynb) does something similar, by asking the LLM to output only in JSON. + + +## Keep It Simple +Llama Grammars are cool, but I think they are better suited for more elaborate outputs, or where I'm asking the LLM to do multiple tasks in a single prompt. The "hacky" prompt engineering technique using Python- or JSON-formatted outputs is a simple way to constrain the output of large language models and make it directly usable for other applications.