diff --git a/.github/workflows/gh-pages.yaml b/.github/workflows/gh-pages.yaml index dbd0e1d..d12ba8e 100644 --- a/.github/workflows/gh-pages.yaml +++ b/.github/workflows/gh-pages.yaml @@ -13,7 +13,7 @@ on: hugoVersion: description: "Hugo Version" required: false - default: "0.102.1" + default: "0.121.0" # Allow one concurrent deployment concurrency: @@ -36,15 +36,20 @@ jobs: build: runs-on: ubuntu-latest env: - HUGO_VERSION: "0.102.1" + HUGO_VERSION: "0.121.0" steps: - name: Check version if: ${{ github.event.inputs.hugoVersion }} run: export HUGO_VERSION="${{ github.event.inputs.hugoVersion }}" - - name: Install Hugo CLI - run: | - wget -O ${{ runner.temp }}/hugo.deb https://github.com/gohugoio/hugo/releases/download/v${HUGO_VERSION}/hugo_${HUGO_VERSION}_Linux-64bit.deb \ - && sudo dpkg -i ${{ runner.temp }}/hugo.deb + # - name: Install Hugo CLI + # run: | + # wget -O ${{ runner.temp }}/hugo.deb https://github.com/gohugoio/hugo/releases/download/v${HUGO_VERSION}/hugo_${HUGO_VERSION}_Linux-arm64.deb \ + # && sudo dpkg -i ${{ runner.temp }}/hugo.deb + - name: Install a binary from GitHub releases + uses: jaxxstorm/action-install-gh-release@v1.10.0 + with: + repo: gohugoio/hugo + tag: v0.121.0 - name: Checkout uses: actions/checkout@v3 with: diff --git a/content/posts/embeddings-101.md b/content/posts/embeddings-101.md new file mode 100644 index 0000000..234d92f --- /dev/null +++ b/content/posts/embeddings-101.md @@ -0,0 +1,35 @@ +--- +title: "All About Embeddings" +summary: "... and the multidimensional worlds our AI inhabit." +date: 2023-11-10 +draft: False +tags: ['machine-learning'] +--- + +> “What are embeddings?” +> “Embeddings are a numerical representation of text that capture rich semantic informa-” +> “No, not the definition. What are embeddings, really?” + +Embeddings are a fascinating concept, as they form the internal language for machine learning models. This interests me because language plays such a central role in human intelligence, and structures in language reflect how we perceive the world. In the same way, embeddings offer a window into the world through the eyes of a machine learning model. + +## Embedding the human experience +Language is not just about the way words sound and how they're spelled. It's also about nonverbal cues like gestures and facial expressions, as well as the emotions and feelings that come with them. Even the way we pause or remain silent can communicate a lot about what's going on inside us. All these elements help us interpret and convey our inner experiences of the outside world. + +So any superior AI must be equipped to at least perceive as much as we do. With deep neural architectures - mainly the Transformer - it is possible to translate many types of information into a machine-friendly embedding language. We have a way to represent what we see (videos and images), read (text), hear (audio), touch (thermal), how we move (IMU) and even [what we smell](https://arxiv.org/abs/1910.10685) in a format that computers can understand. This allows machines to combine, study, and make sense of information from different sources just like humans do. A homegrown example is the [ImageBind model](https://imagebind.metademolab.com/) from Meta AI that recognizes the connections between these different modalities and can analyze them together. + + +## What makes deep-learning embeddings so powerful? +Embeddings from deep-learning models are effective because they are not based on a hand-crafted ontology (like WordNet). Through trial-and-error, these models repeatedly refine the contextual understanding they need to be successful at a task. This contextual understanding is captured in the form of embeddings, which can be used for various downstream tasks. + +Text embeddings are a good and relevant example (especially in the age of retrieval-augmented generation). Many [state-of-the-art embedding models](https://huggingface.co/spaces/mteb/leaderboard) are trained using contrastive learning tasks, where they must identify related and unrelated pairs of text. By doing so, the model creates internal text representations that are highly optimized for document retrieval. This means that the model can effectively capture the nuances and relationships between different texts, allowing it to perform well on a variety of NLP tasks beyond text retrieval. + +## Why are vectors the data structure of choice for embeddings? +Vectors are so prevalent in ML because they follow the rules of linear algebra. You can use vectors to figure out how related or far apart two ideas are by taking their dot product. + +{{< figure src="https://corpling.hypotheses.org/files/2018/04/cosine_sim-500x426.jpg" align="center" caption="imgsrc: https://corpling.hypotheses.org/495" >}} + +You can also scale and combine them in different ways, which makes vectors good for creating complex ideas from simpler ones, like: + +``` +refined_taste = [pizza] + 10*[pineapple] +``` diff --git a/content/posts/llama-loves-python.md b/content/posts/llama-loves-python.md new file mode 100644 index 0000000..fa948f9 --- /dev/null +++ b/content/posts/llama-loves-python.md @@ -0,0 +1,42 @@ +--- +title: "Hacky Prompt Engineering" +summary: "Using Python-Formatted Output to Constrain LLM Responses" +date: 2024-01-02 +draft: False +tags: ['machine-learning', 'llm'] +--- + + +Large language models (LLMs) can be unpredictable in their output formats, making it challenging to direct them to produce specific results. A list of bullet points might be numbered or asterisked, for example. Sometimes - especially with Llama 2 - they also output unnecessary filler text ("Sure! Here is the output you requested...") in a bid to sound conversational. When the output is consumed directly by a human, these inconsistencies are forgivable. When they are to be consumed by another program or within an application, parsing non-uniform outputs can be a challenge. + +## Llama Grammars: A Novel Approach to Constraining LLM Outputs +While working with llama.cpp, I learnt about [Llama Grammars](https://github.com/ggerganov/llama.cpp/blob/master/grammars/README.md), a method that allows us to specify a strict format for the LLM's output. Although I'm not quite clear on how this method works under the hood, it (mostly) works! By providing a schema and prompting the LLM to only answer in JSON, we can obtain mostly correct JSON outputs without any fluff or filler text. + +The catch - constructing a new grammar file can be somewhat tricky (did you see the notation?!), and yet the LLM finds a way to stray away from the expected format. And because I don't know yet how it works, I'm hesitant to use it in my application. + +## Python-Formatted Output +So instead of generating a new output schema, I used a simpler approach to constrain the output of the LLM. The semantic of programming languages is a schema by itself; there is only one way to represent a python list of strings. By telling the LLM to write its output as if it were a valid data structure returned from a function, we can achieve consistently-formatted outputs. For example, if we want the LLM to provide bullet points based on some context, we can prompt it to write it in a python list. + +``` +prompt = { + "system": "Given a passage of text, concisely summarize the passage in simple language. Format your response as a python list of bullet points", + "user": f"PASSAGE: {passage}", + "output": "SUMMARY: ```python\n summary: List[str] = " +} +``` +This ensures my application downstream doesn't have to deal with asterisks or numbers. It can directly `eval` the LLM output into a data structure. + +Similarly, if we want an ontology based on the text, we can ask the LLM to format its output as a list of dicts. Providing an example helps ensure that the LLM understands the desired format. Here's an example prompt: +``` +prompt = { + "system": "Write an ontology of entities contained in the passage as a list. Format your response as a python list", + "user": f"PASSAGE: {passage}", + "output": "ONTOLOGY: ```python\n# ontology = [{'entity': 'Japan', 'class': 'country'}, {'entity': 'pizza', class: 'food'}]\n\nontology = " +} +``` + +The [Prompt Engineering Guide](https://github.com/facebookresearch/llama-recipes/blob/main/examples/Prompt_Engineering_with_Llama_2.ipynb) does something similar, by asking the LLM to output only in JSON. + + +## Keep It Simple +Llama Grammars are cool, but I think they are better suited for more elaborate outputs, or where I'm asking the LLM to do multiple tasks in a single prompt. The "hacky" prompt engineering technique using Python- or JSON-formatted outputs is a simple way to constrain the output of large language models and make it directly usable for other applications.