meta-llama · Darktex · Dec 7, 2023
diff --git a/Llama-Guard/MODEL_CARD.md b/Llama-Guard/MODEL_CARD.md
@@ -1,10 +1,10 @@
 # Model Details
 
-Llama-Guard is a 7B parameter [Llama 2](https://arxiv.org/abs/2307.09288)-based input-output safeguard model. It can be used for classifying content in both LLM inputs (prompt classification) and in LLM responses (response classification).
+Llama Guard is a 7B parameter [Llama 2](https://arxiv.org/abs/2307.09288)-based input-output safeguard model. It can be used for classifying content in both LLM inputs (prompt classification) and in LLM responses (response classification).
 
 It acts as an LLM: it generates text in its output that indicates whether a given prompt or response is safe/unsafe, and if unsafe based on a policy, it also lists the violating subcategories. Here is an example:
 
-![](Llama-Guard_example.png)
+![](Llama Guard_example.png)
 
 In order to produce classifier scores, we look at the probability for the first token, and turn that into an “unsafe” class probability. Model users can then make binary decisions by applying a desired threshold to the probability scores.
 
@@ -32,7 +32,7 @@ own internal policies and is meant to demonstrate the value of our method to
 tune LLMs into classifiers that show high performance and high degrees of
 adaptability to different policies.
 
-### The Llama-Guard Safety Taxonomy & Risk Guidelines
+### The Llama Guard Safety Taxonomy & Risk Guidelines
 
 Below, we provide both the harm types themselves under this taxonomy and also examples of the specific kinds of content that would be considered harmful under each category:
 
@@ -85,6 +85,6 @@ in our paper: [LINK TO PAPER].
 
 |                 | Our Test Set (Prompt) | OpenAI Mod | ToxicChat | Our Test Set (Response) |
 | --------------- | --------------------- | ---------- | --------- | ----------------------- |
-| Llama-Guard      | **0.945**             | 0.847      | **0.626** | **0.953**               |
+| Llama Guard     | **0.945**             | 0.847      | **0.626** | **0.953**               |
 | OpenAI API      | 0.764                 | **0.856**  | 0.588     | 0.769                   |
 | Perspective API | 0.728                 | 0.787      | 0.532     | 0.699                   |
diff --git a/Llama-Guard/README.md b/Llama-Guard/README.md
@@ -1,42 +1,37 @@
-# GuardLlama
+# Llama Guard
 
-GuardLlama is a new experimental model that provides input and output guardrails
+Llama Guard is a new experimental model that provides input and output guardrails
 for LLM deployments.
 
 # Download
 
-In order to download the model weights and tokenizer, please visit the Meta
-website and accept our License.
+In order to download the model weights and tokenizer, please visit the [Meta
+website](https://ai.meta.com/resources/models-and-libraries/llama-downloads/) and accept our License.
 
 Once your request is approved, you will receive a signed URL over email. Then
 run the download.sh script, passing the URL provided when prompted to start the
 download.
 
 Pre-requisites: Make sure you have wget and md5sum installed. Then to run the
-script: ./download.sh.
+script: `./download.sh`.
 
 Keep in mind that the links expire after 24 hours and a certain amount of
-downloads. If you start seeing errors such as 403: Forbidden, you can always
+downloads. If you start seeing errors such as `403: Forbidden`, you can always
 re-request a link.
 
-# Access on HuggingFace
-
-[TODO CHANGE LINK] We are also providing downloads on Hugging Face. You must
-first request a download from the Meta website using the same email address as
-your Hugging Face account. After doing so, you can request access to any of the
-models on Hugging Face and within 1-2 days your account will be granted access
-to all versions.
 
 # Quick Start
+Since Llama Guard is a fine-tuned Llama-7B model (see our [model card](MODEL_CARD.md) for more information), the same quick start
+steps outlined in our [README file](https://github.com/facebookresearch/llama/blob/main/README.md) for Llama2 apply here.
 
-TODO to be written.
+In addition to that, we added examples using Llama Guard in the [Llama 2 recipes repository](https://github.com/facebookresearch/llama-recipes).
 
 # Issues
 
-Please report any software “bug”, or other problems with the models through one
+Please report any software bug, or other problems with the models through one
 of the following means:
 
-- Reporting issues with the GuardLlama model:
+- Reporting issues with the Llama Guard model:
   [github.com/facebookresearch/purplellama](github.com/facebookresearch/purplellama)
 - Reporting issues with Llama in general:
   [github.com/facebookresearch/llama](github.com/facebookresearch/llama)
@@ -57,4 +52,4 @@ as our accompanying [Acceptable Use Policy](USE_POLICY).
 
 # References
 
-Research Paper: [TODO ADD LINK]
+[Research Paper](https://ai.facebook.com/research/publications/llama-guard-llm-based-input-output-safeguard-for-human-ai-conversations/)
diff --git a/Llama-Guard/download.sh b/Llama-Guard/download.sh
@@ -1,70 +1,16 @@
 #!/usr/bin/env bash
-
 # Copyright (c) Meta Platforms, Inc. and affiliates.
 # This software may be used and distributed according to the terms of the Llama 2 Community License Agreement.
-
 set -e
-
 read -p "Enter the URL from email: " PRESIGNED_URL
 echo ""
-read -p "Enter the list of models to download without spaces (7B,13B,70B,7B-chat,13B-chat,70B-chat), or press Enter for all: " MODEL_SIZE
 TARGET_FOLDER="."             # where all files should end up
 mkdir -p ${TARGET_FOLDER}
-
-if [[ $MODEL_SIZE == "" ]]; then
-    MODEL_SIZE="7B,13B,70B,7B-chat,13B-chat,70B-chat"
-fi
-
 echo "Downloading LICENSE and Acceptable Usage Policy"
 wget --continue ${PRESIGNED_URL/'*'/"LICENSE"} -O ${TARGET_FOLDER}"/LICENSE"
 wget --continue ${PRESIGNED_URL/'*'/"USE_POLICY.md"} -O ${TARGET_FOLDER}"/USE_POLICY.md"
-
 echo "Downloading tokenizer"
 wget --continue ${PRESIGNED_URL/'*'/"tokenizer.model"} -O ${TARGET_FOLDER}"/tokenizer.model"
-wget --continue ${PRESIGNED_URL/'*'/"tokenizer_checklist.chk"} -O ${TARGET_FOLDER}"/tokenizer_checklist.chk"
-CPU_ARCH=$(uname -m)
-  if [ "$CPU_ARCH" = "arm64" ]; then
-    (cd ${TARGET_FOLDER} && md5 tokenizer_checklist.chk)
-  else
-    (cd ${TARGET_FOLDER} && md5sum -c tokenizer_checklist.chk)
-  fi
-
-for m in ${MODEL_SIZE//,/ }
-do
-    if [[ $m == "7B" ]]; then
-        SHARD=0
-        MODEL_PATH="llama-2-7b"
-    elif [[ $m == "7B-chat" ]]; then
-        SHARD=0
-        MODEL_PATH="llama-2-7b-chat"
-    elif [[ $m == "13B" ]]; then
-        SHARD=1
-        MODEL_PATH="llama-2-13b"
-    elif [[ $m == "13B-chat" ]]; then
-        SHARD=1
-        MODEL_PATH="llama-2-13b-chat"
-    elif [[ $m == "70B" ]]; then
-        SHARD=7
-        MODEL_PATH="llama-2-70b"
-    elif [[ $m == "70B-chat" ]]; then
-        SHARD=7
-        MODEL_PATH="llama-2-70b-chat"
-    fi
-
-    echo "Downloading ${MODEL_PATH}"
-    mkdir -p ${TARGET_FOLDER}"/${MODEL_PATH}"
-
-    for s in $(seq -f "0%g" 0 ${SHARD})
-    do
-        wget --continue ${PRESIGNED_URL/'*'/"${MODEL_PATH}/consolidated.${s}.pth"} -O ${TARGET_FOLDER}"/${MODEL_PATH}/consolidated.${s}.pth"
-    done
-
-    wget --continue ${PRESIGNED_URL/'*'/"${MODEL_PATH}/params.json"} -O ${TARGET_FOLDER}"/${MODEL_PATH}/params.json"
-    wget --continue ${PRESIGNED_URL/'*'/"${MODEL_PATH}/checklist.chk"} -O ${TARGET_FOLDER}"/${MODEL_PATH}/checklist.chk"
-    echo "Checking checksums"
-    if [ "$CPU_ARCH" = "arm64" ]; then
-      (cd ${TARGET_FOLDER}"/${MODEL_PATH}" && md5 checklist.chk)
-    else
-      (cd ${TARGET_FOLDER}"/${MODEL_PATH}" && md5sum -c checklist.chk)
-    fi
-done
+mkdir -p ${TARGET_FOLDER}"/llama-guard"
+wget --continue ${PRESIGNED_URL/'*'/"consolidated.00.pth"} -O ${TARGET_FOLDER}"/llama-guard/consolidated.00.pth"
+wget --continue ${PRESIGNED_URL/'*'/"params.json"} -O ${TARGET_FOLDER}"/llama-guard/params.json"
diff --git a/README.md b/README.md
@@ -3,29 +3,47 @@
 </p>
 
 <p align="center">
-        🤗 <a href="https://huggingface.co/meta-Llama">Hugging Face</a>&nbsp&nbsp | <a href="">Blog</a>&nbsp&nbsp |  <a href="https://ai.facebook.com/llama/purple-llama">Website</a>&nbsp&nbsp | <a href="https://ai.meta.com/research/publications/purple-llama-cyberseceval-a-benchmark-for-evaluating-the-cybersecurity-risks-of-large-language-models/">CyberSec Eval Paper</a>&nbsp&nbsp | <a href="https://ai.meta.com/research/publications/llama-guard-llm-based-input-output-safeguard-for-human-ai-conversations/">Llama Guard Paper</a>&nbsp&nbsp
+        🤗 <a href="https://huggingface.co/meta-Llama"> Models on Hugging Face</a>&nbsp | <a href="https://ai.facebook.com/blog/purple-llama-open-trust-safety-generative-ai"> Blog</a>&nbsp |  <a href="https://ai.facebook.com/llama/purple-llama">Website</a>&nbsp | <a href="https://ai.meta.com/research/publications/purple-llama-cyberseceval-a-benchmark-for-evaluating-the-cybersecurity-risks-of-large-language-models/">CyberSec Eval Paper</a>&nbsp&nbsp | <a href="https://ai.meta.com/research/publications/llama-guard-llm-based-input-output-safeguard-for-human-ai-conversations/">Llama Guard Paper</a>&nbsp
 <br>
 
 --------------------------------------------------------------------------------
 # Purple Llama
 Purple Llama is a an umbrella project that over time will bring together tools and evals to help the community build responsibly with open generative AI models. The initial release will include tools and evals for Cyber Security and Input/Output safeguards but we plan to contribute more in the near future.
 
 ## Why purple?
-Borrowing a [concept](https://www.youtube.com/watch?v=ab_Fdp6FVDI) from the cybersecurity world, we believe that to truly mitigate the challenges which generative AI presents, we need to take both attack (red team) and defensive (blue team) postures. Purple teaming, composed of both red and blue team responsibilities, is a collaborative approach to evaluating and mitigating potential risks and the same ethos applies to generative AI and hence our investment in Purple Llama will be comprehensive.
+Borrowing a [concept](https://www.youtube.com/watch?v=ab_Fdp6FVDI) from the cybersecurity world, we believe that to truly mitigate the challenges which generative AI presents, we need to take both attack (red team) and defensive (blue team) postures. Purple teaming, composed of both red and blue team responsibilities, is a collaborative approach to evaluating and mitigating potential risks and the same ethos applies to generative AI and hence our investment in Purple Llama will be comprehensive. 
 
 ## License
 Components within the Purple Llama project will be licensed permissively enabling both research and commercial usage. We believe this is a major step towards enabling community collaboration and standardizing the development and usage of trust and safety tools for generative AI development. More concretely evals and benchmarks are licensed under the MIT license while any models use the Llama 2 Community license. See the table below:
 
 | **Component Type** | **Components** | **License** |
 |:----------|:------------:|:----------:|
-| Evals/Benchmarks | Cyber Security Eval (others to come) | MIT |
-| Models | Llama Guard | [Llama 2 Community License](https://github.com/facebookresearch/PurpleLlama/blob/main/LICENSE) |
+| Evals/Benchmarks | Cyber Security Eval (others to come) | MIT | 
+| Models | Llama Guard | [Llama 2 Community License](https://github.com/facebookresearch/PurpleLlama/blob/main/LICENSE) | 
 
-## Getting Started
-To get started and learn how to use Purple Llama components with Llama models, see the getting started guide [here](https://ai.meta.com/llama/get-started/). The guide provides information and resources to help you set up Llama including how to access the model, hosting, how-to and integration guides. Additionally, you will find supplemental materials to further assist you while responsibly building with Llama. The guide will be updated as more Purple Llama components get released.
+## Evals & Benchmarks
+
+### Cybersecurity
+We are sharing what we believe is the first industry-wide set of cybersecurity safety evaluations for LLMs. These benchmarks are based on industry guidance and standards (e.g., CWE and MITRE ATT&CK) and built in collaboration with our security subject matter experts. With this initial release, we aim to provide tools that will help address some risks outlined in the [White House commitments on developing responsible AI](https://www.whitehouse.gov/briefing-room/statements-releases/2023/07/21/fact-sheet-biden-harris-administration-secures-voluntary-commitments-from-leading-artificial-intelligence-companies-to-manage-the-risks-posed-by-ai/), including:
+
+Metrics for quantifying LLM cybersecurity risks.
+Tools to evaluate the frequency of insecure code suggestions.
+Tools to evaluate LLMs to make it harder to generate malicious code or aid in carrying out cyberattacks.
+We believe these tools will reduce the frequency of LLMs suggesting insecure AI-generated code and reduce their helpfulness to cyber adversaries. Our initial results show that there are meaningful cybersecurity risks for LLMs, both with recommending insecure code and for complying with malicious requests. See our [Cybersec Eval paper](https://ai.meta.com/research/publications/purple-llama-cyberseceval-a-benchmark-for-evaluating-the-cybersecurity-risks-of-large-language-models/) for more details.
+
+## Input/Output Safeguards
+As we outlined in Llama 2’s [Responsible Use Guide](https://ai.meta.com/llama/responsible-use-guide/), we recommend that all inputs and outputs to the LLM be checked and filtered in accordance with content guidelines appropriate to the application.
+
+### Llama Guard
+To support this, and empower the community, we are releasing [Llama Guard](https://ai.meta.com/research/publications/llama-guard-llm-based-input-output-safeguard-for-human-ai-conversations/), an openly-available model that performs competitively on common open benchmarks and provides developers with a pretrained model to help defend against generating potentially risky outputs.
+
+As part of our ongoing commitment to open and transparent science, we are releasing our methodology and an extended discussion of model performance in our [Llama Guard paper](https://ai.meta.com/research/publications/llama-guard-llm-based-input-output-safeguard-for-human-ai-conversations/). This model has been trained on a mix of publicly-available datasets to enable detection of common types of potentially risky or violating content that may be relevant to a number of developer use cases. Ultimately, our vision is to enable developers to customize this model to support relevant use cases and to make it easier to adopt best practices and improve the open ecosystem.
+
+## Getting Started 
+To get started and learn how to use Purple Llama components with Llama models, see the getting started guide [here](https://ai.meta.com/llama/get-started/). The guide provides information and resources to help you set up Llama including how to access the model, hosting, how-to and integration guides. Additionally, you will find supplemental materials to further assist you while responsibly building with Llama. The guide will be updated as more Purple Llama components get released. 
 
 ## FAQ
 For a running list of frequently asked questions, for not only Purple Llama components but also generally for Llama models, see the FAQ [here](https://ai.meta.com/llama/faq/).
 
 ## Join the Purple Llama community
-See the [CONTRIBUTING](CONTRIBUTING.md) file for how to help out.
+See the [CONTRIBUTING](CONTRIBUTING.md) file for how to help out.