Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Finishing touches #2

Closed
wants to merge 1 commit into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 4 additions & 4 deletions Llama-Guard/MODEL_CARD.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
# Model Details

Llama-Guard is a 7B parameter [Llama 2](https://arxiv.org/abs/2307.09288)-based input-output safeguard model. It can be used for classifying content in both LLM inputs (prompt classification) and in LLM responses (response classification).
Llama Guard is a 7B parameter [Llama 2](https://arxiv.org/abs/2307.09288)-based input-output safeguard model. It can be used for classifying content in both LLM inputs (prompt classification) and in LLM responses (response classification).

It acts as an LLM: it generates text in its output that indicates whether a given prompt or response is safe/unsafe, and if unsafe based on a policy, it also lists the violating subcategories. Here is an example:

![](Llama-Guard_example.png)
![](Llama Guard_example.png)

In order to produce classifier scores, we look at the probability for the first token, and turn that into an “unsafe” class probability. Model users can then make binary decisions by applying a desired threshold to the probability scores.

Expand Down Expand Up @@ -32,7 +32,7 @@ own internal policies and is meant to demonstrate the value of our method to
tune LLMs into classifiers that show high performance and high degrees of
adaptability to different policies.

### The Llama-Guard Safety Taxonomy & Risk Guidelines
### The Llama Guard Safety Taxonomy & Risk Guidelines

Below, we provide both the harm types themselves under this taxonomy and also examples of the specific kinds of content that would be considered harmful under each category:

Expand Down Expand Up @@ -85,6 +85,6 @@ in our paper: [LINK TO PAPER].

| | Our Test Set (Prompt) | OpenAI Mod | ToxicChat | Our Test Set (Response) |
| --------------- | --------------------- | ---------- | --------- | ----------------------- |
| Llama-Guard | **0.945** | 0.847 | **0.626** | **0.953** |
| Llama Guard | **0.945** | 0.847 | **0.626** | **0.953** |
| OpenAI API | 0.764 | **0.856** | 0.588 | 0.769 |
| Perspective API | 0.728 | 0.787 | 0.532 | 0.699 |
29 changes: 12 additions & 17 deletions Llama-Guard/README.md
Original file line number Diff line number Diff line change
@@ -1,42 +1,37 @@
# GuardLlama
# Llama Guard

GuardLlama is a new experimental model that provides input and output guardrails
Llama Guard is a new experimental model that provides input and output guardrails
for LLM deployments.

# Download

In order to download the model weights and tokenizer, please visit the Meta
website and accept our License.
In order to download the model weights and tokenizer, please visit the [Meta
website](https://ai.meta.com/resources/models-and-libraries/llama-downloads/) and accept our License.

Once your request is approved, you will receive a signed URL over email. Then
run the download.sh script, passing the URL provided when prompted to start the
download.

Pre-requisites: Make sure you have wget and md5sum installed. Then to run the
script: ./download.sh.
script: `./download.sh`.

Keep in mind that the links expire after 24 hours and a certain amount of
downloads. If you start seeing errors such as 403: Forbidden, you can always
downloads. If you start seeing errors such as `403: Forbidden`, you can always
re-request a link.

# Access on HuggingFace

[TODO CHANGE LINK] We are also providing downloads on Hugging Face. You must
first request a download from the Meta website using the same email address as
your Hugging Face account. After doing so, you can request access to any of the
models on Hugging Face and within 1-2 days your account will be granted access
to all versions.

# Quick Start
Since Llama Guard is a fine-tuned Llama-7B model (see our [model card](MODEL_CARD.md) for more information), the same quick start
steps outlined in our [README file](https://github.com/facebookresearch/llama/blob/main/README.md) for Llama2 apply here.

TODO to be written.
In addition to that, we added examples using Llama Guard in the [Llama 2 recipes repository](https://github.com/facebookresearch/llama-recipes).

# Issues

Please report any software bug, or other problems with the models through one
Please report any software bug, or other problems with the models through one
of the following means:

- Reporting issues with the GuardLlama model:
- Reporting issues with the Llama Guard model:
[github.com/facebookresearch/purplellama](github.com/facebookresearch/purplellama)
- Reporting issues with Llama in general:
[github.com/facebookresearch/llama](github.com/facebookresearch/llama)
Expand All @@ -57,4 +52,4 @@ as our accompanying [Acceptable Use Policy](USE_POLICY).

# References

Research Paper: [TODO ADD LINK]
[Research Paper](https://ai.facebook.com/research/publications/llama-guard-llm-based-input-output-safeguard-for-human-ai-conversations/)
60 changes: 3 additions & 57 deletions Llama-Guard/download.sh
Original file line number Diff line number Diff line change
@@ -1,70 +1,16 @@
#!/usr/bin/env bash

# Copyright (c) Meta Platforms, Inc. and affiliates.
# This software may be used and distributed according to the terms of the Llama 2 Community License Agreement.

set -e

read -p "Enter the URL from email: " PRESIGNED_URL
echo ""
read -p "Enter the list of models to download without spaces (7B,13B,70B,7B-chat,13B-chat,70B-chat), or press Enter for all: " MODEL_SIZE
TARGET_FOLDER="." # where all files should end up
mkdir -p ${TARGET_FOLDER}

if [[ $MODEL_SIZE == "" ]]; then
MODEL_SIZE="7B,13B,70B,7B-chat,13B-chat,70B-chat"
fi

echo "Downloading LICENSE and Acceptable Usage Policy"
wget --continue ${PRESIGNED_URL/'*'/"LICENSE"} -O ${TARGET_FOLDER}"/LICENSE"
wget --continue ${PRESIGNED_URL/'*'/"USE_POLICY.md"} -O ${TARGET_FOLDER}"/USE_POLICY.md"

echo "Downloading tokenizer"
wget --continue ${PRESIGNED_URL/'*'/"tokenizer.model"} -O ${TARGET_FOLDER}"/tokenizer.model"
wget --continue ${PRESIGNED_URL/'*'/"tokenizer_checklist.chk"} -O ${TARGET_FOLDER}"/tokenizer_checklist.chk"
CPU_ARCH=$(uname -m)
if [ "$CPU_ARCH" = "arm64" ]; then
(cd ${TARGET_FOLDER} && md5 tokenizer_checklist.chk)
else
(cd ${TARGET_FOLDER} && md5sum -c tokenizer_checklist.chk)
fi

for m in ${MODEL_SIZE//,/ }
do
if [[ $m == "7B" ]]; then
SHARD=0
MODEL_PATH="llama-2-7b"
elif [[ $m == "7B-chat" ]]; then
SHARD=0
MODEL_PATH="llama-2-7b-chat"
elif [[ $m == "13B" ]]; then
SHARD=1
MODEL_PATH="llama-2-13b"
elif [[ $m == "13B-chat" ]]; then
SHARD=1
MODEL_PATH="llama-2-13b-chat"
elif [[ $m == "70B" ]]; then
SHARD=7
MODEL_PATH="llama-2-70b"
elif [[ $m == "70B-chat" ]]; then
SHARD=7
MODEL_PATH="llama-2-70b-chat"
fi

echo "Downloading ${MODEL_PATH}"
mkdir -p ${TARGET_FOLDER}"/${MODEL_PATH}"

for s in $(seq -f "0%g" 0 ${SHARD})
do
wget --continue ${PRESIGNED_URL/'*'/"${MODEL_PATH}/consolidated.${s}.pth"} -O ${TARGET_FOLDER}"/${MODEL_PATH}/consolidated.${s}.pth"
done

wget --continue ${PRESIGNED_URL/'*'/"${MODEL_PATH}/params.json"} -O ${TARGET_FOLDER}"/${MODEL_PATH}/params.json"
wget --continue ${PRESIGNED_URL/'*'/"${MODEL_PATH}/checklist.chk"} -O ${TARGET_FOLDER}"/${MODEL_PATH}/checklist.chk"
echo "Checking checksums"
if [ "$CPU_ARCH" = "arm64" ]; then
(cd ${TARGET_FOLDER}"/${MODEL_PATH}" && md5 checklist.chk)
else
(cd ${TARGET_FOLDER}"/${MODEL_PATH}" && md5sum -c checklist.chk)
fi
done
mkdir -p ${TARGET_FOLDER}"/llama-guard"
wget --continue ${PRESIGNED_URL/'*'/"consolidated.00.pth"} -O ${TARGET_FOLDER}"/llama-guard/consolidated.00.pth"
wget --continue ${PRESIGNED_URL/'*'/"params.json"} -O ${TARGET_FOLDER}"/llama-guard/params.json"
32 changes: 25 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,29 +3,47 @@
</p>

<p align="center">
🤗 <a href="https://huggingface.co/meta-Llama">Hugging Face</a>&nbsp&nbsp | <a href="">Blog</a>&nbsp&nbsp | <a href="https://ai.facebook.com/llama/purple-llama">Website</a>&nbsp&nbsp | <a href="https://ai.meta.com/research/publications/purple-llama-cyberseceval-a-benchmark-for-evaluating-the-cybersecurity-risks-of-large-language-models/">CyberSec Eval Paper</a>&nbsp&nbsp | <a href="https://ai.meta.com/research/publications/llama-guard-llm-based-input-output-safeguard-for-human-ai-conversations/">Llama Guard Paper</a>&nbsp&nbsp
🤗 <a href="https://huggingface.co/meta-Llama"> Models on Hugging Face</a>&nbsp | <a href="https://ai.facebook.com/blog/purple-llama-open-trust-safety-generative-ai"> Blog</a>&nbsp | <a href="https://ai.facebook.com/llama/purple-llama">Website</a>&nbsp | <a href="https://ai.meta.com/research/publications/purple-llama-cyberseceval-a-benchmark-for-evaluating-the-cybersecurity-risks-of-large-language-models/">CyberSec Eval Paper</a>&nbsp&nbsp | <a href="https://ai.meta.com/research/publications/llama-guard-llm-based-input-output-safeguard-for-human-ai-conversations/">Llama Guard Paper</a>&nbsp
<br>

--------------------------------------------------------------------------------
# Purple Llama
Purple Llama is a an umbrella project that over time will bring together tools and evals to help the community build responsibly with open generative AI models. The initial release will include tools and evals for Cyber Security and Input/Output safeguards but we plan to contribute more in the near future.

## Why purple?
Borrowing a [concept](https://www.youtube.com/watch?v=ab_Fdp6FVDI) from the cybersecurity world, we believe that to truly mitigate the challenges which generative AI presents, we need to take both attack (red team) and defensive (blue team) postures. Purple teaming, composed of both red and blue team responsibilities, is a collaborative approach to evaluating and mitigating potential risks and the same ethos applies to generative AI and hence our investment in Purple Llama will be comprehensive.
Borrowing a [concept](https://www.youtube.com/watch?v=ab_Fdp6FVDI) from the cybersecurity world, we believe that to truly mitigate the challenges which generative AI presents, we need to take both attack (red team) and defensive (blue team) postures. Purple teaming, composed of both red and blue team responsibilities, is a collaborative approach to evaluating and mitigating potential risks and the same ethos applies to generative AI and hence our investment in Purple Llama will be comprehensive.

## License
Components within the Purple Llama project will be licensed permissively enabling both research and commercial usage. We believe this is a major step towards enabling community collaboration and standardizing the development and usage of trust and safety tools for generative AI development. More concretely evals and benchmarks are licensed under the MIT license while any models use the Llama 2 Community license. See the table below:

| **Component Type** | **Components** | **License** |
|:----------|:------------:|:----------:|
| Evals/Benchmarks | Cyber Security Eval (others to come) | MIT |
| Models | Llama Guard | [Llama 2 Community License](https://github.com/facebookresearch/PurpleLlama/blob/main/LICENSE) |
| Evals/Benchmarks | Cyber Security Eval (others to come) | MIT |
| Models | Llama Guard | [Llama 2 Community License](https://github.com/facebookresearch/PurpleLlama/blob/main/LICENSE) |

## Getting Started
To get started and learn how to use Purple Llama components with Llama models, see the getting started guide [here](https://ai.meta.com/llama/get-started/). The guide provides information and resources to help you set up Llama including how to access the model, hosting, how-to and integration guides. Additionally, you will find supplemental materials to further assist you while responsibly building with Llama. The guide will be updated as more Purple Llama components get released.
## Evals & Benchmarks

### Cybersecurity
We are sharing what we believe is the first industry-wide set of cybersecurity safety evaluations for LLMs. These benchmarks are based on industry guidance and standards (e.g., CWE and MITRE ATT&CK) and built in collaboration with our security subject matter experts. With this initial release, we aim to provide tools that will help address some risks outlined in the [White House commitments on developing responsible AI](https://www.whitehouse.gov/briefing-room/statements-releases/2023/07/21/fact-sheet-biden-harris-administration-secures-voluntary-commitments-from-leading-artificial-intelligence-companies-to-manage-the-risks-posed-by-ai/), including:

Metrics for quantifying LLM cybersecurity risks.
Tools to evaluate the frequency of insecure code suggestions.
Tools to evaluate LLMs to make it harder to generate malicious code or aid in carrying out cyberattacks.
We believe these tools will reduce the frequency of LLMs suggesting insecure AI-generated code and reduce their helpfulness to cyber adversaries. Our initial results show that there are meaningful cybersecurity risks for LLMs, both with recommending insecure code and for complying with malicious requests. See our [Cybersec Eval paper](https://ai.meta.com/research/publications/purple-llama-cyberseceval-a-benchmark-for-evaluating-the-cybersecurity-risks-of-large-language-models/) for more details.

## Input/Output Safeguards
As we outlined in Llama 2’s [Responsible Use Guide](https://ai.meta.com/llama/responsible-use-guide/), we recommend that all inputs and outputs to the LLM be checked and filtered in accordance with content guidelines appropriate to the application.

### Llama Guard
To support this, and empower the community, we are releasing [Llama Guard](https://ai.meta.com/research/publications/llama-guard-llm-based-input-output-safeguard-for-human-ai-conversations/), an openly-available model that performs competitively on common open benchmarks and provides developers with a pretrained model to help defend against generating potentially risky outputs.

As part of our ongoing commitment to open and transparent science, we are releasing our methodology and an extended discussion of model performance in our [Llama Guard paper](https://ai.meta.com/research/publications/llama-guard-llm-based-input-output-safeguard-for-human-ai-conversations/). This model has been trained on a mix of publicly-available datasets to enable detection of common types of potentially risky or violating content that may be relevant to a number of developer use cases. Ultimately, our vision is to enable developers to customize this model to support relevant use cases and to make it easier to adopt best practices and improve the open ecosystem.

## Getting Started
To get started and learn how to use Purple Llama components with Llama models, see the getting started guide [here](https://ai.meta.com/llama/get-started/). The guide provides information and resources to help you set up Llama including how to access the model, hosting, how-to and integration guides. Additionally, you will find supplemental materials to further assist you while responsibly building with Llama. The guide will be updated as more Purple Llama components get released.

## FAQ
For a running list of frequently asked questions, for not only Purple Llama components but also generally for Llama models, see the FAQ [here](https://ai.meta.com/llama/faq/).

## Join the Purple Llama community
See the [CONTRIBUTING](CONTRIBUTING.md) file for how to help out.
See the [CONTRIBUTING](CONTRIBUTING.md) file for how to help out.
Loading