Running an experiment requires three steps:
- Install dependencies.
- Setting up LLM access.
- Launch experiment.
You must install:
- Python 3.11
- pip
- python3.11-venv
- Git
- Docker
- Google Cloud SDK
- c++filt must be available in PATH.
- (optional for
project_src.py
) clang-format
Install required dependencies in a Python
virtual environment:
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
Setup Vertex AI or OpenAI with the following steps.
Accessing Vertex AI models require a Google Cloud Project (GCP) with Vertex AI enabled.
Then auth to GCP:
gcloud auth login
gcloud auth application-default login
gcloud auth application-default set-quota-project <your-project>
You'll also need to specify the GCP projects and locations where you have Vertex AI quota (comma delimited):
export CLOUD_ML_PROJECT_ID=<gcp-project-id>
export VERTEX_AI_LOCATIONS=us-west1,us-west4,us-east4,us-central1,northamerica-northeast1
OpenAI requires an API key.
Then set it as an ENV variable:
export OPENAI_API_KEY='<your-api-key>'
To generate and evaluate the fuzz targets in a benchmark set via local experiments:
./run_all_experiments.py \
--model=<model-name> \
--benchmarks-directory='./benchmark-sets/comparison' \
[--ai-binary=<llm-access-binary>] \
[--template-directory=prompts/custom_template] \
[--work-dir=results-dir]
[...]
# E.g., generate fuzz targets for TinyXML-2 with default template and fuzz for 30 seconds.
# ./run_all_experiments.py -y ./benchmark-sets/comparison/tinyxml2.yaml
where the <model-name>
can be:
vertex_ai_code-bison
orvertex_ai_code-bison-32k
for the Code Bison models on Vertex AI.vertex_ai_gemini-pro
for Gemini Pro on Vertex AI.gpt-3.5-turbo
orgpt-4
for OpenAI.
Experiments can also be run on Google Cloud using Google Cloud Build. You can
do this by passing
--cloud <experiment-name> --cloud-experiment-bucket <bucket>
,
where <bucket>
is the name of a Google Cloud Storage bucket your Google Cloud project.
We currently offer two sets of benchmarks:
comparison
: A small selection of OSS-Fuzz C/C++ projects.all
: All benchmarks across all OSS-Fuzz C/C++ projects.
Once finished, the framework will output experiment results like this:
================================================================================
*<project-name>, <function-name>*
build success rate: <build-rate>, crash rate: <crash-rate>, max coverage: <max-coverage>, max line coverage diff: <max-coverage-diff>
max coverage sample: <results-dir>/<benchmark-dir>/fixed_targets/<LLM-generated-fuzz-target>
max coverage diff sample: <results-dir>/<benchmark-dir>/fixed_targets/<LLM-generated-fuzz-target>
where <build-rate>
is the number of the fuzz targets that can compile over the total number of fuzz target generated by LLM (e.g., 0.5 if 4 out of 8 fuzz targets can build), <crash-rate>
is the run-time crash rate, <max-coverage>
measures the maximum line coverage of all targets, and <max-coverage-diff>
shows the max new line coverage of LLM-generated targets against existing human-written targets in OSS-Fuzz.
Note that <max-coverage>
and <max-coverage-diff>
are computed based on the code linked against the fuzz target, not the whole project.
For example:
================================================================================
*tinyxml2, tinyxml2::XMLDocument::Print*
build success rate: 1.0, crash rate: 0.125, max coverage: 0.29099427381572096, max line coverage diff: 0.11301753077209996
max coverage sample: <result-dir>/output-tinyxml2-tinyxml2-xmldocument-print/fixed_targets/08.cpp
max coverage diff sample: <result-dir>/output-tinyxml2-tinyxml2-xmldocument-print/fixed_targets/08.cpp
To visualize these results via a web UI, with more details on the exact prompts used, samples generated, and other logs, run:
python -m report.web -r <results-dir> -o <output-dir>
python -m http.server <port> -d <output-dir>
Where <results-dir>
is the directory passed to --work-dir
in your
experiments (default value ./results
).
Then navigate to http://localhost:<port>
to view the result in a table.
Configure and use framework in the following five steps:
- Configure benchmark
- Setup prompt template
- Generate fuzz target
- Fix compilation error
- Evaluate fuzz target
Prepare a benchmark YAML that specifies
the function to test, here is
an example. Follow the link above
to automatically generate one for a C
/C++
project in OSS-Fuzz
. Note that
the project under test needs to be integrated into OSS-Fuzz
to build.
Prepare prompt templates. The LLM prompt will be constructed based on the files in this directory. It starts with a priming to define the main goal and important notices, followed by some example problems and solutions. Each example problem is in the same format as the final problem (i.e., a unction signature to fuzz), and the solution is the corresponding human-written fuzz target for different functions from the same project or other projects. Prompt can also include more information of the function (e.g., its usage, source code, or parameter type definitions), and model-specific notes (e.g., common pitfalls to avoid).
You can pass an alternative template directory via --template-directory
. The
new template directory does not have to include all files: The framework will
use files from template_xml/
by default when they are missing. The default
prompt is structured as follows:
<Priming>
<Model-specific notes>
<Examples>
<Final question + Function information>
The script run_all_experiments.py
will generate fuzz targets via LLM using
the prompt constructed above and measure their code coverage. All experiment
data will be saved into the --work-dir
.
When a fuzz target fails to build, the framework will automatically make five
attempts to fix it before terminate. Each attempt asks LLM to fix the fuzz
target based on the build failure from OSS-Fuzz
, parses source code from the
response, and re-compiles it.
If the fuzz target compiles successfully, the framework fuzzes it with
libFuzzer
and measures its line coverage. The fuzzing timeout is specified by
--run-timeout
flag. Its line coverage is also compared against existing
human-written fuzz targets from OSS-Fuzz
in production.
You can a Git pre-push hook to auto-format/-lint your code:
./helper/add_pre-push_hook
Or manually run the formater/linter by running:
.github/helper/presubmit
We use https://github.com/jazzband/pip-tools to manage our Python dependencies.
# Edit requirements.in
pip install pip-tools # Required to re-generate requirements.txt from requirements.in
pip-compile requirements.in > requirements.txt
pip install -r requirements.txt