Directory Overview

Welcome to the analysis/ directory! This folder contains various analysis implementations for LLM360 models. Each subfolder is an independent and self-contained module with setup instructions, relying soley on the code within the subfolder.

Data memorization (memorization/) evaluates model memorization of the training data.
LLM Unlearning (unlearn/) implements machine unlearning methods to remove an LLM's hazardous knowledge.
Safety360 (safety360/) contains modules to measure model safety:
- bold/ provides sentiment analysis with BOLD dataset.
- toxic_detection/ measures model's capability to identify toxic text.
- toxigen/ evaluate model's toxicity on text generation.
- wmdp/ evaluate model's hazardous knowledge.
Mechanistic Interpretability (mechinterp/) contains packages visualizing algorithms executed by LLMs during inference.
Evaluation metrics (metrics/) contains modules for model evaluation:
- harness/ provides instructions to evaluate models following the Open LLM Leaderboard.
- ppl/ evaluates model per-token perplexity

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Directory Overview

Files

README.md

Latest commit

History

README.md

File metadata and controls

Directory Overview