Skip to content

Latest commit

 

History

History
15 lines (13 loc) · 1.43 KB

README.md

File metadata and controls

15 lines (13 loc) · 1.43 KB

Directory Overview

Welcome to the analysis/ directory! This folder contains various analysis implementations for LLM360 models. Each subfolder is an independent and self-contained module with setup instructions, relying soley on the code within the subfolder.

  1. Data memorization (memorization/) evaluates model memorization of the training data.
  2. LLM Unlearning (unlearn/) implements machine unlearning methods to remove an LLM's hazardous knowledge.
  3. Safety360 (safety360/) contains modules to measure model safety:
    • bold/ provides sentiment analysis with BOLD dataset.
    • toxic_detection/ measures model's capability to identify toxic text.
    • toxigen/ evaluate model's toxicity on text generation.
    • wmdp/ evaluate model's hazardous knowledge.
  4. Mechanistic Interpretability (mechinterp/) contains packages visualizing algorithms executed by LLMs during inference.
  5. Evaluation metrics (metrics/) contains modules for model evaluation: