Superposition Replication Study

This repository contains implementations and experiments exploring superposition phenomena in different neural network architectures. Superposition is a phenomenon where neural networks learn to encode multiple features in the same set of weights or neurons, effectively compressing information through overlapping representations.

Project Structure

├── images/                               # Visualization outputs and diagrams
├── runs/                                 # Training logs and experiment results
├── scripts/                              # Utility scripts and helpers
├── anthropic_toy_models.ipynb            # Jupyter notebook with toy model implementations
├── environment.yaml                      # Conda environment specification
├── intro_transformer_superposition.py     # Basic transformer superposition examples
├── intro_translation_superposition.py     # Translation model superposition examples
├── toy_models_config.yaml                # Configuration for toy model experiments
├── toy_models_reproduction.py            # Scripts to reproduce toy model results
├── transformer_superposition.py           # Advanced transformer implementations
├── translation_superposition.py          # Advanced translation model experiments
└── LICENSE                               # Project license

Getting Started

Prerequisites

To run the experiments, you'll need Python 3.7+ and conda installed. Set up the environment using:

conda env create -f environment.yaml
conda activate superposition

Running Experiments

The repository includes several implementations:

Basic Examples
```
python3 intro_transformer_superposition.py
python3 intro_translation_superposition.py
```
Demonstrates superposition patterns in transformer and translation architectures.
Toy Model Reproduction
```
python3 toy_models_reproduction.py
```
Reproduces results from toy model experiments using configurations in toy_models_config.yaml.
Advanced Experiments
```
python transformer_superposition.py
python translation_superposition.py
```
Contains more sophisticated experiments and analysis tools.

Key Features

Implementation of superposition detection methods
Interactive toy models for understanding basic concepts
Visualization tools for analyzing learned representations
Comparative analysis across different model architectures
Experiments with various training configurations
Tools for measuring and quantifying superposition effects

Visualization and Analysis

Results and visualizations are stored in the images/ directory. Training logs and metrics can be found in the runs/ directory.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.

License

This project is licensed under the terms included in the LICENSE file.

Citation

If you use this code in your research, please cite:

@software{superposition_replication,
  title = {Model Superposition Replication Study},
  year = {2024},
  url = {https://github.com/mmjerge/superposition_replication}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Superposition Replication Study

Project Structure

Getting Started

Prerequisites

Running Experiments

Key Features

Visualization and Analysis

Contributing

License

Citation

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
images		images
runs		runs
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
anthropic_toy_models.ipynb		anthropic_toy_models.ipynb
environment.yaml		environment.yaml
intro_diagram_transformer_superposition.py		intro_diagram_transformer_superposition.py
intro_diagram_translation_superposition.py		intro_diagram_translation_superposition.py
toy_models_config.yaml		toy_models_config.yaml
toy_models_reproduction.py		toy_models_reproduction.py
transformer_superposition.py		transformer_superposition.py
translation_superposition.py		translation_superposition.py

License

mmjerge/superposition_replication

Folders and files

Latest commit

History

Repository files navigation

Superposition Replication Study

Project Structure

Getting Started

Prerequisites

Running Experiments

Key Features

Visualization and Analysis

Contributing

License

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages