- Set Up and Installation
- Download the Data
- Develop your Submission
- Run your Submission
- Score your Submission
To get started you will have to make a few decisions and install the repository along with its dependencies. Specifically:
- Decide if you would like to develop your submission in either PyTorch or JAX.
- Set up your workstation or VM. We recommend to use a setup similar to the benchmarking hardware.
The specs on the benchmarking machines are:
- 8xV100 16GB GPUs
- 240 GB in RAM
- 2 TB in storage (for datasets).
- Install the algorithmic package and dependencies either in a Python virtual environment or use a Docker (recommended) or Singularity/Apptainer container.
Prerequisites:
- Python minimum requirement >= 3.8
- CUDA 11.8
- NVIDIA Driver version 535.104.05
To set up a virtual enviornment and install this repository
-
Create new environment, e.g. via
conda
orvirtualenv
sudo apt-get install python3-venv python3 -m venv env source env/bin/activate
-
Clone this repository
git clone https://github.com/mlcommons/algorithmic-efficiency.git cd algorithmic-efficiency
-
Run the following pip3 install commands based on your chosen framework to install
algorithmic_efficiency
and its dependencies.For JAX:
pip3 install -e '.[pytorch_cpu]' pip3 install -e '.[jax_gpu]' -f 'https://storage.googleapis.com/jax-releases/jax_cuda_releases.html' pip3 install -e '.[full]'
For PyTorch
Note: the below command assumes you have CUDA 12.1 installed locally. This is the default in the provided Docker image. We recommend you match this CUDA version but if you decide to run with a different local CUDA version, please find the appropriate wheel url to pass to the
pip install
command forpytorch
.pip3 install -e '.[jax_cpu]' pip3 install -e '.[pytorch_gpu]' -f 'https://download.pytorch.org/whl/cu121' pip3 install -e '.[full]'
Per workload installations
You can also install the requirements for individual workloads, e.g. viapip3 install -e '.[librispeech]'
or all workloads at once via
pip3 install -e '.[full]'
We recommend using a Docker container to ensure a similar environment to our scoring and testing environments. Alternatively, a Singularity/Apptainer container can also be used (see instructions below).
Prerequisites:
- NVIDIA Driver version 535.104.05
- NVIDIA Container Toolkit so that the containers can locate the NVIDIA drivers and GPUs. See instructions here.
-
Clone this repository
cd ~ && git clone https://github.com/mlcommons/algorithmic-efficiency.git
-
Build Docker image
cd algorithmic-efficiency/docker docker build -t <docker_image_name> . --build-arg framework=<framework>
The
framework
flag can be eitherpytorch
,jax
orboth
. Specifying the framework will install the framework specific dependencies. Thedocker_image_name
is arbitrary.
To use the Docker container as an interactive virtual environment, you can run a container mounted to your local data and code directories and execute the bash
program. This may be useful if you are in the process of developing a submission.
-
Run detached Docker container. The
container_id
will be printed if the container is run successfully.docker run -t -d \ -v $HOME/data/:/data/ \ -v $HOME/experiment_runs/:/experiment_runs \ -v $HOME/experiment_runs/logs:/logs \ -v $HOME/algorithmic-efficiency:/algorithmic-efficiency \ --gpus all \ --ipc=host \ <docker_image_name> \ --keep_container_alive true
Note: You may have to use double quotes around
algorithmic-efficiency
[path] in the mounting-v
flag. If the above command fails try replacing the following line:-v $HOME/algorithmic-efficiency:/algorithmic-efficiency2 \
with
-v $HOME"/algorithmic-efficiency:/algorithmic-efficiency" \
-
Open a bash terminal
docker exec -it <container_id> /bin/bash
Since many compute clusters don't allow the usage of Docker due to securtiy concerns and instead encourage the use of Singularity/Apptainer (formerly Singularity, now called Apptainer), we also provide an Apptainer recipe (located at docker/Singularity.def
) that can be used to build an image by running
singularity build --fakeroot <singularity_image_name>.sif Singularity.def
Note that this can take several minutes. Then, to start a shell session with GPU support (by using the --nv
flag), we can run
singularity shell --bind $HOME/data:/data,$HOME/experiment_runs:/experiment_runs \
--nv <singularity_image_name>.sif
Note the --bind
flag which, similarly to Docker, allows to bind specific paths on the host system and the container, as explained here.
Also note that we generated Singularity.def
automatically from the Dockerfile
using spython, as follows:
pip3 install spython
cd algorithmic-efficiency/docker
python scripts/singularity_converter.py -i Dockerfile -o Singularity.def
Users that wish to customize their images are invited to check and modify the Singularity.def
recipe and the singularity_converter.py
script.
The workloads in this benchmark use 6 different datasets across 8 workloads. You may choose to download some or all of the datasets as you are developing your submission, but your submission will be scored across all 8 workloads. For instructions on obtaining and setting up the datasets see datasets/README.
To develop a submission you will write a Python module containing your training algorithm. Your training algorithm must implement a set of predefined API methods for the initialization and update steps.
Make a submissions subdirectory to store your submission modules e.g. algorithmic-effiency/submissions/my_submissions
.
You can find examples of submission modules under algorithmic-efficiency/prize_qualification_baselines
and algorithmic-efficiency/reference_algorithms
.
A submission for the external ruleset will consist of a submission module and a tuning search space definition.
- Copy the template submission module
submissions/template/submission.py
into your submissions directory e.g. inalgorithmic-efficiency/my_submissions
. - Implement at least the methods in the template submission module. Feel free to use helper functions and/or modules as you see fit. Make sure you adhere to to the competition rules. Check out the guidelines for allowed submissions, disallowed submissions and pay special attention to the software dependencies rule.
- Add a tuning configuration e.g.
tuning_search_space.json
file to your submission directory. For the tuning search space you can either:-
Define the set of feasible points by defining a value for "feasible_points" for the hyperparameters:
{ "learning_rate": { "feasible_points": 0.999 }, }
For a complete example see tuning_search_space.json.
-
Define a range of values for quasirandom sampling by specifing a
min
,max
andscaling
keys for the hyperparameter:{ "weight_decay": { "min": 5e-3, "max": 1.0, "scaling": "log", } }
For a complete example see tuning_search_space.json.
-
From your virtual environment or interactively running Docker container run your submission with submission_runner.py
:
JAX: to score your submission on a workload, from the algorithmic-efficency directory run:
python3 submission_runner.py \
--framework=jax \
--workload=mnist \
--experiment_dir=<path_to_experiment_dir>\
--experiment_name=<experiment_name> \
--submission_path=submissions/my_submissions/submission.py \
--tuning_search_space=<path_to_tuning_search_space>
Pytorch: to score your submission on a workload, from the algorithmic-efficency directory run:
python3 submission_runner.py \
--framework=pytorch \
--workload=<workload> \
--experiment_dir=<path_to_experiment_dir> \
--experiment_name=<experiment_name> \
--submission_path=<path_to_submission_module> \
--tuning_search_space=<path_to_tuning_search_space>
We recommend using PyTorch's Distributed Data Parallel (DDP) when using multiple GPUs on a single node. You can initialize ddp with torchrun. For example, on single host with 8 GPUs simply replace python3
in the above command by:
torchrun --redirects 1:0,2:0,3:0,4:0,5:0,6:0,7:0 --standalone --nnodes=1 --nproc_per_node=N_GPUS
where N_GPUS
is the number of available GPUs on the node.
So the complete command is:
torchrun --redirects 1:0,2:0,3:0,4:0,5:0,6:0,7:0 \
--standalone \
--nnodes=1 \
--nproc_per_node=N_GPUS \
submission_runner.py \
--framework=pytorch \
--workload=<workload> \
--experiment_dir=<path_to_experiment_dir> \
--experiment_name=<experiment_name> \
--submission_path=<path_to_submission_module> \
--tuning_search_space=<path_to_tuning_search_space>
The container entrypoint script provides the following flags:
--dataset
dataset: can be 'imagenet', 'fastmri', 'librispeech', 'criteo1tb', 'wmt', or 'ogbg'. Setting this flag will download data if~/data/<dataset>
does not exist on the host machine. Required for running a submission.--framework
framework: can be either 'pytorch' or 'jax'. If you just want to download data, this flag is required for-d imagenet
since we have two versions of data for imagenet. This flag is also required for running a submission.--submission_path
submission_path: path to submission file on container filesystem. If this flag is set, the container will run a submission, so it is required for running a submission.--tuning_search_space
tuning_search_space: path to file containing tuning search space on container filesystem. Required for running a submission.--experiment_name
experiment_name: name of experiment. Required for running a submission.--workload
workload: can be 'imagenet_resnet', 'imagenet_jax', 'librispeech_deepspeech', 'librispeech_conformer', 'ogbg', 'wmt', 'fastmri' or 'criteo1tb'. Required for running a submission.--max_global_steps
max_global_steps: maximum number of steps to run the workload for. Optional.--keep_container_alive
: can be true or false. Iftrue
the container will not be killed automatically. This is useful for developing or debugging.
To run the docker container that will run the submission runner run:
docker run -t -d \
-v $HOME/data/:/data/ \
-v $HOME/experiment_runs/:/experiment_runs \
-v $HOME/experiment_runs/logs:/logs \
--gpus all \
--ipc=host \
<docker_image_name> \
--dataset <dataset> \
--framework <framework> \
--submission_path <submission_path> \
--tuning_search_space <tuning_search_space> \
--experiment_name <experiment_name> \
--workload <workload> \
--keep_container_alive <keep_container_alive>
This will print the container ID to the terminal.
To find the container IDs of running containers
docker ps
To see output of the entrypoint script
docker logs <container_id>
To enter a bash session in the container
docker exec -it <container_id> /bin/bash
To score your submission we will score over all fixed workloads, held-out workloads and studies as described in the rules. We will sample 1 held-out workload per dataset for a total of 6 held-out workloads and will use the sampled held-out workloads in the scoring criteria for the matching fixed base workloads. In other words, the total number of runs expected for official scoring is:
- for external tuning ruleset: 350 = (8 (fixed workloads) + 6 (held-out workloads)) x 5 (studies) x 5 (trials)
- for self-tuning ruleset: 70 = (8 (fixed workloads) + 6 (held-out workloads)) x 5 (studies)
To run workloads for (a mock) scoring you may specify a "virtual" list of held-out workloads. It is important to note that the official set of held-out workloads will be sampled by the competition organizers during scoring time.
An example config for held-out workloads is stored in scoring/held_workloads_example.json
.
To generate a new sample of held out workloads run:
python3 generate_held_out_workloads.py --seed <optional_rng_seed> --output_filename <output_filename>
To run a number of studies and trials over all workload using Docker containers for each run:
python scoring/run_workloads.py \
--framework <framework> \
--experiment_name <experiment_name> \
--docker_image_url <docker_image_url> \
--submission_path <sumbission_path> \
--tuning_search_space <submission_path> \
--held_out_workloads_config_path held_out_workloads_example.json \
--num_studies <num_studies>
--seed <rng_seed>
Note that to run the above script you will need at least the jax_cpu
and pytorch_cpu
installations of the algorithmic-efficiency
package.
During submission development, it might be useful to do faster, approximate scoring (e.g. without 5
different studies or when some trials are missing) so the scoring scripts allow someflexibility.
To simulate official scoring, pass the --strict=True
flag in score_submission.py
. To get the raw scores and performance profiles of group of submissions or single submission:
python score_submissions.py --submission_directory <directory_with_submissions> --output_dir <output_dir> --compute_performance_profiles
We provide the scores and performance profiles for the paper baseline algorithms in the "Baseline Results" section in Benchmarking Neural Network Training Algorithms.
Good Luck!