Skip to content

Commit

Permalink
chore(benchmarks): tidy up benchmark (awslabs#292)
Browse files Browse the repository at this point in the history
Delete leftover script files. Add minor change to Jupyter Notebook
(`"ec2_metadata"` key in results table). Simplify pyproject.toml
dependencies list. Change some example parameters in the Hydra config
files, add clarifying comments. Rework the READMEs. Tune the
utils/prepare_nvme.sh to work for both Amazon Linux and Ubuntu EC2
instances. Update global .gitignore. Delete
utils/prepare_ec2_instance.sh, and add its content to the README. For
dataset scenario, add training time measurement around epochs. Minor
Python code improvements.
  • Loading branch information
matthieu-d4r authored Jan 10, 2025
1 parent 79582bd commit 58de12a
Show file tree
Hide file tree
Showing 25 changed files with 315 additions and 556 deletions.
6 changes: 4 additions & 2 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -60,8 +60,10 @@ venv.bak/
.dmypy.json
dmypy.json

# Hydra (https://hydra.cc/)
multirun/
# PyTorch benchmarks: Hydra, NVMe directory, and CSV results
s3torchbenchmarking/**/multirun/
s3torchbenchmarking/**/nvme/
s3torchbenchmarking/**/*.csv

# Rust .gitignore (https://github.com/github/gitignore/blob/main/Rust.gitignore) -- cherry-picked ######################

Expand Down
255 changes: 128 additions & 127 deletions s3torchbenchmarking/README.md
Original file line number Diff line number Diff line change
@@ -1,179 +1,180 @@
# Benchmarking the S3 Connector for PyTorch
# s3torchbenchmarking

This directory contains a modular component for the experimental evaluation of the performance of the Amazon S3 Connector for
PyTorch.
The goal of this component is to be able to run performance benchmarks for PyTorch connectors in an easy-to-reproduce and
extensible fashion. This way, users can experiment with different settings and arrive at the optimal configuration for their workloads,
before committing to a setup.
This Python package houses a set of benchmarks for experimentally evaluating the performance of
the **Amazon S3 Connector for PyTorch** library.

By managing complex configuration space with [Hydra](https://hydra.cc/) we are able to define modular configuration pieces mapped to various
stages of the training pipeline. This approach allows one to mix and match configurations and measure the performance
impact to the end-to-end training process.
With the use of the [Hydra](https://hydra.cc/) framework, we are able to define modular configuration pieces mapped to
various stages of the training pipeline. This approach allows one to mix and match configurations and measure the
performance impact to the end-to-end training process.

There are **three scenarios** available:
**Four scenarios** are available:

- **Data loading benchmarks**: measure our connector against other Dataset classes (i.e., classes used to fetch and
index actual datasets); all save to S3.
- **PyTorch Lightning Checkpointing benchmarks**: measure our connector, using the PyTorch Lightning framework, against
the latter default implementation of checkpointing.
- **PyTorch’s Distributed Checkpointing (DCP) benchmarks**: measure our connector against PyTorch default distributed
checkpointing mechanism — learn more in [this dedicated README](src/s3torchbenchmarking/dcp/README.md).
1. **Dataset benchmarks**
- Compare our connector against other Dataset classes
- All scenarios save data to S3
- Measure performance in data fetching and indexing
2. **PyTorch's Distributed Checkpointing (DCP) benchmarks**
- Assess our connector's performance versus PyTorch's default distributed checkpointing mechanism
- For detailed information, refer to the [dedicated DCP `README`](src/s3torchbenchmarking/dcp/README.md)
3. **PyTorch Lightning Checkpointing benchmarks**
- Evaluate our connector within the PyTorch Lightning framework
- Compare against PyTorch Lightning's default checkpointing implementation
4. **PyTorch Checkpointing benchmarks**
- TODO!

## Getting Started
## Getting started

The benchmarking code is available within the `src/s3torchbenchmarking` module.
The benchmarking code is located in the `src/s3torchbenchmarking` module. The scenarios are designed to be run on an EC2
instance with one (or many) GPU(s).

The tests can be run locally, or you can launch an EC2 instance with a GPU (we used a [g5.2xlarge][g5.2xlarge]),
choosing the [AWS Deep Learning AMI GPU PyTorch 2.5 (Ubuntu 22.04)][dl-ami] as your AMI.
### EC2 instance setup (recommended)

First, activate the Conda env within this machine by running:
From your EC2 AWS Console, launch an instance with one (or many) GPU(s) (e.g., G5 instance type); we recommend using
an [AWS Deep Learning AMI (DLAMI)][dlami], such
as [AWS Deep Learning AMI GPU PyTorch 2.5 (Amazon Linux 2023)][dlami-pytorch].

> [!NOTE]
> Some benchmarks can be long-running. To avoid the shortcomings around expired AWS tokens, we recommend attaching a
> role to your EC2 instance with:
>
> - Full access to S3
> - (Optional) Full access to DynamoDB — for writing run results
>
> See the [Running the benchmarks](#running-the-benchmarks) section for more details.
For optimal results, it is recommended to run the benchmarks on a dedicated EC2 instance _without_ other
resource-intensive processes.

### Creating a new Conda environment (env)

> [!WARNING]
> While some DLAMIs provide a pre-configured Conda env (`source activate pytorch`), we have observed compatibility
> issues with the latest PyTorch versions (2.5.X) at the time of writing. We recommend creating a new one from scratch
> as detailed below.
Once your instance is running, `ssh` into it, and create a new Conda env:

```shell
source activate pytorch
conda create -n pytorch-benchmarks python=3.12
conda init
```

If running locally you can optionally configure a Python venv:
Then, activate it (_you will need to log out and in again in the meantime, as signaled by `conda init`_):

```shell
python -m venv <ENV-NAME>
source <PATH-TO-VENV>/bin/activate
source activate pytorch-benchmarks
```

Then, `cd` to the `s3torchbenchmarking` directory, and run the `utils/prepare_ec2_instance.sh` script: the latter will
take care of updating the instance's packages (through either `yum` or `apt`), install Mountpoint for Amazon S3, and
install the required Python packages.
Finally, from within this directory, install the `s3torchbenchmarking` module:

```shell
# `-e` so local modifications get picked up, if any
pip install -e .
```

> [!NOTE]
> Some errors may arise while trying to run the benchmarks; below are some workarounds to execute in such cases.
- Error `RuntimeError: operator torchvision::nms does not exist` while trying the run the benchmarks:
```shell
conda install -y pytorch torchvision torchaudio pytorch-cuda=12.4 -c pytorch -c nvidia
```
- Error `TypeError: canonicalize_version() got an unexpected keyword argument 'strip_trailing_zero'` while trying to
install `s3torchbenchmarking` package:
```shell
pip install "setuptools<71"
```
> For some scenarios, you may be required to install the [Mountpoint for Amazon S3][mountpoint-s3] file client: please
> refer to their README for instructions.
### (Pre-requisite) Configure AWS Credentials

The commands provided below (`datagen.py`, `benchmark.py`) rely on the
standard [AWS credential discovery mechanism][credentials]. Supplement the command as necessary to ensure the AWS
credentials are made available to the process, e.g., by setting the `AWS_PROFILE` environment variable.
The benchmarks and other commands provided below rely on the standard [AWS credential discovery mechanism][credentials].
Supplement the command as necessary to ensure the AWS credentials are made available to the process, e.g., by setting
the `AWS_PROFILE` environment variable.

### Configuring the dataset
### Creating a dataset (optional; for "dataset" benchmarks only)

_Note: This is a one-time setup for each dataset configuration. The dataset configuration files, once created locally
and can be used in subsequent benchmarks, as long as the dataset on the S3 bucket is intact._
You can use your own dataset for the benchmarks, or you can generate one on-the-fly using the `s3torch-datagen` command.

If you already have a dataset, you only need upload it to an S3 bucket and set up a YAML file under
`./conf/dataset/` in the following format:

```yaml
# custom_dataset.yaml
Here are some sample dataset configurations that we ran our benchmarks against:

prefix_uri: s3://<S3_BUCKET>/<S3_PREFIX>/
region: <AWS_REGION>
sharding: TAR|null # if the samples have been packed into TAR archives.
```shell
s3torch-datagen -n 100k --shard-size 128MiB --s3-bucket my-bucket --region us-east-1
```

This dataset can then be referenced in an experiment with an entry like `dataset: custom_dataset` (note that we're
omitting the *.yaml extension). This will result in running the benchmarks against this dataset. Some experiments have
already been defined for reference - see `./conf/dataloading.yaml` or `./conf/sharding.yaml`.
## Running the benchmarks

_Note: Ensure the bucket is in the same region as the EC2 instance to eliminate network latency effects in your
measurements._
You can run the different benchmarks by editing their corresponding config files, then running one of those shell
script (specifically, you must provide a value for all keys marked with `???`):

Alternatively, you can use the `s3torch-datagen` command to procedurally generate an image dataset and upload it to
Amazon S3. The script also creates a Hydra configuration file at the appropriate path.
```shell
# Dataset benchmarks
vim ./conf/dataset.yaml # 1. edit config
./utils/run_dataset_benchmarks.sh # 2. run scenario

```
$ s3torch-datagen --help
Usage: s3torch-datagen [OPTIONS]

Synthesizes a dataset that will be used for benchmarking and uploads it to
an S3 bucket.

Options:
-n, --num-samples FLOAT Number of samples to generate. Can be supplied as
an IEC or SI prefix. Eg: 1k, 2M. Note: these are
case-sensitive notations. [default: 1k]
--resolution TEXT Resolution written in 'widthxheight' format
[default: 496x387]
--shard-size TEXT If supplied, the images are grouped into tar files
of the given size. Size can be supplied as an IEC
or SI prefix. Eg: 16Mib, 4Kb, 1Gib. Note: these are
case-sensitive notations.
--s3-bucket TEXT S3 Bucket name. Note: Ensure the credentials are
made available either through environment variables
or a shared credentials file. [required]
--s3-prefix TEXT Optional S3 Key prefix where the dataset will be
uploaded. Note: a prefix will be autogenerated. eg:
s3://<BUCKET>/1k_256x256_16Mib_sharded/
--region TEXT Region where the S3 bucket is hosted. [default:
us-east-1]
--help Show this message and exit.
# PyTorch Checkpointing benchmarks
vim ./conf/pytorch_checkpointing.yaml # 1. edit config
./utils/run_checkpoints_benchmarks.sh # 2. run scenario

# PyTorch Lightning Checkpointing benchmarks
vim ./conf/lightning_checkpointing.yaml # 1. edit config
./utils/run_lighning_benchmarks.sh # 2. run scenario

# PyTorch’s Distributed Checkpointing (DCP) benchmarks
vim ./conf/dcp.yaml # 1. edit config
./utils/run_dcp_benchmarks.sh # 2. run scenario
```

Here are some sample dataset configurations that we ran our benchmarks against:
> [!NOTE]
> Ensure the bucket is in the same region as the EC2 instance, to eliminate network latency effects in your
> measurements.
- `-n 20k --resolution 496x387`
- `-n 20k --resolution 496x387 --shard-size {4, 8, 16, 32, 64}MiB`
Each of those scripts rely on Hydra config files, located under the [`conf`](conf) directory. You may edit those as you
see fit to configure the runs: in particular, parameters under the `hydra.sweeper.params` path will create as many jobs
as the cartesian product of those.

Example:
Also, as the scripts pass the inline parameters you give them to Hydra, you may override their behaviors this way:

```
$ s3torch-datagen -n 20k \
--resolution 496x387 \
--shard-size 4MB \
--s3-bucket swift-benchmark-dataset \
--region eu-west-2

Generating data: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:00<00:00, 1243.50it/s]
Uploading to S3: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9/9 [00:00<00:00, 3378.87it/s]
Dataset uploaded to: s3://swift-benchmark-dataset/20k_496x387_images_4MB_shards/
Dataset Configuration created at: ./conf/dataset/20k_496x387_images_4MB_shards.yaml
Configure your experiment by setting the entry:
dataset: 20k_496x387_images_4MB_shards
Alternatively, you can run specify it on the cmd-line when running the benchmark like so:
s3torch-benchmark -cd conf -m -cn <CONFIG-NAME> 'dataset=20k_496x387_images_4MB_shards'
```shell
./utils/run_dataset_benchmarks.sh +disambiguator=some_key
```

---
## Getting the results

Finally, once the dataset and other configuration modules have been defined, you can kick off the benchmark by running:
### Scenario organization

```shell
# For data loading benchmarks:
$ . utils/run_dataset_benchmarks.sh
Benchmark results are organized as follows, inside a default `./multirun` directory (e.g.):

```
./multirun
└── dataset
└── 2024-12-20_13-42-27
├── 0
│ ├── benchmark.log
│ └── job_results.json
├── 1
│ ├── benchmark.log
│ └── job_resutls.json
├── multirun.yaml
└── run_results.json
```

# For PyTorch Checkpointing benchmarks:
$ . utils/run_checkpoints_benchmarks.sh
Scenarios are organized at the top level, each in its own directory named after the scenario (e.g., `dataset`). Within
each scenario directory, you'll find individual run directories, automatically named by Hydra using the creation
timestamp (e.g., `2024-12-20_13-42-27`).

# For PyTorch Lightning Checkpointing benchmarks:
$ . utils/run_lighning_benchmarks.sh
Each run directory contains job subdirectories (e.g., `0`, `1`, etc.), corresponding to a specific subset of parameters.

# For PyTorch’s Distributed Checkpointing (DCP) benchmarks:
$ . utils/run_dcp_benchmarks.sh
```
### Experiment reporting

_Note: For overriding any other benchmark parameters, see [Hydra Overrides][hydra-overrides]. You can also run
`s3torch-benchmark --hydra-help` to learn more._
Experiments will report various metrics, such as throughput and processed time — the exact types vary per scenarios.
Results are stored in two locations:

Experiments will report various metrics, like throughput, processed time, etc. The results for individual jobs and runs
(one run will contain 1 to N jobs) will be written out to dedicated files, respectively `job_results.json` and
`run_results.json`, within their corresponding output directory (see the YAML config files).
1. In the job subdirectories:
- `benchmark.log`: Individual job logs (collected by Hydra)
- `job_results.json`: Individual job results
2. In the run directory:
- `multirun.yaml`: Global Hydra configuration for the run
- `run_results.json`: Comprehensive run results, including additional metadata

## Next Steps
If a DynamoDB table is defined in the [`conf/aws/dynamodb.yaml`](conf/aws/dynamodb.yaml) configuration file, results
will also be written to the specified table.

- Add more models (LLMs?) to monitor training performance.
- Support plugging in user-defined models and automatic discovery of the same.
[dlami]: https://docs.aws.amazon.com/dlami/

[g5.2xlarge]: https://aws.amazon.com/ec2/instance-types/g5/
[dlami-pytorch]: https://aws.amazon.com/releasenotes/aws-deep-learning-ami-gpu-pytorch-2-5-amazon-linux-2023/

[dl-ami]: https://docs.aws.amazon.com/dlami/latest/devguide/appendix-ami-release-notes.html
[mountpoint-s3]: https://github.com/awslabs/mountpoint-s3/tree/main

[credentials]: https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-files.html

Expand Down
25 changes: 17 additions & 8 deletions s3torchbenchmarking/benchmark_results_aggregator.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@
},
{
"cell_type": "code",
"execution_count": 4,
"execution_count": 1,
"id": "6522fc8a931ffbc3",
"metadata": {
"ExecuteTime": {
Expand All @@ -39,7 +39,7 @@
},
{
"cell_type": "code",
"execution_count": 5,
"execution_count": 2,
"id": "a371fc9062af6126",
"metadata": {
"ExecuteTime": {
Expand Down Expand Up @@ -76,7 +76,7 @@
},
{
"cell_type": "code",
"execution_count": 9,
"execution_count": 3,
"id": "e14b9efad6ae3ad6",
"metadata": {
"ExecuteTime": {
Expand Down Expand Up @@ -127,6 +127,7 @@
" ),\n",
" **metrics_averaged,\n",
" \"config\": job_result[\"config\"],\n",
" \"ec2_metadata\": run_result[\"ec2_metadata\"],\n",
" }\n",
" rows.append(row)\n",
"\n",
Expand All @@ -143,7 +144,7 @@
},
{
"cell_type": "code",
"execution_count": 10,
"execution_count": 4,
"id": "be008fb6acf09055",
"metadata": {
"ExecuteTime": {
Expand All @@ -170,14 +171,18 @@
"source": [
"import pandas as pd\n",
"\n",
"_data = transform(_run_results)\n",
"_table = pd.json_normalize(_data).set_index(\"version\")\n",
"_table = pd.DataFrame()\n",
"\n",
"if _run_results:\n",
" _data = transform(_run_results)\n",
" _table = pd.json_normalize(_data).set_index(\"version\")\n",
"\n",
"_table"
]
},
{
"cell_type": "code",
"execution_count": 13,
"execution_count": null,
"id": "b4eed2752e6add17",
"metadata": {
"ExecuteTime": {
Expand All @@ -191,7 +196,11 @@
"import random\n",
"\n",
"_suffix = \"\".join(random.choices(string.ascii_letters, k=5))\n",
"_table.to_csv(f\"benchmark_results_{_suffix}.csv\")"
"_filename = f\"benchmark_results_{_suffix}.csv\"\n",
"\n",
"if not _table.empty:\n",
" _table.to_csv(_filename)\n",
" print(f\"CSV written to {_filename}\")"
]
}
],
Expand Down
Loading

0 comments on commit 58de12a

Please sign in to comment.