Add MLCube support for RNN speech recognition #491

davidjurado · 2021-06-24T22:18:09Z

Used PR #465 as reference.

Current implementation

We'll be updating this section as we merge MLCube PRs and make new MLCube releases.

Project setup

# Create Python environment and install MLCube Docker runner 
virtualenv -p python3 ./env && source ./env/bin/activate && pip install mlcube-docker

# Fetch the RNN speech recognition workload
git clone https://github.com/mlcommons/training && cd ./training
git fetch origin pull/491/head:feature/rnnt_mlcube && git checkout feature/rnnt_mlcube
cd ./rnn_speech_recognition/mlcube

Dataset

The Librispeech dataset will be downloaded, extracted, and processed. Sizes of the dataset in each step:

Dataset Step	MLCube Task	Format	Size
Download (Compressed dataset)	download_data	Tar files	~62 GB
Extract (Uncompressed dataset)	download_data	Flac files	~64 GB
Preprocess (Processed dataset)	preprocess_data	Wav files	~114 GB
Total	(After all tasks)	All	~240 GB

Tasks execution

# Download Librispeech dataset. Default path = /workspace/data
# To override it, use data_dir=DATA_DIR
mlcube run --task download_data

# Preprocess Librispeech dataset, this will convert .flac audios to .wav format
# It will use the DATA_DIR path defined in the previous step
mlcube run --task preprocess_data

# Run benchmark. Default paths = ./workspace/data
# Parameters to override: data_dir=DATA_DIR, output_dir=OUTPUT_DIR, parameters_file=PATH_TO_TRAINING_PARAMS
mlcube run --task train

We are targeting pull-type installation, so MLCube images should be available on docker hub. If not, try this:

mlcube run ... -Pdocker.build_strategy=always

github-actions · 2021-06-24T22:18:24Z

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

mwawrzos · 2023-05-19T07:37:15Z

Hello @davidjurado! I tried to follow the task execution steps, but the last step failed with the following error:

$ mlcube run --task train
Usage: mlcube.py train [OPTIONS]
Try 'mlcube.py train --help' for help.

Error: Missing option '--output_dir'.
2023-05-19 09:35:17 [...]

Your description sais:

# Run benchmark. Default paths = ./workspace/data
# Parameters to override: data_dir=DATA_DIR, output_dir=OUTPUT_DIR, parameters_file=PATH_TO_TRAINING_PARAMS
mlcube run --task train

How to override the output_dir?

nv-rborkar · 2024-03-08T03:39:40Z

@davidjurado can you answer @mwawrzos 's question. We can merge this accordingly.

Add MLCube support for RNN speech recognition

7d7eb58

davidjurado added 3 commits June 24, 2021 20:11

Fix typos and add README

df3daad

Add missing dependencie: typer

d75f49f

Fix documentation

bbd8023

davidjurado mentioned this pull request Jun 25, 2021

MLCube a Training Model mlcommons/mlcube#187

Closed

davidjurado added 2 commits June 28, 2021 07:14

Fix train task: Change sentpiece_model path

a754738

Fix train task target script and documentation

aaed6cf

davidjurado marked this pull request as draft June 28, 2021 15:28

davidjurado marked this pull request as ready for review June 28, 2021 15:29

Fix dataset description in README

544ccc9

davidjurado mentioned this pull request Jul 2, 2021

Add MLCube support for Image Segmentation Benchmark #494

Open

davidjurado added 4 commits July 22, 2021 18:00

Update to config 2.0

a8726bf

Update to MLCube config v2.0

4d92867

Change docker image name in mlcube comfig file

72d5246

Update readme

2b1e1e2

davidjurado force-pushed the feature/rnnt_mlcube branch from af4e361 to 2b1e1e2 Compare July 22, 2022 16:06

matthew-frank added rnn_speech_recognition RNN-T model on Librispeech dataset MLCube labels Dec 2, 2022

johntran-nv requested a review from mwawrzos March 16, 2023 18:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add MLCube support for RNN speech recognition #491

Add MLCube support for RNN speech recognition #491

davidjurado commented Jun 24, 2021 •

edited

Loading

github-actions bot commented Jun 24, 2021 •

edited

Loading

mwawrzos commented May 19, 2023

nv-rborkar commented Mar 8, 2024

Add MLCube support for RNN speech recognition #491

Are you sure you want to change the base?

Add MLCube support for RNN speech recognition #491

Conversation

davidjurado commented Jun 24, 2021 • edited Loading

Current implementation

Project setup

Dataset

Tasks execution

github-actions bot commented Jun 24, 2021 • edited Loading

mwawrzos commented May 19, 2023

nv-rborkar commented Mar 8, 2024

davidjurado commented Jun 24, 2021 •

edited

Loading

github-actions bot commented Jun 24, 2021 •

edited

Loading