Name		Name	Last commit message	Last commit date
parent directory ..
assets		assets
jobs		jobs
README.md		README.md
create-assets.sh		create-assets.sh
setup.sh		setup.sh

README.md

AzureML v2 CLI HuggingFace Fine-tuning Examples

Prerequisites

If you don't have an Azure subscription, create a free account.
Install and configure the v2 CLI machine learning extension.
(Optional) Read over the documentation for v2 CLI jobs.

Setup

Clone this repository and navigate to the examples directory:

git clone https://github.com/linydub/azureml-greenai-txtsum.git
cd azureml-greenai-txtsum/examples

Create a new Azure resource group and machine learning workspace, then set defaults for Azure CLI:

bash setup.sh

Create the workspace assets (environment, dataset, and compute target) used in the examples:

bash create-assets.sh

Examples

Run a fine-tuning job defined through the example's YAML specification:

az ml job create --file ./jobs/pytorch-job.yml --web --stream

Job	Status	Description
jobs/pytorch-job.yml		Finetune an encoder-decoder Transformer model (BART) for dialogue summarization (SAMSum) with HuggingFace (PyTorch).
jobs/deepspeed-job.yml		Finetune an encoder-decoder Transformer model (BART) for dialogue summarization (SAMSum) with HuggingFace's DeepSpeed integration.
jobs/sweep-job.yml		Hyperparameter tune an encoder-decoder Transformer model (BART) for dialogue summarization (SAMSum) with a grid search sweep job.

Script usage (PyTorch)

jobs/src/main.py could be adapted or replaced with another script (e.g. run_glue.py) to fine-tune models for other NLP tasks.

*Script for text summarization was adapted from: https://github.com/huggingface/transformers/blob/master/examples/pytorch/summarization/run_summarization.py

New optional arguments

Name	Type	Default	Description
`dataset_path`	string	None	The path to the dataset files. Use this for loading data assets registered in the AzureML workspace.
`train_early_stopping`	bool	False	Whether to add EarlyStoppingCallback for training. This callback depends on `load_best_model_at_end` functionality to set best_metric in TrainerState.
`early_stopping_patience`	int	1	Use with `metric_for_best_model` to stop training when the specified metric worsens for `early_stopping_patience` evaluation calls.
`early_stopping_threshold`	float	0.0	Use with `metric_for_best_model` and `early_stopping_patience` to denote how much the specified metric must improve to satisfy early stopping conditions.
`freeze_embeds`	bool	False	Whether to freeze the model's embedding modules.
`freeze_encoder`	bool	False	Whether to freeze the model's encoder.

Known issues

Directory	Description
`assets`	Example workspace assets
`jobs`	Example jobs for sample tasks
`jobs/src`	Example training script and configs

References

AzureML:

HuggingFace:

DeepSpeed:

JSON config documentation

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

examples

examples

README.md

AzureML v2 CLI HuggingFace Fine-tuning Examples

Prerequisites

Setup

Examples

Script usage (PyTorch)

New optional arguments

Known issues

Contents

References

Files

examples

Directory actions

More options

Directory actions

More options

Latest commit

History

examples

Folders and files

parent directory

README.md

AzureML v2 CLI HuggingFace Fine-tuning Examples

Prerequisites

Setup

Examples

Script usage (PyTorch)

New optional arguments

Known issues

Contents

References