-
Notifications
You must be signed in to change notification settings - Fork 53
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #218 from kelbrown20/update-train-readme
Docs: Update training README
- Loading branch information
Showing
1 changed file
with
114 additions
and
92 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -5,43 +5,56 @@ | |
![Release](https://img.shields.io/github/v/release/instructlab/training) | ||
![License](https://img.shields.io/github/license/instructlab/training) | ||
|
||
In order to simplify the process of fine-tuning models through the LAB | ||
method, this library provides a simple training interface. | ||
- [Installing](#installing-the-library) | ||
- [Additional Nvidia packages](#additional-nvidia-packages) | ||
- [Using the library](#using-the-library) | ||
- [Learning about the training arguments](#learning-about-training-arguments) | ||
- [`TrainingArgs`](#trainingargs) | ||
- [`DeepSpeedOptions`](#deepspeedoptions) | ||
- [`FSDPOptions`](#fsdpoptions) | ||
- [`loraOptions`](#loraoptions) | ||
- [Learning about `TorchrunArgs` arguments](#learning-about-torchrunargs-arguments) | ||
- [Example training run with arguments](#example-training-run-with-arguments) | ||
|
||
## Installation | ||
To simplify the process of fine-tuning models with the [LAB | ||
method](https://arxiv.org/abs/2403.01081), this library provides a simple training interface. | ||
|
||
To get started with the library, you must clone this repo and install it from source via `pip`: | ||
## Installing the library | ||
|
||
```bash | ||
# clone the repo and switch to the directory | ||
git clone https://github.com/instructlab/training | ||
cd training | ||
To get started with the library, you must clone this repository and install it via `pip`. | ||
|
||
Install the library: | ||
|
||
# install the library | ||
pip install . | ||
```bash | ||
pip install instructlab-training | ||
``` | ||
|
||
For development, install it instead with `pip install -e .` instead | ||
to make local changes while using this library elsewhere. | ||
You can then install the library for development: | ||
|
||
### Installing Additional NVIDIA packages | ||
```bash | ||
pip install -e ./training | ||
``` | ||
|
||
We make use of `flash-attn` and other packages which rely on NVIDIA-specific | ||
CUDA tooling to be installed. | ||
### Additional NVIDIA packages | ||
|
||
If you are using NVIDIA hardware with CUDA, please install the additional dependencies via: | ||
This library uses the `flash-attn` package as well as other packages, which rely on NVIDIA-specific CUDA tooling to be installed. | ||
If you are using NVIDIA hardware with CUDA, you need to install the following additional dependencies. | ||
|
||
Basic install | ||
|
||
```bash | ||
# for a regular install | ||
pip install .[cuda] | ||
``` | ||
|
||
# or, for an editable install (development) | ||
Editable install (development) | ||
|
||
```bash | ||
pip install -e .[cuda] | ||
``` | ||
|
||
## Usage | ||
## Using the library | ||
|
||
Using the library is fairly straightforward, import the necessary items, | ||
You can utilize this training library by importing the necessary items. | ||
|
||
```py | ||
from instructlab.training import ( | ||
|
@@ -52,65 +65,18 @@ from instructlab.training import ( | |
) | ||
``` | ||
|
||
Then, define the training arguments which will serve as the | ||
parameters for our training run: | ||
You can then define various training arguments. They will serve as the parameters for your training runs. See: | ||
|
||
```py | ||
# define training-specific arguments | ||
training_args = TrainingArgs( | ||
# define data-specific arguments | ||
model_path = "ibm-granite/granite-7b-base", | ||
data_path = "path/to/dataset.jsonl", | ||
ckpt_output_dir = "data/saved_checkpoints", | ||
data_output_dir = "data/outputs", | ||
|
||
# define model-trianing parameters | ||
max_seq_len = 4096, | ||
max_batch_len = 60000, | ||
num_epochs = 10, | ||
effective_batch_size = 3840, | ||
save_samples = 250000, | ||
learning_rate = 2e-6, | ||
warmup_steps = 800, | ||
is_padding_free = True, # set this to true when using Granite-based models | ||
random_seed = 42, | ||
) | ||
``` | ||
- [Learning about the training argument](#learning-about-training-arguments) | ||
- [Example training run with arguments](#example-training-run-with-arguments) | ||
|
||
We'll also need to define the settings for running a multi-process job | ||
via `torchrun`. To do this, create a `TorchrunArgs` object. | ||
|
||
> [!TIP] | ||
> Note, for single-GPU jobs, you can simply set `nnodes = 1` and `nproc_per_node=1`. | ||
```py | ||
torchrun_args = TorchrunArgs( | ||
nnodes = 1, # number of machines | ||
nproc_per_node = 8, # num GPUs per machine | ||
node_rank = 0, # node rank for this machine | ||
rdzv_id = 123, | ||
rdzv_endpoint = '127.0.0.1:12345' | ||
) | ||
``` | ||
|
||
Finally, you can just call `run_training` and this library will handle | ||
the rest 🙂. | ||
|
||
```py | ||
run_training( | ||
torchrun_args=torchrun_args, | ||
training_args=training_args, | ||
) | ||
|
||
``` | ||
|
||
### Customizing `TrainingArgs` | ||
## Learning about training arguments | ||
|
||
The `TrainingArgs` class provides most of the customization options | ||
for the training job itself. There are a number of options you can specify, such as setting | ||
DeepSpeed config values or running a LoRA training job instead of a full fine-tune. | ||
for training jobs. There are a number of options you can specify, such as setting | ||
`DeepSpeed` config values or running a `LoRA` training job instead of a full fine-tune. | ||
|
||
Here is a breakdown of the general options: | ||
### `TrainingArgs` | ||
|
||
| Field | Description | | ||
| --- | --- | | ||
|
@@ -137,9 +103,9 @@ Here is a breakdown of the general options: | |
| distributed_backend | Specifies which distributed training backend to use. Supported options are "fsdp" and "deepspeed". | | ||
| disable_flash_attn | Disables flash attention when set to true. This allows for training on older devices. | | ||
|
||
#### `DeepSpeedOptions` | ||
### `DeepSpeedOptions` | ||
|
||
We only currently support a few options in `DeepSpeedOptions`: | ||
This library only currently support a few options in `DeepSpeedOptions`: | ||
The default is to run with DeepSpeed, so these options only currently | ||
allow you to customize aspects of the ZeRO stage 2 optimizer. | ||
|
||
|
@@ -150,6 +116,8 @@ allow you to customize aspects of the ZeRO stage 2 optimizer. | |
| cpu_offload_optimizer_pin_memory | If true, offload to page-locked CPU memory. This could boost throughput at the cost of extra memory overhead. | | ||
| save_samples | The number of samples to see before saving a DeepSpeed checkpoint. | | ||
|
||
For more information about DeepSpeed, see [deepspeed.ai](https://www.deepspeed.ai/) | ||
|
||
#### `FSDPOptions` | ||
|
||
Like DeepSpeed, we only expose a number of parameters for you to modify with FSDP. | ||
|
@@ -162,8 +130,19 @@ They are listed below: | |
|
||
> [!NOTE] | ||
> For `sharding_strategy` - Only `SHARD_GRAD_OP` has been extensively tested and is actively supported by this library. | ||
### `loraOptions` | ||
Check failure on line 133 in README.md GitHub Actions / markdown-lintHeadings should be surrounded by blank lines
|
||
|
||
#### `LoraOptions` | ||
LoRA options currently supported: | ||
|
||
| Field | Description | | ||
| --- | --- | | ||
| rank | The rank parameter for LoRA training. | | ||
| alpha | The alpha parameter for LoRA training. | | ||
| dropout | The dropout rate for LoRA training. | | ||
| target_modules | The list of target modules for LoRA training. | | ||
| quantize_data_type | The data type for quantization in LoRA training. Valid options are `None` and `"nf4"` | | ||
|
||
#### Example run with LoRa options | ||
|
||
If you'd like to do a LoRA train, you can specify a LoRA | ||
option to `TrainingArgs` via the `LoraOptions` object. | ||
|
@@ -181,23 +160,12 @@ training_args = TrainingArgs( | |
) | ||
``` | ||
|
||
Here is the definition for what we currently support today: | ||
|
||
| Field | Description | | ||
| --- | --- | | ||
| rank | The rank parameter for LoRA training. | | ||
| alpha | The alpha parameter for LoRA training. | | ||
| dropout | The dropout rate for LoRA training. | | ||
| target_modules | The list of target modules for LoRA training. | | ||
| quantize_data_type | The data type for quantization in LoRA training. Valid options are `None` and `"nf4"` | | ||
|
||
### Customizing `TorchrunArgs` | ||
### Learning about `TorchrunArgs` arguments | ||
|
||
When running the training script, we always invoke `torchrun`. | ||
|
||
If you are running a single-GPU system or something that doesn't | ||
otherwise require distributed training configuration, you can | ||
just create a default object: | ||
otherwise require distributed training configuration, you can create a default object: | ||
|
||
```python | ||
run_training( | ||
|
@@ -209,12 +177,14 @@ run_training( | |
``` | ||
|
||
However, if you want to specify a more complex configuration, | ||
we currently expose all of the options that [torchrun accepts | ||
the library currently supports all the options that [torchrun accepts | ||
today](https://pytorch.org/docs/stable/elastic/run.html#definitions). | ||
|
||
> ![NOTE] | ||
> [!NOTE] | ||
> For more information about the `torchrun` arguments, please consult the [torchrun documentation](https://pytorch.org/docs/stable/elastic/run.html#definitions). | ||
#### Example training run with `TorchrunArgs` arguments | ||
|
||
For example, in a 8-GPU, 2-machine system, we would | ||
specify the following torchrun config: | ||
|
||
|
@@ -257,3 +227,55 @@ run_training( | |
train_args=training_args | ||
) | ||
``` | ||
|
||
## Example training run with arguments | ||
|
||
Define the training arguments which will serve as the | ||
parameters for our training run: | ||
|
||
```py | ||
# define training-specific arguments | ||
training_args = TrainingArgs( | ||
# define data-specific arguments | ||
model_path = "ibm-granite/granite-7b-base", | ||
data_path = "path/to/dataset.jsonl", | ||
ckpt_output_dir = "data/saved_checkpoints", | ||
data_output_dir = "data/outputs", | ||
|
||
# define model-trianing parameters | ||
max_seq_len = 4096, | ||
max_batch_len = 60000, | ||
num_epochs = 10, | ||
effective_batch_size = 3840, | ||
save_samples = 250000, | ||
learning_rate = 2e-6, | ||
warmup_steps = 800, | ||
is_padding_free = True, # set this to true when using Granite-based models | ||
random_seed = 42, | ||
) | ||
``` | ||
|
||
We'll also need to define the settings for running a multi-process job | ||
via `torchrun`. To do this, create a `TorchrunArgs` object. | ||
|
||
> [!TIP] | ||
> Note, for single-GPU jobs, you can simply set `nnodes = 1` and `nproc_per_node=1`. | ||
```py | ||
torchrun_args = TorchrunArgs( | ||
nnodes = 1, # number of machines | ||
nproc_per_node = 8, # num GPUs per machine | ||
node_rank = 0, # node rank for this machine | ||
rdzv_id = 123, | ||
rdzv_endpoint = '127.0.0.1:12345' | ||
) | ||
``` | ||
|
||
Finally, you can just call `run_training` and this library will handle | ||
the rest 🙂. | ||
|
||
```py | ||
run_training( | ||
torchrun_args=torchrun_args, | ||
training_args=training_args, | ||
) |