Skip to content

Commit

Permalink
Update MoE scripts
Browse files Browse the repository at this point in the history
  • Loading branch information
RissyRan committed Jan 17, 2025
1 parent 9acbaed commit 66af741
Show file tree
Hide file tree
Showing 5 changed files with 8 additions and 17 deletions.
17 changes: 3 additions & 14 deletions training/trillium/Mixtral-8x7B-MaxText/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,9 +11,9 @@ Please follow this [link](https://github.com/AI-Hypercomputer/tpu-recipes/blob/m
### Test Env
jaxlib=0.4.35

libtpu-nightly=20241028
libtpu-nightly=20241119

[maxtext](https://github.com/AI-Hypercomputer/maxtext.git)@e7292a3a572792a0d797fc8977b21d0f255729f1
[maxtext](https://github.com/AI-Hypercomputer/maxtext.git)@261a8be0fc5e909ef9da0521df62549e650ebb79

### Starting workload

Expand All @@ -22,21 +22,10 @@ From the MaxText root directory, start your Mixtral workload.
Bf16 run:
```
python3 benchmarks/benchmark_runner.py --project=${PROJECT} --zone={zone} --device_type=v6e-256 --num_slices=1 --cluster_name=${CLUSTER_NAME} --base_output_directory=${OUTPUT_DIR} \
--model_name="mixtral_8x7b_dropped" --libtpu_version=20241028 --base_docker_image=maxtext_base_image
--model_name="mixtral_8x7b_dropped" --libtpu_version=20241119 --base_docker_image=maxtext_base_image
```

From your workload logs, you should start seeing step time logs like the following:
```
completed step: 19, seconds: 8.409, TFLOP/s/device: 323.173, Tokens/s/device: 3896.752, total_weights: 8388608, loss: 0.031
```

Int8 run:
```
python3 benchmarks/benchmark_runner.py --project=${PROJECT} --zone={zone} --device_type=v6e-256 --num_slices=1 --cluster_name=${CLUSTER_NAME} --base_output_directory=${OUTPUT_DIR} \
--model_name="mixtral_8x7b_dropped_int8" --libtpu_version=20241028 --base_docker_image=maxtext_base_image
```

From your workload logs, you should start seeing step time logs like the following:
```
completed step: 18, seconds: 8.218, TFLOP/s/device: 330.683, Tokens/s/device: 3987.307, total_weights: 8388608, loss: 0.030
```
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
python3 benchmarks/benchmark_runner.py --project=$PROJECT --zone=$ZONE --device_type=v6e-256 --num_slices=1 --cluster_name=${CLUSTER_NAME} --base_output_directory=${OUTPUT_DIR} \
--model_name="mixtral_8x7b_dropped" --libtpu_version=20241028 --base_docker_image maxtext_base_image
--model_name="mixtral_8x7b_dropped" --libtpu_version=20241119 --base_docker_image maxtext_base_image
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
python3 benchmarks/benchmark_runner.py --project=$PROJECT --zone=$ZONE --device_type=v6e-256 --num_slices=2 --cluster_name=${CLUSTER_NAME} --base_output_directory=${OUTPUT_DIR} \
--model_name="mixtral_8x7b_dropped" --libtpu_version=20241119 --base_docker_image maxtext_base_image
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
python3 benchmarks/benchmark_runner.py --project=$PROJECT --zone=$ZONE --device_type=v6e-256 --num_slices=4 --cluster_name=${CLUSTER_NAME} --base_output_directory=${OUTPUT_DIR} \
--model_name="mixtral_8x7b_dropped" --libtpu_version=20241119 --base_docker_image maxtext_base_image
2 changes: 0 additions & 2 deletions training/trillium/Mixtral-8x7B-MaxText/mixtral-8x7b-int8.sh

This file was deleted.

0 comments on commit 66af741

Please sign in to comment.