[TASK] Seperate AutoTP workflow #4894

delock · 2024-01-04T06:34:15Z

As discussed in this PR (#4721), we need to increase test coverage for AutoTP to cover more models. Such workflow can help avoid regressions such as #4774

This is a challenge in current UT scope because of the following points:

Popular models has very large model checkpoints (~6B to ~180B), we need an instance large enough to be able to download and run these large models.
To test effectiveness of AutoTP, certain metric i.e. accuracy or perplexity will be needed to verify the effectiveness of AutoTP
The workflow needs to be expandable to new model supported by DeepSpeed.

The workflow may also run the following variants:

load checkpoint with from_config as memory efficient form.
Quantization form of the model.
3 devices to test uneven sharding.

The expected result of this task is:

A workflow that can regularly test the AutoTP status of each model and post test result. (pass/fail, accuracy, etc.) This can complement the manually maintained list (https://github.com/microsoft/DeepSpeed/blob/master/docs/_tutorials/automatic-tensor-parallelism.md) which is often out of sync.
A test script that people can use to reproduce and report AutoTP related issues.
Better integration process of new model AutoTP support (what was broken, which PR fixed it, etc.)

delock · 2024-01-09T09:08:04Z

An example of AutoTP workflow, including bloom 3.1b and opt 1.3b:
https://github.com/delock/DeepSpeedSYCLSupport/blob/gma/test_cpu_branch/.github/workflows/autotp.yml
And one run as the following link:
https://github.com/delock/DeepSpeedSYCLSupport/actions/runs/7458213863/job/20291766511

delock · 2024-01-09T09:27:49Z

One learning is there needs to be a persistent storage for downloaded models, otherwise these models will be downloaded again and again and is a waste of time.

delock · 2024-01-10T09:44:53Z

Latest workflow run result. When model checkpoint fully cached, test two models (opt1.3b and bloom3b) took around 5 minutes.
https://github.com/delock/DeepSpeedSYCLSupport/actions/runs/7472909814

delock · 2024-01-11T01:55:28Z

next step is to explore accuracy metric. There are two choices: perplexity and accuracy over certain task.

delock · 2024-01-11T09:47:43Z

For correctness check DeepSpeedExamples have ds-hf-compare which should be a good start point. This script set inference to kernel injection however will need to modify it to fit this usage.

delock · 2024-01-15T02:20:20Z

A workflow with ds-hf-compare.py test. The script need some change to be able to run with two ranks however.
https://github.com/delock/DeepSpeedSYCLSupport/actions/runs/7519543233/job/20468193515.

This workflow is ready to be tested on DeepSpeed self-hosted runner. @mrwyattii I wonder how persistency works. If I set HF_HOME to /blob, will model checkpoint be downloaded to persistent storage and be reused for next run?

delock · 2024-01-16T10:09:12Z

@mrwyattii PR #4961 had been added. Initially there are two models in it but I plan to add more models to the list. (I can't run more models on runner on my desktop because of limited memory)

delock added the enhancement New feature or request label Jan 4, 2024

mrwyattii self-assigned this Jan 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[TASK] Seperate AutoTP workflow #4894

[TASK] Seperate AutoTP workflow #4894

delock commented Jan 4, 2024 •

edited

Loading

delock commented Jan 9, 2024

delock commented Jan 9, 2024

delock commented Jan 10, 2024

delock commented Jan 11, 2024

delock commented Jan 11, 2024

delock commented Jan 15, 2024

delock commented Jan 16, 2024

[TASK] Seperate AutoTP workflow #4894

[TASK] Seperate AutoTP workflow #4894

Comments

delock commented Jan 4, 2024 • edited Loading

delock commented Jan 9, 2024

delock commented Jan 9, 2024

delock commented Jan 10, 2024

delock commented Jan 11, 2024

delock commented Jan 11, 2024

delock commented Jan 15, 2024

delock commented Jan 16, 2024

delock commented Jan 4, 2024 •

edited

Loading