-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[TASK] Seperate AutoTP workflow #4894
Comments
An example of AutoTP workflow, including bloom 3.1b and opt 1.3b: |
One learning is there needs to be a persistent storage for downloaded models, otherwise these models will be downloaded again and again and is a waste of time. |
Latest workflow run result. When model checkpoint fully cached, test two models (opt1.3b and bloom3b) took around 5 minutes. |
next step is to explore accuracy metric. There are two choices: perplexity and accuracy over certain task. |
For correctness check DeepSpeedExamples have ds-hf-compare which should be a good start point. This script set inference to kernel injection however will need to modify it to fit this usage. |
A workflow with ds-hf-compare.py test. The script need some change to be able to run with two ranks however. This workflow is ready to be tested on DeepSpeed self-hosted runner. @mrwyattii I wonder how persistency works. If I set HF_HOME to /blob, will model checkpoint be downloaded to persistent storage and be reused for next run? |
@mrwyattii PR #4961 had been added. Initially there are two models in it but I plan to add more models to the list. (I can't run more models on runner on my desktop because of limited memory) |
As discussed in this PR (#4721), we need to increase test coverage for AutoTP to cover more models. Such workflow can help avoid regressions such as #4774
This is a challenge in current UT scope because of the following points:
The workflow may also run the following variants:
The expected result of this task is:
The text was updated successfully, but these errors were encountered: