-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor unit test workflow; add smoke test workflow #419
Conversation
Test groups are divided into three categories: 1) unit tests 2) smoke tests 3) benchmark tests They each have a dedicated tox entrypoint. Adds outer product of [FSDP, DeepSpeed] x [CPU offload, Not] test matrix. DEEPSPEED TESTS ARE BROKEN IN THIS COMMIT and are marked xFail- to be fixed in another, later commit. Signed-off-by: James Kunstle <[email protected]>
starting/stopping self-hosted ec2 runners is common code that we can move into separate, reusable workflows. moved those steps into their own files and refactored `unit-tests.yaml` to call them instead of inlining. Signed-off-by: James Kunstle <[email protected]>
908fcf2
to
f3b17a6
Compare
users can dispatch a workflow that runs smoke tests against a selected branch Signed-off-by: James Kunstle <[email protected]>
f3b17a6
to
f2958b9
Compare
@@ -0,0 +1,76 @@ | |||
# SPDX-License-Identifier: Apache-2.0 | |||
name: "[Reusable] Start EC2 self-hosted runner." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Do you need to put reusable in the name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was a convention I saw in a blog post, we don't have to adopt that if it seems messy
""" | ||
|
||
huggingface_hub.snapshot_download( | ||
token=os.getenv("HF_TOKEN", None), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this token needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possibly related: instructlab/instructlab#3100
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was an annoying behavioral thing. snapshot_download
seems like it should grab HF_TOKEN from the environment but it wasn't doing that, so I had to pull it in directly this way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
HF_TOKEN shouldn't be accessible to this test. Before instructlab/instructlab#3100, the CLI required something to be passed even if it wasn't needed. Hopefully this is a case that can just stop passing the token.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, you're totally right. Removed.
One concern I have about this change is it will make the ec2 handling logic different from the rest of the repos. Given the limited expertise in this logic, I think it would be worth it to keep the different instructlab repos consistent. So if we are going to change it here, we should make the same changes in instructlab, sdg, and eval. |
That's a fair point. I was hoping to break this logic out so we can maintain only these common reusable components since the startup / stop EC2 behavior is common to a lot of scripts. @courtneypacheco would there be any appetite to refactoring the existing workflows across the repos to reuse these reusable jobs? inlining the reusable logic seems like a hassle at this point. I'd gladly help to do this work. |
This pull request has merge conflicts that must be resolved before it can be |
@@ -0,0 +1,76 @@ | |||
# SPDX-License-Identifier: Apache-2.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Related: instructlab/dev-docs#179 (comment)
This PR does two things: