Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adds upstream testing development document #404

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

JamesKunstle
Copy link
Contributor

Adds a dev doc describing our testing strategy for this repo.

@JamesKunstle JamesKunstle added documentation Improvements or additions to documentation CI/CD Affects CI/CD configuration labels Jan 21, 2025
@JamesKunstle JamesKunstle self-assigned this Jan 21, 2025
@JamesKunstle JamesKunstle marked this pull request as ready for review January 21, 2025 00:41
@mergify mergify bot added the ci-failure label Jan 21, 2025
@nathan-weinberg
Copy link
Member

@JamesKunstle can you fix the markdown linting? can run make md-lint for debugging locally

@JamesKunstle JamesKunstle force-pushed the upstream-testing-proposal branch from fc72782 to f8c0843 Compare January 21, 2025 21:40

```python

for variant in ["nvidia", "amd", "intel", "cpu", "mps"]: # parallel at runner level
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is going to be a real challenge with runner availability to cover these on every PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I figure that we'll scale out a little at a time, starting w/ CPU and nvidia, and adding runners to the matrix as possible.

@JamesKunstle JamesKunstle force-pushed the upstream-testing-proposal branch from f8c0843 to c405770 Compare January 24, 2025 21:39
@JamesKunstle JamesKunstle requested a review from danmcp January 24, 2025 21:39
@JamesKunstle
Copy link
Contributor Author

The current CI failure seems to be because the workflow can't access vars.AWS_REGION even though the workflow itself hasn't changed in this PR and the branch is rebased onto main, for which the unit test works. I'm not sure what that's happening.
@danmcp @courtneypacheco do ya'll have any guidance? I was assuming that the protected variables would be exposed to the workflows once they were merged.

@danmcp
Copy link
Member

danmcp commented Jan 24, 2025

The current CI failure seems to be because the workflow can't access vars.AWS_REGION even though the workflow itself hasn't changed in this PR and the branch is rebased onto main, for which the unit test works. I'm not sure what that's happening. @danmcp @courtneypacheco do ya'll have any guidance? I was assuming that the protected variables would be exposed to the workflows once they were merged.

I am looking at the difference between:

https://github.com/instructlab/training/blob/main/.github/workflows/unit-tests.yaml#L12

and

https://github.com/instructlab/training/blob/main/.github/workflows/e2e-nvidia-l4-x1.yml#L5

Were there reasons for the unit tests to be different?

See: https://docs.github.com/en/actions/writing-workflows/choosing-when-your-workflow-runs/events-that-trigger-workflows#pull_request_target

@JamesKunstle
Copy link
Contributor Author

No there wasn't a reason for the two to be different apart from just writing the unit-test workflow from scratch. The two should have the same behavior, I'll amend that.

I'm confused about the vars context availability though because this CI workflow has merged to the main branch and doesn't seem to be able to access that context.

@danmcp
Copy link
Member

danmcp commented Jan 24, 2025

No there wasn't a reason for the two to be different apart from just writing the unit-test workflow from scratch. The two should have the same behavior, I'll amend that.

I'm confused about the vars context availability though because this CI workflow has merged to the main branch and doesn't seem to be able to access that context.

pull_request runs from the context of your merge commit and pull_request_target runs from the context of instructlab/training. So when running from the context of your branch, it can't see the var from the instructlab/training repo.

@JamesKunstle
Copy link
Contributor Author

That's super interesting, I totally missed that distinction in the docs. I'll look more closely at that.

I opened a PR #411 that mirrors the on and permissions usage between the e2e workflow invocation to the unit-tests invocation.

@JamesKunstle
Copy link
Contributor Author

From the docs:
"For workflows that are triggered by the pull_request_target event, the GITHUB_TOKEN is granted read/write repository permission unless the permissions key is specified and the workflow can access secrets, even when it is triggered from a fork. Although the workflow runs in the context of the base of the pull request, you should make sure that you do not check out, build, or run untrusted code from the pull request with this event."

My new understanding is that we have to use pull_request_target because we are setting up a self-hosted ec2 runner and need secrets / vars context to do this. We then limit the permissions of the job that runs the untrusted code.

If we weren't using these secure resources we could use the pull_request invocation option instead.

@mergify mergify bot added ci-failure and removed ci-failure labels Jan 24, 2025
@JamesKunstle JamesKunstle force-pushed the upstream-testing-proposal branch from c405770 to 68f34ee Compare January 24, 2025 22:50
@mergify mergify bot removed the ci-failure label Jan 24, 2025
@danmcp
Copy link
Member

danmcp commented Jan 24, 2025

My new understanding is that we have to use pull_request_target because we are setting up a self-hosted ec2 runner and need secrets / vars context to do this. We then limit the permissions of the job that runs the untrusted code.

If we weren't using these secure resources we could use the pull_request invocation option instead.

That matches my understanding as well. I had originally made sure you had permissions set to {}, but I missed the pull_request vs pull_request_target at the top.

@JamesKunstle
Copy link
Contributor Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI/CD Affects CI/CD configuration documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants