Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh/actions unit test workflows #384

Merged
merged 1 commit into from
Jan 9, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
137 changes: 137 additions & 0 deletions .github/workflows/unittesting-ci-nvidia.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,137 @@
# SPDX-License-Identifier: Apache-2.0

name: "Run 'fast' marked unit tests via Tox::pytest"
# This tests should run only those tests that are marked as 'fast.'
# The opposite are those that would require the mark 'slow,' which would
# include longer-running integration and smoke tests.
#
# Essentially, this workflow should be used frequently for cheap tests,
# and a 'slow' marked workflow will be used later in review, manually triggered,
# to verify integration correctness.

on:
pull_request:
types: [opened, reopened, synchronize]
push:
branches:
- "main"
- "release-**"

env:
pytest_mark: "fast"
ec2_runner_variant: "m8g.xlarge" # 4 Graviton CPU, 16GB RAM

jobs:
start-ec2-runner:
runs-on: ubuntu-latest
outputs:
label: ${{ steps.start-ec2-runner.outputs.label }}
ec2-instance-id: ${{ steps.start-ec2-runner.outputs.label }}

steps:
- name: "Harden runner"
uses: step-security/harden-runner@0080882f6c36860b6ba35c610c98ce87d4e2f26f # v2.10.1
with:
egress-policy: audit

- name: "Configure AWS credentials"
uses: "aws-actions/configure-aws-credentials@e3dd6a429d7300a6a4c196c26e071d42e0343502" # v4.0.2
with:
aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
aws-region: ${{ vars.AWS_REGION }}

- name: "Start EC2 runner"
id: start-ec2-runner
uses: machulav/ec2-github-runner@1827d6ca7544d7044ddbd2e9360564651b463da2 # v2.3.7
with:
mode: start
github-token: ${{ secrets.GH_PERSONAL_ACCESS_TOKEN }}
ec2-image-id: ${{ vars.AWS_EC2_AMI }}
ec2-instance-type: ${{ vars.AWS_REGION }}
subnet-id: subnet-024298cefa3bedd61
security-group-id: sg-06300447c4a5fbef3
iam-role-name: instructlab-ci-runner
aws-resource-tags: >
[
{"Key": "Name", "Value": "instructlab-ci-github-large-runner"},
{"Key": "GitHubRepository", "Value": "${{ github.repository }}"},
{"Key": "GitHubRef", "Value": "${{ github.ref }}"},
{"Key": "GitHubPR", "Value": "${{ github.event.number }}"}
]

run-unit-tests:
needs:
- start-ec2-runner
runs-on: ${{needs.start-ec2-runner.outputs.label}}
# This job MUST HAVE NO PERMISSIONS and no access to any secrets
# because it'll run incoming user code without discretion.
permissions: {} # this syntax disables permissions for all available options.
steps:
- name: "Harden runner"
uses: step-security/harden-runner@0080882f6c36860b6ba35c610c98ce87d4e2f26f # v2.10.1
with:
egress-policy: audit

- name: "Install packages"
run: |
cat /etc/os-release
sudo dnf install -y gcc gcc-c++ make git python3.11 python3.11-devel

- name: "Checkout code"
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
with:
fetch-depth: 0

- name: "Verify environment variables are setup correctly"
run: |
export CUDA_HOME="/usr/local/cuda"
export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64"
export PATH="$PATH:$CUDA_HOME/bin"
nvidia-smi

# installs in $GITHUB_WORKSPACE/venv.
# only has to install Tox because Tox will do the other virtual environment management.
- name: "Setup Python virtual environment"
run: |
python3.11 -m venv --upgrade-deps venv
. venv/bin/activate
pip install tox

- name: "Show disk utilization BEFORE tests"
run: |
df -h

- name: "Run unit tests with Tox and Pytest"
run: |
tox -e py3-unit -- -m ${{env.pytest_mark}}

- name: "Show disk utilization AFTER tests"
run: |
df -h

stop-ec2-runner:
needs:
- start-ec2-runner
- run-unit-tests
runs-on: ubuntu-latest
steps:
- name: "Harden runner"
uses: step-security/harden-runner@0080882f6c36860b6ba35c610c98ce87d4e2f26f # v2.10.1
with:
egress-policy: audit
- name: "Configure AWS credentials"
uses: "aws-actions/configure-aws-credentials@e3dd6a429d7300a6a4c196c26e071d42e0343502" # v4.0.2
with:
aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
aws-region: ${{ vars.AWS_REGION }}

- name: "Stop EC2 runner"
id: start-ec2-runner
uses: machulav/ec2-github-runner@1827d6ca7544d7044ddbd2e9360564651b463da2 # v2.3.7
with:
mode: stop
github-token: ${{ secrets.GH_PERSONAL_ACCESS_TOKEN }}
label: ${{ needs.start-ec2-runner.outputs.label }}
ec2-instance-type: ${{ env.ec2_runner_variant }}
4 changes: 2 additions & 2 deletions src/instructlab/training/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,8 +30,8 @@ class DeepSpeedOffloadStrategy(Enum):

# public API
class DistributedBackend(Enum):
FSDP: str = "fsdp"
DEEPSPEED: str = "deepspeed"
FSDP = "fsdp"
DEEPSPEED = "deepspeed"
nathan-weinberg marked this conversation as resolved.
Show resolved Hide resolved


# public API
Expand Down
7 changes: 7 additions & 0 deletions tests/test_init.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# Third Party
import pytest


@pytest.mark.fast
def test_fake():
assert True
JamesKunstle marked this conversation as resolved.
Show resolved Hide resolved
Loading