-
Notifications
You must be signed in to change notification settings - Fork 10
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
ci: add automated and on demand testing of fluence
Problem: we cannot tell if/when fluence builds will break against upstream Solution: have a weekly run that will build and test images, and deploy on successful results. For testing, I have added a complete example that uses Job for fluence/default-scheduler, and the reason is because we can run a container that generates output, have it complete, and there is no crash loop backoff or similar. I have added a complete testing setup using kind, and it is in one GitHub job so we can build both containers and load into kind, and then run the tests. Note that MiniKube does NOT appear to work for custom schedulers - I suspect there are extensions/plugins that need to be added. Finally, I was able to figure out how to programmatically check both the pod metadata for the scheduler along with events, and that combined with the output should be sufficient (for now) to test that fluence is working. Signed-off-by: vsoch <[email protected]>
- Loading branch information
Showing
7 changed files
with
311 additions
and
18 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,96 @@ | ||
#!/bin/bash | ||
|
||
# This will test fluence with two jobs. | ||
# We choose jobs as they generate output and complete, and pods | ||
# are expected to keep running (and then would error) | ||
|
||
set -eEu -o pipefail | ||
|
||
# ensure upstream exists | ||
# This test script assumes fluence image and sidecar are already built | ||
make prepare | ||
|
||
# Keep track of root directory to return to | ||
here=$(pwd) | ||
|
||
# Never will use our loaded (just built) images | ||
cd upstream/manifests/install/charts | ||
helm install \ | ||
--set scheduler.image=ghcr.io/flux-framework/fluence:latest \ | ||
--set scheduler.sidecarimage=ghcr.io/flux-framework/fluence-sidecar:latest \ | ||
--set scheduler.pullPolicy=Never \ | ||
--set scheduler.sidecarPullPolicy=Never \ | ||
schedscheduler-plugins as-a-second-scheduler/ | ||
|
||
# These containers should already be loaded into minikube | ||
echo "Sleeping 10 seconds waiting for scheduler deploy" | ||
sleep 10 | ||
kubectl get pods | ||
|
||
# This will get the fluence image (which has scheduler and sidecar), which should be first | ||
fluence_pod=$(kubectl get pods -o json | jq -r .items[0].metadata.name) | ||
echo "Found fluence pod ${fluence_pod}" | ||
|
||
# Show logs for debugging, if needed | ||
echo | ||
echo "⭐️ kubectl logs ${fluence_pod} -c sidecar" | ||
kubectl logs ${fluence_pod} -c sidecar | ||
echo | ||
echo "⭐️ kubectl logs ${fluence_pod} -c scheduler-plugins-scheduler" | ||
kubectl logs ${fluence_pod} -c scheduler-plugins-scheduler | ||
|
||
# We now want to apply the examples | ||
cd ${here}/examples/test_example | ||
|
||
# Apply both example jobs | ||
kubectl apply -f fluence-job.yaml | ||
kubectl apply -f default-job.yaml | ||
|
||
# Get them based on associated job | ||
fluence_job_pod=$(kubectl get pods --selector=job-name=fluence-job -o json | jq -r .items[0].metadata.name) | ||
default_job_pod=$(kubectl get pods --selector=job-name=default-job -o json | jq -r .items[0].metadata.name) | ||
|
||
echo | ||
echo "Fluence job pod is ${fluence_job_pod}" | ||
echo "Default job pod is ${default_job_pod}" | ||
sleep 3 | ||
|
||
# Shared function to check output | ||
function check_output { | ||
expected="$1" | ||
actual="$2" | ||
if [[ "${expected}" != "${actual}" ]]; then | ||
echo "Expected output is ${expected}" | ||
echo "Actual output is ${actual}" | ||
exit 1 | ||
fi | ||
} | ||
|
||
# Get output (and show) | ||
default_output=$(kubectl logs ${default_job_pod}) | ||
default_scheduled_by=$(kubectl get pod ${default_job_pod} -o json | jq -r .spec.schedulerName) | ||
echo | ||
echo "Default scheduler pod output: ${default_output}" | ||
echo " Scheduled by: ${default_scheduled_by}" | ||
|
||
fluence_output=$(kubectl logs ${fluence_job_pod}) | ||
fluence_scheduled_by=$(kubectl get pod ${fluence_job_pod} -o json | jq -r .spec.schedulerName) | ||
echo | ||
echo "Fluence scheduler pod output: ${fluence_output}" | ||
echo " Scheduled by: ${fluence_scheduled_by}" | ||
|
||
# Check output explicitly | ||
check_output "${fluence_output}" "potato" | ||
check_output "${default_output}" "not potato" | ||
check_output "${default_scheduled_by}" "default-scheduler" | ||
check_output "${fluence_scheduled_by}" "fluence" | ||
|
||
# But events tell us actually what happened, let's parse throught them and find our pods | ||
# This tells us the Event -> reason "Scheduled" and who it was reported by. | ||
# The first should be fluence | ||
reported_by=$(kubectl events --for pod/${fluence_job_pod} -o json | jq -c '[ .items[] | select( .reason | contains("Scheduled")) ]' | jq -r .[0].reportingComponent) | ||
check_output "${reported_by}" "fluence" | ||
|
||
# And the second should be the default scheduler | ||
reported_by=$(kubectl events --for pod/${default_job_pod} -o json | jq -c '[ .items[] | select( .reason | contains("Scheduled")) ]' | jq -r .[0].reportingComponent) | ||
check_output "${reported_by}" "fluence" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,137 @@ | ||
name: fluence build test | ||
|
||
on: | ||
pull_request: [] | ||
# Test on demand (dispath) or once a week, sunday | ||
# We combine the builds into one job to simplify not needing to share | ||
# containers between jobs. We also don't want to push unless the tests pass. | ||
workflow_dispatch: | ||
schedule: | ||
- cron: '0 0 * * 0' | ||
|
||
jobs: | ||
build-fluence: | ||
env: | ||
container: ghcr.io/flux-framework/fluence | ||
runs-on: ubuntu-latest | ||
name: build fluence | ||
steps: | ||
- uses: actions/checkout@v4 | ||
- uses: actions/setup-go@v3 | ||
with: | ||
go-version: ^1.19 | ||
|
||
- name: Build Containers | ||
run: | | ||
make prepare | ||
make build REGISTRY=ghcr.io/flux-framework SCHEDULER_IMAGE=fluence | ||
- name: Save Container | ||
run: docker save ${{ env.container }} | gzip > fluence_latest.tar.gz | ||
|
||
- name: Upload container artifact | ||
uses: actions/upload-artifact@v2 | ||
with: | ||
name: fluence | ||
path: fluence_latest.tar.gz | ||
|
||
build-sidecar: | ||
env: | ||
container: ghcr.io/flux-framework/fluence-sidecar | ||
runs-on: ubuntu-latest | ||
name: build sidecar | ||
steps: | ||
- uses: actions/checkout@v4 | ||
- uses: actions/setup-go@v3 | ||
with: | ||
go-version: ^1.19 | ||
|
||
- name: Build Container | ||
run: | | ||
make prepare | ||
make build-sidecar REGISTRY=ghcr.io/flux-framework SIDECAR_IMAGE=fluence-sidecar | ||
- name: Save Container | ||
run: docker save ${{ env.container }} | gzip > fluence_sidecar_latest.tar.gz | ||
|
||
- name: Upload container artifact | ||
uses: actions/upload-artifact@v2 | ||
with: | ||
name: fluence_sidecar | ||
path: fluence_sidecar_latest.tar.gz | ||
|
||
test-fluence: | ||
needs: [build-fluence, build-sidecar] | ||
permissions: | ||
packages: write | ||
env: | ||
fluence_container: ghcr.io/flux-framework/fluence | ||
sidecar_container: ghcr.io/flux-framework/fluence-sidecar | ||
|
||
runs-on: ubuntu-latest | ||
name: build fluence | ||
steps: | ||
- uses: actions/checkout@v4 | ||
- uses: actions/setup-go@v3 | ||
with: | ||
go-version: ^1.19 | ||
|
||
- name: Download fluence artifact | ||
uses: actions/download-artifact@v2 | ||
with: | ||
name: fluence | ||
path: /tmp | ||
|
||
- name: Download fluence_sidecar artifact | ||
uses: actions/download-artifact@v2 | ||
with: | ||
name: fluence_sidecar | ||
path: /tmp | ||
|
||
- name: Load Docker images | ||
run: | | ||
ls /tmp/*.tar.gz | ||
docker load --input /tmp/fluence_sidecar_latest.tar.gz | ||
docker load --input /tmp/fluence_latest.tar.gz | ||
docker image ls -a | grep fluence | ||
- name: Create Kind Cluster | ||
uses: helm/[email protected] | ||
with: | ||
cluster_name: kind | ||
|
||
- name: Load Docker Containers into Kind | ||
env: | ||
fluence: ${{ env.fluence_container }} | ||
sidecar: ${{ env.sidecar_container }} | ||
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} | ||
run: | | ||
kind load docker-image ${fluence} | ||
kind load docker-image ${sidecar} | ||
- name: Test Fluence | ||
run: /bin/bash ./.github/test.sh | ||
|
||
- name: Tag Weekly Images | ||
run: | | ||
# YEAR-MONTH-DAY or #YYYY-MM-DD | ||
tag=$(echo $(date +%Y-%m-%d)) | ||
echo "Tagging and releasing ${{ env.fluence_container}}:${tag}" | ||
docker tag ${{ env.fluence_container }}:latest ${{ env.fluence_container }}:${tag} | ||
echo "Tagging and releasing ${{ env.sidecar_container}}:${tag}" | ||
docker tag ${{ env.sidecar_container }}:latest ${{ env.sidecar_container }}:${tag} | ||
# If we get here, tests pass, and we can deploy | ||
- name: GHCR Login | ||
if: (github.event_name != 'pull_request') | ||
uses: docker/login-action@v2 | ||
with: | ||
registry: ghcr.io | ||
username: ${{ github.actor }} | ||
password: ${{ secrets.GITHUB_TOKEN }} | ||
|
||
- name: Deploy Containers | ||
if: (github.event_name != 'pull_request') | ||
run: | | ||
docker push ${{ env.fluence_container }} --all-tags | ||
docker push ${{ env.sidecar_container }} --all-tags |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
apiVersion: batch/v1 | ||
kind: Job | ||
metadata: | ||
name: default-job | ||
spec: | ||
template: | ||
spec: | ||
schedulerName: default-scheduler | ||
containers: | ||
- name: default-job | ||
image: busybox | ||
command: [echo, not, potato] | ||
restartPolicy: Never | ||
backoffLimit: 4 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
apiVersion: batch/v1 | ||
kind: Job | ||
metadata: | ||
name: fluence-job | ||
spec: | ||
template: | ||
spec: | ||
schedulerName: fluence | ||
containers: | ||
- name: fluence-job | ||
image: busybox | ||
command: [echo, potato] | ||
restartPolicy: Never | ||
backoffLimit: 4 |
Oops, something went wrong.