Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Experimental device requests #204

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions legacy/build-deploy-docker-compose.sh
Original file line number Diff line number Diff line change
Expand Up @@ -1528,6 +1528,9 @@ do
fi
fi

# handle gpu configuration
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could wrap this experimental feature under a flag like this https://github.com/uselagoon/build-deploy-tool/blob/main/legacy/build-deploy-docker-compose.sh#L845

ADMIN_LAGOON_FEATURE_FLAG_X are only settable on the remote cluster, so we have some constraint to where this can be enabled.

if [ "$ADMIN_LAGOON_FEATURE_FLAG_EXPERIMENTAL_GPU_SUPPORT" = enabled ]; then
	. /kubectl-build-deploy/scripts/exec-gpu-generation.sh
fi

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need a flag at all? Any cluster that doesn't have the k8s plugins installed and GPU nodes available will just fail to schedule pods and cause a deployment failure.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I guess that is ok. We can re-evaluate the feature wrapping later. In IO cloud if we ever want to restrict users some how, we will need some sort of flagging system though, problem for another day though.

. /kubectl-build-deploy/scripts/exec-gpu-generation.sh

# handle spot configurations
. /kubectl-build-deploy/scripts/exec-spot-generation.sh

Expand Down
25 changes: 25 additions & 0 deletions legacy/scripts/exec-gpu-generation.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
#!/bin/bash

# Handle GPU device requests
GPU_REQUEST=$(cat $DOCKER_COMPOSE_YAML | shyaml get-value services.$COMPOSE_SERVICE.labels.lagoon\\.gpu false)
if [ ! $GPU_REQUEST == "false" ]; then
GPU_REQUEST_SIZE=$(cat $DOCKER_COMPOSE_YAML | shyaml get-value services.$COMPOSE_SERVICE.labels.lagoon\\.gpu\\.size false)
if [ ! $GPU_REQUEST_SIZE == "false" ]; then
GPU_SIZE=$GPU_REQUEST_SIZE
else
GPU_SIZE=1
fi

echo -e "\
tolerations:
- key: lagoon.sh/gpu
operator: Equal
value: 'true'
effect: NoSchedule
nodeSelector:
lagoon.sh/gpu: 'true'
resources:
limits:
nvidia.com/gpu: ${GPU_SIZE}
" >> /kubectl-build-deploy/${SERVICE_NAME}-values.yaml
fi
Loading