-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Debug GitHub Action to create new GCP Composer Environment #50
base: master
Are you sure you want to change the base?
Conversation
This PR will make issue https://mapaction.atlassian.net/browse/DATAPIPE-81 irrelevant. It will also have implications for this issues: |
Hi @andrewphilipsmith, it might be worth discussing this PR a bit when you're back. Essentially I'm wondering if doing this through GitHub Actions is the right approach, considering the Airflow environments will presumably be long lived and so a different context to routine tasks that I think GH Actions is more suited to. I completely support making the Composer configuration defined as code though, so changes can be tracked in Git and resources are reproducible. However I'd recommend we use something like Terraform (which I have a fair amount of experience with) to so. There's documentation here on how to configure a Google Composer instance for example, which I'm happy to implement. I also have some leave coming up so will have time to get this setup. I would definitely want to setup this up in a separate Google Cloud Project initially so as not to interfere with anything (once as code it would be easy to target a different project afterwards). |
This PR builds on #49 and debugs the GitHub Action which creates a new Composer Environment (aka managed Airflow instance) in Google Cloud Platform.
I am adding some of the relevant notes are here until I find a better location for them in the relevant files in the repo.
Process
GCP Composer names each instance of Airflow and its associated K8s nodes an "Environment". A new environment can be created with the command:
The Action can only be manually triggered. When triggered it asks the user to input the name to be used for the new Composer Environment. There is no default value.
TODO: The value of
secrets.GCLOUD_COMPOSER_ENVIRONMENT
should be updated with the new name.When you create a new Environment, you cannot specify which bucket it will read the DAGs from. It will create a new bucket (with an incomprehensible name). The URL of the new bucket is queried with this command:
This gives the URL including the
/dags
suffix. The/dags
suffix needs to be stripped off before the next step.The
Configs
action (which deploys the dags and plugin code to the Airflow instance) uses theGCLOUD_BUCKET_PATH
secret as the destination to deploy to. Therefore this action needs to update the value of the GCLOUD_BUCKET_PATH secret.TODO: Update the value of
secret.GCLOUD_BUCKET_PATH
Open questions:
requirements.txt
forapache-airflow
and uses the version number specified there. Is this a sensible/sane/foolhardy idea?Container
andConfigs
actions on completion of this action?Removing old Composer Environment and DAG bucket
There is currently no Action that removes an old Composer Environment and DAG bucket. Here are some notes on the process. The command
gcloud composer delete...
does not remove the associated DAG bucket. Therefore this DAG bucket must be identified before removing the relevant Composer Environment.