Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Debug GitHub Action to create new GCP Composer Environment #50

Draft
wants to merge 12 commits into
base: master
Choose a base branch
from

Conversation

andrewphilipsmith
Copy link
Contributor

@andrewphilipsmith andrewphilipsmith commented Jul 22, 2021

This PR builds on #49 and debugs the GitHub Action which creates a new Composer Environment (aka managed Airflow instance) in Google Cloud Platform.

I am adding some of the relevant notes are here until I find a better location for them in the relevant files in the repo.

Process

GCP Composer names each instance of Airflow and its associated K8s nodes an "Environment". A new environment can be created with the command:

gcloud composer create ...

The Action can only be manually triggered. When triggered it asks the user to input the name to be used for the new Composer Environment. There is no default value.

TODO: The value of secrets.GCLOUD_COMPOSER_ENVIRONMENT should be updated with the new name.

When you create a new Environment, you cannot specify which bucket it will read the DAGs from. It will create a new bucket (with an incomprehensible name). The URL of the new bucket is queried with this command:

gcloud composer environments describe --location=europe-west2 $new_composer_env_name --flatten=config.dagGcsPrefix

This gives the URL including the /dags suffix. The /dags suffix needs to be stripped off before the next step.

The Configs action (which deploys the dags and plugin code to the Airflow instance) uses the GCLOUD_BUCKET_PATH secret as the destination to deploy to. Therefore this action needs to update the value of the GCLOUD_BUCKET_PATH secret.

TODO: Update the value of secret.GCLOUD_BUCKET_PATH

Open questions:

  • It is possible to specify the version of Airflow in the new instance. One possibility is that this action parses the requirements.txt for apache-airflow and uses the version number specified there. Is this a sensible/sane/foolhardy idea?
  • Given this action can incur significant cost, is it possible to limit who can run this action?
  • What permissions does the service account, used by GH Actions, require on GCP to perform these tasks?
  • Do we need to set permissions of the new bucket, or are the defaults appropriate?
  • Do we trust the 3rd party GH Action https://github.com/hmanzur/actions-set-secret?
  • Do we want to automatically trigger the Container and Configs actions on completion of this action?

Removing old Composer Environment and DAG bucket

There is currently no Action that removes an old Composer Environment and DAG bucket. Here are some notes on the process. The command gcloud composer delete... does not remove the associated DAG bucket. Therefore this DAG bucket must be identified before removing the relevant Composer Environment.

@andrewphilipsmith
Copy link
Contributor Author

@andrewphilipsmith andrewphilipsmith marked this pull request as draft July 22, 2021 17:01
@andrewphilipsmith andrewphilipsmith requested a review from felnne July 23, 2021 09:04
@felnne
Copy link
Collaborator

felnne commented Jul 29, 2021

Hi @andrewphilipsmith, it might be worth discussing this PR a bit when you're back.

Essentially I'm wondering if doing this through GitHub Actions is the right approach, considering the Airflow environments will presumably be long lived and so a different context to routine tasks that I think GH Actions is more suited to.

I completely support making the Composer configuration defined as code though, so changes can be tracked in Git and resources are reproducible. However I'd recommend we use something like Terraform (which I have a fair amount of experience with) to so.

There's documentation here on how to configure a Google Composer instance for example, which I'm happy to implement. I also have some leave coming up so will have time to get this setup. I would definitely want to setup this up in a separate Google Cloud Project initially so as not to interfere with anything (once as code it would be easy to target a different project afterwards).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants