diff --git a/.gitattributes b/.gitattributes index dac878f73670d..6125c363103c5 100644 --- a/.gitattributes +++ b/.gitattributes @@ -15,8 +15,6 @@ Dockerfile.ci export-ignore ISSUE_TRIAGE_PROCESS.rst export-ignore CONTRIBUTING.rst export-ignore -CI.rst export-ignore -CI_DIAGRAMS.md export-ignore contributing_docs/ export-ignore .devcontainer export-ignore diff --git a/CI.rst b/CI.rst deleted file mode 100644 index dcba9820887ca..0000000000000 --- a/CI.rst +++ /dev/null @@ -1,669 +0,0 @@ - .. Licensed to the Apache Software Foundation (ASF) under one - or more contributor license agreements. See the NOTICE file - distributed with this work for additional information - regarding copyright ownership. The ASF licenses this file - to you under the Apache License, Version 2.0 (the - "License"); you may not use this file except in compliance - with the License. You may obtain a copy of the License at - - .. http://www.apache.org/licenses/LICENSE-2.0 - - .. Unless required by applicable law or agreed to in writing, - software distributed under the License is distributed on an - "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY - KIND, either express or implied. See the License for the - specific language governing permissions and limitations - under the License. - -.. contents:: :local: - -CI Environment -============== - -Continuous Integration is important component of making Apache Airflow robust and stable. We are running -a lot of tests for every pull request, for main and v2-*-test branches and regularly as scheduled jobs. - -Our execution environment for CI is `GitHub Actions `_. GitHub Actions -(GA) are very well integrated with GitHub code and Workflow and it has evolved fast in 2019/202 to become -a fully-fledged CI environment, easy to use and develop for, so we decided to switch to it. Our previous -CI system was Travis CI. - -However part of the philosophy we have is that we are not tightly coupled with any of the CI -environments we use. Most of our CI jobs are written as bash scripts which are executed as steps in -the CI jobs. And we have a number of variables determine build behaviour. - -You can also take a look at the `CI Sequence Diagrams `_ for more graphical overview -of how Airflow CI works. - -GitHub Actions runs -------------------- - -Our CI builds are highly optimized, leveraging the latest features provided -by the GitHub Actions environment to reuse parts of the build process across -different jobs. - -A significant portion of our CI runs utilize container images. Given that -Airflow has numerous dependencies, we use Docker containers to ensure tests -run in a well-configured and consistent environment. This approach is used -for most tests, documentation building, and some advanced static checks. -The environment comprises two types of images: CI images and PROD images. -CI images are used for most tests and checks, while PROD images are used for -Kubernetes tests. - -To run the tests, we need to ensure that the images are built using the -latest sources and that the build process is efficient. A full rebuild of -such an image from scratch might take approximately 15 minutes. Therefore, -we've implemented optimization techniques that efficiently use the cache -from the GitHub Docker registry. In most cases, this reduces the time -needed to rebuild the image to about 4 minutes. However, when -dependencies change, it can take around 6-7 minutes, and if the base -image of Python releases a new patch-level, it can take approximately -12 minutes. - -Container Registry used as cache --------------------------------- - -We are using GitHub Container Registry to store the results of the ``Build Images`` -workflow which is used in the ``Tests`` workflow. - -Currently in main version of Airflow we run tests in all versions of Python supported, -which means that we have to build multiple images (one CI and one PROD for each Python version). -Yet we run many jobs (>15) - for each of the CI images. That is a lot of time to just build the -environment to run. Therefore we are utilising ``pull_request_target`` feature of GitHub Actions. - -This feature allows to run a separate, independent workflow, when the main workflow is run - -this separate workflow is different than the main one, because by default it runs using ``main`` version -of the sources but also - and most of all - that it has WRITE access to the GitHub Container Image registry. - -This is especially important in our case where Pull Requests to Airflow might come from any repository, -and it would be a huge security issue if anyone from outside could -utilise the WRITE access to the Container Image Registry via external Pull Request. - -Thanks to the WRITE access and fact that the ``pull_request_target`` by default uses the ``main`` version of the -sources, we can safely run some logic there will checkout the incoming Pull Request, build the container -image from the sources from the incoming PR and push such image to an GitHub Docker Registry - so that -this image can be built only once and used by all the jobs running tests. The image is tagged with unique -``COMMIT_SHA`` of the incoming Pull Request and the tests run in the Pull Request can simply pull such image -rather than build it from the scratch. Pulling such image takes ~ 1 minute, thanks to that we are saving -a lot of precious time for jobs. - -We use `GitHub Container Registry `_. -``GITHUB_TOKEN`` is needed to push to the registry and we configured scopes of the tokens in our jobs -to be able to write to the registry. - -The latest cache is kept as ``:cache-amd64`` and ``:cache-arm64`` tagged cache (suitable for -``--cache-from`` directive of buildx - it contains metadata and cache for all segments in the image, -and cache is separately kept for different platform. - -The ``latest`` images of CI and PROD are ``amd64`` only images for CI, because there is no very easy way -to push multiplatform images without merging the manifests and it is not really needed nor used for cache. - - -Naming conventions for stored images -==================================== - -The images produced during the ``Build Images`` workflow of CI jobs are stored in the -`GitHub Container Registry `_ - -The images are stored with both "latest" tag (for last main push image that passes all the tests as well -with the COMMIT_SHA id for images that were used in particular build. - -The image names follow the patterns (except the Python image, all the images are stored in -https://ghcr.io/ in ``apache`` organization. - -The packages are available under (CONTAINER_NAME is url-encoded name of the image). Note that "/" are -supported now in the ``ghcr.io`` as a part of the image name within the ``apache`` organization, but they -have to be percent-encoded when you access them via UI (/ = %2F) - -``https://github.com/apache/airflow/pkgs/container/`` - -+--------------+----------------------------------------------------------+----------------------------------------------------------+ -| Image | Name:tag (both cases latest version and per-build) | Description | -+==============+==========================================================+==========================================================+ -| Python image | python:-slim-bookworm | Base Python image used by both production and CI image. | -| (DockerHub) | | Python maintainer release new versions of those image | -| | | with security fixes every few weeks in DockerHub. | -+--------------+----------------------------------------------------------+----------------------------------------------------------+ -| CI image | airflow//ci/python:latest | CI image - this is the image used for most of the tests. | -| | or | Contains all provider dependencies and tools useful | -| | airflow//ci/python: | For testing. This image is used in Breeze. | -+--------------+----------------------------------------------------------+----------------------------------------------------------+ -| | | faster to build or pull. | -| PROD image | airflow//prod/python:latest | Production image. This is the actual production image | -| | or | optimized for size. | -| | airflow//prod/python: | It contains only compiled libraries and minimal set of | -| | | dependencies to run Airflow. | -+--------------+----------------------------------------------------------+----------------------------------------------------------+ - -* might be either "main" or "v2-*-test" -* - Python version (Major + Minor).Should be one of ["3.8", "3.9", "3.10", "3.11"]. -* - full-length SHA of commit either from the tip of the branch (for pushes/schedule) or - commit from the tip of the branch used for the PR. - -GitHub Registry Variables -========================= - -Our CI uses GitHub Registry to pull and push images to/from by default. Those variables are set automatically -by GitHub Actions when you run Airflow workflows in your fork, so they should automatically use your -own repository as GitHub Registry to build and keep the images as build image cache. - -The variables are automatically set in GitHub actions - -+--------------------------------+---------------------------+----------------------------------------------+ -| Variable | Default | Comment | -+================================+===========================+==============================================+ -| GITHUB_REPOSITORY | ``apache/airflow`` | Prefix of the image. It indicates which. | -| | | registry from GitHub to use for image cache | -| | | and to determine the name of the image. | -+--------------------------------+---------------------------+----------------------------------------------+ -| CONSTRAINTS_GITHUB_REPOSITORY | ``apache/airflow`` | Repository where constraints are stored | -+--------------------------------+---------------------------+----------------------------------------------+ -| GITHUB_USERNAME | | Username to use to login to GitHub | -| | | | -+--------------------------------+---------------------------+----------------------------------------------+ -| GITHUB_TOKEN | | Token to use to login to GitHub. | -| | | Only used when pushing images on CI. | -+--------------------------------+---------------------------+----------------------------------------------+ - -The Variables beginning with ``GITHUB_`` cannot be overridden in GitHub Actions by the workflow. -Those variables are set by GitHub Actions automatically and they are immutable. Therefore if -you want to override them in your own CI workflow and use ``breeze``, you need to pass the -values by corresponding ``breeze`` flags ``--github-repository``, -``--github-token`` rather than by setting them as environment variables in your workflow. -Unless you want to keep your own copy of constraints in orphaned ``constraints-*`` -branches, the ``CONSTRAINTS_GITHUB_REPOSITORY`` should remain ``apache/airflow``, regardless in which -repository the CI job is run. - -One of the variables you might want to override in your own GitHub Actions workflow when using ``breeze`` is -``--github-repository`` - you might want to force it to ``apache/airflow``, because then the cache from -``apache/airflow`` repository will be used and your builds will be much faster. - -Example command to build your CI image efficiently in your own CI workflow: - -.. code-block:: bash - - # GITHUB_REPOSITORY is set automatically in Github Actions so we need to override it with flag - # - breeze ci-image build --github-repository apache/airflow --python 3.10 - docker tag ghcr.io/apache/airflow/main/ci/python3.10 your-image-name:tag - - -Authentication in GitHub Registry -================================= - -We are using GitHub Container Registry as cache for our images. Authentication uses GITHUB_TOKEN mechanism. -Authentication is needed for pushing the images (WRITE) only in "push", "pull_request_target" workflows. -When you are running the CI jobs in GitHub Actions, GITHUB_TOKEN is set automatically by the actions. - - -CI run types -============ - -The Apache Airflow project utilizes several types of Continuous Integration (CI) -jobs, each with a distinct purpose and context. These jobs are executed by the -``ci.yaml`` workflow. - -In addition to the standard "PR" runs, we also execute "Canary" runs. -These runs are designed to detect potential issues that could affect -regular PRs early on, without causing all PRs to fail when such problems -arise. This strategy ensures a more stable environment for contributors -submitting their PRs. At the same time, it allows maintainers to proactively -address issues highlighted by the "Canary" builds. - -Pull request run ----------------- - -These runs are triggered by pull requests from contributors' forks. The majority of -Apache Airflow builds fall into this category. They are executed in the context of -the contributor's "Fork", not the main Airflow Code Repository, meaning they only have -"read" access to all GitHub resources, such as the container registry and code repository. -This is necessary because the code in these PRs, including the CI job definition, -might be modified by individuals who are not committers to the Apache Airflow Code Repository. - -The primary purpose of these jobs is to verify if the PR builds cleanly, if the tests -run correctly, and if the PR is ready for review and merge. These runs utilize cached -images from the Private GitHub registry, including CI, Production Images, and base -Python images. Furthermore, for these builds, we only execute Python tests if -significant files have changed. For instance, if the PR involves a "no-code" change, -no tests will be executed. - -Regular PR builds run in a "stable" environment: - -* fixed set of constraints (constraints that passed the tests) - except the PRs that change dependencies -* limited matrix and set of tests (determined by selective checks based on what changed in the PR) -* no ARM image builds are build in the regular PRs -* lower probability of flaky tests for non-committer PRs (public runners and less parallelism) - -Maintainers can also run the "Pull Request run" from the "apache/airflow" repository by pushing -to a branch in the "apache/airflow" repository. This is useful when you want to test a PR that -changes the CI/CD infrastructure itself (for example changes to the CI/CD scripts or changes to -the CI/CD workflows). In this case the PR is run in the context of the "apache/airflow" repository -and has WRITE access to the GitHub Container Registry. - -Canary run ----------- - -This workflow is triggered when a pull request is merged into the "main" branch or pushed to any of -the "v2-*-test" branches. The "Canary" run aims to upgrade dependencies to their latest versions -and promptly pushes a preview of the CI/PROD image cache to the GitHub Registry. This allows pull -requests to quickly utilize the new cache, which is particularly beneficial when the Dockerfile or -installation scripts have been modified. Even if some tests fail, this cache will already include the -latest Dockerfile and scripts.Upon successful execution, the run updates the constraint files in the -"constraints-main" branch with the latest constraints and pushes both the cache and the latest CI/PROD -images to the GitHub Registry. - -If the "Canary" build fails, it often indicates that a new version of our dependencies is incompatible -with the current tests or Airflow code. Alternatively, it could mean that a breaking change has been -merged into "main". Both scenarios require prompt attention from the maintainers. While a "broken main" -due to our code should be fixed quickly, "broken dependencies" may take longer to resolve. Until the tests -pass, the constraints will not be updated, meaning that regular PRs will continue using the older version -of dependencies that passed one of the previous "Canary" runs. - -Scheduled runs --------------- - -The "scheduled" workflow, which is designed to run regularly (typically overnight), -is triggered when a scheduled run occurs. This workflow is largely identical to the -"Canary" run, with one key difference: the image is always built from scratch, not -from a cache. This approach ensures that we can verify whether any "system" dependencies -in the Debian base image have changed, and confirm that the build process remains reproducible. -Since the process for a scheduled run mirrors that of a "Canary" run, no separate diagram is -necessary to illustrate it. - -Workflows -========= - -A general note about cancelling duplicated workflows: for the ``Build Images``, ``Tests`` and ``CodeQL`` -workflows we use the ``concurrency`` feature of GitHub actions to automatically cancel "old" workflow runs -of each type -- meaning if you push a new commit to a branch or to a pull request and there is a workflow -running, GitHub Actions will cancel the old workflow run automatically. - -Build Images Workflow ---------------------- - -This workflow builds images for the CI Workflow for Pull Requests coming from forks. - -It's a special type of workflow: ``pull_request_target`` which means that it is triggered when a pull request -is opened. This also means that the workflow has Write permission to push to the GitHub registry the images -used by CI jobs which means that the images can be built only once and reused by all the CI jobs -(including the matrix jobs). We've implemented it so that the ``Tests`` workflow waits -until the images are built by the ``Build Images`` workflow before running. - -Those "Build Image" steps are skipped in case Pull Requests do not come from "forks" (i.e. those -are internal PRs for Apache Airflow repository. This is because in case of PRs coming from -Apache Airflow (only committers can create those) the "pull_request" workflows have enough -permission to push images to GitHub Registry. - -This workflow is not triggered on normal pushes to our "main" branches, i.e. after a -pull request is merged and whenever ``scheduled`` run is triggered. Again in this case the "CI" workflow -has enough permissions to push the images. In this case we simply do not run this workflow. - -The workflow has the following jobs: - -+---------------------------+---------------------------------------------+ -| Job | Description | -| | | -+===========================+=============================================+ -| Build Info | Prints detailed information about the build | -+---------------------------+---------------------------------------------+ -| Build CI images | Builds all configured CI images | -+---------------------------+---------------------------------------------+ -| Build PROD images | Builds all configured PROD images | -+---------------------------+---------------------------------------------+ - -The images are stored in the `GitHub Container Registry `_ -and the names of those images follow the patterns described in -`Naming conventions for stored images <#naming-conventions-for-stored-images>`_ - -Image building is configured in "fail-fast" mode. When any of the images -fails to build, it cancels other builds and the source ``Tests`` workflow run -that triggered it. - - -Differences for main and release branches ------------------------------------------ - -The type of tests executed varies depending on the version or branch under test. For the "main" development branch, -we run all tests to maintain the quality of Airflow. However, when releasing patch-level updates on older -branches, we only run a subset of these tests. This is because older branches are exclusively used for releasing -Airflow and its corresponding image, not for releasing providers or helm charts. - -This behaviour is controlled by ``default-branch`` output of the build-info job. Whenever we create a branch for old version -we update the ``AIRFLOW_BRANCH`` in ``airflow_breeze/branch_defaults.py`` to point to the new branch and there are a few -places where selection of tests is based on whether this output is ``main``. They are marked as - in the "Release branches" -column of the table below. - -Tests Workflow --------------- - -This workflow is a regular workflow that performs all checks of Airflow code. - -+---------------------------------+----------------------------------------------------------+----------+----------+-----------+-------------------+ -| Job | Description | PR | Canary | Scheduled | Release branches | -+=================================+==========================================================+==========+==========+===========+===================+ -| Build info | Prints detailed information about the build | Yes | Yes | Yes | Yes | -+---------------------------------+----------------------------------------------------------+----------+----------+-----------+-------------------+ -| Push early cache & images | Pushes early cache/images to GitHub Registry and test | - | Yes | - | - | -| | speed of building breeze images from scratch | | | | | -+---------------------------------+----------------------------------------------------------+----------+----------+-----------+-------------------+ -| Check that image builds quickly | Checks that image builds quickly without taking a lot of | - | Yes | - | Yes | -| | time for ``pip`` to figure out the right set of deps. | | | | | -+---------------------------------+----------------------------------------------------------+----------+----------+-----------+-------------------+ -| Build CI images | Builds images in-workflow (not in the ``build images``) | - | Yes | Yes (1) | Yes (4) | -+---------------------------------+----------------------------------------------------------+----------+----------+-----------+-------------------+ -| Generate constraints/CI verify | Generate constraints for the build and verify CI image | Yes (2) | Yes (2) | Yes (2) | Yes (2) | -+---------------------------------+----------------------------------------------------------+----------+----------+-----------+-------------------+ -| Build PROD images | Builds images in-workflow (not in the ``build images``) | - | Yes | Yes (1) | Yes (4) | -+---------------------------------+----------------------------------------------------------+----------+----------+-----------+-------------------+ -| Build Bullseye PROD images | Builds images based on Bullseye debian | - | Yes | Yes | Yes | -+---------------------------------+----------------------------------------------------------+----------+----------+-----------+-------------------+ -| Run breeze tests | Run unit tests for Breeze | Yes | Yes | Yes | Yes | -+---------------------------------+----------------------------------------------------------+----------+----------+-----------+-------------------+ -| Test OpenAPI client gen | Tests if OpenAPIClient continues to generate | Yes | Yes | Yes | Yes | -+---------------------------------+----------------------------------------------------------+----------+----------+-----------+-------------------+ -| React WWW tests | React UI tests for new Airflow UI | Yes | Yes | Yes | Yes | -+---------------------------------+----------------------------------------------------------+----------+----------+-----------+-------------------+ -| Test examples image building | Tests if PROD image build examples work | Yes | Yes | Yes | Yes | -+---------------------------------+----------------------------------------------------------+----------+----------+-----------+-------------------+ -| Test git clone on Windows | Tests if Git clone for for Windows | Yes (5) | Yes (5) | Yes (5) | Yes (5) | -+---------------------------------+----------------------------------------------------------+----------+----------+-----------+-------------------+ -| Waits for CI Images | Waits for and verify CI Images | Yes (2) | Yes (2) | Yes (2) | Yes (2) | -+---------------------------------+----------------------------------------------------------+----------+----------+-----------+-------------------+ -| Static checks | Performs full static checks | Yes (6) | Yes | Yes | Yes (7) | -+---------------------------------+----------------------------------------------------------+----------+----------+-----------+-------------------+ -| Basic static checks | Performs basic static checks (no image) | Yes (6) | - | - | - | -+---------------------------------+----------------------------------------------------------+----------+----------+-----------+-------------------+ -| Build docs | Builds and tests publishing of the documentation | Yes | Yes | Yes | Yes | -+---------------------------------+----------------------------------------------------------+----------+----------+-----------+-------------------+ -| Spellcheck docs | Spellcheck docs | Yes | Yes | Yes | Yes | -+---------------------------------+----------------------------------------------------------+----------+----------+-----------+-------------------+ -| Tests wheel provider packages | Tests if provider packages can be built and released | Yes | Yes | Yes | - | -+---------------------------------+----------------------------------------------------------+----------+----------+-----------+-------------------+ -| Tests Airflow compatibility | Compatibility of provider packages with older Airflow | Yes | Yes | Yes | - | -+---------------------------------+----------------------------------------------------------+----------+----------+-----------+-------------------+ -| Tests dist provider packages | Tests if dist provider packages can be built | - | Yes | Yes | - | -+---------------------------------+----------------------------------------------------------+----------+----------+-----------+-------------------+ -| Tests airflow release commands | Tests if airflow release command works | - | Yes | Yes | - | -+---------------------------------+----------------------------------------------------------+----------+----------+-----------+-------------------+ -| Tests (Backend/Python matrix) | Run the Pytest unit DB tests (Backend/Python matrix) | Yes | Yes | Yes | Yes (8) | -+---------------------------------+----------------------------------------------------------+----------+----------+-----------+-------------------+ -| No DB tests | Run the Pytest unit Non-DB tests (with pytest-xdist) | Yes | Yes | Yes | Yes (8) | -+---------------------------------+----------------------------------------------------------+----------+----------+-----------+-------------------+ -| Integration tests | Runs integration tests (Postgres/Mysql) | Yes | Yes | Yes | Yes (9) | -+---------------------------------+----------------------------------------------------------+----------+----------+-----------+-------------------+ -| Quarantined tests | Runs quarantined tests (with flakiness and side-effects) | Yes | Yes | Yes | Yes (8) | -+---------------------------------+----------------------------------------------------------+----------+----------+-----------+-------------------+ -| Test airflow packages | Tests that Airflow package can be built and released | Yes | Yes | Yes | Yes | -+---------------------------------+----------------------------------------------------------+----------+----------+-----------+-------------------+ -| Helm tests | Run the Helm integration tests | Yes | Yes | Yes | - | -+---------------------------------+----------------------------------------------------------+----------+----------+-----------+-------------------+ -| Helm release tests | Run the tests for Helm releasing | Yes | Yes | Yes | - | -+---------------------------------+----------------------------------------------------------+----------+----------+-----------+-------------------+ -| Summarize warnings | Summarizes warnings from all other tests | Yes | Yes | Yes | Yes | -+---------------------------------+----------------------------------------------------------+----------+----------+-----------+-------------------+ -| Wait for PROD Images | Waits for and verify PROD Images | Yes (2) | Yes (2) | Yes (2) | Yes (2) | -+---------------------------------+----------------------------------------------------------+----------+----------+-----------+-------------------+ -| Docker Compose test/PROD verify | Tests quick-start Docker Compose and verify PROD image | Yes | Yes | Yes | Yes | -+---------------------------------+----------------------------------------------------------+----------+----------+-----------+-------------------+ -| Tests Kubernetes | Run Kubernetes test | Yes | Yes | Yes | - | -+---------------------------------+----------------------------------------------------------+----------+----------+-----------+-------------------+ -| Update constraints | Upgrade constraints to latest ones | Yes (3) | Yes (3) | Yes (3) | Yes (3) | -+---------------------------------+----------------------------------------------------------+----------+----------+-----------+-------------------+ -| Push cache & images | Pushes cache/images to GitHub Registry (3) | - | Yes (3) | - | Yes | -+---------------------------------+----------------------------------------------------------+----------+----------+-----------+-------------------+ -| Build CI ARM images | Builds CI images for ARM to detect any problems which | Yes (10) | - | Yes | - | -| | would only appear if we install all dependencies on ARM | | | | | -+---------------------------------+----------------------------------------------------------+----------+----------+-----------+-------------------+ - -``(1)`` Scheduled jobs builds images from scratch - to test if everything works properly for clean builds - -``(2)`` The jobs wait for CI images to be available. It only actually runs when build image is needed (in - case of simpler PRs that do not change dependencies or source code, images are not build) - -``(3)`` PROD and CI cache & images are pushed as "cache" (both AMD and ARM) and "latest" (only AMD) -to GitHub Container registry and constraints are upgraded only if all tests are successful. -The images are rebuilt in this step using constraints pushed in the previous step. -Constraints are only actually pushed in the ``canary/scheduled`` runs. - -``(4)`` In main, PROD image uses locally build providers using "latest" version of the provider code. In the -non-main version of the build, the latest released providers from PyPI are used. - -``(5)`` Always run with public runners to test if Git clone works on Windows. - -``(6)`` Run full set of static checks when selective-checks determine that they are needed (basically, when -Python code has been modified). - -``(7)`` On non-main builds some of the static checks that are related to Providers are skipped via selective checks -(``skip-pre-commits`` check). - -``(8)`` On non-main builds the unit tests for providers are skipped via selective checks removing the -"Providers" test type. - -``(9)`` On non-main builds the integration tests for providers are skipped via ``skip-provider-tests`` selective -check output. - -``(10)`` Only run the builds in case PR is run by a committer from "apache" repository and in scheduled build. - - -CodeQL scan ------------ - -The `CodeQL `_ security scan uses GitHub security scan framework to scan our code for security violations. -It is run for JavaScript and Python code. - -Publishing documentation ------------------------- - -Documentation from the ``main`` branch is automatically published on Amazon S3. - -To make this possible, GitHub Action has secrets set up with credentials -for an Amazon Web Service account - ``DOCS_AWS_ACCESS_KEY_ID`` and ``DOCS_AWS_SECRET_ACCESS_KEY``. - -This account has permission to write/list/put objects to bucket ``apache-airflow-docs``. This bucket has -public access configured, which means it is accessible through the website endpoint. -For more information, see: -`Hosting a static website on Amazon S3 `_ - -Website endpoint: http://apache-airflow-docs.s3-website.eu-central-1.amazonaws.com/ - - -Debugging CI Jobs in Github Actions -=================================== - -The CI jobs are notoriously difficult to test, because you can only really see results of it when you run them -in CI environment, and the environment in which they run depend on who runs them (they might be either run -in our Self-Hosted runners (with 64 GB RAM 8 CPUs) or in the GitHub Public runners (6 GB of RAM, 2 CPUs) and -the results will vastly differ depending on which environment is used. We are utilizing parallelism to make -use of all the available CPU/Memory but sometimes you need to enable debugging and force certain environments. -Additional difficulty is that ``Build Images`` workflow is ``pull-request-target`` type, which means that it -will always run using the ``main`` version - no matter what is in your Pull Request. - -There are several ways how you can debug the CI jobs when you are maintainer. - -* When you want to tests the build with all combinations of all python, backends etc on regular PR, - add ``full tests needed`` label to the PR. -* When you want to test maintainer PR using public runners, add ``public runners`` label to the PR -* When you want to see resources used by the run, add ``debug ci resources`` label to the PR -* When you want to test changes to breeze that include changes to how images are build you should push - your PR to ``apache`` repository not to your fork. This will run the images as part of the ``CI`` workflow - rather than using ``Build images`` workflow and use the same breeze version for building image and testing -* When you want to test changes to ``build-images.yml`` workflow you should push your branch as ``main`` - branch in your local fork. This will run changed ``build-images.yml`` workflow as it will be in ``main`` - branch of your fork - -Replicating the CI Jobs locally -=============================== - -The main goal of the CI philosophy we have that no matter how complex the test and integration -infrastructure, as a developer you should be able to reproduce and re-run any of the failed checks -locally. One part of it are pre-commit checks, that allow you to run the same static checks in CI -and locally, but another part is the CI environment which is replicated locally with Breeze. - -You can read more about Breeze in `README.rst `_ but in essence it is a script that allows -you to re-create CI environment in your local development instance and interact with it. In its basic -form, when you do development you can run all the same tests that will be run in CI - but locally, -before you submit them as PR. Another use case where Breeze is useful is when tests fail on CI. You can -take the full ``COMMIT_SHA`` of the failed build pass it as ``--image-tag`` parameter of Breeze and it will -download the very same version of image that was used in CI and run it locally. This way, you can very -easily reproduce any failed test that happens in CI - even if you do not check out the sources -connected with the run. - -All our CI jobs are executed via ``breeze`` commands. You can replicate exactly what our CI is doing -by running the sequence of corresponding ``breeze`` command. Make sure however that you look at both: - -* flags passed to ``breeze`` commands -* environment variables used when ``breeze`` command is run - this is useful when we want - to set a common flag for all ``breeze`` commands in the same job or even the whole workflow. For - example ``VERBOSE`` variable is set to ``true`` for all our workflows so that more detailed information - about internal commands executed in CI is printed. - -In the output of the CI jobs, you will find both - the flags passed and environment variables set. - -You can read more about it in `Breeze `_ and `Testing `_ - -Since we store images from every CI run, you should be able easily reproduce any of the CI tests problems -locally. You can do it by pulling and using the right image and running it with the right docker command, -For example knowing that the CI job was for commit ``cd27124534b46c9688a1d89e75fcd137ab5137e3``: - -.. code-block:: bash - - docker pull ghcr.io/apache/airflow/main/ci/python3.8:cd27124534b46c9688a1d89e75fcd137ab5137e3 - - docker run -it ghcr.io/apache/airflow/main/ci/python3.8:cd27124534b46c9688a1d89e75fcd137ab5137e3 - - -But you usually need to pass more variables and complex setup if you want to connect to a database or -enable some integrations. Therefore it is easiest to use `Breeze `_ for that. -For example if you need to reproduce a MySQL environment in python 3.8 environment you can run: - -.. code-block:: bash - - breeze --image-tag cd27124534b46c9688a1d89e75fcd137ab5137e3 --python 3.8 --backend mysql - -You will be dropped into a shell with the exact version that was used during the CI run and you will -be able to run pytest tests manually, easily reproducing the environment that was used in CI. Note that in -this case, you do not need to checkout the sources that were used for that run - they are already part of -the image - but remember that any changes you make in those sources are lost when you leave the image as -the sources are not mapped from your host machine. - -Depending whether the scripts are run locally via `Breeze `_ or whether they -are run in ``Build Images`` or ``Tests`` workflows they can take different values. - -You can use those variables when you try to reproduce the build locally (alternatively you can pass -those via corresponding command line flags passed to ``breeze shell`` command. - -+-----------------------------------------+-------------+--------------+------------+-------------------------------------------------+ -| Variable | Local | Build Images | CI | Comment | -| | development | workflow | Workflow | | -+=========================================+=============+==============+============+=================================================+ -| Basic variables | -+-----------------------------------------+-------------+--------------+------------+-------------------------------------------------+ -| ``PYTHON_MAJOR_MINOR_VERSION`` | | | | Major/Minor version of Python used. | -+-----------------------------------------+-------------+--------------+------------+-------------------------------------------------+ -| ``DB_RESET`` | false | true | true | Determines whether database should be reset | -| | | | | at the container entry. By default locally | -| | | | | the database is not reset, which allows to | -| | | | | keep the database content between runs in | -| | | | | case of Postgres or MySQL. However, | -| | | | | it requires to perform manual init/reset | -| | | | | if you stop the environment. | -+-----------------------------------------+-------------+--------------+------------+-------------------------------------------------+ -| Forcing answer | -+-----------------------------------------+-------------+--------------+------------+-------------------------------------------------+ -| ``ANSWER`` | | yes | yes | This variable determines if answer to questions | -| | | | | during the build process should be | -| | | | | automatically given. For local development, | -| | | | | the user is occasionally asked to provide | -| | | | | answers to questions such as - whether | -| | | | | the image should be rebuilt. By default | -| | | | | the user has to answer but in the CI | -| | | | | environment, we force "yes" answer. | -+-----------------------------------------+-------------+--------------+------------+-------------------------------------------------+ -| Host variables | -+-----------------------------------------+-------------+--------------+------------+-------------------------------------------------+ -| ``HOST_USER_ID`` | | | | User id of the host user. | -+-----------------------------------------+-------------+--------------+------------+-------------------------------------------------+ -| ``HOST_GROUP_ID`` | | | | Group id of the host user. | -+-----------------------------------------+-------------+--------------+------------+-------------------------------------------------+ -| ``HOST_OS`` | | linux | linux | OS of the Host (darwin/linux/windows). | -+-----------------------------------------+-------------+--------------+------------+-------------------------------------------------+ -| Git variables | -+-----------------------------------------+-------------+--------------+------------+-------------------------------------------------+ -| ``COMMIT_SHA`` | | GITHUB_SHA | GITHUB_SHA | SHA of the commit of the build is run | -+-----------------------------------------+-------------+--------------+------------+-------------------------------------------------+ -| In container environment initialization | -+-----------------------------------------+-------------+--------------+------------+-------------------------------------------------+ -| ``SKIP_ENVIRONMENT_INITIALIZATION`` | false\* | false\* | false\* | Skip initialization of test environment | -| | | | | | -| | | | | \* set to true in pre-commits | -+-----------------------------------------+-------------+--------------+------------+-------------------------------------------------+ -| ``SKIP_IMAGE_UPGRADE_CHECK`` | false\* | false\* | false\* | Skip checking if image should be upgraded | -| | | | | | -| | | | | \* set to true in pre-commits | -+-----------------------------------------+-------------+--------------+------------+-------------------------------------------------+ -| ``SKIP_PROVIDER_TESTS`` | false\* | false\* | false\* | Skip running provider integration tests | -+-----------------------------------------+-------------+--------------+------------+-------------------------------------------------+ -| ``SKIP_SSH_SETUP`` | false\* | false\* | false\* | Skip setting up SSH server for tests. | -| | | | | | -| | | | | \* set to true in GitHub CodeSpaces | -+-----------------------------------------+-------------+--------------+------------+-------------------------------------------------+ -| ``VERBOSE_COMMANDS`` | false | false | false | Determines whether every command | -| | | | | executed in docker should also be printed | -| | | | | before execution. This is a low-level | -| | | | | debugging feature of bash (set -x) enabled in | -| | | | | entrypoint and it should only be used if you | -| | | | | need to debug the bash scripts in container. | -+-----------------------------------------+-------------+--------------+------------+-------------------------------------------------+ -| Image build variables | -+-----------------------------------------+-------------+--------------+------------+-------------------------------------------------+ -| ``UPGRADE_TO_NEWER_DEPENDENCIES`` | false | false | false\* | Determines whether the build should | -| | | | | attempt to upgrade Python base image and all | -| | | | | PIP dependencies to latest ones matching | -| | | | | ``pyproject.toml`` limits. Tries to replicate | -| | | | | the situation of "fresh" user who just installs | -| | | | | airflow and uses latest version of matching | -| | | | | dependencies. By default we are using a | -| | | | | tested set of dependency constraints | -| | | | | stored in separated "orphan" branches | -| | | | | of the airflow repository | -| | | | | ("constraints-main, "constraints-2-0") | -| | | | | but when this flag is set to anything but false | -| | | | | (for example random value), they are not used | -| | | | | used and "eager" upgrade strategy is used | -| | | | | when installing dependencies. We set it | -| | | | | to true in case of direct pushes (merges) | -| | | | | to main and scheduled builds so that | -| | | | | the constraints are tested. In those builds, | -| | | | | in case we determine that the tests pass | -| | | | | we automatically push latest set of | -| | | | | "tested" constraints to the repository. | -| | | | | | -| | | | | Setting the value to random value is best way | -| | | | | to assure that constraints are upgraded even if | -| | | | | there is no change to ``pyproject.toml`` | -| | | | | | -| | | | | This way our constraints are automatically | -| | | | | tested and updated whenever new versions | -| | | | | of libraries are released. | -| | | | | | -| | | | | \* true in case of direct pushes and | -| | | | | scheduled builds | -+-----------------------------------------+-------------+--------------+------------+-------------------------------------------------+ - -Adding new Python versions to CI -================================ - -In order to add a new version the following operations should be done (example uses Python 3.10) - -* copy the latest constraints in ``constraints-main`` branch from previous versions and name it - using the new Python version (``constraints-3.10.txt``). Commit and push - -* build image locally for both prod and CI locally using Breeze: - -.. code-block:: bash - - breeze ci-image build --python 3.10 - -* Find the 2 new images (prod, ci) created in - `GitHub Container registry `_ - go to Package Settings and turn on ``Public Visibility`` and set "Inherit access from Repository" flag. diff --git a/Dockerfile.ci b/Dockerfile.ci index 84b03c2359d96..178c0fda044d9 100644 --- a/Dockerfile.ci +++ b/Dockerfile.ci @@ -1247,7 +1247,7 @@ LABEL org.apache.airflow.distro="debian" \ org.opencontainers.image.created="${AIRFLOW_IMAGE_DATE_CREATED}" \ org.opencontainers.image.authors="dev@airflow.apache.org" \ org.opencontainers.image.url="https://airflow.apache.org" \ - org.opencontainers.image.documentation="https://github.com/apache/airflow/IMAGES.rst" \ + org.opencontainers.image.documentation="https://airflow.apache.org/docs/docker-stack/index.html" \ org.opencontainers.image.source="https://github.com/apache/airflow" \ org.opencontainers.image.version="${AIRFLOW_VERSION}" \ org.opencontainers.image.revision="${COMMIT_SHA}" \ diff --git a/IMAGES.rst b/IMAGES.rst deleted file mode 100644 index 89a014f15de0b..0000000000000 --- a/IMAGES.rst +++ /dev/null @@ -1,561 +0,0 @@ - .. Licensed to the Apache Software Foundation (ASF) under one - or more contributor license agreements. See the NOTICE file - distributed with this work for additional information - regarding copyright ownership. The ASF licenses this file - to you under the Apache License, Version 2.0 (the - "License"); you may not use this file except in compliance - with the License. You may obtain a copy of the License at - - .. http://www.apache.org/licenses/LICENSE-2.0 - - .. Unless required by applicable law or agreed to in writing, - software distributed under the License is distributed on an - "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY - KIND, either express or implied. See the License for the - specific language governing permissions and limitations - under the License. - -.. contents:: :local: - -Airflow Docker images -===================== - -Airflow has two main images (build from Dockerfiles): - - * Production image (Dockerfile) - that can be used to build your own production-ready Airflow installation. - You can read more about building and using the production image in the - `Docker stack `_ documentation. - The image is built using `Dockerfile `_. - - * CI image (Dockerfile.ci) - used for running tests and local development. The image is built using - `Dockerfile.ci `_. - -PROD image ------------ - -The PROD image is a multi-segment image. The first segment ``airflow-build-image`` contains all the -build essentials and related dependencies that allow to install airflow locally. By default the image is -built from a released version of Airflow from GitHub, but by providing some extra arguments you can also -build it from local sources. This is particularly useful in CI environment where we are using the image -to run Kubernetes tests. See below for the list of arguments that should be provided to build -production image from the local sources. - -The image is primarily optimised for size of the final image, but also for speed of rebuilds - the -``airflow-build-image`` segment uses the same technique as the CI jobs for pre-installing dependencies. -It first pre-installs them from the right GitHub branch and only after that final airflow installation is -done from either local sources or remote location (PyPI or GitHub repository). - -You can read more details about building, extending and customizing the PROD image in the -`Latest documentation `_ - -CI image --------- - -The CI image is used by `Breeze `_ as the shell image but it is also used during CI tests. -The image is single segment image that contains Airflow installation with "all" dependencies installed. -It is optimised for rebuild speed. It installs PIP dependencies from the current branch first - -so that any changes in ``pyproject.toml`` do not trigger reinstalling of all dependencies. -There is a second step of installation that re-installs the dependencies -from the latest sources so that we are sure that latest dependencies are installed. - -Building docker images from current sources -=========================================== - -The easy way to build the CI/PROD images is to use ``_. It uses a number of -optimization and caches to build it efficiently and fast when you are developing Airflow and need to update to -latest version. - -For CI image: Airflow package is always built from sources. When you execute the image, you can however use -the ``--use-airflow-version`` flag (or ``USE_AIRFLOW_VERSION`` environment variable) to remove -the preinstalled source version of Airflow and replace it with one of the possible installation methods: - -* "none" - airflow is removed and not installed -* "wheel" - airflow is removed and replaced with "wheel" version available in dist -* "sdist" - airflow is removed and replaced with "sdist" version available in dist -* "" - airflow is removed and installed from PyPI (with the specified version) - -For PROD image: By default production image is built from the latest sources when using Breeze, but when -you use it via docker build command, it uses the latest installed version of airflow and providers. -However, you can choose different installation methods as described in -`Building PROD docker images from released PIP packages <#building-prod-docker-images-from-released-packages>`_. -Detailed reference for building production image from different sources can be found in: -`Build Args reference `_ - -You can build the CI image using current sources this command: - -.. code-block:: bash - - breeze ci-image build - -You can build the PROD image using current sources with this command: - -.. code-block:: bash - - breeze prod-image build - -By adding ``--python `` parameter you can build the -image version for the chosen Python version. - -The images are built with default extras - different extras for CI and production image and you -can change the extras via the ``--extras`` parameters and add new ones with ``--additional-airflow-extras``. - -For example if you want to build Python 3.8 version of production image with -"all" extras installed you should run this command: - -.. code-block:: bash - - breeze prod-image build --python 3.8 --extras "all" - -If you just want to add new extras you can add them like that: - -.. code-block:: bash - - breeze prod-image build --python 3.8 --additional-airflow-extras "all" - -The command that builds the CI image is optimized to minimize the time needed to rebuild the image when -the source code of Airflow evolves. This means that if you already have the image locally downloaded and -built, the scripts will determine whether the rebuild is needed in the first place. Then the scripts will -make sure that minimal number of steps are executed to rebuild parts of the image (for example, -PIP dependencies) and will give you an image consistent with the one used during Continuous Integration. - -The command that builds the production image is optimised for size of the image. - -Building PROD docker images from released PIP packages -====================================================== - -You can also build production images from PIP packages via providing ``--install-airflow-version`` -parameter to Breeze: - -.. code-block:: bash - - breeze prod-image build --python 3.8 --additional-airflow-extras=trino --install-airflow-version=2.0.0 - -This will build the image using command similar to: - -.. code-block:: bash - - pip install \ - apache-airflow[async,amazon,celery,cncf.kubernetes,docker,elasticsearch,ftp,grpc,hashicorp,http,ldap,google,microsoft.azure,mysql,postgres,redis,sendgrid,sftp,slack,ssh,statsd,virtualenv]==2.0.0 \ - --constraint "https://raw.githubusercontent.com/apache/airflow/constraints-2.0.0/constraints-3.8.txt" - -.. note:: - - Only ``pip`` installation is currently officially supported. - - While they are some successes with using other tools like `poetry `_ or - `pip-tools `_, they do not share the same workflow as - ``pip`` - especially when it comes to constraint vs. requirements management. - Installing via ``Poetry`` or ``pip-tools`` is not currently supported. - - There are known issues with ``bazel`` that might lead to circular dependencies when using it to install - Airflow. Please switch to ``pip`` if you encounter such problems. ``Bazel`` community works on fixing - the problem in `this PR `_ so it might be that - newer versions of ``bazel`` will handle it. - - If you wish to install airflow using those tools you should use the constraint files and convert - them to appropriate format and workflow that your tool requires. - - -You can also build production images from specific Git version via providing ``--install-airflow-reference`` -parameter to Breeze (this time constraints are taken from the ``constraints-main`` branch which is the -HEAD of development for constraints): - -.. code-block:: bash - - pip install "https://github.com/apache/airflow/archive/.tar.gz#egg=apache-airflow" \ - --constraint "https://raw.githubusercontent.com/apache/airflow/constraints-main/constraints-3.8.txt" - -You can also skip installing airflow and install it from locally provided files by using -``--install-packages-from-context`` parameter to Breeze: - -.. code-block:: bash - - breeze prod-image build --python 3.8 --additional-airflow-extras=trino --install-packages-from-context - -In this case you airflow and all packages (.whl files) should be placed in ``docker-context-files`` folder. - -Using docker cache during builds -================================ - -Default mechanism used in Breeze for building CI images uses images pulled from -GitHub Container Registry. This is done to speed up local builds and building images for CI runs - instead of -> 12 minutes for rebuild of CI images, it takes usually about 1 minute when cache is used. -For CI images this is usually the best strategy - to use default "pull" cache. This is default strategy when -``_ builds are performed. - -For Production Image - which is far smaller and faster to build, it's better to use local build cache (the -standard mechanism that docker uses. This is the default strategy for production images when -``_ builds are performed. The first time you run it, it will take considerably longer time than -if you use the pull mechanism, but then when you do small, incremental changes to local sources, -Dockerfile image and scripts, further rebuilds with local build cache will be considerably faster. - -You can also disable build cache altogether. This is the strategy used by the scheduled builds in CI - they -will always rebuild all the images from scratch. - -You can change the strategy by providing one of the ``--build-cache`` flags: ``registry`` (default), ``local``, -or ``disabled`` flags when you run Breeze commands. For example: - -.. code-block:: bash - - breeze ci-image build --python 3.8 --docker-cache local - -Will build the CI image using local build cache (note that it will take quite a long time the first -time you run it). - -.. code-block:: bash - - breeze prod-image build --python 3.8 --docker-cache registry - -Will build the production image with cache used from registry. - - -.. code-block:: bash - - breeze prod-image build --python 3.8 --docker-cache disabled - -Will build the production image from the scratch. - -You can also turn local docker caching by setting ``DOCKER_CACHE`` variable to ``local``, ``registry``, -``disabled`` and exporting it. - -.. code-block:: bash - - export DOCKER_CACHE="registry" - -or - -.. code-block:: bash - - export DOCKER_CACHE="local" - -or - -.. code-block:: bash - - export DOCKER_CACHE="disabled" - -Naming conventions -================== - -By default images we are using cache for images in GitHub Container registry. We are using GitHub -Container Registry as development image cache and CI registry for build images. -The images are all in organization wide "apache/" namespace. We are adding "airflow-" as prefix for -the image names of all Airflow images. The images are linked to the repository -via ``org.opencontainers.image.source`` label in the image. - -See https://docs.github.com/en/packages/learn-github-packages/connecting-a-repository-to-a-package - -Naming convention for the GitHub packages. - -Images with a commit SHA (built for pull requests and pushes). Those are images that are snapshot of the -currently run build. They are built once per each build and pulled by each test job. - -.. code-block:: bash - - ghcr.io/apache/airflow//ci/python: - for CI images - ghcr.io/apache/airflow//prod/python: - for production images - -Thoe image contain inlined cache. - -You can see all the current GitHub images at ``_ - -You can read more about the CI configuration and how CI jobs are using GitHub images -in ``_. - -Note that you need to be committer and have the right to refresh the images in the GitHub Registry with -latest sources from main via (./dev/refresh_images.sh). -Only committers can push images directly. You need to login with your Personal Access Token with -"packages" write scope to be able to push to those repositories or pull from them -in case of GitHub Packages. - -GitHub Container Registry - -.. code-block:: bash - - docker login ghcr.io - -Since there are different naming conventions used for Airflow images and there are multiple images used, -`Breeze `_ provides easy to use management interface for the images. The -`CI system of ours `_ is designed in the way that it should automatically refresh caches, rebuild -the images periodically and update them whenever new version of base Python is released. -However, occasionally, you might need to rebuild images locally and push them directly to the registries -to refresh them. - - - -Every developer can also pull and run images being result of a specific CI run in GitHub Actions. -This is a powerful tool that allows to reproduce CI failures locally, enter the images and fix them much -faster. It is enough to pass ``--image-tag`` and the registry and Breeze will download and execute -commands using the same image that was used during the CI tests. - -For example this command will run the same Python 3.8 image as was used in build identified with -9a621eaa394c0a0a336f8e1b31b35eff4e4ee86e commit SHA with enabled rabbitmq integration. - -.. code-block:: bash - - breeze --image-tag 9a621eaa394c0a0a336f8e1b31b35eff4e4ee86e --python 3.8 --integration rabbitmq - -You can see more details and examples in `Breeze `_ - -Customizing the CI image -======================== - -Customizing the CI image allows to add your own dependencies to the image. - -The easiest way to build the customized image is to use ``breeze`` script, but you can also build suc -customized image by running appropriately crafted docker build in which you specify all the ``build-args`` -that you need to add to customize it. You can read about all the args and ways you can build the image -in the `<#ci-image-build-arguments>`_ chapter below. - -Here just a few examples are presented which should give you general understanding of what you can customize. - -This builds the production image in version 3.8 with additional airflow extras from 2.0.0 PyPI package and -additional apt dev and runtime dependencies. - -As of Airflow 2.3.0, it is required to build images with ``DOCKER_BUILDKIT=1`` variable -(Breeze sets ``DOCKER_BUILDKIT=1`` variable automatically) or via ``docker buildx build`` command if -you have ``buildx`` plugin installed. - -.. code-block:: bash - - DOCKER_BUILDKIT=1 docker build . -f Dockerfile.ci \ - --pull \ - --build-arg PYTHON_BASE_IMAGE="python:3.8-slim-bookworm" \ - --build-arg ADDITIONAL_AIRFLOW_EXTRAS="jdbc" \ - --build-arg ADDITIONAL_PYTHON_DEPS="pandas" \ - --build-arg ADDITIONAL_DEV_APT_DEPS="gcc g++" \ - --tag my-image:0.0.1 - - -the same image can be built using ``breeze`` (it supports auto-completion of the options): - -.. code-block:: bash - - breeze ci-image build --python 3.8 --additional-airflow-extras=jdbc --additional-python-deps="pandas" \ - --additional-dev-apt-deps="gcc g++" - -You can customize more aspects of the image - such as additional commands executed before apt dependencies -are installed, or adding extra sources to install your dependencies from. You can see all the arguments -described below but here is an example of rather complex command to customize the image -based on example in `this comment `_: - -.. code-block:: bash - - DOCKER_BUILDKIT=1 docker build . -f Dockerfile.ci \ - --pull \ - --build-arg PYTHON_BASE_IMAGE="python:3.8-slim-bookworm" \ - --build-arg AIRFLOW_INSTALLATION_METHOD="apache-airflow" \ - --build-arg ADDITIONAL_AIRFLOW_EXTRAS="slack" \ - --build-arg ADDITIONAL_PYTHON_DEPS="apache-airflow-providers-odbc \ - azure-storage-blob \ - sshtunnel \ - google-api-python-client \ - oauth2client \ - beautifulsoup4 \ - dateparser \ - rocketchat_API \ - typeform" \ - --build-arg ADDITIONAL_DEV_APT_DEPS="msodbcsql17 unixodbc-dev g++" \ - --build-arg ADDITIONAL_DEV_APT_COMMAND="curl https://packages.microsoft.com/keys/microsoft.asc | apt-key add --no-tty - && curl https://packages.microsoft.com/config/debian/12/prod.list > /etc/apt/sources.list.d/mssql-release.list" \ - --build-arg ADDITIONAL_DEV_ENV_VARS="ACCEPT_EULA=Y" - --tag my-image:0.0.1 - -CI image build arguments ------------------------- - -The following build arguments (``--build-arg`` in docker build command) can be used for CI images: - -+------------------------------------------+------------------------------------------+------------------------------------------+ -| Build argument | Default value | Description | -+==========================================+==========================================+==========================================+ -| ``PYTHON_BASE_IMAGE`` | ``python:3.8-slim-bookworm`` | Base Python image | -+------------------------------------------+------------------------------------------+------------------------------------------+ -| ``PYTHON_MAJOR_MINOR_VERSION`` | ``3.8`` | major/minor version of Python (should | -| | | match base image) | -+------------------------------------------+------------------------------------------+------------------------------------------+ -| ``DEPENDENCIES_EPOCH_NUMBER`` | ``2`` | increasing this number will reinstall | -| | | all apt dependencies | -+------------------------------------------+------------------------------------------+------------------------------------------+ -| ``ADDITIONAL_PIP_INSTALL_FLAGS`` | | additional ``pip`` flags passed to the | -| | | installation commands (except when | -| | | reinstalling ``pip`` itself) | -+------------------------------------------+------------------------------------------+------------------------------------------+ -| ``PIP_NO_CACHE_DIR`` | ``true`` | if true, then no pip cache will be | -| | | stored | -+------------------------------------------+------------------------------------------+------------------------------------------+ -| ``HOME`` | ``/root`` | Home directory of the root user (CI | -| | | image has root user as default) | -+------------------------------------------+------------------------------------------+------------------------------------------+ -| ``AIRFLOW_HOME`` | ``/root/airflow`` | Airflow's HOME (that's where logs and | -| | | sqlite databases are stored) | -+------------------------------------------+------------------------------------------+------------------------------------------+ -| ``AIRFLOW_SOURCES`` | ``/opt/airflow`` | Mounted sources of Airflow | -+------------------------------------------+------------------------------------------+------------------------------------------+ -| ``AIRFLOW_REPO`` | ``apache/airflow`` | the repository from which PIP | -| | | dependencies are pre-installed | -+------------------------------------------+------------------------------------------+------------------------------------------+ -| ``AIRFLOW_BRANCH`` | ``main`` | the branch from which PIP dependencies | -| | | are pre-installed | -+------------------------------------------+------------------------------------------+------------------------------------------+ -| ``AIRFLOW_CI_BUILD_EPOCH`` | ``1`` | increasing this value will reinstall PIP | -| | | dependencies from the repository from | -| | | scratch | -+------------------------------------------+------------------------------------------+------------------------------------------+ -| ``AIRFLOW_CONSTRAINTS_LOCATION`` | | If not empty, it will override the | -| | | source of the constraints with the | -| | | specified URL or file. Note that the | -| | | file has to be in docker context so | -| | | it's best to place such file in | -| | | one of the folders included in | -| | | .dockerignore. for example in the | -| | | 'docker-context-files'. Note that the | -| | | location does not work for the first | -| | | stage of installation when the | -| | | stage of installation when the | -| | | ``AIRFLOW_PRE_CACHED_PIP_PACKAGES`` is | -| | | set to true. Default location from | -| | | GitHub is used in this case. | -+------------------------------------------+------------------------------------------+------------------------------------------+ -| ``AIRFLOW_CONSTRAINTS_REFERENCE`` | | reference (branch or tag) from GitHub | -| | | repository from which constraints are | -| | | used. By default it is set to | -| | | ``constraints-main`` but can be | -| | | ``constraints-2-0`` for 2.0.* versions | -| | | or it could point to specific version | -| | | for example ``constraints-2.0.0`` | -| | | is empty, it is auto-detected | -+------------------------------------------+------------------------------------------+------------------------------------------+ -| ``AIRFLOW_EXTRAS`` | ``all`` | extras to install | -+------------------------------------------+------------------------------------------+------------------------------------------+ -| ``UPGRADE_TO_NEWER_DEPENDENCIES`` | ``false`` | If set to a value different than "false" | -| | | the dependencies are upgraded to newer | -| | | versions. In CI it is set to build id | -| | | to make sure subsequent builds are not | -| | | reusing cached images with same value. | -+------------------------------------------+------------------------------------------+------------------------------------------+ -| ``AIRFLOW_PRE_CACHED_PIP_PACKAGES`` | ``true`` | Allows to pre-cache airflow PIP packages | -| | | from the GitHub of Apache Airflow | -| | | This allows to optimize iterations for | -| | | Image builds and speeds up CI jobs | -| | | But in some corporate environments it | -| | | might be forbidden to download anything | -| | | from public repositories. | -+------------------------------------------+------------------------------------------+------------------------------------------+ -| ``ADDITIONAL_AIRFLOW_EXTRAS`` | | additional extras to install | -+------------------------------------------+------------------------------------------+------------------------------------------+ -| ``ADDITIONAL_PYTHON_DEPS`` | | additional Python dependencies to | -| | | install | -+------------------------------------------+------------------------------------------+------------------------------------------+ -| ``DEV_APT_COMMAND`` | | Dev apt command executed before dev deps | -| | | are installed in the first part of image | -+------------------------------------------+------------------------------------------+------------------------------------------+ -| ``ADDITIONAL_DEV_APT_COMMAND`` | | Additional Dev apt command executed | -| | | before dev dep are installed | -| | | in the first part of the image | -+------------------------------------------+------------------------------------------+------------------------------------------+ -| ``DEV_APT_DEPS`` | Empty - install default dependencies | Dev APT dependencies installed | -| | (see ``install_os_dependencies.sh``) | in the first part of the image | -+------------------------------------------+------------------------------------------+------------------------------------------+ -| ``ADDITIONAL_DEV_APT_DEPS`` | | Additional apt dev dependencies | -| | | installed in the first part of the image | -+------------------------------------------+------------------------------------------+------------------------------------------+ -| ``ADDITIONAL_DEV_APT_ENV`` | | Additional env variables defined | -| | | when installing dev deps | -+------------------------------------------+------------------------------------------+------------------------------------------+ -| ``AIRFLOW_PIP_VERSION`` | ``23.3.2`` | PIP version used. | -+------------------------------------------+------------------------------------------+------------------------------------------+ -| ``PIP_PROGRESS_BAR`` | ``on`` | Progress bar for PIP installation | -+------------------------------------------+------------------------------------------+------------------------------------------+ - -Here are some examples of how CI images can built manually. CI is always built from local sources. - -This builds the CI image in version 3.8 with default extras ("all"). - -.. code-block:: bash - - DOCKER_BUILDKIT=1 docker build . -f Dockerfile.ci \ - --pull \ - --build-arg PYTHON_BASE_IMAGE="python:3.8-slim-bookworm" --tag my-image:0.0.1 - - -This builds the CI image in version 3.8 with "gcp" extra only. - -.. code-block:: bash - - DOCKER_BUILDKIT=1 docker build . -f Dockerfile.ci \ - --pull \ - --build-arg PYTHON_BASE_IMAGE="python:3.8-slim-bookworm" \ - --build-arg AIRFLOW_EXTRAS=gcp --tag my-image:0.0.1 - - -This builds the CI image in version 3.8 with "apache-beam" extra added. - -.. code-block:: bash - - DOCKER_BUILDKIT=1 docker build . -f Dockerfile.ci \ - --pull \ - --build-arg PYTHON_BASE_IMAGE="python:3.8-slim-bookworm" \ - --build-arg ADDITIONAL_AIRFLOW_EXTRAS="apache-beam" --tag my-image:0.0.1 - -This builds the CI image in version 3.8 with "mssql" additional package added. - -.. code-block:: bash - - DOCKER_BUILDKIT=1 docker build . -f Dockerfile.ci \ - --pull \ - --build-arg PYTHON_BASE_IMAGE="python:3.8-slim-bookworm" \ - --build-arg ADDITIONAL_PYTHON_DEPS="mssql" --tag my-image:0.0.1 - -This builds the CI image in version 3.8 with "gcc" and "g++" additional apt dev dependencies added. - -.. code-block:: - - DOCKER_BUILDKIT=1 docker build . -f Dockerfile.ci \ - --pull - --build-arg PYTHON_BASE_IMAGE="python:3.8-slim-bookworm" \ - --build-arg ADDITIONAL_DEV_APT_DEPS="gcc g++" --tag my-image:0.0.1 - -This builds the CI image in version 3.8 with "jdbc" extra and "default-jre-headless" additional apt runtime dependencies added. - -.. code-block:: - - DOCKER_BUILDKIT=1 docker build . -f Dockerfile.ci \ - --pull \ - --build-arg PYTHON_BASE_IMAGE="python:3.8-slim-bookworm" \ - --build-arg AIRFLOW_EXTRAS=jdbc \ - --tag my-image:0.0.1 - -Running the CI image --------------------- - -The entrypoint in the CI image contains all the initialisation needed for tests to be immediately executed. -It is copied from ``scripts/docker/entrypoint_ci.sh``. - -The default behaviour is that you are dropped into bash shell. However if RUN_TESTS variable is -set to "true", then tests passed as arguments are executed - -The entrypoint performs those operations: - -* checks if the environment is ready to test (including database and all integrations). It waits - until all the components are ready to work - -* removes and re-installs another version of Airflow (if another version of Airflow is requested to be - reinstalled via ``USE_AIRFLOW_PYPI_VERSION`` variable. - -* Sets up Kerberos if Kerberos integration is enabled (generates and configures Kerberos token) - -* Sets up ssh keys for ssh tests and restarts the SSH server - -* Sets all variables and configurations needed for unit tests to run - -* Reads additional variables set in ``files/airflow-breeze-config/variables.env`` by sourcing that file - -* In case of CI run sets parallelism to 2 to avoid excessive number of processes to run - -* In case of CI run sets default parameters for pytest - -* In case of running integration/long_running/quarantined tests - it sets the right pytest flags - -* Sets default "tests" target in case the target is not explicitly set as additional argument - -* Runs system tests if RUN_SYSTEM_TESTS flag is specified, otherwise runs regular unit and integration tests diff --git a/README.md b/README.md index 56effc061bc84..5a6200e672c16 100644 --- a/README.md +++ b/README.md @@ -224,7 +224,7 @@ Those are - in the order of most common ways people install Airflow: `docker` tool, use them in Kubernetes, Helm Charts, `docker-compose`, `docker swarm`, etc. You can read more about using, customising, and extending the images in the [Latest docs](https://airflow.apache.org/docs/docker-stack/index.html), and - learn details on the internals in the [IMAGES.rst](https://github.com/apache/airflow/blob/main/IMAGES.rst) document. + learn details on the internals in the [images](https://airflow.apache.org/docs/docker-stack/index.html) document. - [Tags in GitHub](https://github.com/apache/airflow/tags) to retrieve the git project sources that were used to generate official source packages via git @@ -429,7 +429,7 @@ might decide to add additional limits (and justify them with comment). Want to help build Apache Airflow? Check out our [contributing documentation](https://github.com/apache/airflow/blob/main/contributing-docs/README.rst). -Official Docker (container) images for Apache Airflow are described in [IMAGES.rst](https://github.com/apache/airflow/blob/main/IMAGES.rst). +Official Docker (container) images for Apache Airflow are described in [images](dev/breeze/doc/ci/02_images.md). diff --git a/contributing-docs/06_development_environments.rst b/contributing-docs/06_development_environments.rst index a99cffabae5da..e442ed735a1f1 100644 --- a/contributing-docs/06_development_environments.rst +++ b/contributing-docs/06_development_environments.rst @@ -86,7 +86,7 @@ Benefits: - Breeze environment is almost the same as used in the CI automated builds. So, if the tests run in your Breeze environment, they will work in the CI as well. - See `<../../CI.rst>`_ for details about Airflow CI. + See `<../../dev/breeze/doc/ci/README.md>`_ for details about Airflow CI. Limitations: diff --git a/contributing-docs/testing/integration_tests.rst b/contributing-docs/testing/integration_tests.rst index 2d9766b00744a..df1a69b881c02 100644 --- a/contributing-docs/testing/integration_tests.rst +++ b/contributing-docs/testing/integration_tests.rst @@ -28,7 +28,8 @@ Enabling Integrations --------------------- Airflow integration tests cannot be run in the local virtualenv. They can only run in the Breeze -environment with enabled integrations and in the CI. See `CI `_ for details about Airflow CI. +environment with enabled integrations and in the CI. See `CI <../../dev/breeze/doc/ci/README.md>`_ for +details about Airflow CI. When you are in the Breeze environment, by default, all integrations are disabled. This enables only true unit tests to be executed in Breeze. You can enable the integration by passing the ``--integration `` diff --git a/dev/MANUALLY_GENERATING_IMAGE_CACHE_AND_CONSTRAINTS.md b/dev/MANUALLY_GENERATING_IMAGE_CACHE_AND_CONSTRAINTS.md index 9b981acd31377..e969858e59f1c 100644 --- a/dev/MANUALLY_GENERATING_IMAGE_CACHE_AND_CONSTRAINTS.md +++ b/dev/MANUALLY_GENERATING_IMAGE_CACHE_AND_CONSTRAINTS.md @@ -59,12 +59,12 @@ and users: * `Constraints files` - used by both, CI jobs (to fix the versions of dependencies used by CI jobs in regular PRs) and used by our users to reproducibly install released airflow versions. -Normally, both are updated and refreshed automatically via [CI system](../CI.rst). However, there are some -cases where we need to update them manually. This document describes how to do it. +Normally, both are updated and refreshed automatically via [CI system](../dev/breeze/doc/ci/README.md). +However, there are some cases where we need to update them manually. This document describes how to do it. # Automated image cache and constraints refreshing in CI -Our [CI system](../CI.rst) is build in the way that it self-maintains. Regular scheduled builds and +Our [CI](../dev/breeze/doc/ci/README.md) is build in the way that it self-maintains. Regular scheduled builds and merges to `main` branch builds (also known as `canary` builds) have separate maintenance step that take care about refreshing the cache that is used to speed up our builds and to speed up rebuilding of [Breeze](./breeze/doc/README.rst) images for development purpose. This is all happening automatically, usually: @@ -72,8 +72,8 @@ rebuilding of [Breeze](./breeze/doc/README.rst) images for development purpose. * The latest [constraints](../contributing-docs/12_airflow_dependencies_and_extras.rst#pinned-constraint-files) are pushed to appropriate branch after all tests succeed in the `canary` build. -* The [images](../IMAGES.rst) in `ghcr.io` registry are refreshed early at the beginning of the `canary` build. This - is done twice during the canary build: +* The [images](breeze/doc/ci/02_images.md) in `ghcr.io` registry are refreshed early at the beginning of the + `canary` build. This is done twice during the canary build: * By the `Push Early Image Cache` job that is run at the beginning of the `canary` build. This cover the case when there are new dependencies added or Dockerfile/scripts change. Thanks to that step, subsequent PRs will be faster when they use the new Dockerfile/script. Those jobs **might fail** occasionally, diff --git a/dev/airflow-github b/dev/airflow-github index 97b04c6ecde83..5b07efa1b4aba 100755 --- a/dev/airflow-github +++ b/dev/airflow-github @@ -142,12 +142,9 @@ def is_core_commit(files: list[str]) -> bool: # non-released docs "COMMITTERS.rst", "contributing_docs/", - "IMAGES.rst", "INTHEWILD.md", "INSTALL", "README.md", - "CI.rst", - "CI_DIAGRAMS.md", "images/", "codecov.yml", "kubernetes_tests/", diff --git a/dev/breeze/doc/01_installation.rst b/dev/breeze/doc/01_installation.rst index 5f8201ecc2997..962d1839fd3a7 100644 --- a/dev/breeze/doc/01_installation.rst +++ b/dev/breeze/doc/01_installation.rst @@ -462,5 +462,7 @@ This will also remove breeze from the folder: ``${HOME}.local/bin/`` pipx uninstall apache-airflow-breeze +---- + Next step: Follow the `Customizing <02_customizing.rst>`_ guide to customize your environment. diff --git a/dev/breeze/doc/02_customizing.rst b/dev/breeze/doc/02_customizing.rst index ddb1875bac93b..78d24f30db7bc 100644 --- a/dev/breeze/doc/02_customizing.rst +++ b/dev/breeze/doc/02_customizing.rst @@ -125,5 +125,6 @@ For automation scripts, you can export the ``ANSWER`` variable (and set it to export ANSWER="yes" +------ Next step: Follow the `Developer tasks <03_developer_tasks.rst>`_ guide to learn how to use Breeze for regular development tasks. diff --git a/dev/breeze/doc/03_developer_tasks.rst b/dev/breeze/doc/03_developer_tasks.rst index a0e983a18444c..cf8294fa596e8 100644 --- a/dev/breeze/doc/03_developer_tasks.rst +++ b/dev/breeze/doc/03_developer_tasks.rst @@ -555,4 +555,6 @@ This is a lightweight solution that has its own limitations. More details on using the local virtualenv are available in the `Local Virtualenv <../../../contributing-docs/07_local_virtualenv.rst>`_. +------ + Next step: Follow the `Troubleshooting <04_troubleshooting.rst>`_ guide to troubleshoot your Breeze environment. diff --git a/dev/breeze/doc/04_troubleshooting.rst b/dev/breeze/doc/04_troubleshooting.rst index 230692d27c4fe..644eb2632af96 100644 --- a/dev/breeze/doc/04_troubleshooting.rst +++ b/dev/breeze/doc/04_troubleshooting.rst @@ -155,4 +155,6 @@ issue. You may try running the below commands in the same terminal and then try set HTTP_PROXY=null set HTTPS_PROXY=null +---- + Next step: Follow the `Test commands <05_test_commands.rst>`_ guide to running tests using Breeze. diff --git a/dev/breeze/doc/05_test_commands.rst b/dev/breeze/doc/05_test_commands.rst index afb68b460b6ea..abbc84b6364f5 100644 --- a/dev/breeze/doc/05_test_commands.rst +++ b/dev/breeze/doc/05_test_commands.rst @@ -603,5 +603,7 @@ All parameters of the command are here: :width: 100% :alt: Breeze k8s logs +----- + Next step: Follow the `Managing Breeze images <06_managing_docker_images.rst>`_ guide to learn how to manage CI and PROD images of Breeze. diff --git a/dev/breeze/doc/06_managing_docker_images.rst b/dev/breeze/doc/06_managing_docker_images.rst index ce551e17d7882..4304ba54076dd 100644 --- a/dev/breeze/doc/06_managing_docker_images.rst +++ b/dev/breeze/doc/06_managing_docker_images.rst @@ -108,7 +108,7 @@ customized variant of the image that contains everything you need. You can building the production image manually by using ``prod-image build`` command. Note, that the images can also be built using ``docker build`` command by passing appropriate -build-args as described in `IMAGES.rst `_ , but Breeze provides several flags that +build-args as described in `Images documentation `_ , but Breeze provides several flags that makes it easier to do it. You can see all the flags by running ``breeze prod-image build --help``, but here typical examples are presented: @@ -180,5 +180,7 @@ These are all available flags of ``verify-prod-image`` command: :width: 100% :alt: Breeze prod-image verify +------ + Next step: Follow the `Breeze maintenance tasks <07_breeze_maintenance_tasks.rst>`_ to learn about tasks that are useful when you are modifying Breeze itself. diff --git a/dev/breeze/doc/07_breeze_maintenance_tasks.rst b/dev/breeze/doc/07_breeze_maintenance_tasks.rst index 63101b01a0306..726f61282ff95 100644 --- a/dev/breeze/doc/07_breeze_maintenance_tasks.rst +++ b/dev/breeze/doc/07_breeze_maintenance_tasks.rst @@ -65,4 +65,6 @@ done via ``synchronize-local-mounts`` command. :width: 100% :alt: Breeze setup synchronize-local-mounts +----- + Next step: Follow the `CI tasks <08_ci_tasks.rst>`_ guide to learn how to use Breeze for regular development tasks. diff --git a/dev/breeze/doc/08_ci_tasks.rst b/dev/breeze/doc/08_ci_tasks.rst index 90b27ea869ac7..d6594cf8d9c36 100644 --- a/dev/breeze/doc/08_ci_tasks.rst +++ b/dev/breeze/doc/08_ci_tasks.rst @@ -19,6 +19,7 @@ CI tasks ======== Breeze hase a number of commands that are mostly used in CI environment to perform cleanup. +Detailed description of the CI design can be found in `CI design `_. .. contents:: :local: @@ -130,5 +131,7 @@ These are all available flags of ``find-backtracking-candidates`` command: :width: 100% :alt: Breeze ci find-backtracking-candidates +----- + Next step: Follow the `Release management tasks <09_release_management_tasks.rst>`_ guide to learn how release managers are using Breeze to release various Airflow artifacts. diff --git a/dev/breeze/doc/09_release_management_tasks.rst b/dev/breeze/doc/09_release_management_tasks.rst index 4c9ee55638204..cffe31a0c9fc8 100644 --- a/dev/breeze/doc/09_release_management_tasks.rst +++ b/dev/breeze/doc/09_release_management_tasks.rst @@ -597,5 +597,7 @@ This command will build one docker image per python version, with all the airflo :width: 100% :alt: Breeze build all airflow images +----- + Next step: Follow the `Advanced Breeze topics <10_advanced_breeze_topics.rst>`_ to learn more about Breeze internals. diff --git a/dev/breeze/doc/10_advanced_breeze_topics.rst b/dev/breeze/doc/10_advanced_breeze_topics.rst index 3885359cf3cc5..42d03dd321674 100644 --- a/dev/breeze/doc/10_advanced_breeze_topics.rst +++ b/dev/breeze/doc/10_advanced_breeze_topics.rst @@ -242,6 +242,6 @@ It's enabled by setting ``RECORD_BREEZE_OUTPUT_FILE`` to a file name where it wi By default it records the screenshots with default characters width and with "Breeze screenshot" title, but you can override it with ``RECORD_BREEZE_WIDTH`` and ``RECORD_BREEZE_TITLE`` variables respectively. - +------ **Thank you for getting that far** - we hope you will enjoy using Breeze! diff --git a/dev/breeze/doc/README.rst b/dev/breeze/doc/README.rst index 0c6c621ceaac1..570ca8b75da35 100644 --- a/dev/breeze/doc/README.rst +++ b/dev/breeze/doc/README.rst @@ -18,7 +18,7 @@ .. raw:: html
- Airflow Breeze - Development and Test Environment for Apache Airflow
diff --git a/dev/breeze/doc/adr/0005-preventing-using-contributed-code-when-building-images.md b/dev/breeze/doc/adr/0005-preventing-using-contributed-code-when-building-images.md index 1c334b03abe6e..765a77c599c6d 100644 --- a/dev/breeze/doc/adr/0005-preventing-using-contributed-code-when-building-images.md +++ b/dev/breeze/doc/adr/0005-preventing-using-contributed-code-when-building-images.md @@ -159,5 +159,5 @@ Thanks to combination of features available in GitHub, the builds are secured ag code by users contributing PRs, that could get uncontrolled write access to Airflow repository. The negative consequence of this is that the build process becomes much more complex -(see [CI](../../../../CI.rst) for complete description) and that some cases (like modifying build behaviour +(see [CI](../ci/README.md) for complete description) and that some cases (like modifying build behaviour require additional process of testing by pushing the changes as `main` branch to a fork of Apache Airflow) diff --git a/dev/breeze/doc/ci/01_ci_environment.md b/dev/breeze/doc/ci/01_ci_environment.md new file mode 100644 index 0000000000000..c9501a13b208a --- /dev/null +++ b/dev/breeze/doc/ci/01_ci_environment.md @@ -0,0 +1,129 @@ + + + + +**Table of Contents** *generated with [DocToc](https://github.com/thlorenz/doctoc)* + +- [CI Environment](#ci-environment) + - [GitHub Actions workflows](#github-actions-workflows) + - [Container Registry used as cache](#container-registry-used-as-cache) + - [Authentication in GitHub Registry](#authentication-in-github-registry) + + + +# CI Environment + +Continuous Integration is an important component of making Apache Airflow +robust and stable. We run a lot of tests for every pull request, +for main and v2-\*-test branches and regularly as scheduled jobs. + +Our execution environment for CI is [GitHub Actions](https://github.com/features/actions). GitHub Actions. + +However. part of the philosophy we have is that we are not tightly +coupled with any of the CI environments we use. Most of our CI jobs are +written as Python code packaged in [Breeze](../../README.md) package, +which are executed as steps in the CI jobs via `breeze` CLI commands. +And we have a number of variables determine build behaviour. + +## GitHub Actions workflows + +Our CI builds are highly optimized, leveraging the latest features +provided by the GitHub Actions environment to reuse parts of the build +process across different jobs. + +A significant portion of our CI runs utilize container images. Given +that Airflow has numerous dependencies, we use Docker containers to +ensure tests run in a well-configured and consistent environment. This +approach is used for most tests, documentation building, and some +advanced static checks. The environment comprises two types of images: +CI images and PROD images. CI images are used for most tests and checks, +while PROD images are used for Kubernetes tests. + +To run the tests, we need to ensure that the images are built using the +latest sources and that the build process is efficient. A full rebuild +of such an image from scratch might take approximately 15 minutes. +Therefore, we've implemented optimization techniques that efficiently +use the cache from the GitHub Docker registry. In most cases, this +reduces the time needed to rebuild the image to about 4 minutes. +However, when dependencies change, it can take around 6-7 minutes, and +if the base image of Python releases a new patch-level, it can take +approximately 12 minutes. + +## Container Registry used as cache + +We are using GitHub Container Registry to store the results of the +`Build Images` workflow which is used in the `Tests` workflow. + +Currently in main version of Airflow we run tests in all versions of +Python supported, which means that we have to build multiple images (one +CI and one PROD for each Python version). Yet we run many jobs (\>15) - +for each of the CI images. That is a lot of time to just build the +environment to run. Therefore we are utilising the `pull_request_target` +feature of GitHub Actions. + +This feature allows us to run a separate, independent workflow, when the +main workflow is run -this separate workflow is different than the main +one, because by default it runs using `main` version of the sources but +also - and most of all - that it has WRITE access to the GitHub +Container Image registry. + +This is especially important in our case where Pull Requests to Airflow +might come from any repository, and it would be a huge security issue if +anyone from outside could utilise the WRITE access to the Container +Image Registry via external Pull Request. + +Thanks to the WRITE access and fact that the `pull_request_target` workflow named +`Build Imaages` which - by default - uses the `main` version of the sources. +There we can safely run some code there as it has been reviewed and merged. +The workflow checks-out the incoming Pull Request, builds +the container image from the sources from the incoming PR (which happens in an +isolated Docker build step for security) and pushes such image to the +GitHub Docker Registry - so that this image can be built only once and +used by all the jobs running tests. The image is tagged with unique +`COMMIT_SHA` of the incoming Pull Request and the tests run in the `pull` workflow +can simply pull such image rather than build it from the scratch. +Pulling such image takes ~ 1 minute, thanks to that we are saving a +lot of precious time for jobs. + +We use [GitHub Container Registry](https://docs.github.com/en/packages/guides/about-github-container-registry). +A `GITHUB_TOKEN` is needed to push to the registry. We configured +scopes of the tokens in our jobs to be able to write to the registry, +but only for the jobs that need it. + +The latest cache is kept as `:cache-linux-amd64` and `:cache-linux-arm64` +tagged cache of our CI images (suitable for `--cache-from` directive of +buildx). It contains metadata and cache for all segments in the image, +and cache is kept separately for different platform. + +The `latest` images of CI and PROD are `amd64` only images for CI, +because there is no easy way to push multiplatform images without +merging the manifests, and it is not really needed nor used for cache. + +## Authentication in GitHub Registry + +We are using GitHub Container Registry as cache for our images. +Authentication uses GITHUB_TOKEN mechanism. Authentication is needed for +pushing the images (WRITE) only in `push`, `pull_request_target` +workflows. When you are running the CI jobs in GitHub Actions, +GITHUB_TOKEN is set automatically by the actions. + +---- + +Read next about [Images](02_images.md) diff --git a/dev/breeze/doc/ci/02_images.md b/dev/breeze/doc/ci/02_images.md new file mode 100644 index 0000000000000..b057a2254c708 --- /dev/null +++ b/dev/breeze/doc/ci/02_images.md @@ -0,0 +1,585 @@ + + + + +**Table of Contents** *generated with [DocToc](https://github.com/thlorenz/doctoc)* + +- [Airflow Docker images](#airflow-docker-images) + - [PROD image](#prod-image) + - [CI image](#ci-image) +- [Building docker images from current sources](#building-docker-images-from-current-sources) +- [Building PROD docker images from released PIP packages](#building-prod-docker-images-from-released-pip-packages) +- [Using docker cache during builds](#using-docker-cache-during-builds) +- [Naming conventions](#naming-conventions) +- [Customizing the CI image](#customizing-the-ci-image) + - [CI image build arguments](#ci-image-build-arguments) + - [Running the CI image](#running-the-ci-image) +- [Naming conventions for stored images](#naming-conventions-for-stored-images) + + + +# Airflow Docker images + +Airflow has two main images (build from Dockerfiles): + +- Production image (Dockerfile) - that can be used to build your own + production-ready Airflow installation. You can read more about + building and using the production image in the + [Docker stack](https://airflow.apache.org/docs/docker-stack/index.html) + documentation. The image is built using [Dockerfile](Dockerfile). +- CI image (Dockerfile.ci) - used for running tests and local + development. The image is built using [Dockerfile.ci](Dockerfile.ci). + +## PROD image + +The PROD image is a multi-segment image. The first segment +`airflow-build-image` contains all the build essentials and related +dependencies that allow to install airflow locally. By default the image +is built from a released version of Airflow from GitHub, but by +providing some extra arguments you can also build it from local sources. +This is particularly useful in CI environment where we are using the +image to run Kubernetes tests. See below for the list of arguments that +should be provided to build production image from the local sources. + +The image is primarily optimised for size of the final image, but also +for speed of rebuilds - the `airflow-build-image` segment uses the same +technique as the CI jobs for pre-installing dependencies. It first +pre-installs them from the right GitHub branch and only after that final +airflow installation is done from either local sources or remote +location (PyPI or GitHub repository). + +You can read more details about building, extending and customizing the +PROD image in the [Latest +documentation](https://airflow.apache.org/docs/docker-stack/index.html) + +## CI image + +The CI image is used by [Breeze](../README.rst) as the shell +image but it is also used during CI tests. The image is single segment +image that contains Airflow installation with "all" dependencies +installed. It is optimised for rebuild speed. It installs PIP +dependencies from the current branch first -so that any changes in +`pyproject.toml` do not trigger reinstalling of all dependencies. There +is a second step of installation that re-installs the dependencies from +the latest sources so that we are sure that latest dependencies are +installed. + +# Building docker images from current sources + +The easy way to build the CI/PROD images is to use +[Breeze](../README.rst). It uses a number +of optimization and caches to build it efficiently and fast when you are +developing Airflow and need to update to latest version. + +For CI image: Airflow package is always built from sources. When you +execute the image, you can however use the `--use-airflow-version` flag +(or `USE_AIRFLOW_VERSION` environment variable) to remove the +preinstalled source version of Airflow and replace it with one of the +possible installation methods: + +- "none" - airflow is removed and not installed +- "wheel" - airflow is removed and replaced with "wheel" version + available in dist +- "sdist" - airflow is removed and replaced with "sdist" version + available in dist +- "\" - airflow is removed and installed from PyPI (with the + specified version) + +For PROD image: By default production image is built from the latest +sources when using Breeze, but when you use it via docker build command, +it uses the latest installed version of airflow and providers. However, +you can choose different installation methods as described in [Building +PROD docker images from released PIP packages](#building-prod-docker-images-from-released-pip-packages). Detailed +reference for building production image from different sources can be +found in: [Build Args reference](docs/docker-stack/build-arg-ref.rst#installing-airflow-using-different-methods) + +You can build the CI image using current sources this command: + +``` bash +breeze ci-image build +``` + +You can build the PROD image using current sources with this command: + +``` bash +breeze prod-image build +``` + +By adding `--python ` parameter you can +build the image version for the chosen Python version. + +The images are built with default extras - different extras for CI and +production image and you can change the extras via the `--extras` +parameters and add new ones with `--additional-airflow-extras`. + +For example if you want to build Python 3.8 version of production image +with "all" extras installed you should run this command: + +``` bash +breeze prod-image build --python 3.8 --extras "all" +``` + +If you just want to add new extras you can add them like that: + +``` bash +breeze prod-image build --python 3.8 --additional-airflow-extras "all" +``` + +The command that builds the CI image is optimized to minimize the time +needed to rebuild the image when the source code of Airflow evolves. +This means that if you already have the image locally downloaded and +built, the scripts will determine whether the rebuild is needed in the +first place. Then the scripts will make sure that minimal number of +steps are executed to rebuild parts of the image (for example, PIP +dependencies) and will give you an image consistent with the one used +during Continuous Integration. + +The command that builds the production image is optimised for size of +the image. + +# Building PROD docker images from released PIP packages + +You can also build production images from PIP packages via providing +`--install-airflow-version` parameter to Breeze: + +``` bash +breeze prod-image build --python 3.8 --additional-airflow-extras=trino --install-airflow-version=2.0.0 +``` + +This will build the image using command similar to: + +``` bash +pip install \ + apache-airflow[async,amazon,celery,cncf.kubernetes,docker,elasticsearch,ftp,grpc,hashicorp,http,ldap,google,microsoft.azure,mysql,postgres,redis,sendgrid,sftp,slack,ssh,statsd,virtualenv]==2.0.0 \ + --constraint "https://raw.githubusercontent.com/apache/airflow/constraints-2.0.0/constraints-3.8.txt" +``` + +> [!NOTE] +> Only `pip` installation is currently officially supported. +> +> While they are some successes with using other tools like +> [poetry](https://python-poetry.org/) or +> [pip-tools](https://pypi.org/project/pip-tools/), they do not share +> the same workflow as `pip` - especially when it comes to constraint +> vs. requirements management. Installing via `Poetry` or `pip-tools` is +> not currently supported. +> +> There are known issues with `bazel` that might lead to circular +> dependencies when using it to install Airflow. Please switch to `pip` +> if you encounter such problems. `Bazel` community works on fixing the +> problem in [this +> PR](https://github.com/bazelbuild/rules_python/pull/1166) so it might +> be that newer versions of `bazel` will handle it. +> +> If you wish to install airflow using those tools you should use the +> constraint files and convert them to appropriate format and workflow +> that your tool requires. + +You can also build production images from specific Git version via +providing `--install-airflow-reference` parameter to Breeze (this time +constraints are taken from the `constraints-main` branch which is the +HEAD of development for constraints): + +``` bash +pip install "https://github.com/apache/airflow/archive/.tar.gz#egg=apache-airflow" \ + --constraint "https://raw.githubusercontent.com/apache/airflow/constraints-main/constraints-3.8.txt" +``` + +You can also skip installing airflow and install it from locally +provided files by using `--install-packages-from-context` parameter to +Breeze: + +``` bash +breeze prod-image build --python 3.8 --additional-airflow-extras=trino --install-packages-from-context +``` + +In this case you airflow and all packages (.whl files) should be placed +in `docker-context-files` folder. + +# Using docker cache during builds + +Default mechanism used in Breeze for building CI images uses images +pulled from GitHub Container Registry. This is done to speed up local +builds and building images for CI runs - instead of \> 12 minutes for +rebuild of CI images, it takes usually about 1 minute when cache is +used. For CI images this is usually the best strategy - to use default +"pull" cache. This is default strategy when [Breeze](../README.rst) +builds are performed. + +For Production Image - which is far smaller and faster to build, it's +better to use local build cache (the standard mechanism that docker +uses. This is the default strategy for production images when +[Breeze](../README.rst) builds are +performed. The first time you run it, it will take considerably longer +time than if you use the pull mechanism, but then when you do small, +incremental changes to local sources, Dockerfile image and scripts, +further rebuilds with local build cache will be considerably faster. + +You can also disable build cache altogether. This is the strategy used +by the scheduled builds in CI - they will always rebuild all the images +from scratch. + +You can change the strategy by providing one of the `--build-cache` +flags: `registry` (default), `local`, or `disabled` flags when you run +Breeze commands. For example: + +``` bash +breeze ci-image build --python 3.8 --docker-cache local +``` + +Will build the CI image using local build cache (note that it will take +quite a long time the first time you run it). + +``` bash +breeze prod-image build --python 3.8 --docker-cache registry +``` + +Will build the production image with cache used from registry. + +``` bash +breeze prod-image build --python 3.8 --docker-cache disabled +``` + +Will build the production image from the scratch. + +You can also turn local docker caching by setting `DOCKER_CACHE` +variable to `local`, `registry`, `disabled` and exporting it. + +``` bash +export DOCKER_CACHE="registry" +``` + +or + +``` bash +export DOCKER_CACHE="local" +``` + +or + +``` bash +export DOCKER_CACHE="disabled" +``` + +# Naming conventions + +By default images we are using cache for images in GitHub Container +registry. We are using GitHub Container Registry as development image +cache and CI registry for build images. The images are all in +organization wide "apache/" namespace. We are adding "airflow-" as +prefix for the image names of all Airflow images. The images are linked +to the repository via `org.opencontainers.image.source` label in the +image. + +See + + +Naming convention for the GitHub packages. + +Images with a commit SHA (built for pull requests and pushes). Those are +images that are snapshot of the currently run build. They are built once +per each build and pulled by each test job. + +``` bash +ghcr.io/apache/airflow//ci/python: - for CI images +ghcr.io/apache/airflow//prod/python: - for production images +``` + +Thoe image contain inlined cache. + +You can see all the current GitHub images at + + +Note that you need to be committer and have the right to refresh the +images in the GitHub Registry with latest sources from main via +(./dev/refresh_images.sh). Only committers can push images directly. You +need to login with your Personal Access Token with "packages" write +scope to be able to push to those repositories or pull from them in case +of GitHub Packages. + +GitHub Container Registry + +``` bash +docker login ghcr.io +``` + +Since there are different naming conventions used for Airflow images and +there are multiple images used, [Breeze](../README.rst) +provides easy to use management interface for the images. The CI +is designed in the way that it should automatically +refresh caches, rebuild the images periodically and update them whenever +new version of base Python is released. However, occasionally, you might +need to rebuild images locally and push them directly to the registries +to refresh them. + +Every developer can also pull and run images being result of a specific +CI run in GitHub Actions. This is a powerful tool that allows to +reproduce CI failures locally, enter the images and fix them much +faster. It is enough to pass `--image-tag` and the registry and Breeze +will download and execute commands using the same image that was used +during the CI tests. + +For example this command will run the same Python 3.8 image as was used +in build identified with 9a621eaa394c0a0a336f8e1b31b35eff4e4ee86e commit +SHA with enabled rabbitmq integration. + +``` bash +breeze --image-tag 9a621eaa394c0a0a336f8e1b31b35eff4e4ee86e --python 3.8 --integration rabbitmq +``` + +You can see more details and examples in[Breeze](../README.rst) + +# Customizing the CI image + +Customizing the CI image allows to add your own dependencies to the +image. + +The easiest way to build the customized image is to use `breeze` script, +but you can also build suc customized image by running appropriately +crafted docker build in which you specify all the `build-args` that you +need to add to customize it. You can read about all the args and ways +you can build the image in the +[\#ci-image-build-arguments](#ci-image-build-arguments) chapter below. + +Here just a few examples are presented which should give you general +understanding of what you can customize. + +This builds the production image in version 3.8 with additional airflow +extras from 2.0.0 PyPI package and additional apt dev and runtime +dependencies. + +As of Airflow 2.3.0, it is required to build images with +`DOCKER_BUILDKIT=1` variable (Breeze sets `DOCKER_BUILDKIT=1` variable +automatically) or via `docker buildx build` command if you have `buildx` +plugin installed. + +``` bash +DOCKER_BUILDKIT=1 docker build . -f Dockerfile.ci \ + --pull \ + --build-arg PYTHON_BASE_IMAGE="python:3.8-slim-bookworm" \ + --build-arg ADDITIONAL_AIRFLOW_EXTRAS="jdbc" \ + --build-arg ADDITIONAL_PYTHON_DEPS="pandas" \ + --build-arg ADDITIONAL_DEV_APT_DEPS="gcc g++" \ + --tag my-image:0.0.1 +``` + +the same image can be built using `breeze` (it supports auto-completion +of the options): + +``` bash +breeze ci-image build --python 3.8 --additional-airflow-extras=jdbc --additional-python-deps="pandas" \ + --additional-dev-apt-deps="gcc g++" +``` + +You can customize more aspects of the image - such as additional +commands executed before apt dependencies are installed, or adding extra +sources to install your dependencies from. You can see all the arguments +described below but here is an example of rather complex command to +customize the image based on example in [this +comment](https://github.com/apache/airflow/issues/8605#issuecomment-690065621): + +``` bash +DOCKER_BUILDKIT=1 docker build . -f Dockerfile.ci \ + --pull \ + --build-arg PYTHON_BASE_IMAGE="python:3.8-slim-bookworm" \ + --build-arg AIRFLOW_INSTALLATION_METHOD="apache-airflow" \ + --build-arg ADDITIONAL_AIRFLOW_EXTRAS="slack" \ + --build-arg ADDITIONAL_PYTHON_DEPS="apache-airflow-providers-odbc \ + azure-storage-blob \ + sshtunnel \ + google-api-python-client \ + oauth2client \ + beautifulsoup4 \ + dateparser \ + rocketchat_API \ + typeform" \ + --build-arg ADDITIONAL_DEV_APT_DEPS="msodbcsql17 unixodbc-dev g++" \ + --build-arg ADDITIONAL_DEV_APT_COMMAND="curl https://packages.microsoft.com/keys/microsoft.asc | apt-key add --no-tty - && curl https://packages.microsoft.com/config/debian/12/prod.list > /etc/apt/sources.list.d/mssql-release.list" \ + --build-arg ADDITIONAL_DEV_ENV_VARS="ACCEPT_EULA=Y" + --tag my-image:0.0.1 +``` + +## CI image build arguments + +The following build arguments (`--build-arg` in docker build command) +can be used for CI images: + +| Build argument | Default value | Description | +|-----------------------------------|-------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------| +| `PYTHON_BASE_IMAGE` | `python:3.8-slim-bookworm` | Base Python image | +| `PYTHON_MAJOR_MINOR_VERSION` | `3.8` | major/minor version of Python (should match base image) | +| `DEPENDENCIES_EPOCH_NUMBER` | `2` | increasing this number will reinstall all apt dependencies | +| `ADDITIONAL_PIP_INSTALL_FLAGS` | | additional `pip` flags passed to the installation commands (except when reinstalling `pip` itself) | +| `PIP_NO_CACHE_DIR` | `true` | if true, then no pip cache will be stored | +| `HOME` | `/root` | Home directory of the root user (CI image has root user as default) | +| `AIRFLOW_HOME` | `/root/airflow` | Airflow's HOME (that's where logs and sqlite databases are stored) | +| `AIRFLOW_SOURCES` | `/opt/airflow` | Mounted sources of Airflow | +| `AIRFLOW_REPO` | `apache/airflow` | the repository from which PIP dependencies are pre-installed | +| `AIRFLOW_BRANCH` | `main` | the branch from which PIP dependencies are pre-installed | +| `AIRFLOW_CI_BUILD_EPOCH` | `1` | increasing this value will reinstall PIP dependencies from the repository from scratch | +| `AIRFLOW_CONSTRAINTS_LOCATION` | | If not empty, it will override the source of the constraints with the specified URL or file. | +| `AIRFLOW_CONSTRAINTS_REFERENCE` | | reference (branch or tag) from GitHub repository from which constraints are used. By default it is set to `constraints-main` but can be `constraints-2-X`. | +| `AIRFLOW_EXTRAS` | `all` | extras to install | +| `UPGRADE_TO_NEWER_DEPENDENCIES` | `false` | If set to a value different than "false" the dependencies are upgraded to newer versions. In CI it is set to build id. | +| `AIRFLOW_PRE_CACHED_PIP_PACKAGES` | `true` | Allows to pre-cache airflow PIP packages from the GitHub of Apache Airflow This allows to optimize iterations for Image builds and speeds up CI jobs. | +| `ADDITIONAL_AIRFLOW_EXTRAS` | | additional extras to install | +| `ADDITIONAL_PYTHON_DEPS` | | additional Python dependencies to install | +| `DEV_APT_COMMAND` | | Dev apt command executed before dev deps are installed in the first part of image | +| `ADDITIONAL_DEV_APT_COMMAND` | | Additional Dev apt command executed before dev dep are installed in the first part of the image | +| `DEV_APT_DEPS` | Empty - install default dependencies (see `install_os_dependencies.sh`) | Dev APT dependencies installed in the first part of the image | +| `ADDITIONAL_DEV_APT_DEPS` | | Additional apt dev dependencies installed in the first part of the image | +| `ADDITIONAL_DEV_APT_ENV` | | Additional env variables defined when installing dev deps | +| `AIRFLOW_PIP_VERSION` | `23.3.2` | PIP version used. | +| `PIP_PROGRESS_BAR` | `on` | Progress bar for PIP installation | + +Here are some examples of how CI images can built manually. CI is always +built from local sources. + +This builds the CI image in version 3.8 with default extras ("all"). + +``` bash +DOCKER_BUILDKIT=1 docker build . -f Dockerfile.ci \ + --pull \ + --build-arg PYTHON_BASE_IMAGE="python:3.8-slim-bookworm" --tag my-image:0.0.1 +``` + +This builds the CI image in version 3.8 with "gcp" extra only. + +``` bash +DOCKER_BUILDKIT=1 docker build . -f Dockerfile.ci \ + --pull \ + --build-arg PYTHON_BASE_IMAGE="python:3.8-slim-bookworm" \ + --build-arg AIRFLOW_EXTRAS=gcp --tag my-image:0.0.1 +``` + +This builds the CI image in version 3.8 with "apache-beam" extra added. + +``` bash +DOCKER_BUILDKIT=1 docker build . -f Dockerfile.ci \ + --pull \ + --build-arg PYTHON_BASE_IMAGE="python:3.8-slim-bookworm" \ + --build-arg ADDITIONAL_AIRFLOW_EXTRAS="apache-beam" --tag my-image:0.0.1 +``` + +This builds the CI image in version 3.8 with "mssql" additional package +added. + +``` bash +DOCKER_BUILDKIT=1 docker build . -f Dockerfile.ci \ + --pull \ + --build-arg PYTHON_BASE_IMAGE="python:3.8-slim-bookworm" \ + --build-arg ADDITIONAL_PYTHON_DEPS="mssql" --tag my-image:0.0.1 +``` + +This builds the CI image in version 3.8 with "gcc" and "g++" additional +apt dev dependencies added. + +``` +DOCKER_BUILDKIT=1 docker build . -f Dockerfile.ci \ + --pull + --build-arg PYTHON_BASE_IMAGE="python:3.8-slim-bookworm" \ + --build-arg ADDITIONAL_DEV_APT_DEPS="gcc g++" --tag my-image:0.0.1 +``` + +This builds the CI image in version 3.8 with "jdbc" extra and +"default-jre-headless" additional apt runtime dependencies added. + +``` +DOCKER_BUILDKIT=1 docker build . -f Dockerfile.ci \ + --pull \ + --build-arg PYTHON_BASE_IMAGE="python:3.8-slim-bookworm" \ + --build-arg AIRFLOW_EXTRAS=jdbc \ + --tag my-image:0.0.1 +``` + +## Running the CI image + +The entrypoint in the CI image contains all the initialisation needed +for tests to be immediately executed. It is copied from +`scripts/docker/entrypoint_ci.sh`. + +The default behaviour is that you are dropped into bash shell. However +if RUN_TESTS variable is set to "true", then tests passed as arguments +are executed + +The entrypoint performs those operations: + +- checks if the environment is ready to test (including database and all + integrations). It waits until all the components are ready to work +- removes and re-installs another version of Airflow (if another version + of Airflow is requested to be reinstalled via + `USE_AIRFLOW_PYPI_VERSION` variable. +- Sets up Kerberos if Kerberos integration is enabled (generates and + configures Kerberos token) +- Sets up ssh keys for ssh tests and restarts the SSH server +- Sets all variables and configurations needed for unit tests to run +- Reads additional variables set in + `files/airflow-breeze-config/variables.env` by sourcing that file +- In case of CI run sets parallelism to 2 to avoid excessive number of + processes to run +- In case of CI run sets default parameters for pytest +- In case of running integration/long_running/quarantined tests - it + sets the right pytest flags +- Sets default "tests" target in case the target is not explicitly set + as additional argument +- Runs system tests if RUN_SYSTEM_TESTS flag is specified, otherwise + runs regular unit and integration tests + + +# Naming conventions for stored images + +The images produced during the `Build Images` workflow of CI jobs are +stored in the [GitHub Container +Registry](https://github.com/orgs/apache/packages?repo_name=airflow) + +The images are stored with both "latest" tag (for last main push image +that passes all the tests as well with the COMMIT_SHA id for images that +were used in particular build. + +The image names follow the patterns (except the Python image, all the +images are stored in in `apache` organization. + +The packages are available under (CONTAINER_NAME is url-encoded name of +the image). Note that "/" are supported now in the `ghcr.io` as a part +of the image name within the `apache` organization, but they have to be +percent-encoded when you access them via UI (/ = %2F) + +`https://github.com/apache/airflow/pkgs/container/` + +| Image | Name:tag (both cases latest version and per-build) | Description | +|--------------------------|----------------------------------------------------|---------------------------------------------------------------| +| Python image (DockerHub) | python:\-slim-bookworm | Base Python image used by both production and CI image. | +| CI image | airflow/\/ci/python\:\ | CI image - this is the image used for most of the tests. | +| PROD image | airflow/\/prod/python\:\ | faster to build or pull. Production image optimized for size. | + +- \ might be either "main" or "v2-\*-test" +- \ - Python version (Major + Minor).Should be one of \["3.8", + "3.9", "3.10", "3.11"\]. +- \ - full-length SHA of commit either from the tip of the + branch (for pushes/schedule) or commit from the tip of the branch used + for the PR. +- \ - tag of the image. It is either "latest" or \ + (full-length SHA of commit either from the tip of the branch (for + pushes/schedule) or commit from the tip of the branch used for the + PR). + +---- + +Read next about [Github Variables](03_github_variables.md) diff --git a/dev/breeze/doc/ci/03_github_variables.md b/dev/breeze/doc/ci/03_github_variables.md new file mode 100644 index 0000000000000..10983369784e1 --- /dev/null +++ b/dev/breeze/doc/ci/03_github_variables.md @@ -0,0 +1,74 @@ + + + + +**Table of Contents** *generated with [DocToc](https://github.com/thlorenz/doctoc)* + +- [GitHub Registry Variables](#github-registry-variables) + + + +# GitHub Registry Variables + +Our CI uses GitHub Registry to pull and push images to/from by default. +Those variables are set automatically by GitHub Actions when you run +Airflow workflows in your fork, so they should automatically use your +own repository as GitHub Registry to build and keep the images as build +image cache. + +The variables are automatically set in GitHub actions + +| Variable | Default | Comment | +|-------------------------------|------------------|-----------------------------------------------------------------------| +| GITHUB_REPOSITORY | `apache/airflow` | Prefix of the image. It indicates which registry from GitHub to use. | +| CONSTRAINTS_GITHUB_REPOSITORY | `apache/airflow` | Repository where constraints are stored | +| GITHUB_USERNAME | | Username to use to login to GitHub | +| GITHUB_TOKEN | | Token to use to login to GitHub. Only used when pushing images on CI. | + +The Variables beginning with `GITHUB_` cannot be overridden in GitHub +Actions by the workflow. Those variables are set by GitHub Actions +automatically and they are immutable. Therefore if you want to override +them in your own CI workflow and use `breeze`, you need to pass the +values by corresponding `breeze` flags `--github-repository`, +`--github-token` rather than by setting them as environment variables in +your workflow. Unless you want to keep your own copy of constraints in +orphaned `constraints-*` branches, the `CONSTRAINTS_GITHUB_REPOSITORY` +should remain `apache/airflow`, regardless in which repository the CI +job is run. + +One of the variables you might want to override in your own GitHub +Actions workflow when using `breeze` is `--github-repository` - you +might want to force it to `apache/airflow`, because then the cache from +`apache/airflow` repository will be used and your builds will be much +faster. + +Example command to build your CI image efficiently in your own CI +workflow: + +``` bash +# GITHUB_REPOSITORY is set automatically in Github Actions so we need to override it with flag +# +breeze ci-image build --github-repository apache/airflow --python 3.10 +docker tag ghcr.io/apache/airflow/main/ci/python3.10 your-image-name:tag +``` + +----- + +Read next about [Static checks](04_static_checks.md) diff --git a/dev/breeze/SELECTIVE_CHECKS.md b/dev/breeze/doc/ci/04_static_checks.md similarity index 97% rename from dev/breeze/SELECTIVE_CHECKS.md rename to dev/breeze/doc/ci/04_static_checks.md index e9f1f7547ef73..9162012aff113 100644 --- a/dev/breeze/SELECTIVE_CHECKS.md +++ b/dev/breeze/doc/ci/04_static_checks.md @@ -22,6 +22,13 @@ **Table of Contents** *generated with [DocToc](https://github.com/thlorenz/doctoc)* - [Selective CI Checks](#selective-ci-checks) + - [Groups of files that selective check make decisions on](#groups-of-files-that-selective-check-make-decisions-on) + - [Selective check decision rules](#selective-check-decision-rules) + - [Skipping pre-commits (Static checks)](#skipping-pre-commits-static-checks) + - [Suspended providers](#suspended-providers) + - [Selective check outputs](#selective-check-outputs) + + # Selective CI Checks @@ -38,7 +45,7 @@ We have the following Groups of files for CI that determine which tests are run: there might simply change the whole environment of what is going on in CI (Container image, dependencies) * `Python production files` and `Javascript production files` - this area is useful in CodeQL Security scanning - if any of the python or javascript files for airflow "production" changed, this means that the security - scans should run + scans should run * `Always test files` - Files that belong to "Always" run tests. * `API tests files` and `Codegen test files` - those are OpenAPI definition files that impact Open API specification and determine that we should run dedicated API tests. @@ -49,7 +56,7 @@ We have the following Groups of files for CI that determine which tests are run: the `update-providers-dependencies` pre-commit. The provider.yaml is a single source of truth for each provider. * `DOC files` - change in those files indicate that we should run documentation builds (both airflow sources - and airflow documentation) + and airflow documentation) * `WWW files` - those are files for the WWW part of our UI (useful to determine if UI tests should run) * `System test files` - those are the files that are part of system tests (system tests are not automatically run in our CI, but Airflow stakeholders are running the tests and expose dashboards for them at @@ -140,6 +147,8 @@ when some files are not changed. Those are the rules implemented: * if no `All Providers Python files` and no `All Providers Yaml files` are changed - `check-provider-yaml-valid` check is skipped + + ## Suspended providers The selective checks will fail in PR if it contains changes to a suspended provider unless you set the @@ -217,3 +226,7 @@ or when new Hook class is added), we do not need to run full tests. That's why we do not base our `full tests needed` decision on changes in dependency files that are generated from the `provider.yaml` files. + +----- + +Read next about [Workflows](05_workflows.md) diff --git a/dev/breeze/doc/ci/05_workflows.md b/dev/breeze/doc/ci/05_workflows.md new file mode 100644 index 0000000000000..ab914f2329ae2 --- /dev/null +++ b/dev/breeze/doc/ci/05_workflows.md @@ -0,0 +1,291 @@ + + + + +**Table of Contents** *generated with [DocToc](https://github.com/thlorenz/doctoc)* + +- [CI run types](#ci-run-types) + - [Pull request run](#pull-request-run) + - [Canary run](#canary-run) + - [Scheduled runs](#scheduled-runs) +- [Workflows](#workflows) + - [Build Images Workflow](#build-images-workflow) + - [Differences for main and release branches](#differences-for-main-and-release-branches) + - [Tests Workflow](#tests-workflow) + - [CodeQL scan](#codeql-scan) + - [Publishing documentation](#publishing-documentation) + + + +# CI run types + +The Apache Airflow project utilizes several types of Continuous +Integration (CI) jobs, each with a distinct purpose and context. These +jobs are executed by the `ci.yaml` workflow. + +In addition to the standard "PR" runs, we also execute "Canary" runs. +These runs are designed to detect potential issues that could affect +regular PRs early on, without causing all PRs to fail when such problems +arise. This strategy ensures a more stable environment for contributors +submitting their PRs. At the same time, it allows maintainers to +proactively address issues highlighted by the "Canary" builds. + +## Pull request run + +These runs are triggered by pull requests from contributors' forks. The +majority of Apache Airflow builds fall into this category. They are +executed in the context of the contributor's "Fork", not the main +Airflow Code Repository, meaning they only have "read" access to all +GitHub resources, such as the container registry and code repository. +This is necessary because the code in these PRs, including the CI job +definition, might be modified by individuals who are not committers to +the Apache Airflow Code Repository. + +The primary purpose of these jobs is to verify if the PR builds cleanly, +if the tests run correctly, and if the PR is ready for review and merge. +These runs utilize cached images from the Private GitHub registry, +including CI, Production Images, and base Python images. Furthermore, +for these builds, we only execute Python tests if significant files have +changed. For instance, if the PR involves a "no-code" change, no tests +will be executed. + +Regular PR builds run in a "stable" environment: + +- fixed set of constraints (constraints that passed the tests) - except + the PRs that change dependencies +- limited matrix and set of tests (determined by selective checks based + on what changed in the PR) +- no ARM image builds are build in the regular PRs +- lower probability of flaky tests for non-committer PRs (public runners + and less parallelism) + +Maintainers can also run the "Pull Request run" from the +"apache/airflow" repository by pushing to a branch in the +"apache/airflow" repository. This is useful when you want to test a PR +that changes the CI/CD infrastructure itself (for example changes to the +CI/CD scripts or changes to the CI/CD workflows). In this case the PR is +run in the context of the "apache/airflow" repository and has WRITE +access to the GitHub Container Registry. + +## Canary run + +This workflow is triggered when a pull request is merged into the "main" +branch or pushed to any of the "v2-\*-test" branches. The "Canary" run +aims to upgrade dependencies to their latest versions and promptly +pushes a preview of the CI/PROD image cache to the GitHub Registry. This +allows pull requests to quickly utilize the new cache, which is +particularly beneficial when the Dockerfile or installation scripts have +been modified. Even if some tests fail, this cache will already include +the latest Dockerfile and scripts.Upon successful execution, the run +updates the constraint files in the "constraints-main" branch with the +latest constraints and pushes both the cache and the latest CI/PROD +images to the GitHub Registry. + +If the "Canary" build fails, it often indicates that a new version of +our dependencies is incompatible with the current tests or Airflow code. +Alternatively, it could mean that a breaking change has been merged into +"main". Both scenarios require prompt attention from the maintainers. +While a "broken main" due to our code should be fixed quickly, "broken +dependencies" may take longer to resolve. Until the tests pass, the +constraints will not be updated, meaning that regular PRs will continue +using the older version of dependencies that passed one of the previous +"Canary" runs. + +## Scheduled runs + +The "scheduled" workflow, which is designed to run regularly (typically +overnight), is triggered when a scheduled run occurs. This workflow is +largely identical to the "Canary" run, with one key difference: the +image is always built from scratch, not from a cache. This approach +ensures that we can verify whether any "system" dependencies in the +Debian base image have changed, and confirm that the build process +remains reproducible. Since the process for a scheduled run mirrors that +of a "Canary" run, no separate diagram is necessary to illustrate it. + +# Workflows + +A general note about cancelling duplicated workflows: for the +`Build Images`, `Tests` and `CodeQL` workflows we use the `concurrency` +feature of GitHub actions to automatically cancel "old" workflow runs of +each type -- meaning if you push a new commit to a branch or to a pull +request and there is a workflow running, GitHub Actions will cancel the +old workflow run automatically. + +## Build Images Workflow + +This workflow builds images for the CI Workflow for Pull Requests coming +from forks. + +It's a special type of workflow: `pull_request_target` which means that +it is triggered when a pull request is opened. This also means that the +workflow has Write permission to push to the GitHub registry the images +used by CI jobs which means that the images can be built only once and +reused by all the CI jobs (including the matrix jobs). We've implemented +it so that the `Tests` workflow waits until the images are built by the +`Build Images` workflow before running. + +Those "Build Image" steps are skipped in case Pull Requests do not come +from "forks" (i.e. those are internal PRs for Apache Airflow repository. +This is because in case of PRs coming from Apache Airflow (only +committers can create those) the "pull_request" workflows have enough +permission to push images to GitHub Registry. + +This workflow is not triggered on normal pushes to our "main" branches, +i.e. after a pull request is merged and whenever `scheduled` run is +triggered. Again in this case the "CI" workflow has enough permissions +to push the images. In this case we simply do not run this workflow. + +The workflow has the following jobs: + +| Job | Description | +|-------------------|---------------------------------------------| +| Build Info | Prints detailed information about the build | +| Build CI images | Builds all configured CI images | +| Build PROD images | Builds all configured PROD images | + +The images are stored in the [GitHub Container +Registry](https://github.com/orgs/apache/packages?repo_name=airflow) and the names of those images follow the patterns +described in [Images](02_images.md#naming-conventions) + +Image building is configured in "fail-fast" mode. When any of the images +fails to build, it cancels other builds and the source `Tests` workflow +run that triggered it. + +## Differences for main and release branches + +The type of tests executed varies depending on the version or branch +under test. For the "main" development branch, we run all tests to +maintain the quality of Airflow. However, when releasing patch-level +updates on older branches, we only run a subset of these tests. This is +because older branches are exclusively used for releasing Airflow and +its corresponding image, not for releasing providers or helm charts. + +This behaviour is controlled by `default-branch` output of the +build-info job. Whenever we create a branch for old version we update +the `AIRFLOW_BRANCH` in `airflow_breeze/branch_defaults.py` to point to +the new branch and there are a few places where selection of tests is +based on whether this output is `main`. They are marked as - in the +"Release branches" column of the table below. + +## Tests Workflow + +This workflow is a regular workflow that performs all checks of Airflow +code. + +| Job | Description | PR | Canary | Scheduled | Release branches | +|----------------------------------|----------------------------------------------------------|-----------|----------|------------|-------------------| +| Build info | Prints detailed information about the build | Yes | Yes | Yes | Yes | +| Push early cache & images | Pushes early cache/images to GitHub Registry | | Yes | | | +| Check that image builds quickly | Checks that image builds quickly | | Yes | | Yes | +| Build CI images | Builds images in-workflow (not in the build images) | | Yes | Yes (1) | Yes (4) | +| Generate constraints/CI verify | Generate constraints for the build and verify CI image | Yes (2) | Yes (2) | Yes (2) | Yes (2) | +| Build PROD images | Builds images in-workflow (not in the build images) | | Yes | Yes (1) | Yes (4) | +| Build Bullseye PROD images | Builds images based on Bullseye debian | | Yes | Yes | Yes | +| Run breeze tests | Run unit tests for Breeze | Yes | Yes | Yes | Yes | +| Test OpenAPI client gen | Tests if OpenAPIClient continues to generate | Yes | Yes | Yes | Yes | +| React WWW tests | React UI tests for new Airflow UI | Yes | Yes | Yes | Yes | +| Test examples image building | Tests if PROD image build examples work | Yes | Yes | Yes | Yes | +| Test git clone on Windows | Tests if Git clone for for Windows | Yes (5) | Yes (5) | Yes (5) | Yes (5) | +| Waits for CI Images | Waits for and verify CI Images | Yes (2) | Yes (2) | Yes (2) | Yes (2) | +| Static checks | Performs full static checks | Yes (6) | Yes | Yes | Yes (7) | +| Basic static checks | Performs basic static checks (no image) | Yes (6) | | | | +| Build docs | Builds and tests publishing of the documentation | Yes | Yes | Yes | Yes | +| Spellcheck docs | Spellcheck docs | Yes | Yes | Yes | Yes | +| Tests wheel provider packages | Tests if provider packages can be built and released | Yes | Yes | Yes | | +| Tests Airflow compatibility | Compatibility of provider packages with older Airflow | Yes | Yes | Yes | | +| Tests dist provider packages | Tests if dist provider packages can be built | | Yes | Yes | | +| Tests airflow release commands | Tests if airflow release command works | | Yes | Yes | | +| Tests (Backend/Python matrix) | Run the Pytest unit DB tests (Backend/Python matrix) | Yes | Yes | Yes | Yes (8) | +| No DB tests | Run the Pytest unit Non-DB tests (with pytest-xdist) | Yes | Yes | Yes | Yes (8) | +| Integration tests | Runs integration tests (Postgres/Mysql) | Yes | Yes | Yes | Yes (9) | +| Quarantined tests | Runs quarantined tests (with flakiness and side-effects) | Yes | Yes | Yes | Yes (8) | +| Test airflow packages | Tests that Airflow package can be built and released | Yes | Yes | Yes | Yes | +| Helm tests | Run the Helm integration tests | Yes | Yes | Yes | | +| Helm release tests | Run the tests for Helm releasing | Yes | Yes | Yes | | +| Summarize warnings | Summarizes warnings from all other tests | Yes | Yes | Yes | Yes | +| Wait for PROD Images | Waits for and verify PROD Images | Yes (2) | Yes (2) | Yes (2) | Yes (2) | +| Docker Compose test/PROD verify | Tests quick-start Docker Compose and verify PROD image | Yes | Yes | Yes | Yes | +| Tests Kubernetes | Run Kubernetes test | Yes | Yes | Yes | | +| Update constraints | Upgrade constraints to latest ones | Yes (3) | Yes (3) | Yes (3) | Yes (3) | +| Push cache & images | Pushes cache/images to GitHub Registry (3) | | Yes (3) | | Yes | +| Build CI ARM images | Builds CI images for ARM | Yes (10) | | Yes | | + +`(1)` Scheduled jobs builds images from scratch - to test if everything +works properly for clean builds + +`(2)` The jobs wait for CI images to be available. It only actually runs when build image is needed (in +case of simpler PRs that do not change dependencies or source code, +images are not build) + +`(3)` PROD and CI cache & images are pushed as "cache" (both AMD and +ARM) and "latest" (only AMD) to GitHub Container registry and +constraints are upgraded only if all tests are successful. The images +are rebuilt in this step using constraints pushed in the previous step. +Constraints are only actually pushed in the `canary/scheduled` runs. + +`(4)` In main, PROD image uses locally build providers using "latest" +version of the provider code. In the non-main version of the build, the +latest released providers from PyPI are used. + +`(5)` Always run with public runners to test if Git clone works on +Windows. + +`(6)` Run full set of static checks when selective-checks determine that +they are needed (basically, when Python code has been modified). + +`(7)` On non-main builds some of the static checks that are related to +Providers are skipped via selective checks (`skip-pre-commits` check). + +`(8)` On non-main builds the unit tests for providers are skipped via +selective checks removing the "Providers" test type. + +`(9)` On non-main builds the integration tests for providers are skipped +via `skip-provider-tests` selective check output. + +`(10)` Only run the builds in case PR is run by a committer from +"apache" repository and in scheduled build. + +## CodeQL scan + +The [CodeQL](https://securitylab.github.com/tools/codeql) security scan +uses GitHub security scan framework to scan our code for security +violations. It is run for JavaScript and Python code. + +## Publishing documentation + +Documentation from the `main` branch is automatically published on +Amazon S3. + +To make this possible, GitHub Action has secrets set up with credentials +for an Amazon Web Service account - `DOCS_AWS_ACCESS_KEY_ID` and +`DOCS_AWS_SECRET_ACCESS_KEY`. + +This account has permission to write/list/put objects to bucket +`apache-airflow-docs`. This bucket has public access configured, which +means it is accessible through the website endpoint. For more +information, see: [Hosting a static website on Amazon +S3](https://docs.aws.amazon.com/AmazonS3/latest/dev/WebsiteHosting.html) + +Website endpoint: + + +----- + +Read next about [Diagrams](06_diagrams.md) diff --git a/CI_DIAGRAMS.md b/dev/breeze/doc/ci/06_diagrams.md similarity index 96% rename from CI_DIAGRAMS.md rename to dev/breeze/doc/ci/06_diagrams.md index a06160d4b1cab..c9610ff009d6f 100644 --- a/CI_DIAGRAMS.md +++ b/dev/breeze/doc/ci/06_diagrams.md @@ -17,10 +17,21 @@ under the License. --> + + +**Table of Contents** *generated with [DocToc](https://github.com/thlorenz/doctoc)* + +- [CI Sequence diagrams](#ci-sequence-diagrams) + - [Pull request flow from fork](#pull-request-flow-from-fork) + - [Pull request flow from "apache/airflow" repo](#pull-request-flow-from-apacheairflow-repo) + - [Merge "Canary" run](#merge-canary-run) + - [Scheduled run](#scheduled-run) + + + # CI Sequence diagrams -You can see here the sequence diagrams of the flow happening during the CI Jobs. More detailed description -for the CI flows can be found in the [CI.rst](CI.rst) document. +You can see here the sequence diagrams of the flow happening during the CI Jobs. ## Pull request flow from fork @@ -448,3 +459,7 @@ same as "Canary" run, with the difference that the image used to run the tests i cache - it's always built from the scratch. This way we can check that no "system" dependencies in debian base image have changed and that the build is still reproducible. No separate diagram is needed for scheduled run as it is identical to that of "Canary" run. + +----- + +Read next about [Debugging](07_debugging.md) diff --git a/dev/breeze/doc/ci/07_debugging.md b/dev/breeze/doc/ci/07_debugging.md new file mode 100644 index 0000000000000..64ee2d1e5a790 --- /dev/null +++ b/dev/breeze/doc/ci/07_debugging.md @@ -0,0 +1,63 @@ + + + + +**Table of Contents** *generated with [DocToc](https://github.com/thlorenz/doctoc)* + +- [Debugging CI Jobs in Github Actions](#debugging-ci-jobs-in-github-actions) + + + +# Debugging CI Jobs in Github Actions + +The CI jobs are notoriously difficult to test, because you can only +really see results of it when you run them in CI environment, and the +environment in which they run depend on who runs them (they might be +either run in our Self-Hosted runners (with 64 GB RAM 8 CPUs) or in the +GitHub Public runners (6 GB of RAM, 2 CPUs) and the results will vastly +differ depending on which environment is used. We are utilizing +parallelism to make use of all the available CPU/Memory but sometimes +you need to enable debugging and force certain environments. Additional +difficulty is that `Build Images` workflow is `pull-request-target` +type, which means that it will always run using the `main` version - no +matter what is in your Pull Request. + +There are several ways how you can debug the CI jobs when you are +maintainer. + +- When you want to tests the build with all combinations of all python, + backends etc on regular PR, add `full tests needed` label to the PR. +- When you want to test maintainer PR using public runners, add + `public runners` label to the PR +- When you want to see resources used by the run, add + `debug ci resources` label to the PR +- When you want to test changes to breeze that include changes to how + images are build you should push your PR to `apache` repository not to + your fork. This will run the images as part of the `CI` workflow + rather than using `Build images` workflow and use the same breeze + version for building image and testing +- When you want to test changes to `build-images.yml` workflow you + should push your branch as `main` branch in your local fork. This will + run changed `build-images.yml` workflow as it will be in `main` branch + of your fork + +----- + +Read next about [Running CI locally](08_running_ci_locally.md) diff --git a/dev/breeze/doc/ci/08_running_ci_locally.md b/dev/breeze/doc/ci/08_running_ci_locally.md new file mode 100644 index 0000000000000..6e1cbb0917536 --- /dev/null +++ b/dev/breeze/doc/ci/08_running_ci_locally.md @@ -0,0 +1,141 @@ + + + + +**Table of Contents** *generated with [DocToc](https://github.com/thlorenz/doctoc)* + +- [Running the CI Jobs locally](#running-the-ci-jobs-locally) +- [Upgrade to newer dependencies](#upgrade-to-newer-dependencies) + + + +# Running the CI Jobs locally + +The main goal of the CI philosophy we have that no matter how complex +the test and integration infrastructure, as a developer you should be +able to reproduce and re-run any of the failed checks locally. One part +of it are pre-commit checks, that allow you to run the same static +checks in CI and locally, but another part is the CI environment which +is replicated locally with Breeze. + +You can read more about Breeze in +[README.rst](../README.rst) but in essence it is a script +that allows you to re-create CI environment in your local development +instance and interact with it. In its basic form, when you do +development you can run all the same tests that will be run in CI - but +locally, before you submit them as PR. Another use case where Breeze is +useful is when tests fail on CI. You can take the full `COMMIT_SHA` of +the failed build pass it as `--image-tag` parameter of Breeze and it +will download the very same version of image that was used in CI and run +it locally. This way, you can very easily reproduce any failed test that +happens in CI - even if you do not check out the sources connected with +the run. + +All our CI jobs are executed via `breeze` commands. You can replicate +exactly what our CI is doing by running the sequence of corresponding +`breeze` command. Make sure however that you look at both: + +- flags passed to `breeze` commands +- environment variables used when `breeze` command is run - this is + useful when we want to set a common flag for all `breeze` commands in + the same job or even the whole workflow. For example `VERBOSE` + variable is set to `true` for all our workflows so that more detailed + information about internal commands executed in CI is printed. + +In the output of the CI jobs, you will find both - the flags passed and +environment variables set. + +You can read more about it in [Breeze](../README.rst) and +[Testing](contributing-docs/09_testing.rst) + +Since we store images from every CI run, you should be able easily +reproduce any of the CI tests problems locally. You can do it by pulling +and using the right image and running it with the right docker command, +For example knowing that the CI job was for commit +`cd27124534b46c9688a1d89e75fcd137ab5137e3`: + +``` bash +docker pull ghcr.io/apache/airflow/main/ci/python3.8:cd27124534b46c9688a1d89e75fcd137ab5137e3 + +docker run -it ghcr.io/apache/airflow/main/ci/python3.8:cd27124534b46c9688a1d89e75fcd137ab5137e3 +``` + +But you usually need to pass more variables and complex setup if you +want to connect to a database or enable some integrations. Therefore it +is easiest to use [Breeze](../README.rst) for that. For +example if you need to reproduce a MySQL environment in python 3.8 +environment you can run: + +``` bash +breeze --image-tag cd27124534b46c9688a1d89e75fcd137ab5137e3 --python 3.8 --backend mysql +``` + +You will be dropped into a shell with the exact version that was used +during the CI run and you will be able to run pytest tests manually, +easily reproducing the environment that was used in CI. Note that in +this case, you do not need to checkout the sources that were used for +that run - they are already part of the image - but remember that any +changes you make in those sources are lost when you leave the image as +the sources are not mapped from your host machine. + +Depending whether the scripts are run locally via +[Breeze](../README.rst) or whether they are run in +`Build Images` or `Tests` workflows they can take different values. + +You can use those variables when you try to reproduce the build locally +(alternatively you can pass those via corresponding command line flags +passed to `breeze shell` command. + +| Variable | Local development | Build Images workflow | CI Workflow | Comment | +|-----------------------------------------|--------------------|------------------------|--------------|--------------------------------------------------------------------------------| +| Basic variables | | | | | +| PYTHON_MAJOR_MINOR_VERSION | | | | Major/Minor version of Python used. | +| DB_RESET | false | true | true | Determines whether database should be reset at the container entry. | +| Forcing answer | | | | | +| ANSWER | | yes | yes | This variable determines if answer to questions should be automatically given. | +| Host variables | | | | | +| HOST_USER_ID | | | | User id of the host user. | +| HOST_GROUP_ID | | | | Group id of the host user. | +| HOST_OS | | linux | linux | OS of the Host (darwin/linux/windows). | +| Git variables | | | | | +| COMMIT_SHA | | GITHUB_SHA | GITHUB_SHA | SHA of the commit of the build is run | +| In container environment initialization | | | | | +| SKIP_ENVIRONMENT_INITIALIZATION | false* | false* | false* | Skip initialization of test environment * set to true in pre-commits | +| SKIP_IMAGE_UPGRADE_CHECK | false* | false* | false* | Skip checking if image should be upgraded * set to true in pre-commits | +| SKIP_PROVIDER_TESTS | false* | false* | false* | Skip running provider integration tests | +| SKIP_SSH_SETUP | false* | false* | false* | Skip setting up SSH server for tests. * set to true in GitHub CodeSpaces | +| VERBOSE_COMMANDS | false | false | false | Determines whether every command executed in docker should be printed. | +| Image build variables | | | | | +| UPGRADE_TO_NEWER_DEPENDENCIES | false | false | false* | Determines whether the build should attempt to upgrade dependencies. | + +# Upgrade to newer dependencies + +By default we are using a tested set of dependency constraints stored in separated "orphan" branches of the airflow repository +("constraints-main, "constraints-2-0") but when this flag is set to anything but false (for example random value), +they are not used used and "eager" upgrade strategy is used when installing dependencies. We set it to true in case of direct +pushes (merges) to main and scheduled builds so that the constraints are tested. In those builds, in case we determine +that the tests pass we automatically push latest set of "tested" constraints to the repository. Setting the value to random +value is best way to assure that constraints are upgraded even if there is no change to pyproject.toml +This way our constraints are automatically tested and updated whenever new versions of libraries are released. +(*) true in case of direct pushes and scheduled builds + +---- + +**Thank you** for reading this far. We hope that you have learned a lot about Airflow's CI. diff --git a/dev/breeze/doc/ci/README.md b/dev/breeze/doc/ci/README.md new file mode 100644 index 0000000000000..c28a5977da3bb --- /dev/null +++ b/dev/breeze/doc/ci/README.md @@ -0,0 +1,29 @@ + + +This directory contains detailed design of the Airflow CI setup. + +* [CI Environment](01_ci_environment.md) - contains description of the CI environment +* [Image Naming](02_image_naming.md) - contains description of the naming conventions for the images +* [GitHub Variables](03_github_variables.md) - contains description of the GitHub variables used in CI +* [Static checks](04_static_checks.md) - contains description of the static checks performed in CI +* [Workflows](05_workflows.md) - contains description of the workflows used in CI +* [Diagrams](06_diagrams.md) - contains diagrams of the CI workflows +* [Debugging](07_debugging.md) - contains description of debugging CI issues +* [Running CI Locally](08_running_ci_locally.md) - contains description of running CI locally diff --git a/contributing-docs/images/AirflowBreeze_logo.png b/dev/breeze/doc/images/AirflowBreeze_logo.png similarity index 100% rename from contributing-docs/images/AirflowBreeze_logo.png rename to dev/breeze/doc/images/AirflowBreeze_logo.png diff --git a/docs/docker-stack/build.rst b/docs/docker-stack/build.rst index eac84cbd57c35..b94aaff59ff87 100644 --- a/docs/docker-stack/build.rst +++ b/docs/docker-stack/build.rst @@ -1014,7 +1014,7 @@ The architecture of the images .............................. You can read more details about the images - the context, their parameters and internal structure in the -`IMAGES.rst `_ document. +`Images documentation `_. Pip packages caching diff --git a/generated/PYPI_README.md b/generated/PYPI_README.md index 27a4b546c2941..70f85ac1eb8aa 100644 --- a/generated/PYPI_README.md +++ b/generated/PYPI_README.md @@ -165,7 +165,7 @@ release provided they have access to the appropriate platform and tools. Want to help build Apache Airflow? Check out our [contributing documentation](https://github.com/apache/airflow/blob/main/contributing-docs/README.rst). -Official Docker (container) images for Apache Airflow are described in [IMAGES.rst](https://github.com/apache/airflow/blob/main/IMAGES.rst). +Official Docker (container) images for Apache Airflow are described in [images](dev/breeze/doc/ci/02_images.md). ## Voting Policy