From e585e3848c87ee9d789e0e26c0b1664ae6d44248 Mon Sep 17 00:00:00 2001 From: Dave Flynn Date: Wed, 24 Apr 2024 09:37:27 +0800 Subject: [PATCH] Refined CI doc --- docs/docs/guides/scenario-ci.html | 1249 +++++++++++++++++++++++++++++ docs/docs/guides/scenario-ci.md | 44 +- 2 files changed, 1278 insertions(+), 15 deletions(-) create mode 100644 docs/docs/guides/scenario-ci.html diff --git a/docs/docs/guides/scenario-ci.html b/docs/docs/guides/scenario-ci.html new file mode 100644 index 00000000..b82b588f --- /dev/null +++ b/docs/docs/guides/scenario-ci.html @@ -0,0 +1,1249 @@ +Continuous Integration (CI)

Recce CI integration with GitHub Action

+

Recce provides the recce run command for CI/CD pipeline. You can integrate Recce with GitHub Actions (or other CI tools) to compare the data models between two environments when a new pull-request is created.

+

The following guide shows how to configure Recce in GitHub Actions.

+

Prerequisites

+

Before integrating Recce with GitHub Actions, you will need to configure the following items:

+ +

Set up Recce with GitHub Action

+

We will suggest setting up two GitHub Actions workflows in your GitHub repository. One for the production environment and another for the development environment.

+

For the production environment, it will be triggered on every merge to the main branch.

+

And for the development environment, it will be triggered on every push commits to the pull-request branch.

+

Base Workflow (Main Branch)

+

In this workflow, we will set up the GitHub Action to run the dbt commands for the production environment. And then, it will package the dbt artifacts and upload them to the 3rd party storage system outside the GitHub. We will use the AWS S3 bucket to store the dbt artifacts here.

+
name: Recce CI Base Branch
+
+on:
+  push:
+    branches:
+      - main
+
+concurrency:
+  group: recce-ci-base
+  cancel-in-progress: true
+
+jobs:
+  build:
+    name: DBT Runner
+    runs-on: ubuntu-latest
+
+    steps:
+      - uses: actions/checkout@v3
+
+      - name: Set up Python
+        uses: actions/setup-python@v2
+        with:
+          python-version: "3.10.x"
+
+      - name: Install dependencies
+        run: |
+          pip install -r requirements.txt
+
+      - name: Run DBT
+        run: |
+          dbt deps
+          dbt seed --target ${{ env.DBT_BASE_TARGET }} --target-path target-base
+          dbt run --target ${{ env.DBT_BASE_TARGET }} --target-path target-base
+          dbt docs generate --target ${{ env.DBT_BASE_TARGET }} --target-path target-base
+        env:
+          # Set the dbt target name of the base environment
+          DBT_BASE_TARGET: prod
+
+      - name: Package DBT artifacts
+        run: |
+          tar -czvf dbt-artifacts.tar.gz target-base
+          mv dbt-artifacts.tar.gz $GITHUB_WORKSPACE/${{ github.sha }}.tar.gz
+
+      - name: Upload to S3
+        run: |
+          aws s3 cp $GITHUB_WORKSPACE/${{ github.sha }}.tar.gz s3://${{ env.AWS_S3_BUCKET }}/${{ github.sha }}.tar.gz
+        env:
+          # Set these in your repository secrets
+          AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
+          # Set these in your repository secrets
+          AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
+          # Set these in your repository secrets
+          AWS_REGION: ${{ secrets.AWS_REGION }}
+          # Set these in your repository secrets
+          AWS_S3_BUCKET: ${{ secrets.AWS_S3_BUCKET }}
+
+ +

Current Workflow (Pull Request Branch)

+

In the current workflow, we will set up the GitHub Action to run the dbt commands for the development environment. And then, download the dbt artifacts built in the base environment from the 3rd party storage system. After that, it will compare the data models between the base and current environments using Recce.

+
name: Recce CI Current Branch
+
+on:
+  pull_request:
+    branches: [main]
+
+jobs:
+  check-pull-request:
+    name: Check pull request by Recce CI
+    runs-on: ubuntu-latest
+    permissions:
+      pull-requests: write
+    steps:
+      - name: Checkout repository
+        uses: actions/checkout@v3
+        with:
+          fetch-depth: 0
+
+      - name: Set up Python
+        uses: actions/setup-python@v4
+        with:
+          python-version: "3.10.x"
+
+      - name: Install dependencies
+        run: |
+          pip install -r requirements.txt
+
+      - name: Install Recce
+        run: |
+          pip install recce
+
+      - name: Prepare DBT Base environment
+        run: |
+          if aws s3 cp s3://$AWS_S3_BUCKET/${{ github.event.pull_request.base.sha }}.tar.gz .; then
+            echo "Base environment found in S3"
+            tar -xvf ${{ github.event.pull_request.base.sha }}.tar.gz
+          else
+            echo "Base environment not found in S3. Running dbt to create base environment"
+            git checkout ${{ github.event.pull_request.base.sha }}
+            dbt deps
+            dbt seed --target ${{ env.DBT_BASE_TARGET }} --target-path target-base
+            dbt run --target ${{ env.DBT_BASE_TARGET }} --target-path target-base
+            dbt docs generate --target ${{ env.DBT_BASE_TARGET }} --target-path target-base
+          fi
+        env:
+          AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
+          AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
+          AWS_REGION: ${{ secrets.AWS_REGION }}
+          AWS_S3_BUCKET: ${{ secrets.AWS_S3_BUCKET }}
+          # Set the dbt target name of the base environment
+          DBT_BASE_TARGET: prod
+
+      - name: Prepare DBT Current environment
+        run: |
+          git checkout ${{ github.event.pull_request.head.sha }}
+          dbt deps
+          dbt seed --target ${{ env.DBT_CURRENT_TARGET }}
+          dbt run --target ${{ env.DBT_CURRENT_TARGET }}
+          dbt docs generate --target ${{ env.DBT_CURRENT_TARGET }}
+        env:
+          # Set the dbt target name of the current environment
+            DBT_CURRENT_TARGET: dev
+
+      - name: Run Recce CI
+        run: |
+          recce run --github-pull-request-url ${{ github.event.pull_request.html_url }}
+
+      - name: Archive Recce State File
+        uses: actions/upload-artifact@v4
+        id: recce-artifact-uploader
+        with:
+          name: recce-state-file
+          path: recce_state.json
+
+      - name: Comment on pull request
+        uses: thollander/actions-comment-pull-request@v2
+        with:
+          message: |
+            Recce `run` successfully completed.
+            Please download the [artifact](${{ env.ARTIFACT_URL }}) for the state file.
+        env:
+          ARTIFACT_URL: ${{ steps.recce-artifact-uploader.outputs.artifact-url }}
+
+ +

Review the Recce State File

+

Once the Recce CI workflow is completed, you can download the Recce state file from the GitHub pull-request. The Recce state file contains the comparison results of the data models between the base and current environments.

+
recce server --review recce_state.json
+
+ +

In the Recce server review mode, you can review the comparison results of the data models between the base and current environments. It will contain the row counts of modified data models, and the query results of the Recce Preset Checks.

\ No newline at end of file diff --git a/docs/docs/guides/scenario-ci.md b/docs/docs/guides/scenario-ci.md index 8c6e6af7..1d15dab0 100644 --- a/docs/docs/guides/scenario-ci.md +++ b/docs/docs/guides/scenario-ci.md @@ -5,29 +5,34 @@ icon: octicons/play-16 # Recce CI integration with GitHub Action -Recce provides the `recce run` command for CI/CD pipeline. You can integrate Recce with GitHub Action to compare the data models between two environments when a new pull-request is created. +Recce provides the `recce run` command for CI/CD pipeline. You can integrate Recce with GitHub Actions (or other CI tools) to compare the data models between two environments when a new pull-request is created. + +The following guide demonstrates how to configure Recce in GitHub Actions. ## Prerequisites -Before you start integrating Recce with GitHub Action, you need to have the following prerequisites: +Before integrating Recce with GitHub Actions, you will need to configure the following items: + +- Set up **two environments** in your data warehouse. For example, one for production and another for development. -- Set up two environments in your data warehouse. For example, one for production and another for development. +- Provide the **credentials profile** for both environments in your `profiles.yml` so that Recce can access your data warehouse. You can put the credentials in a `profiles.yml` file, or use environment variables. -- Provide the credentials profile for both environments in your `profiles.yml` file to let Recce access your data warehouse. You can put the credentials in the `profiles.yml` file. Or you can use the environment variables to provide the credentials. +- Set up the **data warehouse credentials** in your [GitHub repository secrets](https://docs.github.com/en/actions/reference/encrypted-secrets). -- Set up the data warehouse credentials in the GitHub repository secrets. You can set up the credentials in the GitHub repository secrets by following the steps mentioned in the [GitHub documentation](https://docs.github.com/en/actions/reference/encrypted-secrets). +## Set up Recce with GitHub Actions -## Set up Recce with GitHub Action +We suggest setting up two GitHub Actions workflows in your GitHub repository. One for the production environment and another for the development environment. -We will suggest setting up two GitHub Actions workflows in your GitHub repository. One for the production environment and another for the development environment. +- **Production environment workflow**: Triggered on every merge to the `main branch`. This ensures that production artifacts are readily available for use when a PR is opened. -For the production environment, it will be triggered on every merge to the main branch. +- **Development environment workflow**: Triggered on every push to the `pull-request branch`. This workflow will compare production models with the current development environment. -And for the development environment, it will be triggered on every push commits to the pull-request branch. +### Production Workflow (Main Branch) -### Base Workflow (Main Branch) +This workflow will perform the following actions: -In this workflow, we will set up the GitHub Action to run the dbt commands for the production environment. And then, it will package the dbt artifacts and upload them to the 3rd party storage system outside the GitHub. We will use the AWS S3 bucket to store the dbt artifacts here. +1. Run dbt on the production environment. +2. Upload the generated artifacts to S3 for later use. ```yaml name: Recce CI Base Branch @@ -87,9 +92,15 @@ jobs: AWS_S3_BUCKET: ${{ secrets.AWS_S3_BUCKET }} ``` -### Current Workflow (Pull Request Branch) +### Development Workflow (Pull Request Branch) + +This workflow will perform the following actions: + +1. Run dbt on the development environment. +2. Download previously generated production artifacts from S3. +3. Use Recce to compare the current environment with the downloaded production artifacts. +4. Post the Recce [state file](../features/state-file.md) to a pull request comment. -In the current workflow, we will set up the GitHub Action to run the dbt commands for the development environment. And then, download the dbt artifacts built in the base environment from the 3rd party storage system. After that, it will compare the data models between the base and current environments using Recce. ```yaml name: Recce CI Current Branch @@ -176,12 +187,15 @@ jobs: ARTIFACT_URL: ${{ steps.recce-artifact-uploader.outputs.artifact-url }} ``` + ## Review the Recce State File -Once the Recce CI workflow is completed, you can download the [Recce state file](../features/state-file.md) from the GitHub pull-request. The Recce state file contains the comparison results of the data models between the base and current environments. +Review the downloaded Recce [state file](../features/state-file.md) with the folowing command: ```bash recce server --review recce_state.json ``` -In the Recce server review mode, you can review the comparison results of the data models between the base and current environments. It will contain the row counts of modified data models, and the query results of the Recce Preset Checks. +In the Recce server `--review` mode, you can review the comparison results of the data models between the base and current environments. It will contain the row counts of modified data models, and the results of any Recce [Preset Checks](../../features/preset-checks/). + +