Skip to content

Commit

Permalink
blog section and first blog
Browse files Browse the repository at this point in the history
  • Loading branch information
DaveFlynn committed Mar 4, 2024
1 parent 641ea34 commit 9f1f57b
Show file tree
Hide file tree
Showing 39 changed files with 217 additions and 8 deletions.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added .cache/plugin/social/Roboto-Black.ttf
Binary file not shown.
Binary file added .cache/plugin/social/Roboto-BlackItalic.ttf
Binary file not shown.
Binary file added .cache/plugin/social/Roboto-Bold.ttf
Binary file not shown.
Binary file added .cache/plugin/social/Roboto-BoldItalic.ttf
Binary file not shown.
Binary file added .cache/plugin/social/Roboto-Italic.ttf
Binary file not shown.
Binary file added .cache/plugin/social/Roboto-Light.ttf
Binary file not shown.
Binary file added .cache/plugin/social/Roboto-LightItalic.ttf
Binary file not shown.
Binary file added .cache/plugin/social/Roboto-Medium.ttf
Binary file not shown.
Binary file added .cache/plugin/social/Roboto-MediumItalic.ttf
Binary file not shown.
Binary file added .cache/plugin/social/Roboto-Regular.ttf
Binary file not shown.
Binary file added .cache/plugin/social/Roboto-Thin.ttf
Binary file not shown.
Binary file added .cache/plugin/social/Roboto-ThinItalic.ttf
Binary file not shown.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
7 changes: 7 additions & 0 deletions docs/blog/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
---
title: Blog
icon: material/newspaper-variant-outline
---

# Recce Blog

132 changes: 132 additions & 0 deletions docs/blog/posts/2024-04-04_recce-introduction.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,132 @@
---
title: Data Validation Toolkit for dbt Data Projects
date: 2024-04-04
slug: data-validaton-toolkit-for-dbt-data-projects
cover_image: all-signal-no-noise-medium.png
description: >
Validate modeling changes and create "all-signal" PR comments
categories:
- General
---

# Next-Level Data Validation Toolkit for dbt Data Projects — Introducing Recce

<figure markdown="span">
![Build the ultimate PR comment to validate your data modeling changes](../assets/images/data-validaton-toolkit-for-dbt-data-projects/all-signal-no-noise-medium.png)
<figcaption>Recce: Data Validation Toolkit for dbt</figcaption>
</figure>

Validating data modeling changes and reviewing pull requests for dbt projects can be a challenging task. The difficulty of performing a proper ‘code review’ for data projects, due to both the code and data needing review, means the data validation stage is often omitted, poorly implemented, or [drastically slows down time-to-merge](https://medium.com/inthepipeline/use-this-updated-pull-request-comment-template-for-your-dbt-data-projects-de06f12fc38d) for your time sensitive data updates.

How can you maintain data best practices, but speed up the validation and review process?

<!-- more -->

## Recce — The data validation toolkit for dbt projects

Recce (short for reconnaissance) is a data modeling validation toolkit with a focus on environment diffing. Take two dbt environments, such as dev and prod, and compare them using the suite of diff tools in Recce.

## Your Diffing Toolkit

With Recce you’re able to validate your data modeling changes against a known-good baseline. The only real way to verify modeling changes is to check it against historical/production data.

### Lineage DAG Diff

Start from the zone of impact of your changes, and see which models have been modified, added, and removed. The dbt docs lineage DAG only shows you the current state of the DAG, Recce shows you how the DAG differs from before you made any changes.

<figure markdown="span">
![Lineage DAG Diff](../assets/images/data-validaton-toolkit-for-dbt-data-projects/lineage-diff.gif)
<figcaption>Lineage DAG Diff in Recce</figcaption>
</figure>

### Data Profile Diff & Value Diff

Perform holistic checks by diffing the data profile stats for your development branch, then check the percentage of values matching for each column in a model.

<figure markdown="span">
![Data Profile Diff](../assets/images/data-validaton-toolkit-for-dbt-data-projects/profile-diff.png)
<figcaption>Data Profile in Recce</figcaption>
</figure>

### Query Diff

If something needs further investigation, drill down and query the data. One query will run on both environments, and you’ll be able to see the difference on a row-by-row basis. Enable change-only view to see just what’s changed.

<figure markdown="span">
![SQL Query Diff](../assets/images/data-validaton-toolkit-for-dbt-data-projects/query-diff.png)
<figcaption>SQL Query Diff in Recce</figcaption>
</figure>

### Schema and Row Count

In addition to the above diffs, you can also check the schema and row count, just to be sure you didn’t lose any data, or an important column.

<figure markdown="span">
![Schema and Row Count Diff](../assets/images/data-validaton-toolkit-for-dbt-data-projects/schema-diff.png)
<figcaption>Schema and Row Count in Recce</figcaption>
</figure>


### Create your checklist

As you create validations, add them to your checklist with notes about what you found, and re-re-run checks if the data changes. When you’re ready, export the checks to your PR comment.

<figure markdown="span">
![PR Checklist](../assets/images/data-validaton-toolkit-for-dbt-data-projects/checklist.png)
<figcaption>Data Project PR Checklist in Recce</figcaption>
</figure>


### Create your <em>all signal, no noise</em> PR comment

When you’re ready to share your validations as proof-of-correctness for your work, you can export checks into your [PR comment template](https://medium.com/inthepipeline/use-this-updated-pull-request-comment-template-for-your-dbt-data-projects-de06f12fc38d). You can copy your notes, and export a screenshot of the check as it appears in Recce. By curating the validations for your PR comment, you can create an ‘all-signal, no noise’ comment with the validations that are relevant to the context of your changes.

<figure markdown="span">
![PR Validations](../assets/images/data-validaton-toolkit-for-dbt-data-projects/pr-validations.png)
<figcaption>Data Modeling Validations in PR Comment</figcaption>
</figure>

The reviewer will be able to see the query and results of your data spot-checks and have the comprehensive information required to request further investigation, or sign-off on your changes.

## Why Recce?

As mentioned above, whether you’re the pull request author, or reviewer, you’ve got the difficult task of understanding what is going on and trying to verify if the intentions of the PR were realized without screwing up production data. Here’s some common issues we’ve heard about working on large, or business critical, dbt data projects:

- QA for pull requests takes too long — stakeholders want to merge new data features faster.
- dbt build worked, but the data was actually wrong.
- I’m sick of downtime from silent errors making it into prod.
- CI takes too long and current data quality tools are costly to run.
- I just want to see a summary of what changed for modified models.

If any of these pain points ring true, Recce can help with the code review on your data project.
Open-source and available now

Recce OSS is available on GitHub now. Follow the instructions in our Getting Started guide to start using Recce to validate your data modeling changes.

- GitHub: [DataRecce/Recce](https://github.com/datarecce/recce)
- Docs: [DataRecce.io/docs](https://datarecce.io/docs)
- Discord: [Recce Community](https://discord.gg/bP2Yfk9KEA)

## Try Recce Online

If you want to try Recce out without having to install, check out the demo instance below.

### Demo

The [demo PR](https://github.com/DataRecce/jaffle_shop_duckdb/pull/1) makes a simple change to the dbt’s Jaffle Shop project and changes how customer_lifetime_value (CLV) is calculated (fixes it to only calculated completed orders).

<figure markdown="span">
![Model Code Diff](../assets/images/data-validaton-toolkit-for-dbt-data-projects/model-code-diff.png)
<figcaption>Can you validate this code change using Recce?</figcaption>
</figure>

The expectation from this change is that CLV will be reduced overall, and that this will also impact the customer segments downstream model, With that in mind, see if you can determine if the if the PR has any issues by checking the data in Recce:

- <strong>The PR:</strong> [https://github.com/DataRecce/jaffle_shop_duckdb/pull/1](https://github.com/DataRecce/jaffle_shop_duckdb/pull/1)
- <strong>Recce Demo instance:</strong> [https://pr1.cloud.datarecce.io/](https://pr1.cloud.datarecce.io/)

<em>Hint: Run a Profile Diff, then a Query Diff, on the customers model, then check for downstream impact.</em>


<script src="https://gist.github.com/DaveFlynn/4135bc92aea95227939e3db03cf479a5.js"></script>

62 changes: 55 additions & 7 deletions docs/styles/extra.css
Original file line number Diff line number Diff line change
Expand Up @@ -11,13 +11,7 @@
border-radius: 3px;
}

.md-typeset a {

}
.md-typeset a:hover {
border-bottom: solid #ff6e42 1px;
color: #ff6e42;
}
/**** Nav ****/

.md-nav {
font-size: 0.8rem;
Expand All @@ -35,3 +29,57 @@
.md-nav__link[href]:hover {
color: #ff6e42;
}

/**** Blog ****/

.md-post--excerpt .md-post__content h2 {
line-height: 1.2;
}

.md-post__action a {
background-color: #ff6e42;
color: #fff !important;
padding: .3rem 1rem;
border-radius: 1rem;
text-transform: uppercase;
font-weight: bold;
display: inline-block;
margin-top: 1rem;
}
.md-post__action a:hover {
border-bottom: none !important;
}


.md-post--excerpt {
padding-bottom: 2.5rem;
margin-bottom: 2.5rem;
border-bottom: dashed black 1px;
}

.md-post--excerpt:last-of-type {
border-bottom: none;
}


/**** Links ****/

.md-typeset a {

}
.md-typeset a:hover, .md-typeset a:focus,
.md-post .md-post__meta a:hover, .md-post .md-post__meta a:focus {
border-bottom: solid #ff6e42 1px;
color: #ff6e42;
}

h1 a:hover, h2 a:hover a, h3 a:hover {
border-bottom: none;
/* background-color: #ff6e42;*/
/* color: #fff;*/
}





1 change: 1 addition & 0 deletions material/overrides/home.html
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,7 @@ <h1>Recce</h1>

<ul class="nav">
<li class="nav-item"><a href="docs" class="nav-link">Docs</a></li>
<li class="nav-item"><a href="blog" class="nav-link">Blog</a></li>
<li class="nav-item btn btn-primary"><a href="#signup" class="nav-link">Sign Up</a></li>
</ul>

Expand Down
20 changes: 20 additions & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,8 @@ site_description: >-
repo_name: datarecce/recce
repo_url: https://github.com/datarecce/recce

site_url: https://datarecce.io

extra_css:
- styles/extra.css

Expand All @@ -17,6 +19,7 @@ extra:

nav:
- index.md
- blog/index.md
- docs/index.md
- docs/installation.md
- docs/get-started.md
Expand All @@ -30,6 +33,9 @@ nav:
- docs/features/value-diff.md
- docs/features/checklist.md

#- Recce Blog:
#- blog/index.md

markdown_extensions:
- pymdownx.highlight:
anchor_linenums: true
Expand All @@ -40,6 +46,8 @@ markdown_extensions:
- pymdownx.superfences
- attr_list
- md_in_html
#- toc:
# title: On this page

theme:
name: material
Expand Down Expand Up @@ -76,6 +84,18 @@ theme:
plugins:
- search
- glightbox
- social:
cards_layout_options:
background_color: "#ff6e42"
- blog:
blog_dir: blog
archive: true
post_dir: "{blog}/posts"
categories_name: Categories
post_url_format: "{slug}" # Removed date from URL "{date}/{slug}"
blog_toc: false # enable for blog, archive and category index pages
#archive_toc: false # enable for archive index pages
# categories_toc: true # enable for category index pages

watch:
- material
Expand Down
3 changes: 2 additions & 1 deletion requirements.txt
Original file line number Diff line number Diff line change
@@ -1,2 +1,3 @@
mkdocs-material
mkdocs-glightbox
mkdocs-glightbox
mkdocs-material[imaging]

0 comments on commit 9f1f57b

Please sign in to comment.