Skip to content

Commit

Permalink
Merge pull request dbt-labs#235 from dbt-labs/more-primary-key-testing
Browse files Browse the repository at this point in the history
allow for unique combo of columns
  • Loading branch information
dave-connors-3 authored Dec 6, 2022
2 parents 471e716 + 8773632 commit 33e0a59
Show file tree
Hide file tree
Showing 9 changed files with 77 additions and 14 deletions.
32 changes: 28 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -486,7 +486,12 @@ or any other nested information.

### Testing
#### Missing Primary Key Tests
`fct_missing_primary_key_tests` ([source](models/marts/tests/fct_missing_primary_key_tests.sql)) lists every model that does not meet the minimum testing requirement of testing primary keys. Any models that does not have both a `not_null` and `unique` test configured will be highlighted in this model.
`fct_missing_primary_key_tests` ([source](models/marts/tests/fct_missing_primary_key_tests.sql)) lists every model that does not meet the minimum testing requirement of testing primary keys. Any model that does not have either

1. a `not_null` test and a `unique` test applied to a single column OR
2. a `dbt_utils.unique_combination_of_columns` test applied to a set of columns

will be flagged by this model.

<details>
<summary><b>Reason to Flag</b></summary>
Expand All @@ -496,15 +501,16 @@ Tests are assertions you make about your models and other resources in your dbt
<details>
<summary><b>How to Remediate</b></summary>

Apply a [uniqueness test](https://docs.getdbt.com/reference/resource-properties/tests#unique) and a [not null test](https://docs.getdbt.com/reference/resource-properties/tests#not_null) to the column that represents the grain of your model in its schema entry. For models that are unique across a combination of columns, we recommend adding a surrogate key column to your model, then applying these tests to that new model. See the [`surrogate_key`](https://github.com/dbt-labs/dbt-utils#surrogate_key-source) macro from dbt_utils for more info!
Apply a [uniqueness test](https://docs.getdbt.com/reference/resource-properties/tests#unique) and a [not null test](https://docs.getdbt.com/reference/resource-properties/tests#not_null) to the column that represents the grain of your model in its schema entry. For models that are unique across a combination of columns, we recommend adding a surrogate key column to your model, then applying these tests to that new model. See the [`surrogate_key`](https://github.com/dbt-labs/dbt-utils#surrogate_key-source) macro from dbt_utils for more info! Alternatively, you can use the [`dbt_utils.unique_combination_of_columns`](<https://github.com/dbt-labs/dbt-utils#unique_combination_of_columns-source>) test from `dbt_utils`. Check out the [overriding variables section](#overriding-variables) to read more about configuring other primary key tests for your project!

Additional tests can be configured by applying a [generic test](https://docs.getdbt.com/docs/building-a-dbt-project/tests#generic-tests) in the model's `.yml` entry or by creating a [singular test](https://docs.getdbt.com/docs/building-a-dbt-project/tests#singular-tests)
in the `tests` directory of you project.
in the `tests` directory of you project.
</details>

#### Test Coverage
`fct_test_coverage` ([source](models/marts/tests/fct_test_coverage.sql)) contains metrics pertaining to project-wide test coverage.
Specifically, this models measures:

1. `test_coverage_pct`: the percentage of your models that have minimum 1 test applied.
2. `test_to_model_ratio`: the ratio of the number of tests in your dbt project to the number of models in your dbt project
3. `< model_type >_test_coverage_pct`: the percentage of each of your model types that have minimum 1 test applied.
Expand Down Expand Up @@ -892,21 +898,39 @@ models:
Currently, this package uses different variables to adapt the models to your objectives and naming conventions. They can all be updated directly in `dbt_project.yml`

<details>
<summary><b>Coverage Variables</b></summary>
<summary><b>Testing and Documentation Variables</b></summary>

| variable | description | default |
| ----------- | ----------- | ----------- |
| `test_coverage_pct` | the minimum acceptable test coverage percentage | 100% |
| `documentation_coverage_pct` | the minimum acceptable documentation coverage percentage | 100% |
| `primary_key_test_macros` | the set(s) of dbt tests used to check validity of a primary key | [["dbt.test_unique", "dbt.test_not_null"], ["dbt_utils.test_unique_combination_of_columns"]] |

**Usage notes for `primary_key_test_macros:`**

The `primary_key_test_macros` variable determines how the `fct_missing_primary_key_tests` ([source](models/marts/tests/fct_missing_primary_key_tests.sql)) model evaluates whether the models in your project are properly tested for their grain. This variable is a list and each entry **must be a list of test names in `project_name.test_macro_name` format**.

For each entry in the parent list, the logic in `int_model_test_summary` will evaluate whether each model has all of the tests in that entry applied. If a model meets the criteria of any of the entries in the parent list, it will be considered a pass. The default behavior for this package will check for whether each model has either:

1. __Both__ the `not_null` and `unique` tests applied to a single column OR
2. The `dbt_utils.unique_combination_of_columns` applied to the model.

Each set of test(s) that define a primary key requirement must be grouped together in a sub-list to ensure they are evaluated together (e.g. [`dbt.test_unique`, `dbt.test_not_null`] ).

*While it's not explicitly tested in this package, we strongly encourage adding a `not_null` test on each of the columns listed in the `dbt_utils.unique_combination_of_columns` tests.*


```yml
# dbt_project.yml
# set your test and doc coverage to 75% instead
# use the dbt_constraints.test_primary_key test to check for validity of your primary keys

vars:
dbt_project_evaluator:
documentation_coverage_target: 75
test_coverage_target: 75
primary_key_test_macros: [["dbt_constraints.test_primary_key"]]

```
</details>
Expand Down
2 changes: 2 additions & 0 deletions dbt_project.yml
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,8 @@ vars:
documentation_coverage_target: 100
test_coverage_target: 100

primary_key_test_macros: [["dbt.test_unique", "dbt.test_not_null"], ["dbt_utils.test_unique_combination_of_columns"]]

# -- DAG variables --
models_fanout_threshold: 3

Expand Down
6 changes: 6 additions & 0 deletions integration_tests/models/staging/source_1/schema.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,12 @@ models:
description: hocus pocus
tests:
- unique
- name: stg_model_3
tests:
- dbt_utils.unique_combination_of_columns:
combination_of_columns:
- id
- color
- name: stg_model_2
columns:
- name: id
Expand Down
4 changes: 3 additions & 1 deletion integration_tests/models/staging/source_1/stg_model_3.sql
Original file line number Diff line number Diff line change
@@ -1,2 +1,4 @@
-- depends on: {{ source('source_2', 'table_3') }}
select 1 as id
select 1 as id, 'blue' as color
union all
select 1 as id, 'red' as color
Original file line number Diff line number Diff line change
Expand Up @@ -8,5 +8,4 @@ report_1,FALSE,0
report_2,FALSE,0
report_3,FALSE,0
stg_model_1,FALSE,1
stg_model_3,FALSE,0
stg_model_5,FALSE,0
2 changes: 1 addition & 1 deletion integration_tests/seeds/tests/test_fct_test_coverage.csv
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
total_models,total_tests,tested_models,test_coverage_pct,staging_test_coverage_pct,intermediate_test_coverage_pct,marts_test_coverage_pct,other_test_coverage_pct,test_to_model_ratio
14,9,4,28.57,60.00,50.00,0.00,0.00,0.6429
14,10,5,35.71,80.00,50.00,0.00,0.00,0.7143
7 changes: 4 additions & 3 deletions macros/unpack/get_nodes.sql
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,6 @@
{%- set values = [] -%}

{%- for node in nodes_list -%}

{%- set values_line =
[
wrap_string_with_quotes(node.unique_id),
Expand All @@ -25,7 +24,8 @@
wrap_string_with_quotes(node.alias),
"cast(" ~ dbt_project_evaluator.is_not_empty_string(node.description) | trim ~ " as boolean)",
"''" if not node.column_name else wrap_string_with_quotes(dbt.escape_single_quotes(node.column_name)),
wrap_string_with_quotes(node.meta | tojson)
wrap_string_with_quotes(node.meta | tojson),
wrap_string_with_quotes(node.depends_on.macros | tojson)
]
%}

Expand All @@ -51,7 +51,8 @@
'alias',
('is_described', 'boolean'),
'column_name',
'meta'
'meta',
'macro_dependencies'
]
)
) }}
Expand Down
15 changes: 13 additions & 2 deletions models/marts/core/int_all_graph_resources.sql
Original file line number Diff line number Diff line change
@@ -1,4 +1,13 @@
-- one row for each resource in the graph

{# flatten the sets of permissable primary key test sets to one level for later iteration #}
{%- set test_macro_list = [] %}
{%- for test_set in var('primary_key_test_macros') -%}
{%- for test in test_set %}
{%- do test_macro_list.append(test) -%}
{%- endfor %}
{%- endfor -%}

with unioned as (

{{ dbt_utils.union_relations([
Expand Down Expand Up @@ -56,8 +65,9 @@ joined as (
end as model_type_folder,
{{ dbt.position('naming_convention_folders.folder_name_value','unioned_with_calc.directory_path') }} as position_folder,
nullif(unioned_with_calc.column_name, '') as column_name,
unioned_with_calc.resource_name like 'unique%' and unioned_with_calc.resource_type = 'test' as is_not_null_test,
unioned_with_calc.resource_name like 'not_null%' and unioned_with_calc.resource_type = 'test' as is_unique_test,
{% for test in test_macro_list %}
unioned_with_calc.macro_dependencies like '%macro.{{ test }}%' and unioned_with_calc.resource_type = 'test' as is_{{ test.split('.')[1] }},
{% endfor %}
unioned_with_calc.is_enabled,
unioned_with_calc.materialized,
unioned_with_calc.on_schema_change,
Expand All @@ -72,6 +82,7 @@ joined as (
unioned_with_calc.owner_name,
unioned_with_calc.owner_email,
unioned_with_calc.meta,
unioned_with_calc.macro_dependencies,
unioned_with_calc.metric_type,
unioned_with_calc.model,
unioned_with_calc.label,
Expand Down
22 changes: 20 additions & 2 deletions models/marts/tests/intermediate/int_model_test_summary.sql
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,15 @@ count_column_tests as (
select
relationships.direct_parent_id,
all_graph_resources.column_name,
count(distinct case when all_graph_resources.is_unique_test or all_graph_resources.is_not_null_test then relationships.resource_id else null end) primary_key_tests_count,
{%- for test_set in var('primary_key_test_macros') %}
{%- set outer_loop = loop -%}
count(distinct case when
{%- for test in test_set %}
all_graph_resources.is_{{ test.split('.')[1] }} {%- if not loop.last %} or {% endif %}
{%- endfor %}
then relationships.resource_id else null end
) as primary_key_method_{{ outer_loop.index }}_count,
{%- endfor %}
count(distinct relationships.resource_id) as tests_count
from all_graph_resources
left join relationships
Expand All @@ -27,7 +35,17 @@ agg_test_relationships as (

select
direct_parent_id,
sum(case when primary_key_tests_count = 2 then 1 else 0 end) >= 1 as is_primary_key_tested,
sum(case
when (
{%- for test_set in var('primary_key_test_macros') %}
{%- set compare_value = test_set | length %}
primary_key_method_{{ loop.index }}_count = {{ compare_value}}
{%- if not loop.last %} or {% endif %}
{%- endfor %}
) then 1
else 0
end
) >= 1 as is_primary_key_tested,
sum(tests_count) as number_of_tests_on_model
from count_column_tests
group by 1
Expand Down

0 comments on commit 33e0a59

Please sign in to comment.