diff --git a/README.md b/README.md index b920ff32..1884d313 100644 --- a/README.md +++ b/README.md @@ -486,7 +486,12 @@ or any other nested information. ### Testing #### Missing Primary Key Tests -`fct_missing_primary_key_tests` ([source](models/marts/tests/fct_missing_primary_key_tests.sql)) lists every model that does not meet the minimum testing requirement of testing primary keys. Any models that does not have both a `not_null` and `unique` test configured will be highlighted in this model. +`fct_missing_primary_key_tests` ([source](models/marts/tests/fct_missing_primary_key_tests.sql)) lists every model that does not meet the minimum testing requirement of testing primary keys. Any model that does not have either + +1. a `not_null` test and a `unique` test applied to a single column OR +2. a `dbt_utils.unique_combination_of_columns` test applied to a set of columns + +will be flagged by this model.
Reason to Flag @@ -496,15 +501,16 @@ Tests are assertions you make about your models and other resources in your dbt
How to Remediate -Apply a [uniqueness test](https://docs.getdbt.com/reference/resource-properties/tests#unique) and a [not null test](https://docs.getdbt.com/reference/resource-properties/tests#not_null) to the column that represents the grain of your model in its schema entry. For models that are unique across a combination of columns, we recommend adding a surrogate key column to your model, then applying these tests to that new model. See the [`surrogate_key`](https://github.com/dbt-labs/dbt-utils#surrogate_key-source) macro from dbt_utils for more info! +Apply a [uniqueness test](https://docs.getdbt.com/reference/resource-properties/tests#unique) and a [not null test](https://docs.getdbt.com/reference/resource-properties/tests#not_null) to the column that represents the grain of your model in its schema entry. For models that are unique across a combination of columns, we recommend adding a surrogate key column to your model, then applying these tests to that new model. See the [`surrogate_key`](https://github.com/dbt-labs/dbt-utils#surrogate_key-source) macro from dbt_utils for more info! Alternatively, you can use the [`dbt_utils.unique_combination_of_columns`]() test from `dbt_utils`. Check out the [overriding variables section](#overriding-variables) to read more about configuring other primary key tests for your project! Additional tests can be configured by applying a [generic test](https://docs.getdbt.com/docs/building-a-dbt-project/tests#generic-tests) in the model's `.yml` entry or by creating a [singular test](https://docs.getdbt.com/docs/building-a-dbt-project/tests#singular-tests) -in the `tests` directory of you project. +in the `tests` directory of you project.
#### Test Coverage `fct_test_coverage` ([source](models/marts/tests/fct_test_coverage.sql)) contains metrics pertaining to project-wide test coverage. Specifically, this models measures: + 1. `test_coverage_pct`: the percentage of your models that have minimum 1 test applied. 2. `test_to_model_ratio`: the ratio of the number of tests in your dbt project to the number of models in your dbt project 3. `< model_type >_test_coverage_pct`: the percentage of each of your model types that have minimum 1 test applied. @@ -892,21 +898,39 @@ models: Currently, this package uses different variables to adapt the models to your objectives and naming conventions. They can all be updated directly in `dbt_project.yml`
-Coverage Variables +Testing and Documentation Variables | variable | description | default | | ----------- | ----------- | ----------- | | `test_coverage_pct` | the minimum acceptable test coverage percentage | 100% | | `documentation_coverage_pct` | the minimum acceptable documentation coverage percentage | 100% | +| `primary_key_test_macros` | the set(s) of dbt tests used to check validity of a primary key | [["dbt.test_unique", "dbt.test_not_null"], ["dbt_utils.test_unique_combination_of_columns"]] | + +**Usage notes for `primary_key_test_macros:`** + +The `primary_key_test_macros` variable determines how the `fct_missing_primary_key_tests` ([source](models/marts/tests/fct_missing_primary_key_tests.sql)) model evaluates whether the models in your project are properly tested for their grain. This variable is a list and each entry **must be a list of test names in `project_name.test_macro_name` format**. + +For each entry in the parent list, the logic in `int_model_test_summary` will evaluate whether each model has all of the tests in that entry applied. If a model meets the criteria of any of the entries in the parent list, it will be considered a pass. The default behavior for this package will check for whether each model has either: + +1. __Both__ the `not_null` and `unique` tests applied to a single column OR +2. The `dbt_utils.unique_combination_of_columns` applied to the model. + +Each set of test(s) that define a primary key requirement must be grouped together in a sub-list to ensure they are evaluated together (e.g. [`dbt.test_unique`, `dbt.test_not_null`] ). + +*While it's not explicitly tested in this package, we strongly encourage adding a `not_null` test on each of the columns listed in the `dbt_utils.unique_combination_of_columns` tests.* + ```yml # dbt_project.yml # set your test and doc coverage to 75% instead +# use the dbt_constraints.test_primary_key test to check for validity of your primary keys vars: dbt_project_evaluator: documentation_coverage_target: 75 test_coverage_target: 75 + primary_key_test_macros: [["dbt_constraints.test_primary_key"]] + ```
diff --git a/dbt_project.yml b/dbt_project.yml index e39c10eb..18112403 100644 --- a/dbt_project.yml +++ b/dbt_project.yml @@ -57,6 +57,8 @@ vars: documentation_coverage_target: 100 test_coverage_target: 100 + primary_key_test_macros: [["dbt.test_unique", "dbt.test_not_null"], ["dbt_utils.test_unique_combination_of_columns"]] + # -- DAG variables -- models_fanout_threshold: 3 diff --git a/integration_tests/models/staging/source_1/schema.yml b/integration_tests/models/staging/source_1/schema.yml index 532d7c96..a5d77632 100644 --- a/integration_tests/models/staging/source_1/schema.yml +++ b/integration_tests/models/staging/source_1/schema.yml @@ -8,6 +8,12 @@ models: description: hocus pocus tests: - unique + - name: stg_model_3 + tests: + - dbt_utils.unique_combination_of_columns: + combination_of_columns: + - id + - color - name: stg_model_2 columns: - name: id diff --git a/integration_tests/models/staging/source_1/stg_model_3.sql b/integration_tests/models/staging/source_1/stg_model_3.sql index 29b375cf..d68d7bb3 100644 --- a/integration_tests/models/staging/source_1/stg_model_3.sql +++ b/integration_tests/models/staging/source_1/stg_model_3.sql @@ -1,2 +1,4 @@ -- depends on: {{ source('source_2', 'table_3') }} -select 1 as id \ No newline at end of file +select 1 as id, 'blue' as color +union all +select 1 as id, 'red' as color \ No newline at end of file diff --git a/integration_tests/seeds/tests/test_fct_missing_primary_key_tests.csv b/integration_tests/seeds/tests/test_fct_missing_primary_key_tests.csv index c6bd5a0c..2cedcd54 100644 --- a/integration_tests/seeds/tests/test_fct_missing_primary_key_tests.csv +++ b/integration_tests/seeds/tests/test_fct_missing_primary_key_tests.csv @@ -8,5 +8,4 @@ report_1,FALSE,0 report_2,FALSE,0 report_3,FALSE,0 stg_model_1,FALSE,1 -stg_model_3,FALSE,0 stg_model_5,FALSE,0 diff --git a/integration_tests/seeds/tests/test_fct_test_coverage.csv b/integration_tests/seeds/tests/test_fct_test_coverage.csv index 15b48525..62196693 100644 --- a/integration_tests/seeds/tests/test_fct_test_coverage.csv +++ b/integration_tests/seeds/tests/test_fct_test_coverage.csv @@ -1,2 +1,2 @@ total_models,total_tests,tested_models,test_coverage_pct,staging_test_coverage_pct,intermediate_test_coverage_pct,marts_test_coverage_pct,other_test_coverage_pct,test_to_model_ratio -14,9,4,28.57,60.00,50.00,0.00,0.00,0.6429 +14,10,5,35.71,80.00,50.00,0.00,0.00,0.7143 diff --git a/macros/unpack/get_nodes.sql b/macros/unpack/get_nodes.sql index 7e85c967..ebbb1698 100644 --- a/macros/unpack/get_nodes.sql +++ b/macros/unpack/get_nodes.sql @@ -9,7 +9,6 @@ {%- set values = [] -%} {%- for node in nodes_list -%} - {%- set values_line = [ wrap_string_with_quotes(node.unique_id), @@ -25,7 +24,8 @@ wrap_string_with_quotes(node.alias), "cast(" ~ dbt_project_evaluator.is_not_empty_string(node.description) | trim ~ " as boolean)", "''" if not node.column_name else wrap_string_with_quotes(dbt.escape_single_quotes(node.column_name)), - wrap_string_with_quotes(node.meta | tojson) + wrap_string_with_quotes(node.meta | tojson), + wrap_string_with_quotes(node.depends_on.macros | tojson) ] %} @@ -51,7 +51,8 @@ 'alias', ('is_described', 'boolean'), 'column_name', - 'meta' + 'meta', + 'macro_dependencies' ] ) ) }} diff --git a/models/marts/core/int_all_graph_resources.sql b/models/marts/core/int_all_graph_resources.sql index b65898b7..b64fae08 100644 --- a/models/marts/core/int_all_graph_resources.sql +++ b/models/marts/core/int_all_graph_resources.sql @@ -1,4 +1,13 @@ -- one row for each resource in the graph + +{# flatten the sets of permissable primary key test sets to one level for later iteration #} +{%- set test_macro_list = [] %} +{%- for test_set in var('primary_key_test_macros') -%} + {%- for test in test_set %} + {%- do test_macro_list.append(test) -%} + {%- endfor %} +{%- endfor -%} + with unioned as ( {{ dbt_utils.union_relations([ @@ -56,8 +65,9 @@ joined as ( end as model_type_folder, {{ dbt.position('naming_convention_folders.folder_name_value','unioned_with_calc.directory_path') }} as position_folder, nullif(unioned_with_calc.column_name, '') as column_name, - unioned_with_calc.resource_name like 'unique%' and unioned_with_calc.resource_type = 'test' as is_not_null_test, - unioned_with_calc.resource_name like 'not_null%' and unioned_with_calc.resource_type = 'test' as is_unique_test, + {% for test in test_macro_list %} + unioned_with_calc.macro_dependencies like '%macro.{{ test }}%' and unioned_with_calc.resource_type = 'test' as is_{{ test.split('.')[1] }}, + {% endfor %} unioned_with_calc.is_enabled, unioned_with_calc.materialized, unioned_with_calc.on_schema_change, @@ -72,6 +82,7 @@ joined as ( unioned_with_calc.owner_name, unioned_with_calc.owner_email, unioned_with_calc.meta, + unioned_with_calc.macro_dependencies, unioned_with_calc.metric_type, unioned_with_calc.model, unioned_with_calc.label, diff --git a/models/marts/tests/intermediate/int_model_test_summary.sql b/models/marts/tests/intermediate/int_model_test_summary.sql index 5847f41d..80ede484 100644 --- a/models/marts/tests/intermediate/int_model_test_summary.sql +++ b/models/marts/tests/intermediate/int_model_test_summary.sql @@ -13,7 +13,15 @@ count_column_tests as ( select relationships.direct_parent_id, all_graph_resources.column_name, - count(distinct case when all_graph_resources.is_unique_test or all_graph_resources.is_not_null_test then relationships.resource_id else null end) primary_key_tests_count, + {%- for test_set in var('primary_key_test_macros') %} + {%- set outer_loop = loop -%} + count(distinct case when + {%- for test in test_set %} + all_graph_resources.is_{{ test.split('.')[1] }} {%- if not loop.last %} or {% endif %} + {%- endfor %} + then relationships.resource_id else null end + ) as primary_key_method_{{ outer_loop.index }}_count, + {%- endfor %} count(distinct relationships.resource_id) as tests_count from all_graph_resources left join relationships @@ -27,7 +35,17 @@ agg_test_relationships as ( select direct_parent_id, - sum(case when primary_key_tests_count = 2 then 1 else 0 end) >= 1 as is_primary_key_tested, + sum(case + when ( + {%- for test_set in var('primary_key_test_macros') %} + {%- set compare_value = test_set | length %} + primary_key_method_{{ loop.index }}_count = {{ compare_value}} + {%- if not loop.last %} or {% endif %} + {%- endfor %} + ) then 1 + else 0 + end + ) >= 1 as is_primary_key_tested, sum(tests_count) as number_of_tests_on_model from count_column_tests group by 1 diff --git a/packages.yml b/packages.yml index 5fda0482..659747e7 100644 --- a/packages.yml +++ b/packages.yml @@ -1,3 +1,3 @@ packages: - package: dbt-labs/dbt_utils - version: [">1.0.0", "<2.0.0"] \ No newline at end of file + version: [">1.0.0", "<2.0.0"]