From 37199d3901a97e813899dc25c1f72601714c2c3c Mon Sep 17 00:00:00 2001 From: Joel Labes Date: Fri, 8 Apr 2022 10:44:47 +1200 Subject: [PATCH] dbt utils 0.8.3 (#534) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit * Fix/timestamp withought timezone (#458) * timestamp and changelog updates * changelog fix * Add context for why change to no timezone Co-authored-by: Joel Labes * also ignore dbt_packages (#463) * also ignore dbt_packages * Update CHANGELOG.md * Update CHANGELOG.md Co-authored-by: Joel Labes * date_spine: transform comment to jinja (#462) * Have union_relations raise exception when include parameter results in no columns (#473) * Raise exception if no columns in column_superset * Add relation names to compiler error message * Add `union_relations` fix to changelog * Added case for handling postgres foreign tables... (#476) * Add link for fewer_rows_than schema test in docs (#465) * Added case for handling postgres foreign tables (tables which are external to current database and are imported into current database from remote data stores by using Foreign Data Wrappers functionallity). * Reworked getting of postges table_type. * Added needed changes to CHANGELOG. Co-authored-by: José Coto Co-authored-by: Taras Stetsiak * Enhance usability of star macro by only generating column aliases when prefix and/or suffix is specified (#468) * The star macro should only produce column aliases when there is either a prefix or suffix specified. * Enhanced the readme for the star macro. * Add new integration test Co-authored-by: Nick Perrott Co-authored-by: Josh Elston-Green Co-authored-by: Joel Labes * fix: extra brace typo in insert_by_period_materialization (#480) * Support quoted column names in sequential_values test (#479) * Add any value (#501) * Add link for fewer_rows_than schema test in docs (#465) * Update get_query_results_as_dict example to demonstrate accessing columnar results as dictionary values (#474) * Update get_qu ery_results_as_dict example to demonstrate accessing columnar results as dictionary values * Use slugify in example * Fix slugify example with dbt_utils. package prefix Co-authored-by: Elize Papineau * Add note about not_null_where deprecation to Readme (#477) * Add note about not_null_where deprecation to Readme * Add docs to unique_where test * Update pull_request_template.md to reference `main` vs `master` (#496) * Correct coalesce -> concatenation typo (#495) * add any_value cross-db macro * Missing colon in test * Update CHANGELOG.md Co-authored-by: José Coto Co-authored-by: Elize Papineau Co-authored-by: Elize Papineau Co-authored-by: Joe Ste.Marie Co-authored-by: Niall Woodward * Fix changelog * Second take at fixing pivot to allow single quotes (#503) * fix pivot : in pivoted column value, single quote must be escaped (on postgresql) else ex. syntax error near : when color = 'blue's' * patched expected * single quote escape : added dispatched version of the macro to support bigquery & snowflake * second backslash to escape in Jinja, change case of test file columns Let's see if other databases allow this * explicitly list columns to compare * different tests for snowflake and others * specific comparison seed * Don't quote identifiers for apostrophe, to avoid BQ and SF problems * Whitespace management for macros * Update CHANGELOG.md Co-authored-by: Marc Dutoo * Add bool or cross db (#504) * Create bool_or cross-db func * Forgot a comma * Update CHANGELOG.md * Code review tweaks * Fix union_relations error when no include/exclude provided (#509) * Update CHANGELOG.md * Add _is_ephemeral test to get_column_values (#518) * Add _is_ephemeral test Co-authored-by: Elize Papineau * Add deduplication macro (#512) * Update README.md * Mutually excl range examples in disclosure triangle * Fix union_relations error when no include/exclude provided * Fix union_relations error when no include/exclude provided (#509) * Update CHANGELOG.md * Add dedupe macro * Add test for dedupe macro * Add documentation to README * Add entry to CHANGELOG * Implement review * Typed materialized views as views (#525) * Typed materialized views as views * Update get_relations_by_pattern.sql * Moving fix from get_tables_by_pattern_sql reverting changes to this file to add a fix to the macro get_tables_by_pattern_sql * removing quoting from table_type removing quoting from table_type as this was causing an error when calling this macro within get_tables_by_pattern_sql * calling get_table_types_sql for materialized views calling get_table_types_sql macro to handle materialized views in sources. * Add `alias` argument to `deduplicate` macro (#526) * Add `alias` argument to `deduplicate * Test `alias` argument * Rename `alias` to `relation_alias` * Fix/use generic test naming style instead of schema test (#521) * Updated Rreferences to 'schema test' in README along with small improvements to test descriptions. Updates were also carried out in folder structure and integration README * Updated references to 'schema test' in Changelog * updated changelog with changes to documentation and fproject file structure * Apply suggestions from code review Update macro descriptions to be "asserts that" * Update CHANGELOG.md * Update README.md Co-authored-by: Joel Labes * Remove extraneous whitespace (#529) * rm whitespace from date_trunc * datediff * rm uncessary whitespace control * change log * fix CHANGELOG * address comments * Feature/add listagg macro (#530) * Update README.md * Mutually excl range examples in disclosure triangle * Fix union_relations error when no include/exclude provided * Fix union_relations error when no include/exclude provided (#509) * Update CHANGELOG.md * Add to_condition to relationships where * very minor nit - update "an new" to "a new" (#519) * add quoting to split_part (#528) * add quoting to split_part * update docs for split_part * typo * corrected readme syntax * revert and update to just documentation * add new line * Update README.md * Update README.md * Update README.md Co-authored-by: Joel Labes * add macro to get columns (#516) * add macro to get columns * star macro should use get_columns * add adapter. * swap adapter for dbt_utils Co-authored-by: Joel Labes * update documentation * add output_lower arg * update name to get_filtered_columns_in_relation from get_columns * add tests * forgot args * too much whitespace removal ----------- Actual: ----------- --->"field_3"as "test_field_3"<--- ----------- Expected: ----------- --->"field_3" as "test_field_3"<--- * didnt mean to move a file that i did not create. moving things back. * remove lowercase logic * limit_zero Co-authored-by: Joel Labes * Add listagg macro and integration test * remove type in listagg macro * updated integration test * Add redshift to listagg macro * remove redshift listagg * explicitly named group by column * updated default values * Updated example to use correct double vs. single quotes * whitespace control * Added redshift specific macro * Remove documentation * Update integration test so less likely to accidentally work Co-authored-by: Joel Labes * default everything but measure to none * added limit functionality for other dbs * syntax bug for postgres * update redshift macro * fixed block def control * Fixed bug in redshift * Bug fix redshift * remove unused group_by arg * Added additional test without order by col * updated to regex replace * typo * added more integration_tests * attempt to make redshift less complicated * typo * update redshift * replace to substr * More explicit versions with added complexity * handle special characters Co-authored-by: Joel Labes Co-authored-by: Jamie Rosenberg Co-authored-by: Pat Kearns * patch default behaviour in get_column_values (#533) * Update changelog, add missing quotes around get_table_types_sql * rm whitespace Co-authored-by: Joe Markiewicz <74217849+fivetran-joemarkiewicz@users.noreply.github.com> Co-authored-by: Anders Co-authored-by: Mikaël Simarik Co-authored-by: Graham Wetzler Co-authored-by: Taras <32882370+Aesthet@users.noreply.github.com> Co-authored-by: José Coto Co-authored-by: Taras Stetsiak Co-authored-by: nickperrott <46330920+nickperrott@users.noreply.github.com> Co-authored-by: Nick Perrott Co-authored-by: Ted Conbeer Co-authored-by: Armand Duijn Co-authored-by: Elize Papineau Co-authored-by: Elize Papineau Co-authored-by: Joe Ste.Marie Co-authored-by: Niall Woodward Co-authored-by: Marc Dutoo Co-authored-by: Judah Rand <17158624+judahrand@users.noreply.github.com> Co-authored-by: Luis Leon <98919783+luisleon90@users.noreply.github.com> Co-authored-by: Brid Moynihan Co-authored-by: SunriseLong <44146580+SunriseLong@users.noreply.github.com> Co-authored-by: Grace Goheen <53586774+graciegoheen@users.noreply.github.com> Co-authored-by: Jamie Rosenberg Co-authored-by: Pat Kearns Co-authored-by: James McNeill <55981540+jpmmcneill@users.noreply.github.com> --- CHANGELOG.md | 47 ++++++-- README.md | 96 +++++++++------- integration_tests/README.md | 4 +- .../data/cross_db/data_listagg.csv | 10 ++ .../data/cross_db/data_listagg_output.csv | 10 ++ .../data/sql/data_deduplicate.csv | 4 + .../data/sql/data_deduplicate_expected.csv | 2 + .../models/cross_db_utils/schema.yml | 6 + .../models/cross_db_utils/test_listagg.sql | 69 ++++++++++++ .../models/datetime/test_date_spine.sql | 2 +- .../schema.yml | 0 .../test_equal_column_subset.sql | 0 .../test_equal_rowcount.sql | 0 .../test_fewer_rows_than.sql | 0 .../test_recency.sql | 0 integration_tests/models/sql/schema.yml | 9 +- .../models/sql/test_deduplicate.sql | 22 ++++ .../models/sql/test_generate_series.sql | 2 +- macros/cross_db_utils/date_trunc.sql | 8 +- macros/cross_db_utils/datediff.sql | 16 +-- macros/cross_db_utils/listagg.sql | 104 ++++++++++++++++++ .../accepted_range.sql | 0 .../at_least_one.sql | 0 .../cardinality_equality.sql | 0 .../equal_rowcount.sql | 0 .../equality.sql | 0 .../expression_is_true.sql | 0 .../fewer_rows_than.sql | 0 .../mutually_exclusive_ranges.sql | 0 .../not_accepted_values.sql | 0 .../not_constant.sql | 0 .../not_null_proportion.sql | 0 .../recency.sql | 0 .../relationships_where.sql | 0 .../sequential_values.sql | 0 .../test_not_null_where.sql | 0 .../test_unique_where.sql | 0 .../unique_combination_of_columns.sql | 0 macros/sql/deduplicate.sql | 46 ++++++++ macros/sql/get_column_values.sql | 6 +- macros/sql/get_tables_by_pattern_sql.sql | 5 +- 41 files changed, 394 insertions(+), 74 deletions(-) create mode 100644 integration_tests/data/cross_db/data_listagg.csv create mode 100644 integration_tests/data/cross_db/data_listagg_output.csv create mode 100644 integration_tests/data/sql/data_deduplicate.csv create mode 100644 integration_tests/data/sql/data_deduplicate_expected.csv create mode 100644 integration_tests/models/cross_db_utils/test_listagg.sql rename integration_tests/models/{schema_tests => generic_tests}/schema.yml (100%) rename integration_tests/models/{schema_tests => generic_tests}/test_equal_column_subset.sql (100%) rename integration_tests/models/{schema_tests => generic_tests}/test_equal_rowcount.sql (100%) rename integration_tests/models/{schema_tests => generic_tests}/test_fewer_rows_than.sql (100%) rename integration_tests/models/{schema_tests => generic_tests}/test_recency.sql (100%) create mode 100644 integration_tests/models/sql/test_deduplicate.sql create mode 100644 macros/cross_db_utils/listagg.sql rename macros/{schema_tests => generic_tests}/accepted_range.sql (100%) rename macros/{schema_tests => generic_tests}/at_least_one.sql (100%) rename macros/{schema_tests => generic_tests}/cardinality_equality.sql (100%) rename macros/{schema_tests => generic_tests}/equal_rowcount.sql (100%) rename macros/{schema_tests => generic_tests}/equality.sql (100%) rename macros/{schema_tests => generic_tests}/expression_is_true.sql (100%) rename macros/{schema_tests => generic_tests}/fewer_rows_than.sql (100%) rename macros/{schema_tests => generic_tests}/mutually_exclusive_ranges.sql (100%) rename macros/{schema_tests => generic_tests}/not_accepted_values.sql (100%) rename macros/{schema_tests => generic_tests}/not_constant.sql (100%) rename macros/{schema_tests => generic_tests}/not_null_proportion.sql (100%) rename macros/{schema_tests => generic_tests}/recency.sql (100%) rename macros/{schema_tests => generic_tests}/relationships_where.sql (100%) rename macros/{schema_tests => generic_tests}/sequential_values.sql (100%) rename macros/{schema_tests => generic_tests}/test_not_null_where.sql (100%) rename macros/{schema_tests => generic_tests}/test_unique_where.sql (100%) rename macros/{schema_tests => generic_tests}/unique_combination_of_columns.sql (100%) create mode 100644 macros/sql/deduplicate.sql diff --git a/CHANGELOG.md b/CHANGELOG.md index f346c55b..e8e4bbe8 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,9 +1,35 @@ +# dbt-utils v0.8.3 +## New features +- A macro for deduplicating data, `deduplicate()` ([#335](https://github.com/dbt-labs/dbt-utils/issues/335), [#512](https://github.com/dbt-labs/dbt-utils/pull/512)) +- A cross-database implementation of `listagg()` ([#530](https://github.com/dbt-labs/dbt-utils/pull/530)) +- A new macro to get the columns in a relation as a list, `get_filtered_columns_in_relation()`. This is similar to the `star()` macro, but creates a Jinja list instead of a comma-separated string. ([#516](https://github.com/dbt-labs/dbt-utils/pull/516)) + +## Fixes +- `get_column_values()` once more raises an error when the model doesn't exist and there is no default provided ([#531](https://github.com/dbt-labs/dbt-utils/issues/531), [#533](https://github.com/dbt-labs/dbt-utils/pull/533)) +- `get_column_values()` raises an error when used with an ephemeral model, instead of getting stuck in a compilation loop ([#358](https://github.com/dbt-labs/dbt-utils/issues/358), [#518](https://github.com/dbt-labs/dbt-utils/pull/518)) +- BigQuery materialized views work correctly with `get_relations_by_pattern()` ([#525](https://github.com/dbt-labs/dbt-utils/pull/525)) + +## Quality of life +- Updated references to 'schema test' in project file structure and documentation ([#485](https://github.com/dbt-labs/dbt-utils/issues/485), [#521](https://github.com/dbt-labs/dbt-utils/pull/521)) +- `date_trunc()` and `datediff()` default macros now have whitespace control to assist with linting and readability [#529](https://github.com/dbt-labs/dbt-utils/pull/529) +- `star()` no longer raises an error during SQLFluff linting ([#506](https://github.com/dbt-labs/dbt-utils/issues/506), [#532](https://github.com/dbt-labs/dbt-utils/pull/532)) + +## Contributors: +- [@judahrand](https://github.com/judahrand) (#512) +- [@b-moynihan](https://github.com/b-moynihan) (#521) +- [@sunriselong](https://github.com/sunriselong) (#529) +- [@jpmmcneill](https://github.com/jpmmcneill) (#533) +- [@KamranAMalik](https://github.com/KamranAMalik) (#532) +- [@graciegoheen](https://github.com/graciegoheen) (#530) +- [@luisleon90](https://github.com/luisleon90) (#525) +- [@epapineau](https://github.com/epapineau) (#518) +- [@patkearns10](https://github.com/patkearns10) (#516) + # dbt-utils v0.8.2 ## Fixes - Fix union_relations error from [#473](https://github.com/dbt-labs/dbt-utils/pull/473) when no include/exclude parameters are provided ([#505](https://github.com/dbt-labs/dbt-utils/issues/505), [#509](https://github.com/dbt-labs/dbt-utils/pull/509)) # dbt-utils v0.8.1 - ## New features - A cross-database implementation of `any_value()` ([#497](https://github.com/dbt-labs/dbt-utils/issues/497), [#501](https://github.com/dbt-labs/dbt-utils/pull/501)) - A cross-database implementation of `bool_or()` ([#504](https://github.com/dbt-labs/dbt-utils/pull/504)) @@ -29,10 +55,9 @@ - [armandduijn](https://github.com/armandduijn) (#479) - [mdutoo](https://github.com/mdutoo) (#503) - # dbt-utils v0.8.0 ## 🚨 Breaking changes -- dbt ONE POINT OH is here! This version of dbt-utils requires _any_ version (minor and patch) of v1, which means far less need for compatibility releases in the future. +- dbt ONE POINT OH is here! This version of dbt-utils requires _any_ version (minor and patch) of v1, which means far less need for compatibility releases in the future. - The partition column in the `mutually_exclusive_ranges` test is now always called `partition_by_col`. This enables compatibility with `--store-failures` when multiple columns are concatenated together. If you have models built on top of the failures table, update them to reflect the new column name. ([#423](https://github.com/dbt-labs/dbt-utils/issues/423), [#430](https://github.com/dbt-labs/dbt-utils/pull/430)) ## Contributors: @@ -87,12 +112,12 @@ ## Features -- Add `not_null_proportion` schema test that allows the user to specify the minimum (`at_least`) tolerated proportion (e.g., `0.95`) of non-null values ([#411](https://github.com/dbt-labs/dbt-utils/pull/411)) +- Add `not_null_proportion` generic test that allows the user to specify the minimum (`at_least`) tolerated proportion (e.g., `0.95`) of non-null values ([#411](https://github.com/dbt-labs/dbt-utils/pull/411)) ## Under the hood - Allow user to provide any case type when defining the `exclude` argument in `dbt_utils.star()` ([#403](https://github.com/dbt-labs/dbt-utils/pull/403)) -- Log whole row instead of just column name in 'accepted_range' schema test to allow better visibility into failures ([#413](https://github.com/dbt-labs/dbt-utils/pull/413)) +- Log whole row instead of just column name in 'accepted_range' generic test to allow better visibility into failures ([#413](https://github.com/dbt-labs/dbt-utils/pull/413)) - Use column name to group in 'get_column_values ' to allow better cross db functionality ([#407](https://github.com/dbt-labs/dbt-utils/pull/407)) # dbt-utils v0.7.1 @@ -149,7 +174,7 @@ If you were relying on the position to match up your optional arguments, this ma ## Features * Add new argument, `order_by`, to `get_column_values` (code originally in [#289](https://github.com/fishtown-analytics/dbt-utils/pull/289/) from [@clausherther](https://github.com/clausherther), merged via [#349](https://github.com/fishtown-analytics/dbt-utils/pull/349/)) * Add `slugify` macro, and use it in the pivot macro. :rotating_light: This macro uses the `re` module, which is only available in dbt v0.19.0+. As a result, this feature introduces a breaking change. ([#314](https://github.com/fishtown-analytics/dbt-utils/pull/314)) -* Add `not_null_proportion` schema test that allows the user to specify the minimum (`at_least`) tolerated proportion (e.g., `0.95`) of non-null values +* Add `not_null_proportion` generic test that allows the user to specify the minimum (`at_least`) tolerated proportion (e.g., `0.95`) of non-null values ## Under the hood * Update the default implementation of concat macro to use `||` operator ([#373](https://github.com/fishtown-analytics/dbt-utils/pull/314) from [@ChristopheDuong](https://github.com/ChristopheDuong)). Note this may be a breaking change for adapters that support `concat()` but not `||`, such as Apache Spark. @@ -160,18 +185,18 @@ If you were relying on the position to match up your optional arguments, this ma ## Fixes -- make `sequential_values` schema test use `dbt_utils.type_timestamp()` to allow for compatibility with db's without timestamp data type. [#376](https://github.com/fishtown-analytics/dbt-utils/pull/376) from [@swanderz](https://github.com/swanderz) +- make `sequential_values` generic test use `dbt_utils.type_timestamp()` to allow for compatibility with db's without timestamp data type. [#376](https://github.com/fishtown-analytics/dbt-utils/pull/376) from [@swanderz](https://github.com/swanderz) # dbt-utils v0.6.5 ## Features * Add new `accepted_range` test ([#276](https://github.com/fishtown-analytics/dbt-utils/pull/276) [@joellabes](https://github.com/joellabes)) * Make `expression_is_true` work as a column test (code originally in [#226](https://github.com/fishtown-analytics/dbt-utils/pull/226/) from [@elliottohara](https://github.com/elliottohara), merged via [#313](https://github.com/fishtown-analytics/dbt-utils/pull/313/)) -* Add new schema test, `not_accepted_values` ([#284](https://github.com/fishtown-analytics/dbt-utils/pull/284) [@JavierMonton](https://github.com/JavierMonton)) +* Add new generic test, `not_accepted_values` ([#284](https://github.com/fishtown-analytics/dbt-utils/pull/284) [@JavierMonton](https://github.com/JavierMonton)) * Support a new argument, `zero_length_range_allowed` in the `mutually_exclusive_ranges` test ([#307](https://github.com/fishtown-analytics/dbt-utils/pull/307) [@zemekeneng](https://github.com/zemekeneng)) -* Add new schema test, `sequential_values` ([#318](https://github.com/fishtown-analytics/dbt-utils/pull/318), inspired by [@hundredwatt](https://github.com/hundredwatt)) +* Add new generic test, `sequential_values` ([#318](https://github.com/fishtown-analytics/dbt-utils/pull/318), inspired by [@hundredwatt](https://github.com/hundredwatt)) * Support `quarter` in the `postgres__last_day` macro ([#333](https://github.com/fishtown-analytics/dbt-utils/pull/333/files) [@seunghanhong](https://github.com/seunghanhong)) * Add new argument, `unit`, to `haversine_distance` ([#340](https://github.com/fishtown-analytics/dbt-utils/pull/340) [@bastienboutonnet](https://github.com/bastienboutonnet)) -* Add new schema test, `fewer_rows_than` (code originally in [#221](https://github.com/fishtown-analytics/dbt-utils/pull/230/) from [@dmarts](https://github.com/dmarts), merged via [#343](https://github.com/fishtown-analytics/dbt-utils/pull/343/)) +* Add new generic test, `fewer_rows_than` (code originally in [#221](https://github.com/fishtown-analytics/dbt-utils/pull/230/) from [@dmarts](https://github.com/dmarts), merged via [#343](https://github.com/fishtown-analytics/dbt-utils/pull/343/)) ## Fixes * Handle booleans gracefully in the unpivot macro ([#305](https://github.com/fishtown-analytics/dbt-utils/pull/305) [@avishalom](https://github.com/avishalom)) @@ -245,7 +270,7 @@ enabling users of community-supported database plugins to add or override macro specific to their database ([#267](https://github.com/fishtown-analytics/dbt-utils/pull/267)) * Use `add_ephemeral_prefix` instead of hard-coding a string literal, to support database adapters that use different prefixes ([#267](https://github.com/fishtown-analytics/dbt-utils/pull/267)) -* Implement a quote_columns argument in the unique_combination_of_columns schema test ([#270](https://github.com/fishtown-analytics/dbt-utils/pull/270) [@JoshuaHuntley](https://github.com/JoshuaHuntley)) +* Implement a quote_columns argument in the unique_combination_of_columns generic test ([#270](https://github.com/fishtown-analytics/dbt-utils/pull/270) [@JoshuaHuntley](https://github.com/JoshuaHuntley)) ## Quality of life * Remove deprecated macros `get_tables_by_prefix` and `union_tables` ([#268](https://github.com/fishtown-analytics/dbt-utils/pull/268)) diff --git a/README.md b/README.md index 324d4091..6502b2b9 100644 --- a/README.md +++ b/README.md @@ -9,7 +9,7 @@ For compatibility details between versions of dbt-core and dbt-utils, [see this ---- ## Contents -**[Schema tests](#schema-tests)** +**[Generic tests](#generic-tests)** - [equal_rowcount](#equal_rowcount-source) - [fewer_rows_than](#fewer_rows_than-source) - [equality](#equality-source) @@ -37,6 +37,7 @@ For compatibility details between versions of dbt-core and dbt-utils, [see this - [SQL generators](#sql-generators) - [date_spine](#date_spine-source) + - [deduplicate](#deduplicate) - [haversine_distance](#haversine_distance-source) - [group_by](#group_by-source) - [star](#star-source) @@ -59,6 +60,7 @@ For compatibility details between versions of dbt-core and dbt-utils, [see this - [split_part](#split_part-source) - [last_day](#last_day-source) - [width_bucket](#width_bucket-source) + - [listagg](#listagg) - [Jinja Helpers](#jinja-helpers) - [pretty_time](#pretty_time-source) @@ -69,9 +71,9 @@ For compatibility details between versions of dbt-core and dbt-utils, [see this - [insert_by_period](#insert_by_period-source) ---- -### Schema Tests -#### equal_rowcount ([source](macros/schema_tests/equal_rowcount.sql)) -This schema test asserts the that two relations have the same number of rows. +### Generic Tests +#### equal_rowcount ([source](macros/generic_tests/equal_rowcount.sql)) +Asserts that two relations have the same number of rows. **Usage:** ```yaml @@ -85,8 +87,8 @@ models: ``` -#### fewer_rows_than ([source](macros/schema_tests/fewer_rows_than.sql)) -This schema test asserts that this model has fewer rows than the referenced model. +#### fewer_rows_than ([source](macros/generic_tests/fewer_rows_than.sql)) +Asserts that the respective model has fewer rows than the model being compared. Usage: ```yaml @@ -99,8 +101,8 @@ models: compare_model: ref('other_table_name') ``` -#### equality ([source](macros/schema_tests/equality.sql)) -This schema test asserts the equality of two relations. Optionally specify a subset of columns to compare. +#### equality ([source](macros/generic_tests/equality.sql)) +Asserts the equality of two relations. Optionally specify a subset of columns to compare. **Usage:** ```yaml @@ -116,8 +118,13 @@ models: - second_column ``` -#### expression_is_true ([source](macros/schema_tests/expression_is_true.sql)) -This schema test asserts that a valid sql expression is true for all records. This is useful when checking integrity across columns, for example, that a total is equal to the sum of its parts, or that at least one column is true. +#### expression_is_true ([source](macros/generic_tests/expression_is_true.sql)) +Asserts that a valid SQL expression is true for all records. This is useful when checking integrity across columns. +Examples: + +- Verify an outcome based on the application of basic alegbraic operations between columns. +- Verify the length of a column. +- Verify the truth value of a column. **Usage:** ```yaml @@ -164,8 +171,8 @@ models: condition: col_a = 1 ``` -#### recency ([source](macros/schema_tests/recency.sql)) -This schema test asserts that there is data in the referenced model at least as recent as the defined interval prior to the current timestamp. +#### recency ([source](macros/generic_tests/recency.sql)) +Asserts that a timestamp column in the reference model contains data that is at least as recent as the defined date interval. **Usage:** ```yaml @@ -180,8 +187,8 @@ models: interval: 1 ``` -#### at_least_one ([source](macros/schema_tests/at_least_one.sql)) -This schema test asserts if column has at least one value. +#### at_least_one ([source](macros/generic_tests/at_least_one.sql)) +Asserts that a column has at least one value. **Usage:** ```yaml @@ -195,8 +202,8 @@ models: - dbt_utils.at_least_one ``` -#### not_constant ([source](macros/schema_tests/not_constant.sql)) -This schema test asserts if column does not have same value in all rows. +#### not_constant ([source](macros/generic_tests/not_constant.sql)) +Asserts that a column does not have the same value in all rows. **Usage:** ```yaml @@ -210,8 +217,8 @@ models: - dbt_utils.not_constant ``` -#### cardinality_equality ([source](macros/schema_tests/cardinality_equality.sql)) -This schema test asserts if values in a given column have exactly the same cardinality as values from a different column in a different model. +#### cardinality_equality ([source](macros/generic_tests/cardinality_equality.sql)) +Asserts that values in a given column have exactly the same cardinality as values from a different column in a different model. **Usage:** ```yaml @@ -227,8 +234,8 @@ models: to: ref('other_model_name') ``` -#### unique_where ([source](macros/schema_tests/test_unique_where.sql)) -This test validates that there are no duplicate values present in a field for a subset of rows by specifying a `where` clause. +#### unique_where ([source](macros/generic_tests/test_unique_where.sql)) +Asserts that there are no duplicate values present in a field for a subset of rows by specifying a `where` clause. *Warning*: This test is no longer supported. Starting in dbt v0.20.0, the built-in `unique` test supports a `where` config. [See the dbt docs for more details](https://docs.getdbt.com/reference/resource-configs/where). @@ -245,8 +252,8 @@ models: where: "_deleted = false" ``` -#### not_null_where ([source](macros/schema_tests/test_not_null_where.sql)) -This test validates that there are no null values present in a column for a subset of rows by specifying a `where` clause. +#### not_null_where ([source](macros/generic_tests/test_not_null_where.sql)) +Asserts that there are no null values present in a column for a subset of rows by specifying a `where` clause. *Warning*: This test is no longer supported. Starting in dbt v0.20.0, the built-in `not_null` test supports a `where` config. [See the dbt docs for more details](https://docs.getdbt.com/reference/resource-configs/where). @@ -263,8 +270,8 @@ models: where: "_deleted = false" ``` -#### not_null_proportion ([source](macros/schema_tests/not_null_proportion.sql)) -This test validates that the proportion of non-null values present in a column is between a specified range [`at_least`, `at_most`] where `at_most` is an optional argument (default: `1.0`). +#### not_null_proportion ([source](macros/generic_tests/not_null_proportion.sql)) +Asserts that the proportion of non-null values present in a column is between a specified range [`at_least`, `at_most`] where `at_most` is an optional argument (default: `1.0`). **Usage:** ```yaml @@ -279,8 +286,8 @@ models: at_least: 0.95 ``` -#### not_accepted_values ([source](macros/schema_tests/not_accepted_values.sql)) -This test validates that there are no rows that match the given values. +#### not_accepted_values ([source](macros/generic_tests/not_accepted_values.sql)) +Asserts that there are no rows that match the given values. Usage: ```yaml @@ -295,8 +302,8 @@ models: values: ['Barcelona', 'New York'] ``` -#### relationships_where ([source](macros/schema_tests/relationships_where.sql)) -This test validates the referential integrity between two relations (same as the core relationships schema test) with an added predicate to filter out some rows from the test. This is useful to exclude records such as test entities, rows created in the last X minutes/hours to account for temporary gaps due to ETL limitations, etc. +#### relationships_where ([source](macros/generic_tests/relationships_where.sql)) +Asserts the referential integrity between two relations (same as the core relationships assertions) with an added predicate to filter out some rows from the test. This is useful to exclude records such as test entities, rows created in the last X minutes/hours to account for temporary gaps due to ETL limitations, etc. **Usage:** ```yaml @@ -314,9 +321,9 @@ models: to_condition: created_date >= '2020-01-01' ``` -#### mutually_exclusive_ranges ([source](macros/schema_tests/mutually_exclusive_ranges.sql)) -This test confirms that for a given lower_bound_column and upper_bound_column, -the ranges of between the lower and upper bounds do not overlap with the ranges +#### mutually_exclusive_ranges ([source](macros/generic_tests/mutually_exclusive_ranges.sql)) +Asserts that for a given lower_bound_column and upper_bound_column, +the ranges between the lower and upper bounds do not overlap with the ranges of another row. **Usage:** @@ -381,7 +388,6 @@ models: ```
Additional `gaps` and `zero_length_range_allowed` examples - **Understanding the `gaps` argument:** Here are a number of examples for each allowed `gaps` argument. @@ -429,10 +435,9 @@ models: | 0 | 1 | | 2 | 2 | | 3 | 4 | -
-#### sequential_values ([source](macros/schema_tests/sequential_values.sql)) +#### sequential_values ([source](macros/generic_tests/sequential_values.sql)) This test confirms that a column contains sequential values. It can be used for both numeric values, and datetime values, as follows: ```yml @@ -460,8 +465,8 @@ seeds: * `interval` (default=1): The gap between two sequential values * `datepart` (default=None): Used when the gaps are a unit of time. If omitted, the test will check for a numeric gap. -#### unique_combination_of_columns ([source](macros/schema_tests/unique_combination_of_columns.sql)) -This test confirms that the combination of columns is unique. For example, the +#### unique_combination_of_columns ([source](macros/generic_tests/unique_combination_of_columns.sql)) +Asserts that the combination of columns is unique. For example, the combination of month and product is unique, however neither column is unique in isolation. @@ -496,8 +501,8 @@ An optional `quote_columns` argument (`default=false`) can also be used if a col ``` -#### accepted_range ([source](macros/schema_tests/accepted_range.sql)) -This test checks that a column's values fall inside an expected range. Any combination of `min_value` and `max_value` is allowed, and the range can be inclusive or exclusive. Provide a `where` argument to filter to specific records only. +#### accepted_range ([source](macros/generic_tests/accepted_range.sql)) +Asserts that a column's values fall inside an expected range. Any combination of `min_value` and `max_value` is allowed, and the range can be inclusive or exclusive. Provide a `where` argument to filter to specific records only. In addition to comparisons to a scalar value, you can also compare to another column's values. Any data type that supports the `>` or `<` operators can be compared, so you could also run tests like checking that all order dates are in the past. @@ -729,6 +734,21 @@ This macro returns the sql required to build a date spine. The spine will includ }} ``` +#### deduplicate ([source](macros/sql/deduplicate.sql)) +This macro returns the sql required to remove duplicate rows from a model or source. + +**Usage:** + +``` +{{ dbt_utils.deduplicate( + relation=source('my_source', 'my_table'), + group_by="user_id, cast(timestamp as day)", + order_by="timestamp desc", + relation_alias="my_cte" + ) +}} +``` + #### haversine_distance ([source](macros/sql/haversine_distance.sql)) This macro calculates the [haversine distance](http://daynebatten.com/2015/09/latitude-longitude-distance-sql/) between a pair of x/y coordinates. diff --git a/integration_tests/README.md b/integration_tests/README.md index 243af411..4f9f0131 100644 --- a/integration_tests/README.md +++ b/integration_tests/README.md @@ -26,14 +26,14 @@ Where possible, targets are being run in docker containers (this works for Postg ### Creating a new integration test -This directory contains an example dbt project which tests the macros in the `dbt-utils` package. An integration test typically involves making 1) a new seed file 2) a new model file 3) a schema test. +This directory contains an example dbt project which tests the macros in the `dbt-utils` package. An integration test typically involves making 1) a new seed file 2) a new model file 3) a generic test to assert anticipated behaviour. For an example integration tests, check out the tests for the `get_url_parameter` macro: 1. [Macro definition](https://github.com/fishtown-analytics/dbt-utils/blob/master/macros/web/get_url_parameter.sql) 2. [Seed file with fake data](https://github.com/fishtown-analytics/dbt-utils/blob/master/integration_tests/data/web/data_urls.csv) 3. [Model to test the macro](https://github.com/fishtown-analytics/dbt-utils/blob/master/integration_tests/models/web/test_urls.sql) -4. [A schema test to assert the macro works as expected](https://github.com/fishtown-analytics/dbt-utils/blob/master/integration_tests/models/web/schema.yml#L2) +4. [A generic test to assert the macro works as expected](https://github.com/fishtown-analytics/dbt-utils/blob/master/integration_tests/models/web/schema.yml#L2) Once you've added all of these files, you should be able to run: diff --git a/integration_tests/data/cross_db/data_listagg.csv b/integration_tests/data/cross_db/data_listagg.csv new file mode 100644 index 00000000..ee5083ba --- /dev/null +++ b/integration_tests/data/cross_db/data_listagg.csv @@ -0,0 +1,10 @@ +group_col,string_text,order_col +1,a,1 +1,b,2 +1,c,3 +2,a,2 +2,1,1 +2,p,3 +3,g,1 +3,g,2 +3,g,3 \ No newline at end of file diff --git a/integration_tests/data/cross_db/data_listagg_output.csv b/integration_tests/data/cross_db/data_listagg_output.csv new file mode 100644 index 00000000..a7e1c6c4 --- /dev/null +++ b/integration_tests/data/cross_db/data_listagg_output.csv @@ -0,0 +1,10 @@ +group_col,expected,version +1,"a_|_b_|_c",bottom_ordered +2,"1_|_a_|_p",bottom_ordered +3,"g_|_g_|_g",bottom_ordered +1,"a_|_b",bottom_ordered_limited +2,"1_|_a",bottom_ordered_limited +3,"g_|_g",bottom_ordered_limited +3,"g, g, g",comma_whitespace_unordered +3,"g",distinct_comma +3,"g,g,g",no_params \ No newline at end of file diff --git a/integration_tests/data/sql/data_deduplicate.csv b/integration_tests/data/sql/data_deduplicate.csv new file mode 100644 index 00000000..7e06170a --- /dev/null +++ b/integration_tests/data/sql/data_deduplicate.csv @@ -0,0 +1,4 @@ +user_id,event,version +1,play,1 +1,play,2 +2,pause,1 diff --git a/integration_tests/data/sql/data_deduplicate_expected.csv b/integration_tests/data/sql/data_deduplicate_expected.csv new file mode 100644 index 00000000..de5e204d --- /dev/null +++ b/integration_tests/data/sql/data_deduplicate_expected.csv @@ -0,0 +1,2 @@ +user_id,event,version +1,play,2 diff --git a/integration_tests/models/cross_db_utils/schema.yml b/integration_tests/models/cross_db_utils/schema.yml index dbe7a8f4..e1473c9f 100644 --- a/integration_tests/models/cross_db_utils/schema.yml +++ b/integration_tests/models/cross_db_utils/schema.yml @@ -58,6 +58,12 @@ models: - assert_equal: actual: actual expected: expected + + - name: test_listagg + tests: + - assert_equal: + actual: actual + expected: expected - name: test_safe_cast tests: diff --git a/integration_tests/models/cross_db_utils/test_listagg.sql b/integration_tests/models/cross_db_utils/test_listagg.sql new file mode 100644 index 00000000..006948de --- /dev/null +++ b/integration_tests/models/cross_db_utils/test_listagg.sql @@ -0,0 +1,69 @@ +with data as ( + + select * from {{ ref('data_listagg') }} + +), + +data_output as ( + + select * from {{ ref('data_listagg_output') }} + +), + +calculate as ( + + select + group_col, + {{ dbt_utils.listagg('string_text', "'_|_'", "order by order_col") }} as actual, + 'bottom_ordered' as version + from data + group by group_col + + union all + + select + group_col, + {{ dbt_utils.listagg('string_text', "'_|_'", "order by order_col", 2) }} as actual, + 'bottom_ordered_limited' as version + from data + group by group_col + + union all + + select + group_col, + {{ dbt_utils.listagg('string_text', "', '") }} as actual, + 'comma_whitespace_unordered' as version + from data + where group_col = 3 + group by group_col + + union all + + select + group_col, + {{ dbt_utils.listagg('DISTINCT string_text', "','") }} as actual, + 'distinct_comma' as version + from data + where group_col = 3 + group by group_col + + union all + + select + group_col, + {{ dbt_utils.listagg('string_text') }} as actual, + 'no_params' as version + from data + where group_col = 3 + group by group_col + +) + +select + calculate.actual, + data_output.expected +from calculate +left join data_output +on calculate.group_col = data_output.group_col +and calculate.version = data_output.version \ No newline at end of file diff --git a/integration_tests/models/datetime/test_date_spine.sql b/integration_tests/models/datetime/test_date_spine.sql index 93cd07f1..fa4ae52b 100644 --- a/integration_tests/models/datetime/test_date_spine.sql +++ b/integration_tests/models/datetime/test_date_spine.sql @@ -1,6 +1,6 @@ -- snowflake doesn't like this as a view because the `generate_series` --- call creates a CTE called `unioned`, as does the `equality` schema test. +-- call creates a CTE called `unioned`, as does the `equality` generic test. -- Ideally, Snowflake would be smart enough to know that these CTE names are -- different, as they live in different relations. TODO: use a less common cte name diff --git a/integration_tests/models/schema_tests/schema.yml b/integration_tests/models/generic_tests/schema.yml similarity index 100% rename from integration_tests/models/schema_tests/schema.yml rename to integration_tests/models/generic_tests/schema.yml diff --git a/integration_tests/models/schema_tests/test_equal_column_subset.sql b/integration_tests/models/generic_tests/test_equal_column_subset.sql similarity index 100% rename from integration_tests/models/schema_tests/test_equal_column_subset.sql rename to integration_tests/models/generic_tests/test_equal_column_subset.sql diff --git a/integration_tests/models/schema_tests/test_equal_rowcount.sql b/integration_tests/models/generic_tests/test_equal_rowcount.sql similarity index 100% rename from integration_tests/models/schema_tests/test_equal_rowcount.sql rename to integration_tests/models/generic_tests/test_equal_rowcount.sql diff --git a/integration_tests/models/schema_tests/test_fewer_rows_than.sql b/integration_tests/models/generic_tests/test_fewer_rows_than.sql similarity index 100% rename from integration_tests/models/schema_tests/test_fewer_rows_than.sql rename to integration_tests/models/generic_tests/test_fewer_rows_than.sql diff --git a/integration_tests/models/schema_tests/test_recency.sql b/integration_tests/models/generic_tests/test_recency.sql similarity index 100% rename from integration_tests/models/schema_tests/test_recency.sql rename to integration_tests/models/generic_tests/test_recency.sql diff --git a/integration_tests/models/sql/schema.yml b/integration_tests/models/sql/schema.yml index e136f127..a78e5e1b 100644 --- a/integration_tests/models/sql/schema.yml +++ b/integration_tests/models/sql/schema.yml @@ -90,7 +90,7 @@ models: tests: - dbt_utils.equality: compare_model: ref('data_pivot_expected') - + - name: test_pivot_apostrophe tests: - dbt_utils.equality: @@ -147,8 +147,13 @@ models: tests: - dbt_utils.equality: compare_model: ref('data_union_expected') - + - name: test_get_relations_by_pattern tests: - dbt_utils.equality: compare_model: ref('data_union_events_expected') + + - name: test_dedupe + tests: + - dbt_utils.equality: + compare_model: ref('data_deduplicate_expected') diff --git a/integration_tests/models/sql/test_deduplicate.sql b/integration_tests/models/sql/test_deduplicate.sql new file mode 100644 index 00000000..81fe81e7 --- /dev/null +++ b/integration_tests/models/sql/test_deduplicate.sql @@ -0,0 +1,22 @@ +with + +source as ( + select * + from {{ ref('data_deduplicate') }} + where user_id = 1 +), + +deduped as ( + + {{ + dbt_utils.deduplicate( + ref('data_deduplicate'), + group_by='user_id', + order_by='version desc', + relation_alias="source" + ) | indent + }} + +) + +select * from deduped diff --git a/integration_tests/models/sql/test_generate_series.sql b/integration_tests/models/sql/test_generate_series.sql index a943cf6c..11370b7b 100644 --- a/integration_tests/models/sql/test_generate_series.sql +++ b/integration_tests/models/sql/test_generate_series.sql @@ -1,6 +1,6 @@ -- snowflake doesn't like this as a view because the `generate_series` --- call creates a CTE called `unioned`, as does the `equality` schema test. +-- call creates a CTE called `unioned`, as does the `equality` generic test. -- Ideally, Snowflake would be smart enough to know that these CTE names are -- different, as they live in different relations. TODO: use a less common cte name diff --git a/macros/cross_db_utils/date_trunc.sql b/macros/cross_db_utils/date_trunc.sql index cba3346b..f9d0364b 100644 --- a/macros/cross_db_utils/date_trunc.sql +++ b/macros/cross_db_utils/date_trunc.sql @@ -2,14 +2,14 @@ {{ return(adapter.dispatch('date_trunc', 'dbt_utils') (datepart, date)) }} {%- endmacro %} -{% macro default__date_trunc(datepart, date) %} +{% macro default__date_trunc(datepart, date) -%} date_trunc('{{datepart}}', {{date}}) -{% endmacro %} +{%- endmacro %} -{% macro bigquery__date_trunc(datepart, date) %} +{% macro bigquery__date_trunc(datepart, date) -%} timestamp_trunc( cast({{date}} as timestamp), {{datepart}} ) -{% endmacro %} +{%- endmacro %} diff --git a/macros/cross_db_utils/datediff.sql b/macros/cross_db_utils/datediff.sql index 42dd738e..2b5d6613 100644 --- a/macros/cross_db_utils/datediff.sql +++ b/macros/cross_db_utils/datediff.sql @@ -3,7 +3,7 @@ {% endmacro %} -{% macro default__datediff(first_date, second_date, datepart) %} +{% macro default__datediff(first_date, second_date, datepart) -%} datediff( {{ datepart }}, @@ -11,10 +11,10 @@ {{ second_date }} ) -{% endmacro %} +{%- endmacro %} -{% macro bigquery__datediff(first_date, second_date, datepart) %} +{% macro bigquery__datediff(first_date, second_date, datepart) -%} datetime_diff( cast({{second_date}} as datetime), @@ -22,9 +22,9 @@ {{datepart}} ) -{% endmacro %} +{%- endmacro %} -{% macro postgres__datediff(first_date, second_date, datepart) %} +{% macro postgres__datediff(first_date, second_date, datepart) -%} {% if datepart == 'year' %} (date_part('year', ({{second_date}})::date) - date_part('year', ({{first_date}})::date)) @@ -55,12 +55,12 @@ {{ exceptions.raise_compiler_error("Unsupported datepart for macro datediff in postgres: {!r}".format(datepart)) }} {% endif %} -{% endmacro %} +{%- endmacro %} {# redshift should use default instead of postgres #} -{% macro redshift__datediff(first_date, second_date, datepart) %} +{% macro redshift__datediff(first_date, second_date, datepart) -%} {{ return(dbt_utils.default__datediff(first_date, second_date, datepart)) }} -{% endmacro %} +{%- endmacro %} diff --git a/macros/cross_db_utils/listagg.sql b/macros/cross_db_utils/listagg.sql new file mode 100644 index 00000000..1d19a54f --- /dev/null +++ b/macros/cross_db_utils/listagg.sql @@ -0,0 +1,104 @@ +{% macro listagg(measure, delimiter_text="','", order_by_clause=none, limit_num=none) -%} + {{ return(adapter.dispatch('listagg', 'dbt_utils') (measure, delimiter_text, order_by_clause, limit_num)) }} +{%- endmacro %} + +{% macro default__listagg(measure, delimiter_text, order_by_clause, limit_num) -%} + + {% if limit_num -%} + array_to_string( + array_slice( + array_agg( + {{ measure }} + ){% if order_by_clause -%} + within group ({{ order_by_clause }}) + {%- endif %} + ,0 + ,{{ limit_num }} + ), + {{ delimiter_text }} + ) + {%- else %} + listagg( + {{ measure }}, + {{ delimiter_text }} + ) + {% if order_by_clause -%} + within group ({{ order_by_clause }}) + {%- endif %} + {%- endif %} + +{%- endmacro %} + +{% macro bigquery__listagg(measure, delimiter_text, order_by_clause, limit_num) -%} + + string_agg( + {{ measure }}, + {{ delimiter_text }} + {% if order_by_clause -%} + {{ order_by_clause }} + {%- endif %} + {% if limit_num -%} + limit {{ limit_num }} + {%- endif %} + ) + +{%- endmacro %} + +{% macro postgres__listagg(measure, delimiter_text, order_by_clause, limit_num) -%} + + {% if limit_num -%} + array_to_string( + (array_agg( + {{ measure }} + {% if order_by_clause -%} + {{ order_by_clause }} + {%- endif %} + ))[1:{{ limit_num }}], + {{ delimiter_text }} + ) + {%- else %} + string_agg( + {{ measure }}, + {{ delimiter_text }} + {% if order_by_clause -%} + {{ order_by_clause }} + {%- endif %} + ) + {%- endif %} + +{%- endmacro %} + +{# if there are instances of delimiter_text within your measure, you cannot include a limit_num #} +{% macro redshift__listagg(measure, delimiter_text, order_by_clause, limit_num) -%} + + {% if limit_num -%} + {% set ns = namespace() %} + {% set ns.delimiter_text_regex = delimiter_text|trim("'") %} + {% set special_chars %}\,^,$,.,|,?,*,+,(,),[,],{,}{% endset %} + {%- for char in special_chars.split(',') -%} + {% set escape_char %}\\{{ char }}{% endset %} + {% set ns.delimiter_text_regex = ns.delimiter_text_regex|replace(char,escape_char) %} + {%- endfor -%} + + {% set regex %}'([^{{ ns.delimiter_text_regex }}]+{{ ns.delimiter_text_regex }}){1,{{ limit_num - 1}}}[^{{ ns.delimiter_text_regex }}]+'{% endset %} + regexp_substr( + listagg( + {{ measure }}, + {{ delimiter_text }} + ) + {% if order_by_clause -%} + within group ({{ order_by_clause }}) + {%- endif %} + ,{{ regex }} + ) + {%- else %} + listagg( + {{ measure }}, + {{ delimiter_text }} + ) + {% if order_by_clause -%} + within group ({{ order_by_clause }}) + {%- endif %} + {%- endif %} + +{%- endmacro %} \ No newline at end of file diff --git a/macros/schema_tests/accepted_range.sql b/macros/generic_tests/accepted_range.sql similarity index 100% rename from macros/schema_tests/accepted_range.sql rename to macros/generic_tests/accepted_range.sql diff --git a/macros/schema_tests/at_least_one.sql b/macros/generic_tests/at_least_one.sql similarity index 100% rename from macros/schema_tests/at_least_one.sql rename to macros/generic_tests/at_least_one.sql diff --git a/macros/schema_tests/cardinality_equality.sql b/macros/generic_tests/cardinality_equality.sql similarity index 100% rename from macros/schema_tests/cardinality_equality.sql rename to macros/generic_tests/cardinality_equality.sql diff --git a/macros/schema_tests/equal_rowcount.sql b/macros/generic_tests/equal_rowcount.sql similarity index 100% rename from macros/schema_tests/equal_rowcount.sql rename to macros/generic_tests/equal_rowcount.sql diff --git a/macros/schema_tests/equality.sql b/macros/generic_tests/equality.sql similarity index 100% rename from macros/schema_tests/equality.sql rename to macros/generic_tests/equality.sql diff --git a/macros/schema_tests/expression_is_true.sql b/macros/generic_tests/expression_is_true.sql similarity index 100% rename from macros/schema_tests/expression_is_true.sql rename to macros/generic_tests/expression_is_true.sql diff --git a/macros/schema_tests/fewer_rows_than.sql b/macros/generic_tests/fewer_rows_than.sql similarity index 100% rename from macros/schema_tests/fewer_rows_than.sql rename to macros/generic_tests/fewer_rows_than.sql diff --git a/macros/schema_tests/mutually_exclusive_ranges.sql b/macros/generic_tests/mutually_exclusive_ranges.sql similarity index 100% rename from macros/schema_tests/mutually_exclusive_ranges.sql rename to macros/generic_tests/mutually_exclusive_ranges.sql diff --git a/macros/schema_tests/not_accepted_values.sql b/macros/generic_tests/not_accepted_values.sql similarity index 100% rename from macros/schema_tests/not_accepted_values.sql rename to macros/generic_tests/not_accepted_values.sql diff --git a/macros/schema_tests/not_constant.sql b/macros/generic_tests/not_constant.sql similarity index 100% rename from macros/schema_tests/not_constant.sql rename to macros/generic_tests/not_constant.sql diff --git a/macros/schema_tests/not_null_proportion.sql b/macros/generic_tests/not_null_proportion.sql similarity index 100% rename from macros/schema_tests/not_null_proportion.sql rename to macros/generic_tests/not_null_proportion.sql diff --git a/macros/schema_tests/recency.sql b/macros/generic_tests/recency.sql similarity index 100% rename from macros/schema_tests/recency.sql rename to macros/generic_tests/recency.sql diff --git a/macros/schema_tests/relationships_where.sql b/macros/generic_tests/relationships_where.sql similarity index 100% rename from macros/schema_tests/relationships_where.sql rename to macros/generic_tests/relationships_where.sql diff --git a/macros/schema_tests/sequential_values.sql b/macros/generic_tests/sequential_values.sql similarity index 100% rename from macros/schema_tests/sequential_values.sql rename to macros/generic_tests/sequential_values.sql diff --git a/macros/schema_tests/test_not_null_where.sql b/macros/generic_tests/test_not_null_where.sql similarity index 100% rename from macros/schema_tests/test_not_null_where.sql rename to macros/generic_tests/test_not_null_where.sql diff --git a/macros/schema_tests/test_unique_where.sql b/macros/generic_tests/test_unique_where.sql similarity index 100% rename from macros/schema_tests/test_unique_where.sql rename to macros/generic_tests/test_unique_where.sql diff --git a/macros/schema_tests/unique_combination_of_columns.sql b/macros/generic_tests/unique_combination_of_columns.sql similarity index 100% rename from macros/schema_tests/unique_combination_of_columns.sql rename to macros/generic_tests/unique_combination_of_columns.sql diff --git a/macros/sql/deduplicate.sql b/macros/sql/deduplicate.sql new file mode 100644 index 00000000..9a3571a2 --- /dev/null +++ b/macros/sql/deduplicate.sql @@ -0,0 +1,46 @@ +{%- macro deduplicate(relation, group_by, order_by=none, relation_alias=none) -%} + {{ return(adapter.dispatch('deduplicate', 'dbt_utils')(relation, group_by, order_by=order_by, relation_alias=relation_alias)) }} +{% endmacro %} + +{%- macro default__deduplicate(relation, group_by, order_by=none, relation_alias=none) -%} + + select + {{ dbt_utils.star(relation, relation_alias='deduped') | indent }} + from ( + select + _inner.*, + row_number() over ( + partition by {{ group_by }} + {% if order_by is not none -%} + order by {{ order_by }} + {%- endif %} + ) as rn + from {{ relation if relation_alias is none else relation_alias }} as _inner + ) as deduped + where deduped.rn = 1 + +{%- endmacro -%} + +{# +-- It is more performant to deduplicate using `array_agg` with a limit +-- clause in BigQuery: +-- https://github.com/dbt-labs/dbt-utils/issues/335#issuecomment-788157572 +#} +{%- macro bigquery__deduplicate(relation, group_by, order_by=none, relation_alias=none) -%} + + select + {{ dbt_utils.star(relation, relation_alias='deduped') | indent }} + from ( + select + array_agg ( + original + {% if order_by is not none -%} + order by {{ order_by }} + {%- endif %} + limit 1 + )[offset(0)] as deduped + from {{ relation if relation_alias is none else relation_alias }} as original + group by {{ group_by }} + ) + +{%- endmacro -%} diff --git a/macros/sql/get_column_values.sql b/macros/sql/get_column_values.sql index 57b150a6..f70890e2 100644 --- a/macros/sql/get_column_values.sql +++ b/macros/sql/get_column_values.sql @@ -3,14 +3,14 @@ {% endmacro %} {% macro default__get_column_values(table, column, order_by='count(*) desc', max_records=none, default=none) -%} -{% if default is none %} - {% set default = [] %} -{% endif %} {#-- Prevent querying of db in parsing mode. This works because this macro does not create any new refs. #} {%- if not execute -%} + {% set default = [] if not default %} {{ return(default) }} {% endif %} + {%- do dbt_utils._is_ephemeral(table, 'get_column_values') -%} + {# Not all relations are tables. Renaming for internal clarity without breaking functionality for anyone using named arguments #} {# TODO: Change the method signature in a future 0.x.0 release #} {%- set target_relation = table -%} diff --git a/macros/sql/get_tables_by_pattern_sql.sql b/macros/sql/get_tables_by_pattern_sql.sql index 93f3c6a6..4d5a8fc9 100644 --- a/macros/sql/get_tables_by_pattern_sql.sql +++ b/macros/sql/get_tables_by_pattern_sql.sql @@ -30,10 +30,7 @@ select distinct table_schema, table_name, - case table_type - when 'BASE TABLE' then 'table' - else lower(table_type) - end as table_type + {{ dbt_utils.get_table_types_sql() }} from {{ adapter.quote(database) }}.{{ schema }}.INFORMATION_SCHEMA.TABLES where lower(table_name) like lower ('{{ table_pattern }}')