Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dbt_utils.unique_combination_of_columns test produces false failures in Redshift #979

Open
ClauPet opened this issue Jan 10, 2025 · 0 comments
Labels
bug Something isn't working triage

Comments

@ClauPet
Copy link

ClauPet commented Jan 10, 2025

Describe the bug

dbt_utils.unique_combination_of_columns test produces false failures testing for uniqueness of col_a and col_b in a model in Redshift consisting of only these two columns. While this error shows up when running the dbt_utils.unique_combination_of_columns test I get the same false failures when I run the complied test code directly in Redshift.

Steps to reproduce

I was unable to reproduce this error with an example model. I can only describe the scenario that I am seeing.

I have the following model:

          Column           |         Type          |   Collation    | Nullable | Default Value
---------------------------+-----------------------+----------------+----------+---------------
 col_a                     | character varying(20) | case_sensitive | YES      |
 col_b                     | character(4)          | case_sensitive | YES      |

I run the following test:

  - name: model_name
    data_tests:
      - dbt_utils.unique_combination_of_columns:
          combination_of_columns:
            - col_a
            - col_b

The compiled code looks like this:

select
      count(*) as failures,
      count(*) != 0 as should_warn,
      count(*) != 0 as should_error
    from (
with validation_errors as (
    select
        col_a, col_b
    from schema.model_name
    group by col_a, col_b
    having count(*) > 1
)
select *
from validation_errors
    ) dbt_internal_test

Expected results

I would expect no test errors. When checking the model for duplicates using sql or when using dbt core functionality to test for uniqueness I get no failing records.

The following test results in no failing records:

  - name: model_name
    data_tests:
      - unique:
          column_name: "(col_a|| '-' || col_b)"
```

### Actual results
I get 805 test errors, which would be the number of test errors if I had omitted col_b from the dbt_utils.unique_combination_of_columns and only tested for uniqueness of col_a

When I run the inner query of the compiled test code directly in Redshift:

```
with validation_errors as (
    select
        col_a, col_b
    from schema.model_name
    group by col_a, col_b
    having count(*) > 1
)
select *
from validation_errors
```

I get the following result:

```
 col_a | col_b
-------+------
(0 rows)
```
The result of the total query in Redshift is as follows:

```
 failures | should_warn | should_error
----------+-------------+--------------
      805 | t           | t
```


### System information
**The contents of your `packages.yml` file:**

**Which database are you using dbt with?**
- [ ] postgres
- [x] redshift
- [ ] bigquery
- [ ] snowflake
- [ ] other (specify: ____________)


**The output of `dbt --version`:**
```
Core:
  - installed: 1.8.8
  - latest:    1.9.1 - Update available!

  Your version of dbt-core is out of date!
  You can find instructions for upgrading here:
  https://docs.getdbt.com/docs/installation

Plugins:
  - postgres: 1.8.2 - Update available!
  - redshift: 1.8.1 - Update available!

  At least one plugin is out of date or incompatible with dbt-core.
  You can find instructions for upgrading here:
  https://docs.getdbt.com/docs/installation
```

### Are you interested in contributing the fix?
Depends on what issue this can be tracked down to
@ClauPet ClauPet added bug Something isn't working triage labels Jan 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working triage
Projects
None yet
Development

No branches or pull requests

1 participant