Every single test should return all columns in case of --store-failures & collating them should be easier #8479
adamcunnington-mlg
started this conversation in
Ideas
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I have a use case which I know has come up a few times in various discussions / github issues over the last couple of years.
When my tests run, I want to collate the results and join them together so I can put errors in-line against my data rows. This can be achieved logically as follows:
The first stumbling block here is that there's no such enforced standard requiring tests to output all columns. I don't know how you'd do this in practice but I suggest that dbt-labs/dbt-core and dbt-labs/dbt-utils at minimum should ensure this, and the docs should advocate STRONGLY for test package developers.
P.s. whilst we are on the topic, docs should also advocate strongly for package developers to not create tests that don't make use of adapter.dispatch because it significantly affects portability and customisability of those tests (there's many such examples (e.g. this) in dbt-expectations where it's impossible to override functionality and any change requires a completely alternative custom test).
Some dbt-core tests return all columns (e.g. this) in the case of
should_store_failures()
being true but it's very inconsistent.It's not always easy but every test, regardless of grouping, can achieve this. It requires thinking outside the box and making use of window functions over grouping, and the occasional join, but I wager that it can always be done. We've actually re-implemented 5 very common tests that we use to work in this way and have achieved feature parity in each case (and taken opportunity to improve on the original SQL/jinja in many cases).
The next step would be to provide some more-easily-accessible way to do the final collation. This could perhaps just be some macro that is given the name of the consistent unique column in each test output (e.g.
row_id
) that can be used to do the joining. This macro could take care of the iteration, dynamic SQL construction with the joins & string aggregation - and possibly the final bit; doing something with the consolidated data.Keen to hear other's thoughts on the 3 main aspects of the discussion:
Happy to share examples once we've finished our current implementation.
Beta Was this translation helpful? Give feedback.
All reactions