[Feature] Support on_config_change for incremental #9850

benc-db · 2024-04-02T15:06:21Z

benc-db
Apr 2, 2024

Is this your first time submitting a feature request?

I have read the expectations for open source contributors
I have searched the existing issues, and I could not find an existing issue for this feature
I am requesting a straightforward extension of existing dbt functionality, rather than a Big Idea better suited to a discussion

Describe the feature

Much like materialized views, incremental tables have a set of configuration changes that may occur between dbt runs that should be respected. Today this is done in an ad hoc manner, with, for example, grants having special logic to determine what they should do on first runs vs subsequent runs. In the dbt-databricks adapter, we also have similar logic for persisting column comments, where we run a diff between existing comments before issuing the alter statements (since we don't have a bulk update for column comments, we're trying to reduce the number of statements issued). It would be nice to unify these behaviors where we check state before applying updates under the concept of a config change.

Describe alternatives you've considered

We've already implemented our own logic for on_config_change for our streaming table implementation, so it's not really a stretch for the dbt-databricks adapter to just do this for incremental tables without core support. The primary argument against this is just that the behavior will be inconsistent with core documentation, which describes on_config_change as an MV feature. I'm currently mulling this as I implement support for Databricks tags, which require checking metadata to ensure we're syncing to what is specified in the dbt project.

Who will this benefit?

While this opens the possibility of users specifying whether they want changes to apply, fail, or ignore on incremental run, I think the biggest win is uniformity of implementation for adapters.

Are you interested in contributing this feature?

No response

Anything else?

No response

dbeatty10 · 2024-04-02T16:54:59Z

dbeatty10
Apr 2, 2024
Maintainer

Thanks for opening this @benc-db !

It sounds like you are proposing that we add the on_configuration_change config from the materialized_view materialization into the incremental materialization.

(And that this would be separate from and complement the pre-existing on_schema_change config.)

You shared a couple potential use cases (grants, column comments). In light of those use cases, could you share more about how you'd envision the following?

What the valid values for on_configuration_change would be.
How an incremental materialization would behave for each of those values.

0 replies

benc-db · 2024-04-02T17:20:41Z

benc-db
Apr 2, 2024
Author

I would expect it to behave no differently from how it works for MVs. In other words: apply, continue (ignore), fail. I'm not really sure what the point of 'continue' is for MVs, but for simplicity of mental model, I see no reason to have different on_configuration_change behavior between MVs and incrementals. Honestly, it's not clear to me why dbt views MVs differently from incremental, aside from needing to give a materialization type that informs the adapters to switch to MV-based create/alter statements. Both require different statements for full refresh vs successive calls; both reference a persistent materialization that needs to reflect config from a dbt project that could change between runs.

In dbt-databricks, I'm can't really take advantage of the structure that was made in dbt-core for MVs, primarily because I can't issue multiple statements in one call, and thus had to write my own materializations; I still benefit from users having the conceptual model of 'here's how I want dbt to respond to changes between runs', and the set of functional tests that ensure my adapter is responding appropriately to the different config values. I think there's a similar benefit available for incremental.

0 replies

dbeatty10 · 2024-04-02T21:50:35Z

dbeatty10
Apr 2, 2024
Maintainer

`on_configuration_change` for MVs

Here's the documentation for MVs:

The on_configuration_change config has three settings:

apply (default) — Attempt to update the existing database object if possible, avoiding a complete rebuild.

Note: If any individual configuration change requires a full refresh, a full refresh is performed in lieu of individual alter statements.

continue — Allow runs to continue while also providing a warning that the object was left untouched.

Note: This could result in downstream failures as those models may depend on these unimplemented changes.

fail — Force the entire run to fail if a change is detected.

Ideal behavior for incremental models

It seems like the ideal behavior for incremental models is to just "apply" any configuration changes that happen between runs rather than it being configurable.

Changes to grants? Apply them.
Changes to indexes? Apply them.
Changes to column comments? Apply them.
etc.

Am I missing a key insight here that would cause us to not want to just apply any configuration changes for incremental models?

Rationale for why MVs have `on_configuration_change`

Reading through #6911 gives some insight into why we added this for materialized views.

Applying new logic or configuration to materialized views isn't as straightforward as it is for regular views where the logic can just be overwritten.

Certain configuration changes may require a complete refresh of the MV, and the on_configuration_change setting allows users to weigh the trade-offs and choose how to behave in those scenarios.

But I don't see how those reasons would be relevant to incremental models. The closest thing I can think of is #7732, but I see that as a separate scenario.

Maybe you can give a concrete example where we'd want users to choose between apply, continue, and fail for incremental models when some configuration changes?

Overall, we'd rather not add a new configuration option unless there's a really compelling reason.

0 replies

benc-db · 2024-04-03T16:18:26Z

benc-db
Apr 3, 2024
Author

For incremental we also have situations where a config change could require a full refresh, though maybe this is particular to Databricks: partition by. If a user changes their partition_by config, we will need to recreate the table in order to accomplish that. Today I think we would just fail, possibly with a cryptic error from Databricks (since I don't see the sort of logic I would expect to for detecting this condition).

0 replies

benc-db · 2024-04-03T16:23:14Z

benc-db
Apr 3, 2024
Author

In general, my thinking about the existence of 'continue' and 'fail' as an option for MVs is the same as for incremental: your project is not in sync with what is present in the database, so make sure you really intend to overwrite this before proceeding. Maybe this doesn't come up much in the wild, but consider for example governance tags that were applied globally to tables by some policy; do you really want to blow those tags away because you haven't accounted for them in your dbt project? In Databricks I see a bunch of mixed-use: environments where some people are using dbt, but not all of the data objects/policy are created or enforced by dbt. For such environments, you may want to know that the config in the database has deviated from what you declared in your dbt project.

0 replies

dbeatty10 · 2024-04-03T17:01:41Z

dbeatty10
Apr 3, 2024
Maintainer

I think the key difference between the materialized_view and incremental materializations here is that the former sometimes does a "full refresh" automatically, whereas the latter only does so when the user explicitly asks for it (via full_refresh config).

💡 Good point about how there are often mixed-use environments in which there can be a difference between the changes that dbt makes to a database object vs. changes made by other processes.

🧠 And also helpful that you pointed out cases in which dbt is specifying something like the partition config, but even then it may require a full rebuild of the database object in certain data platforms.

Overall, I don't see anything obvious here that we need/want to add to dbt-core, but I'm going to convert this to a Discussion so we can continue to discuss and gather feedback & additional use-cases.

0 replies

benc-db · 2024-04-03T17:47:46Z

benc-db
Apr 3, 2024
Author

@dbeatty10 yes, I didn't expect this to be a feature that made it into next release, but was talking to Amy about it and she advised me to file a ticket to discuss. Part of why I bring it up is because I see the value in the python structure that we use for detecting config changes, and like the thought of extending that structure to other materializations; however, using that approach does not actually require on_configuration_change at all.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Support on_config_change for incremental #9850

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 7 comments

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

[Feature] Support on_config_change for incremental #9850

benc-db Apr 2, 2024

Is this your first time submitting a feature request?

Describe the feature

Describe alternatives you've considered

Who will this benefit?

Are you interested in contributing this feature?

Anything else?

Replies: 7 comments

dbeatty10 Apr 2, 2024 Maintainer

benc-db Apr 2, 2024 Author

dbeatty10 Apr 2, 2024 Maintainer

on_configuration_change for MVs

Ideal behavior for incremental models

Rationale for why MVs have on_configuration_change

benc-db Apr 3, 2024 Author

benc-db Apr 3, 2024 Author

dbeatty10 Apr 3, 2024 Maintainer

benc-db Apr 3, 2024 Author

benc-db
Apr 2, 2024

dbeatty10
Apr 2, 2024
Maintainer

benc-db
Apr 2, 2024
Author

dbeatty10
Apr 2, 2024
Maintainer

`on_configuration_change` for MVs

Rationale for why MVs have `on_configuration_change`

benc-db
Apr 3, 2024
Author

benc-db
Apr 3, 2024
Author

dbeatty10
Apr 3, 2024
Maintainer

benc-db
Apr 3, 2024
Author