Replies: 7 comments
-
Thanks for opening this @benc-db ! It sounds like you are proposing that we add the (And that this would be separate from and complement the pre-existing You shared a couple potential use cases (grants, column comments). In light of those use cases, could you share more about how you'd envision the following?
|
Beta Was this translation helpful? Give feedback.
-
I would expect it to behave no differently from how it works for MVs. In other words: apply, continue (ignore), fail. I'm not really sure what the point of 'continue' is for MVs, but for simplicity of mental model, I see no reason to have different on_configuration_change behavior between MVs and incrementals. Honestly, it's not clear to me why dbt views MVs differently from incremental, aside from needing to give a materialization type that informs the adapters to switch to MV-based create/alter statements. Both require different statements for full refresh vs successive calls; both reference a persistent materialization that needs to reflect config from a dbt project that could change between runs. In dbt-databricks, I'm can't really take advantage of the structure that was made in dbt-core for MVs, primarily because I can't issue multiple statements in one call, and thus had to write my own materializations; I still benefit from users having the conceptual model of 'here's how I want dbt to respond to changes between runs', and the set of functional tests that ensure my adapter is responding appropriately to the different config values. I think there's a similar benefit available for incremental. |
Beta Was this translation helpful? Give feedback.
-
|
Beta Was this translation helpful? Give feedback.
-
For incremental we also have situations where a config change could require a full refresh, though maybe this is particular to Databricks: partition by. If a user changes their partition_by config, we will need to recreate the table in order to accomplish that. Today I think we would just fail, possibly with a cryptic error from Databricks (since I don't see the sort of logic I would expect to for detecting this condition). |
Beta Was this translation helpful? Give feedback.
-
In general, my thinking about the existence of 'continue' and 'fail' as an option for MVs is the same as for incremental: your project is not in sync with what is present in the database, so make sure you really intend to overwrite this before proceeding. Maybe this doesn't come up much in the wild, but consider for example governance tags that were applied globally to tables by some policy; do you really want to blow those tags away because you haven't accounted for them in your dbt project? In Databricks I see a bunch of mixed-use: environments where some people are using dbt, but not all of the data objects/policy are created or enforced by dbt. For such environments, you may want to know that the config in the database has deviated from what you declared in your dbt project. |
Beta Was this translation helpful? Give feedback.
-
I think the key difference between the 💡 Good point about how there are often mixed-use environments in which there can be a difference between the changes that dbt makes to a database object vs. changes made by other processes. 🧠 And also helpful that you pointed out cases in which dbt is specifying something like the partition config, but even then it may require a full rebuild of the database object in certain data platforms. Overall, I don't see anything obvious here that we need/want to add to dbt-core, but I'm going to convert this to a Discussion so we can continue to discuss and gather feedback & additional use-cases. |
Beta Was this translation helpful? Give feedback.
-
@dbeatty10 yes, I didn't expect this to be a feature that made it into next release, but was talking to Amy about it and she advised me to file a ticket to discuss. Part of why I bring it up is because I see the value in the python structure that we use for detecting config changes, and like the thought of extending that structure to other materializations; however, using that approach does not actually require on_configuration_change at all. |
Beta Was this translation helpful? Give feedback.
-
Is this your first time submitting a feature request?
Describe the feature
Much like materialized views, incremental tables have a set of configuration changes that may occur between dbt runs that should be respected. Today this is done in an ad hoc manner, with, for example, grants having special logic to determine what they should do on first runs vs subsequent runs. In the dbt-databricks adapter, we also have similar logic for persisting column comments, where we run a diff between existing comments before issuing the alter statements (since we don't have a bulk update for column comments, we're trying to reduce the number of statements issued). It would be nice to unify these behaviors where we check state before applying updates under the concept of a config change.
Describe alternatives you've considered
We've already implemented our own logic for on_config_change for our streaming table implementation, so it's not really a stretch for the dbt-databricks adapter to just do this for incremental tables without core support. The primary argument against this is just that the behavior will be inconsistent with core documentation, which describes on_config_change as an MV feature. I'm currently mulling this as I implement support for Databricks tags, which require checking metadata to ensure we're syncing to what is specified in the dbt project.
Who will this benefit?
While this opens the possibility of users specifying whether they want changes to apply, fail, or ignore on incremental run, I think the biggest win is uniformity of implementation for adapters.
Are you interested in contributing this feature?
No response
Anything else?
No response
Beta Was this translation helpful? Give feedback.
All reactions