You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is the current schema for defining groups (example via docs):
groups:
- name: financeowner:
# 'name' or 'email' is required; additional properties allowedemail: [email protected]slack: finance-datagithub: finance-data-team
I would like to propose something like this:
groups:
- name: financeowner:
# 'name' or 'email' is required; additional properties allowedemail: [email protected]slack: finance-datagithub: finance-data-teamallow_cycles: true
Where the allow_cycles boolean flag works like this:
allow_cycles: true (default + existing behavior) nodes outside the group are allowed to be both dependents and dependencies of nodes within the group
allow_cycles: false nodes outside the group can only be a dependent or a dependency of nodes in the group, but not both.
Additionally, when two nodes are in different groups (A and B), and both groups have allow_cycles: false, then all nodes in group A must either be dependents or dependencies of group B.
Examples:
B1 -> A1 -> A2 -> B2
... is a valid DAG when allow_cycles: true
... is invalid when either A or B has allow_cycles: false
A1 -> B1 + B2 -> A2
... is a valid DAG when allow_cycles: true
... is a valid DAG when only one of A or B has allow_cycles: false
... is invalid when both A and B have allow_cycles: false
This proposal should be seen more as a general feature request for enforcing subgraphs. The group: feature may not necessarily be the best place to put it.
It's possible this should be a more general pattern, or that this control flow logic should get its own custom abstraction.
For example, imagine the following flow:
Business Vertical A has "staging", "intermediate", "mart" tables
Business Vertical B has "staging", "intermediate", "mart" tables
There is a general, org-wide "mart"
The business roles we want to enforce are somewhat complex and look like this:
within each vertical, staging -> intermediate -> mart
Each business vertical's mart can "diagonally" reference another vertical's "intermediate" sub-group, but they cannot do this horizontally.
staging is private within each business vertical. They cannot be referenced by other verticals, even diagonally.
The org-wide mart can only pull from the vertical marts.
It's not clear how this fits into the framework I'm proposing, or the groups stuff in general? I think the best you can do is to have 7 groups: staging_A, staging_B, intermediate_A, intermediate_B, mart_A, mart_B, and org_mart; and have allow_cycles: false, on each one. In this case, it will kinda but not fully work: any org following the above rules will have a valid dbt parse, but not all deviations from these rules will have invalid dbt parses. I.e. there are no false negatives when relying solely on the parser, but there are false positives. Better than the status quo, but not perfect.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
This is the current schema for defining groups (example via docs):
I would like to propose something like this:
Where the
allow_cycles
boolean flag works like this:allow_cycles: true
(default + existing behavior) nodes outside the group are allowed to be both dependents and dependencies of nodes within the groupallow_cycles: false
nodes outside the group can only be a dependent or a dependency of nodes in the group, but not both.Additionally, when two nodes are in different groups (A and B), and both groups have
allow_cycles: false
, then all nodes in group A must either be dependents or dependencies of group B.Examples:
B1 -> A1 -> A2 -> B2
allow_cycles: true
allow_cycles: false
A1 -> B1
+B2 -> A2
allow_cycles: true
allow_cycles: false
allow_cycles: false
This proposal should be seen more as a general feature request for enforcing subgraphs. The
group:
feature may not necessarily be the best place to put it.It's possible this should be a more general pattern, or that this control flow logic should get its own custom abstraction.
For example, imagine the following flow:
The business roles we want to enforce are somewhat complex and look like this:
staging -> intermediate -> mart
staging
is private within each business vertical. They cannot be referenced by other verticals, even diagonally.It's not clear how this fits into the framework I'm proposing, or the groups stuff in general? I think the best you can do is to have 7 groups:
staging_A
,staging_B
,intermediate_A
,intermediate_B
,mart_A
,mart_B
, andorg_mart
; and haveallow_cycles: false
, on each one. In this case, it will kinda but not fully work: any org following the above rules will have a valid dbt parse, but not all deviations from these rules will have invalid dbt parses. I.e. there are no false negatives when relying solely on the parser, but there are false positives. Better than the status quo, but not perfect.Beta Was this translation helpful? Give feedback.
All reactions