From 434430cd017b858e78dd84fa3ab9a4ac218a4110 Mon Sep 17 00:00:00 2001 From: mirnawong1 Date: Mon, 10 Feb 2025 17:01:04 +0000 Subject: [PATCH 1/4] add new page --- .../node-selection/configure-state.md | 82 +++++++++++++++++ .../state-comparison-caveats.md | 6 ++ .../node-selection/state-selection.md | 21 +++++ .../docs/reference/node-selection/syntax.md | 90 ------------------- website/sidebars.js | 14 ++- 5 files changed, 122 insertions(+), 91 deletions(-) create mode 100644 website/docs/reference/node-selection/configure-state.md create mode 100644 website/docs/reference/node-selection/state-selection.md diff --git a/website/docs/reference/node-selection/configure-state.md b/website/docs/reference/node-selection/configure-state.md new file mode 100644 index 00000000000..42d8d6f303c --- /dev/null +++ b/website/docs/reference/node-selection/configure-state.md @@ -0,0 +1,82 @@ +--- +title: "Configure state selection" +description: "Learn how to configure state selection in dbt." +pagination_next: "reference/node-selection/state-comparison-caveats" +--- + +State and defer can be set by environment variables as well as CLI flags: + +- `--state` or `DBT_STATE`: file path +- `--defer` or `DBT_DEFER`: boolean +- `--defer-state` or `DBT_DEFER_STATE`: file path to use for deferral only (optional) + +If `--defer-state` is not specified, deferral will use the artifacts supplied by `--state`. This enables more granular control in cases where you want to compare against logical state from one environment or past point in time, and defer to applied state from a different environment or point in time. + +If both the flag and env var are provided, the flag takes precedence. + +#### Notes: +- The `--state` artifacts must be of schema versions that are compatible with the currently running dbt version. +- These are powerful, complex features. Read about [known caveats and limitations](/reference/node-selection/state-comparison-caveats) to state comparison. + +:::warning Syntax deprecated + +In [dbt v1.5](/docs/dbt-versions/core-upgrade/upgrading-to-v1.5#behavior-changes), we deprecated the original syntax for state (`DBT_ARTIFACT_STATE_PATH`) and defer (`DBT_DEFER_TO_STATE`). Although dbt supports backward compatibility with the old syntax, we will remove it in a future release that we have not yet determined. + +::: + +### The "result" status + +Another element of job state is the `result` of a prior dbt invocation. After executing a `dbt run`, for example, dbt creates the `run_results.json` artifact which contains execution times and success / error status for dbt models. You can read more about `run_results.json` on the ['run results'](/reference/artifacts/run-results-json) page. + +The following dbt commands produce `run_results.json` artifacts whose results can be referenced in subsequent dbt invocations: +- `dbt run` +- `dbt test` +- `dbt build` (new in dbt version v0.21.0) +- `dbt seed` + +After issuing one of the above commands, you can reference the results by adding a selector to a subsequent command as follows: + +```bash +# You can also set the DBT_STATE environment variable instead of the --state flag. +dbt run --select "result:" --defer --state path/to/prod/artifacts +``` + +The available options depend on the resource (node) type: + +| `result:\` | model | seed | snapshot | test | +|----------------|-------|------|------|----------| +| `result:error` | ✅ | ✅ | ✅ | ✅ | +| `result:success` | ✅ | ✅ | ✅ | | +| `result:skipped` | ✅ | | ✅ | ✅ | +| `result:fail` | | | | ✅ | +| `result:warn` | | | | ✅ | +| `result:pass` | | | | ✅ | + +### Combining `state` and `result` selectors + +The state and result selectors can also be combined in a single invocation of dbt to capture errors from a previous run OR any new or modified models. + +```bash +dbt run --select "result:+" state:modified+ --defer --state ./ +``` + +### The "source_status" status + +Another element of job state is the `source_status` of a prior dbt invocation. After executing `dbt source freshness`, for example, dbt creates the `sources.json` artifact which contains execution times and `max_loaded_at` dates for dbt sources. You can read more about `sources.json` on the ['sources'](/reference/artifacts/sources-json) page. + +The `dbt source freshness` command produces a `sources.json` artifact whose results can be referenced in subsequent dbt invocations. + +When a job is selected, dbt Cloud will surface the artifacts from that job's most recent successful run. dbt will then use those artifacts to determine the set of fresh sources. In your job commands, you can signal dbt to run and test only on the fresher sources and their children by including the `source_status:fresher+` argument. This requires both the previous and current states to have the `sources.json` artifact available. Or plainly said, both job states need to run `dbt source freshness`. + +After issuing the `dbt source freshness` command, you can reference the source freshness results by adding a selector to a subsequent command: + +```bash +# You can also set the DBT_STATE environment variable instead of the --state flag. +dbt source freshness # must be run again to compare current to previous state +dbt build --select "source_status:fresher+" --state path/to/prod/artifacts +``` +For more example commands, refer to [Pro-tips for workflows](/best-practices/best-practice-workflows#pro-tips-for-workflows). + +## Related docs +- [About state in dbt](/reference/node-selection/state-selection) +- [State comparison caveats](/reference/node-selection/state-comparison-caveats) diff --git a/website/docs/reference/node-selection/state-comparison-caveats.md b/website/docs/reference/node-selection/state-comparison-caveats.md index f83a4f37c89..bf79634e178 100644 --- a/website/docs/reference/node-selection/state-comparison-caveats.md +++ b/website/docs/reference/node-selection/state-comparison-caveats.md @@ -1,5 +1,7 @@ --- title: "Caveats to state comparison" +description: "Learn about caveats to state comparison in dbt." +pagination_prev: "reference/node-selection/configure-state" --- import StateModified from '/snippets/_state-modified-compare.md'; @@ -89,3 +91,7 @@ That means the following config—functionally identical to the snippet above— ### Final note State comparison is complex. We hope to reach eventual consistency between all configuration options, as well as providing users with the control they need to reliably return all modified resources, and only the ones they expect. If you're interested in learning more, read [open issues tagged "state"](https://github.com/dbt-labs/dbt-core/issues?q=is%3Aopen+is%3Aissue+label%3Astate) in the dbt repository. + +## Related docs +- [About state in dbt](/reference/node-selection/state-selection) +- [Configure state selection](/reference/node-selection/configure-state) diff --git a/website/docs/reference/node-selection/state-selection.md b/website/docs/reference/node-selection/state-selection.md new file mode 100644 index 00000000000..5741400ceb1 --- /dev/null +++ b/website/docs/reference/node-selection/state-selection.md @@ -0,0 +1,21 @@ +--- +title: "About state in dbt" +description: "dbt operations are stateless and idempotent, but artifacts enable state-based features like slim CI and deferral." +pagination_next: "reference/node-selection/configure-state" +--- + +One of the greatest underlying assumptions about dbt is that its operations should be **stateless** and ****. That is, it doesn't matter how many times a model has been run before, or if it has ever been run before. It doesn't matter if you run it once or a thousand times. Given the same raw data, you can expect the same transformed result. A given run of dbt doesn't need to "know" about _any other_ run; it just needs to know about the code in the project and the objects in your database as they exist _right now_. + +That said, dbt does store "state" — a detailed, point-in-time view of project resources (also referred to as nodes), database objects, and invocation results — in the form of its [artifacts](/docs/deploy/artifacts). If you choose, dbt can use these artifacts to inform certain operations. Crucially, the operations themselves are still stateless and : given the same manifest and the same raw data, dbt will produce the same transformed result. + +dbt can leverage artifacts from a prior invocation as long as their file path is passed to the `--state` flag. This is a prerequisite for: +- [The `state` selector](/reference/node-selection/methods#state), whereby dbt can identify resources that are new or modified +by comparing code in the current project against the state manifest. +- [Deferring](/reference/node-selection/defer) to another environment, whereby dbt can identify upstream, unselected resources that don't exist in your current environment and instead "defer" their references to the environment provided by the state manifest. +- The [`dbt clone` command](/reference/commands/clone), whereby dbt can clone nodes based on their location in the manifest provided to the `--state` flag. + +Together, the [`state`](/reference/node-selection/methods#state) selector and deferral enable ["slim CI"](/best-practices/best-practice-workflows#run-only-modified-models-to-test-changes-slim-ci). We expect to add more features in future releases that can leverage artifacts passed to the `--state` flag. + +## Related docs +- [Configure state selection](/reference/node-selection/configure-state) +- [State comparison caveats](/reference/node-selection/state-comparison-caveats) diff --git a/website/docs/reference/node-selection/syntax.md b/website/docs/reference/node-selection/syntax.md index a6c3e7eb81e..43e31d64d71 100644 --- a/website/docs/reference/node-selection/syntax.md +++ b/website/docs/reference/node-selection/syntax.md @@ -121,93 +121,3 @@ dbt ls --select "result:+" state:modified+ --state ./ - - -## State selection - -One of the greatest underlying assumptions about dbt is that its operations should be **stateless** and ****. That is, it doesn't matter how many times a model has been run before, or if it has ever been run before. It doesn't matter if you run it once or a thousand times. Given the same raw data, you can expect the same transformed result. A given run of dbt doesn't need to "know" about _any other_ run; it just needs to know about the code in the project and the objects in your database as they exist _right now_. - -That said, dbt does store "state" — a detailed, point-in-time view of project resources (also referred to as nodes), database objects, and invocation results — in the form of its [artifacts](/docs/deploy/artifacts). If you choose, dbt can use these artifacts to inform certain operations. Crucially, the operations themselves are still stateless and : given the same manifest and the same raw data, dbt will produce the same transformed result. - -dbt can leverage artifacts from a prior invocation as long as their file path is passed to the `--state` flag. This is a prerequisite for: -- [The `state` selector](/reference/node-selection/methods#state), whereby dbt can identify resources that are new or modified -by comparing code in the current project against the state manifest. -- [Deferring](/reference/node-selection/defer) to another environment, whereby dbt can identify upstream, unselected resources that don't exist in your current environment and instead "defer" their references to the environment provided by the state manifest. -- The [`dbt clone` command](/reference/commands/clone), whereby dbt can clone nodes based on their location in the manifest provided to the `--state` flag. - -Together, the [`state`](/reference/node-selection/methods#state) selector and deferral enable ["slim CI"](/best-practices/best-practice-workflows#run-only-modified-models-to-test-changes-slim-ci). We expect to add more features in future releases that can leverage artifacts passed to the `--state` flag. - -### Establishing state - -State and defer can be set by environment variables as well as CLI flags: - -- `--state` or `DBT_STATE`: file path -- `--defer` or `DBT_DEFER`: boolean -- `--defer-state` or `DBT_DEFER_STATE`: file path to use for deferral only (optional) - -If `--defer-state` is not specified, deferral will use the artifacts supplied by `--state`. This enables more granular control in cases where you want to compare against logical state from one environment or past point in time, and defer to applied state from a different environment or point in time. - -If both the flag and env var are provided, the flag takes precedence. - -#### Notes: -- The `--state` artifacts must be of schema versions that are compatible with the currently running dbt version. -- These are powerful, complex features. Read about [known caveats and limitations](/reference/node-selection/state-comparison-caveats) to state comparison. - -:::warning Syntax deprecated - -In [dbt v1.5](/docs/dbt-versions/core-upgrade/upgrading-to-v1.5#behavior-changes), we deprecated the original syntax for state (`DBT_ARTIFACT_STATE_PATH`) and defer (`DBT_DEFER_TO_STATE`). Although dbt supports backward compatibility with the old syntax, we will remove it in a future release that we have not yet determined. - -::: - -### The "result" status - -Another element of job state is the `result` of a prior dbt invocation. After executing a `dbt run`, for example, dbt creates the `run_results.json` artifact which contains execution times and success / error status for dbt models. You can read more about `run_results.json` on the ['run results'](/reference/artifacts/run-results-json) page. - -The following dbt commands produce `run_results.json` artifacts whose results can be referenced in subsequent dbt invocations: -- `dbt run` -- `dbt test` -- `dbt build` (new in dbt version v0.21.0) -- `dbt seed` - -After issuing one of the above commands, you can reference the results by adding a selector to a subsequent command as follows: - -```bash -# You can also set the DBT_STATE environment variable instead of the --state flag. -dbt run --select "result:" --defer --state path/to/prod/artifacts -``` - -The available options depend on the resource (node) type: - -| `result:\` | model | seed | snapshot | test | -|----------------|-------|------|------|----------| -| `result:error` | ✅ | ✅ | ✅ | ✅ | -| `result:success` | ✅ | ✅ | ✅ | | -| `result:skipped` | ✅ | | ✅ | ✅ | -| `result:fail` | | | | ✅ | -| `result:warn` | | | | ✅ | -| `result:pass` | | | | ✅ | - -### Combining `state` and `result` selectors - -The state and result selectors can also be combined in a single invocation of dbt to capture errors from a previous run OR any new or modified models. - -```bash -dbt run --select "result:+" state:modified+ --defer --state ./ -``` - -### The "source_status" status - -Another element of job state is the `source_status` of a prior dbt invocation. After executing `dbt source freshness`, for example, dbt creates the `sources.json` artifact which contains execution times and `max_loaded_at` dates for dbt sources. You can read more about `sources.json` on the ['sources'](/reference/artifacts/sources-json) page. - -The `dbt source freshness` command produces a `sources.json` artifact whose results can be referenced in subsequent dbt invocations. - -When a job is selected, dbt Cloud will surface the artifacts from that job's most recent successful run. dbt will then use those artifacts to determine the set of fresh sources. In your job commands, you can signal dbt to run and test only on the fresher sources and their children by including the `source_status:fresher+` argument. This requires both the previous and current states to have the `sources.json` artifact available. Or plainly said, both job states need to run `dbt source freshness`. - -After issuing the `dbt source freshness` command, you can reference the source freshness results by adding a selector to a subsequent command: - -```bash -# You can also set the DBT_STATE environment variable instead of the --state flag. -dbt source freshness # must be run again to compare current to previous state -dbt build --select "source_status:fresher+" --state path/to/prod/artifacts -``` -For more example commands, refer to [Pro-tips for workflows](/best-practices/best-practice-workflows#pro-tips-for-workflows). diff --git a/website/sidebars.js b/website/sidebars.js index ef5a66c889b..0365e171d6f 100644 --- a/website/sidebars.js +++ b/website/sidebars.js @@ -1109,9 +1109,21 @@ const sidebarSettings = { "reference/node-selection/set-operators", "reference/node-selection/methods", "reference/node-selection/putting-it-together", - "reference/node-selection/state-comparison-caveats", "reference/node-selection/yaml-selectors", "reference/node-selection/test-selection-examples", + { + type: "category", + label: "About state selection", + link: { + type: "doc", + id: "reference/node-selection/state-selection", + }, + items: [ + "reference/node-selection/state-selection", + "reference/node-selection/configure-state", + "reference/node-selection/state-comparison-caveats", + ], + }, ], }, { From 749c4b60ce72794308a532d8a09b6b2419b36ba6 Mon Sep 17 00:00:00 2001 From: mirnawong1 Date: Mon, 10 Feb 2025 17:03:53 +0000 Subject: [PATCH 2/4] upate links --- .../2025-01-21-wish-i-had-a-control-plane-for-my-renovation.md | 2 +- website/docs/docs/deploy/ci-jobs.md | 2 +- website/docs/reference/node-selection/methods.md | 2 +- 3 files changed, 3 insertions(+), 3 deletions(-) diff --git a/website/blog/2025-01-21-wish-i-had-a-control-plane-for-my-renovation.md b/website/blog/2025-01-21-wish-i-had-a-control-plane-for-my-renovation.md index 80db3f66de6..01d7fe037a1 100644 --- a/website/blog/2025-01-21-wish-i-had-a-control-plane-for-my-renovation.md +++ b/website/blog/2025-01-21-wish-i-had-a-control-plane-for-my-renovation.md @@ -47,7 +47,7 @@ Here’s the challenge: monitoring tools, by their nature, look backward. They [dbt Cloud](https://www.getdbt.com/product/dbt-cloud) unifies these perspectives into a single [control plane](https://www.getdbt.com/blog/data-control-plane-introduction), bridging proactive and retrospective capabilities: -- **Proactive planning**: In dbt, you declare the desired [state](https://docs.getdbt.com/reference/node-selection/syntax#state-selection) of your data before jobs even run — your architectural plans are baked into the pipeline. +- **Proactive planning**: In dbt, you declare the desired [state](https://docs.getdbt.com/reference/node-selection/state-selection) of your data before jobs even run — your architectural plans are baked into the pipeline. - **Retrospective insights**: dbt Cloud surfaces [job logs](https://docs.getdbt.com/docs/deploy/run-visibility), performance metrics, and test results, providing the same level of insight as traditional monitoring tools. But the real power lies in how dbt integrates these two perspectives. Transformation logic (the plans) and monitoring (the inspections) are tightly connected, creating a continuous feedback loop where issues can be identified and resolved faster, and pipelines can be optimized more effectively. diff --git a/website/docs/docs/deploy/ci-jobs.md b/website/docs/docs/deploy/ci-jobs.md index ff7c321282d..7d49c9d4e43 100644 --- a/website/docs/docs/deploy/ci-jobs.md +++ b/website/docs/docs/deploy/ci-jobs.md @@ -150,7 +150,7 @@ For semantic nodes and models that aren't downstream of modified models, dbt Clo -To only validate modified semantic nodes, use the following command (with [state selection](/reference/node-selection/syntax#state-selection)): +To only validate modified semantic nodes, use the following command (with [state selection](/reference/node-selection/state-selection)): ```bash dbt sl validate --select state:modified+ diff --git a/website/docs/reference/node-selection/methods.md b/website/docs/reference/node-selection/methods.md index 29eb79a9130..c7b5b8d810e 100644 --- a/website/docs/reference/node-selection/methods.md +++ b/website/docs/reference/node-selection/methods.md @@ -224,7 +224,7 @@ dbt build --select "source_status:fresher+" --state path/to/prod/artifacts ### state -**N.B.** State-based selection is a powerful, complex feature. Read about [known caveats and limitations](/reference/node-selection/state-comparison-caveats) to state comparison. +**N.B.** [State-based selection](/reference/node-selection/state-selection) is a powerful, complex feature. Read about [known caveats and limitations](/reference/node-selection/state-comparison-caveats) to state comparison. The `state` method is used to select nodes by comparing them against a previous version of the same project, which is represented by a [manifest](/reference/artifacts/manifest-json). The file path of the comparison manifest _must_ be specified via the `--state` flag or `DBT_STATE` environment variable. From 9ceecad56df20850a3c81a9cac41cb297e9fe0ef Mon Sep 17 00:00:00 2001 From: Mirna Wong <89008547+mirnawong1@users.noreply.github.com> Date: Mon, 10 Feb 2025 17:10:10 +0000 Subject: [PATCH 3/4] Update configure-state.md --- website/docs/reference/node-selection/configure-state.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/website/docs/reference/node-selection/configure-state.md b/website/docs/reference/node-selection/configure-state.md index 42d8d6f303c..ad7b4feaf6a 100644 --- a/website/docs/reference/node-selection/configure-state.md +++ b/website/docs/reference/node-selection/configure-state.md @@ -14,7 +14,7 @@ If `--defer-state` is not specified, deferral will use the artifacts supplied by If both the flag and env var are provided, the flag takes precedence. -#### Notes: +#### Notes - The `--state` artifacts must be of schema versions that are compatible with the currently running dbt version. - These are powerful, complex features. Read about [known caveats and limitations](/reference/node-selection/state-comparison-caveats) to state comparison. @@ -31,7 +31,7 @@ Another element of job state is the `result` of a prior dbt invocation. After ex The following dbt commands produce `run_results.json` artifacts whose results can be referenced in subsequent dbt invocations: - `dbt run` - `dbt test` -- `dbt build` (new in dbt version v0.21.0) +- `dbt build` - `dbt seed` After issuing one of the above commands, you can reference the results by adding a selector to a subsequent command as follows: From db68ebccba09c61cc10b31bcf81f531e4553c633 Mon Sep 17 00:00:00 2001 From: Mirna Wong <89008547+mirnawong1@users.noreply.github.com> Date: Tue, 11 Feb 2025 10:14:22 +0000 Subject: [PATCH 4/4] Update configure-state.md --- website/docs/reference/node-selection/configure-state.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/website/docs/reference/node-selection/configure-state.md b/website/docs/reference/node-selection/configure-state.md index ad7b4feaf6a..f125be3070c 100644 --- a/website/docs/reference/node-selection/configure-state.md +++ b/website/docs/reference/node-selection/configure-state.md @@ -4,7 +4,7 @@ description: "Learn how to configure state selection in dbt." pagination_next: "reference/node-selection/state-comparison-caveats" --- -State and defer can be set by environment variables as well as CLI flags: +State and [defer](/reference/node-selection/defer) can be set by environment variables as well as CLI flags: - `--state` or `DBT_STATE`: file path - `--defer` or `DBT_DEFER`: boolean