Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor state selection section #6894

Merged
merged 7 commits into from
Feb 11, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@ Here’s the challenge: monitoring tools, by their nature, look backward. They

[dbt Cloud](https://www.getdbt.com/product/dbt-cloud) unifies these perspectives into a single [control plane](https://www.getdbt.com/blog/data-control-plane-introduction), bridging proactive and retrospective capabilities:

- **Proactive planning**: In dbt, you declare the desired [state](https://docs.getdbt.com/reference/node-selection/syntax#state-selection) of your data before jobs even run — your architectural plans are baked into the pipeline.
- **Proactive planning**: In dbt, you declare the desired [state](https://docs.getdbt.com/reference/node-selection/state-selection) of your data before jobs even run — your architectural plans are baked into the pipeline.
- **Retrospective insights**: dbt Cloud surfaces [job logs](https://docs.getdbt.com/docs/deploy/run-visibility), performance metrics, and test results, providing the same level of insight as traditional monitoring tools.

But the real power lies in how dbt integrates these two perspectives. Transformation logic (the plans) and monitoring (the inspections) are tightly connected, creating a continuous feedback loop where issues can be identified and resolved faster, and pipelines can be optimized more effectively.
Expand Down
2 changes: 1 addition & 1 deletion website/docs/docs/deploy/ci-jobs.md
Original file line number Diff line number Diff line change
Expand Up @@ -150,7 +150,7 @@ For semantic nodes and models that aren't downstream of modified models, dbt Clo

<Expandable alt_header="Semantic nodes that are modified or affected by downstream modified nodes.">

To only validate modified semantic nodes, use the following command (with [state selection](/reference/node-selection/syntax#state-selection)):
To only validate modified semantic nodes, use the following command (with [state selection](/reference/node-selection/state-selection)):

```bash
dbt sl validate --select state:modified+
Expand Down
82 changes: 82 additions & 0 deletions website/docs/reference/node-selection/configure-state.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,82 @@
---
title: "Configure state selection"
description: "Learn how to configure state selection in dbt."
mirnawong1 marked this conversation as resolved.
Show resolved Hide resolved
pagination_next: "reference/node-selection/state-comparison-caveats"
---

State and [defer](/reference/node-selection/defer) can be set by environment variables as well as CLI flags:

- `--state` or `DBT_STATE`: file path
- `--defer` or `DBT_DEFER`: boolean
mirnawong1 marked this conversation as resolved.
Show resolved Hide resolved
- `--defer-state` or `DBT_DEFER_STATE`: file path to use for deferral only (optional)

If `--defer-state` is not specified, deferral will use the artifacts supplied by `--state`. This enables more granular control in cases where you want to compare against logical state from one environment or past point in time, and defer to applied state from a different environment or point in time.

If both the flag and env var are provided, the flag takes precedence.

#### Notes
- The `--state` artifacts must be of schema versions that are compatible with the currently running dbt version.
- These are powerful, complex features. Read about [known caveats and limitations](/reference/node-selection/state-comparison-caveats) to state comparison.

:::warning Syntax deprecated

In [dbt v1.5](/docs/dbt-versions/core-upgrade/upgrading-to-v1.5#behavior-changes), we deprecated the original syntax for state (`DBT_ARTIFACT_STATE_PATH`) and defer (`DBT_DEFER_TO_STATE`). Although dbt supports backward compatibility with the old syntax, we will remove it in a future release that we have not yet determined.
mirnawong1 marked this conversation as resolved.
Show resolved Hide resolved

:::

### The "result" status
mirnawong1 marked this conversation as resolved.
Show resolved Hide resolved

Another element of job state is the `result` of a prior dbt invocation. After executing a `dbt run`, for example, dbt creates the `run_results.json` artifact which contains execution times and success / error status for dbt models. You can read more about `run_results.json` on the ['run results'](/reference/artifacts/run-results-json) page.

The following dbt commands produce `run_results.json` artifacts whose results can be referenced in subsequent dbt invocations:
- `dbt run`
- `dbt test`
- `dbt build`
- `dbt seed`

After issuing one of the above commands, you can reference the results by adding a selector to a subsequent command as follows:

```bash
# You can also set the DBT_STATE environment variable instead of the --state flag.
dbt run --select "result:<status>" --defer --state path/to/prod/artifacts
```

The available options depend on the resource (node) type:

| `result:\<status>` | model | seed | snapshot | test |
|----------------|-------|------|------|----------|
| `result:error` | ✅ | ✅ | ✅ | ✅ |
| `result:success` | ✅ | ✅ | ✅ | |
| `result:skipped` | ✅ | | ✅ | ✅ |
| `result:fail` | | | | ✅ |
| `result:warn` | | | | ✅ |
| `result:pass` | | | | ✅ |

### Combining `state` and `result` selectors

The state and result selectors can also be combined in a single invocation of dbt to capture errors from a previous run OR any new or modified models.

```bash
dbt run --select "result:<status>+" state:modified+ --defer --state ./<dbt-artifact-path>
```

### The "source_status" status
mirnawong1 marked this conversation as resolved.
Show resolved Hide resolved

Another element of job state is the `source_status` of a prior dbt invocation. After executing `dbt source freshness`, for example, dbt creates the `sources.json` artifact which contains execution times and `max_loaded_at` dates for dbt sources. You can read more about `sources.json` on the ['sources'](/reference/artifacts/sources-json) page.

The `dbt source freshness` command produces a `sources.json` artifact whose results can be referenced in subsequent dbt invocations.

When a job is selected, dbt Cloud will surface the artifacts from that job's most recent successful run. dbt will then use those artifacts to determine the set of fresh sources. In your job commands, you can signal dbt to run and test only on the fresher sources and their children by including the `source_status:fresher+` argument. This requires both the previous and current states to have the `sources.json` artifact available. Or plainly said, both job states need to run `dbt source freshness`.

After issuing the `dbt source freshness` command, you can reference the source freshness results by adding a selector to a subsequent command:

```bash
# You can also set the DBT_STATE environment variable instead of the --state flag.
dbt source freshness # must be run again to compare current to previous state
dbt build --select "source_status:fresher+" --state path/to/prod/artifacts
```
For more example commands, refer to [Pro-tips for workflows](/best-practices/best-practice-workflows#pro-tips-for-workflows).

## Related docs
- [About state in dbt](/reference/node-selection/state-selection)
- [State comparison caveats](/reference/node-selection/state-comparison-caveats)
2 changes: 1 addition & 1 deletion website/docs/reference/node-selection/methods.md
Original file line number Diff line number Diff line change
Expand Up @@ -224,7 +224,7 @@ dbt build --select "source_status:fresher+" --state path/to/prod/artifacts

### state

**N.B.** State-based selection is a powerful, complex feature. Read about [known caveats and limitations](/reference/node-selection/state-comparison-caveats) to state comparison.
**N.B.** [State-based selection](/reference/node-selection/state-selection) is a powerful, complex feature. Read about [known caveats and limitations](/reference/node-selection/state-comparison-caveats) to state comparison.
mirnawong1 marked this conversation as resolved.
Show resolved Hide resolved

The `state` method is used to select nodes by comparing them against a previous version of the same project, which is represented by a [manifest](/reference/artifacts/manifest-json). The file path of the comparison manifest _must_ be specified via the `--state` flag or `DBT_STATE` environment variable.

Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
---
title: "Caveats to state comparison"
description: "Learn about caveats to state comparison in dbt."
pagination_prev: "reference/node-selection/configure-state"
---

import StateModified from '/snippets/_state-modified-compare.md';
Expand Down Expand Up @@ -89,3 +91,7 @@ That means the following config—functionally identical to the snippet above—
### Final note

State comparison is complex. We hope to reach eventual consistency between all configuration options, as well as providing users with the control they need to reliably return all modified resources, and only the ones they expect. If you're interested in learning more, read [open issues tagged "state"](https://github.com/dbt-labs/dbt-core/issues?q=is%3Aopen+is%3Aissue+label%3Astate) in the dbt repository.

## Related docs
- [About state in dbt](/reference/node-selection/state-selection)
- [Configure state selection](/reference/node-selection/configure-state)
21 changes: 21 additions & 0 deletions website/docs/reference/node-selection/state-selection.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
---
title: "About state in dbt"
description: "dbt operations are stateless and idempotent, but artifacts enable state-based features like slim CI and deferral."
pagination_next: "reference/node-selection/configure-state"
---

One of the greatest underlying assumptions about dbt is that its operations should be **stateless** and **<Term id="idempotent" />**. That is, it doesn't matter how many times a model has been run before, or if it has ever been run before. It doesn't matter if you run it once or a thousand times. Given the same raw data, you can expect the same transformed result. A given run of dbt doesn't need to "know" about _any other_ run; it just needs to know about the code in the project and the objects in your database as they exist _right now_.

That said, dbt does store "state" &mdash; a detailed, point-in-time view of project resources (also referred to as nodes), database objects, and invocation results &mdash; in the form of its [artifacts](/docs/deploy/artifacts). If you choose, dbt can use these artifacts to inform certain operations. Crucially, the operations themselves are still stateless and <Term id="idempotent" />: given the same manifest and the same raw data, dbt will produce the same transformed result.

dbt can leverage artifacts from a prior invocation as long as their file path is passed to the `--state` flag. This is a prerequisite for:
- [The `state` selector](/reference/node-selection/methods#state), whereby dbt can identify resources that are new or modified
by comparing code in the current project against the state manifest.
- [Deferring](/reference/node-selection/defer) to another environment, whereby dbt can identify upstream, unselected resources that don't exist in your current environment and instead "defer" their references to the environment provided by the state manifest.
- The [`dbt clone` command](/reference/commands/clone), whereby dbt can clone nodes based on their location in the manifest provided to the `--state` flag.

Together, the [`state`](/reference/node-selection/methods#state) selector and deferral enable ["slim CI"](/best-practices/best-practice-workflows#run-only-modified-models-to-test-changes-slim-ci). We expect to add more features in future releases that can leverage artifacts passed to the `--state` flag.

## Related docs
- [Configure state selection](/reference/node-selection/configure-state)
- [State comparison caveats](/reference/node-selection/state-comparison-caveats)
90 changes: 0 additions & 90 deletions website/docs/reference/node-selection/syntax.md
Original file line number Diff line number Diff line change
Expand Up @@ -121,93 +121,3 @@ dbt ls --select "result:<status>+" state:modified+ --state ./<dbt-artifact-path>

<Snippet path="discourse-help-feed-header" />
<DiscourseHelpFeed tags="node-selection"/>


## State selection

One of the greatest underlying assumptions about dbt is that its operations should be **stateless** and **<Term id="idempotent" />**. That is, it doesn't matter how many times a model has been run before, or if it has ever been run before. It doesn't matter if you run it once or a thousand times. Given the same raw data, you can expect the same transformed result. A given run of dbt doesn't need to "know" about _any other_ run; it just needs to know about the code in the project and the objects in your database as they exist _right now_.

That said, dbt does store "state" &mdash; a detailed, point-in-time view of project resources (also referred to as nodes), database objects, and invocation results &mdash; in the form of its [artifacts](/docs/deploy/artifacts). If you choose, dbt can use these artifacts to inform certain operations. Crucially, the operations themselves are still stateless and <Term id="idempotent" />: given the same manifest and the same raw data, dbt will produce the same transformed result.

dbt can leverage artifacts from a prior invocation as long as their file path is passed to the `--state` flag. This is a prerequisite for:
- [The `state` selector](/reference/node-selection/methods#state), whereby dbt can identify resources that are new or modified
by comparing code in the current project against the state manifest.
- [Deferring](/reference/node-selection/defer) to another environment, whereby dbt can identify upstream, unselected resources that don't exist in your current environment and instead "defer" their references to the environment provided by the state manifest.
- The [`dbt clone` command](/reference/commands/clone), whereby dbt can clone nodes based on their location in the manifest provided to the `--state` flag.

Together, the [`state`](/reference/node-selection/methods#state) selector and deferral enable ["slim CI"](/best-practices/best-practice-workflows#run-only-modified-models-to-test-changes-slim-ci). We expect to add more features in future releases that can leverage artifacts passed to the `--state` flag.

### Establishing state

State and defer can be set by environment variables as well as CLI flags:

- `--state` or `DBT_STATE`: file path
- `--defer` or `DBT_DEFER`: boolean
- `--defer-state` or `DBT_DEFER_STATE`: file path to use for deferral only (optional)

If `--defer-state` is not specified, deferral will use the artifacts supplied by `--state`. This enables more granular control in cases where you want to compare against logical state from one environment or past point in time, and defer to applied state from a different environment or point in time.

If both the flag and env var are provided, the flag takes precedence.

#### Notes:
- The `--state` artifacts must be of schema versions that are compatible with the currently running dbt version.
- These are powerful, complex features. Read about [known caveats and limitations](/reference/node-selection/state-comparison-caveats) to state comparison.

:::warning Syntax deprecated

In [dbt v1.5](/docs/dbt-versions/core-upgrade/upgrading-to-v1.5#behavior-changes), we deprecated the original syntax for state (`DBT_ARTIFACT_STATE_PATH`) and defer (`DBT_DEFER_TO_STATE`). Although dbt supports backward compatibility with the old syntax, we will remove it in a future release that we have not yet determined.

:::

### The "result" status

Another element of job state is the `result` of a prior dbt invocation. After executing a `dbt run`, for example, dbt creates the `run_results.json` artifact which contains execution times and success / error status for dbt models. You can read more about `run_results.json` on the ['run results'](/reference/artifacts/run-results-json) page.

The following dbt commands produce `run_results.json` artifacts whose results can be referenced in subsequent dbt invocations:
- `dbt run`
- `dbt test`
- `dbt build` (new in dbt version v0.21.0)
- `dbt seed`

After issuing one of the above commands, you can reference the results by adding a selector to a subsequent command as follows:

```bash
# You can also set the DBT_STATE environment variable instead of the --state flag.
dbt run --select "result:<status>" --defer --state path/to/prod/artifacts
```

The available options depend on the resource (node) type:

| `result:\<status>` | model | seed | snapshot | test |
|----------------|-------|------|------|----------|
| `result:error` | ✅ | ✅ | ✅ | ✅ |
| `result:success` | ✅ | ✅ | ✅ | |
| `result:skipped` | ✅ | | ✅ | ✅ |
| `result:fail` | | | | ✅ |
| `result:warn` | | | | ✅ |
| `result:pass` | | | | ✅ |

### Combining `state` and `result` selectors

The state and result selectors can also be combined in a single invocation of dbt to capture errors from a previous run OR any new or modified models.

```bash
dbt run --select "result:<status>+" state:modified+ --defer --state ./<dbt-artifact-path>
```

### The "source_status" status

Another element of job state is the `source_status` of a prior dbt invocation. After executing `dbt source freshness`, for example, dbt creates the `sources.json` artifact which contains execution times and `max_loaded_at` dates for dbt sources. You can read more about `sources.json` on the ['sources'](/reference/artifacts/sources-json) page.

The `dbt source freshness` command produces a `sources.json` artifact whose results can be referenced in subsequent dbt invocations.

When a job is selected, dbt Cloud will surface the artifacts from that job's most recent successful run. dbt will then use those artifacts to determine the set of fresh sources. In your job commands, you can signal dbt to run and test only on the fresher sources and their children by including the `source_status:fresher+` argument. This requires both the previous and current states to have the `sources.json` artifact available. Or plainly said, both job states need to run `dbt source freshness`.

After issuing the `dbt source freshness` command, you can reference the source freshness results by adding a selector to a subsequent command:

```bash
# You can also set the DBT_STATE environment variable instead of the --state flag.
dbt source freshness # must be run again to compare current to previous state
dbt build --select "source_status:fresher+" --state path/to/prod/artifacts
```
For more example commands, refer to [Pro-tips for workflows](/best-practices/best-practice-workflows#pro-tips-for-workflows).
14 changes: 13 additions & 1 deletion website/sidebars.js
Original file line number Diff line number Diff line change
Expand Up @@ -1109,9 +1109,21 @@ const sidebarSettings = {
"reference/node-selection/set-operators",
"reference/node-selection/methods",
"reference/node-selection/putting-it-together",
"reference/node-selection/state-comparison-caveats",
"reference/node-selection/yaml-selectors",
"reference/node-selection/test-selection-examples",
{
type: "category",
label: "About state selection",
link: {
type: "doc",
id: "reference/node-selection/state-selection",
},
items: [
"reference/node-selection/state-selection",
"reference/node-selection/configure-state",
"reference/node-selection/state-comparison-caveats",
],
},
],
},
{
Expand Down
Loading