Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dataset only emit for specific models #1442

Open
1 task
DanMawdsleyBA opened this issue Jan 8, 2025 · 2 comments
Open
1 task

Dataset only emit for specific models #1442

DanMawdsleyBA opened this issue Jan 8, 2025 · 2 comments
Labels
area:datasets Related to the Airflow datasets feature/module enhancement New feature or request triage-needed Items need to be reviewed / assigned to milestone

Comments

@DanMawdsleyBA
Copy link
Contributor

Description

We're looking into leveraging datasets but one problem we're having is the ability to control which models produce datasets. Would it be possible to implement something to be able to control which models produce datasets. Perhaps using a list or regex of models to emit datasets?

Use case/motivation

As part of dbt we use a lot of working tables were we don't want to emit datasets and want to cut down on the number of datasets as the airflow graph can get hard to use.

Related issues

No response

Are you willing to submit a PR?

  • Yes, I am willing to submit a PR!
@DanMawdsleyBA DanMawdsleyBA added enhancement New feature or request triage-needed Items need to be reviewed / assigned to milestone labels Jan 8, 2025
@dosubot dosubot bot added the area:datasets Related to the Airflow datasets feature/module label Jan 8, 2025
@tatiana
Copy link
Collaborator

tatiana commented Jan 8, 2025

@DanMawdsleyBA I haven't tried this, but it may be possible with Cosmos 1.8., thanks to @wornjs' #1339 contribution:
https://astronomer.github.io/astronomer-cosmos/getting_started/custom-airflow-properties.html

You could try to set RenderConfig(emit_datasets=False) at a DbtDag or DbtTaskGroup level, and in the models you wanted to emit datasets, you could try to configure:

version: 2

models:
  - name: name
    description: description
    meta:
      owner: '[email protected]'
      cosmos:
        operator_kwargs:
          emit_datasets: True

Since this parameter is set at an operator level:

:param emit_datasets: Enable emitting inlets and outlets during task execution

Could you please try it and let us know how it goes?

@DanMawdsleyBA
Copy link
Contributor Author

DanMawdsleyBA commented Jan 9, 2025

I have had a try of this which doesn't seem to work. I also tried setting the emit datasets to false from the operator args:

        operator_args={'dbt_cmd_flags': ['--exclude', 'test_type:unit']
                       , 'full_refresh': True
                       , "emit_datasets": False}

This doesn't work and still emits the datasests. The only way I have managed to turn them off is within the render_config:

        render_config=RenderConfig(
            emit_datasets=False)

But this will still turn them off for the entire dag.

Wonder if this is because of the behavior in converter.py

        task_args = {
            **operator_args,
            "project_dir": execution_config.project_path,
            "partial_parse": project_config.partial_parse,
            "profile_config": profile_config,
            "emit_datasets": render_config.emit_datasets,
            "env": env_vars,
            "vars": dbt_vars,
        }

For the emit_datasets its only using the value defined from the render config

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:datasets Related to the Airflow datasets feature/module enhancement New feature or request triage-needed Items need to be reviewed / assigned to milestone
Projects
None yet
Development

No branches or pull requests

2 participants