-
Notifications
You must be signed in to change notification settings - Fork 182
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conditional Task Execution Based on Model Metadata #1416
Comments
Hello @itestyoy! I'm here to help you with any bugs, questions, or contributions you have. Let's tackle this together! I wasn't able to find specific information in the Cosmos repository about using metadata fields like To continue talking to Dosu, mention @dosu. Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other |
Hi @itestyoy, Could you please explain a bit more about your use cases for my understanding. Cosmos does support some param using model meta. See docs https://astronomer.github.io/astronomer-cosmos/getting_started/custom-airflow-properties.html |
@pankajastro Hi! I am considering the option of using metadata (e.g., "airflow_schedule": "@daily") directly in the model definition to optimize DAG execution time. Specifically, certain models need to be updated only once a day, while others can be refreshed more frequently. To achieve this optimization, metadata like airflow_schedule can be analyzed for each task, and task executions that do not align with the current execution schedule can be selectively skipped. For instance, if the current execution time does not match the airflow_schedule specified in the metadata, the task should raise a SkipException to avoid unnecessary execution. |
This is a very valid use case, and we've also considered it. Unfortunately, Airflow 2.x does not have built-in support for different schedules in the same DAG. If Cosmos 1.8 users want to handle this use case, they need to:
We logged a task that could simplify step (2) by allowing Cosmos users to add Dataset-based schedules to DbtDags automatically: The implementation of this task is relatively close in our roadmap. Moving forward, a few things could be done to improve this experience. One of them would be if Airlfow had built-in support for different schedules within the same DAG. I believe there was an intention of accomplishing this as part of https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-73+Expanded+Data+Awareness. @cmarteepants and @uranusjr may be able to give additional context and information on this. Another possibility would be for Cosmos to handle this Airflow 2.x limitation by adding conditional operators and deciding when subparts of the DAG should only be run in a specific subset of a more granular schedule. As an example, assuming we had the dbt models a, b, c, d with the following schedules:
If all these models were in the same DAG, the DAG could be run hourly
@itestyoy @pankajastro any thoughts on these ideas? |
@tatiana Hi! Airflow provides @skip_if and @run_if callback decorators in newer versions. More details here: Another approach is to wrap any operator with a callback function and raise an
In general, you don’t need to set a schedule for tasks in the graph(Airflow 2.x does not have built-in support for different schedules in the same DAG). Instead, tasks can be conditionally skipped based on a function that represents the own schedule logic. The DAG itself should be triggered according to your specific needs and aligned with the logic of the scheduling requirements. Using built-in task option Here’s an example implementation:
|
Hello!
I would like to know if it’s possible to run a task based on model metadata, for example:
meta: { "airflow_schedule": "@daily" }
or smt like this without post-rendering DAG.The text was updated successfully, but these errors were encountered: