-
Notifications
You must be signed in to change notification settings - Fork 180
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] The key has to be less than 250 characters #1440
Comments
To address the issue of long task names generated by Regarding the To continue talking to Dosu, mention @dosu. Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other |
After upgrading Cosmos from 1.7.1 to 1.8.1, I also started experiencing problems caused by excessively long task names, though in my case it's another part of Airflow that's complaining about the 250 character limit:
|
Thank you for reporting the issue. @ms32035 @jtrobles-cdd, I’m curious—did you only upgrade Cosmos, or was there also an upgrade to Airflow? Additionally, since Airflow limits task keys to fewer than 250 characters, do you have any suggestions on how we could shorten the task keys while still making them identifiable in terms of the node they represent? Any ideas here would help us address the issue in a more user-friendly way. @ms32035, regarding your comment:
This is expected behavior, as outlined in the documentation: https://github.com/astronomer/astronomer-cosmos/blob/main/docs/configuration/testing-behavior.rst. You can refer to the section titled |
@pankajkoti it's just Cosmos upgrade One workaround idea I have, which is not exactly user friendly is to use the hash of the test that dbt generates On the |
Only Cosmos. I've been using Airflow 2.10.4 (with Astro Runtime 12.6.0) for many weeks without issues. |
Running into this too since cosmos==1.8.1. @pankajkoti We could group these tests by joint parents. E.g. for |
@ms32035 @jtrobles-cdd @internetcoffeephone thanks a lot for reporting / discussing this issue. I'm sorry the fix I made to solve another problem had these unintended side effects. It feels like we have two bugs:
I liked the suggestion of @internetcoffeephone for (1) - how do the others feel about it? Can we log a separate ticket for (2)? @ms32035, could you do this? Would anyone be interested in working on this this week? If not, the Astronomer team can take over these two issues starting next week. In the meantime, to avoid these problems, the recommendation is to use 1.8.0 until we've released 1.8.2 with the fixes. |
@tatiana @pankajkoti I've given it some more thought and I think there are some issues with my previous suggestion. The reason that this abstraction is important is because it makes it easier to reason about the DAG state - if you are going to make some external DAG depend on a specific model, you don't want to hunt down all downstream tests that may or may not be defined in the model YAML. You just want to create a sensor for the model's TaskGroup, which doesn't work if there exist tests for that model outside of its TaskGroup. It's unclear to me why before #1433 was implemented, Since this test is defined under a specific model according to the YAML file, even if it has other dependencies, shouldn't it be included only in that specific model's TaskGroup? Or am I missing an edge case where you would be able to define out-of-order tests? E.g. adding the test in the YAML file under |
@internetcoffeephone #1433 was implemented because tests that depend on more than one dbt node had chances of failing. This happened because they were executing every time each model was run, regardless of all the dependencies that had been previously executed.
Since the test macro itself depends on both models, as shown below, both astronomer-cosmos/dev/dags/dbt/multiple_parents_test/macros/custom_test_combined_model.sql Lines 2 to 7 in 184e45f
Please feel free to try it yourself. As part of #1433, we created a straightforward dbt project in #1433 and a DAG that uses it - so we could reproduce the original problem and make sure we would not have regressions once it was solved. While this is just an example, the problem was reported by several companies that use Cosmos, using real-case dbt projects.
It would be great if this were the case, but that's not what dbt-core currently does. We could not find a solution for keeping these multiple-parent tests in the same task groups of individual dependencies while guaranteeing all the dependencies had been resolved. |
@tatiana Thank you for your explanation, I think I understand now. I tried flipping the dependencies in the yaml file and running If we naively add the test to the Airflow test task of the model TaskGroup wherever it is assigned in the YAML file, it may fail. E.g. if you move the test to
and change the test to:
then we get an almost identical output, except for the test name (and a few similar name changes): instead of So yes, the case where you can define out-of-order tests exists and we cannot blindly trust the YAML structure. In fact, the YAML structure doesn't even explicitly appear in the JSON output. Then the question becomes: what behavior do we want? What we definitely want is that a test only executes once per run, and only after all of its upstream models have ran. For any given test with
For case 2, there is the problem that a "most downstream" TaskGroup may not explicitly exist, e.g. in a setup where For both case 2 and 3, we would be breaking the abstraction of a TaskGroup for a model failing if any of its tests fail. Therefore I'm inclined towards 1. The main objection to case 1 would be that we add more complexity to the DAG. However, it is possible, and it preserves the TaskGroup abstraction. E.g. for models The above approach fails if dependencies within the Airflow DAG are specified at TaskGroup level - we'd have to make the dependency explicit on either the Happy to hear your opinion here, I'm not 100% sure whether this is what we want - as we're veering outside of regular dbt functionality (better functionality I might add), we can't just try to replicate existing dbt behavior. |
Astronomer Cosmos Version
1.8.1
dbt-core version
1.9.0
Versions of dbt adapters
No response
LoadMode
DBT_LS_MANIFEST
ExecutionMode
KUBERNETES
InvocationMode
None
airflow version
2.10.4
Operating System
Debian GNU/Linux 12 (bookworm)
If a you think it's an UI issue, what browsers are you seeing the problem on?
No response
Deployment
Official Apache Airflow Helm Chart
Deployment details
No response
What happened?
Looks like for #1433 in case of some tests provided by packages like dbt_utils or dbt_expectations the names generated are absurdly long
and result in:
Additionally, it seems that tests are generated as their own task regardless of the
test_behaviour
setting. Specifically, these are generated when the value isAFTER_ALL
Relevant log output
How to reproduce
Create a test using
dbt_utils.relationships_where
where table and column names are longAnything else :)?
No response
Are you willing to submit PR?
Contact Details
No response
The text was updated successfully, but these errors were encountered: