-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[airflow] Add lint rule to show error for removed context variables in airflow #15144
base: main
Are you sure you want to change the base?
[airflow] Add lint rule to show error for removed context variables in airflow #15144
Conversation
|
code | total | + violation | - violation | + fix | - fix |
---|---|---|---|---|---|
AIR302 | 107 | 107 | 0 | 0 | 0 |
89ed2ce
to
c3fb996
Compare
...rc/rules/airflow/snapshots/ruff_linter__rules__airflow__tests__AIR302_AIR302_context.py.snap
Outdated
Show resolved
Hide resolved
@task | ||
def print_config(**context): | ||
# This should not throw an error as logical_date is part of airflow context. | ||
logical_date = context["logical_date"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sunank200 and I discussed this earlier. What we're trying to check is whether there's a variable named as context
in a function (most commonly seen in taskflow and python operator) and whether it's can be accessed like a dict with the keys we want to check. I think it's unlikely users are using something like this out of the airflow context. But would like to know whether there's any concern
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added logic for other ways to access context value as well. It is part of tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It’s probably better to detect
- Arguments of a function decorated with
@task
(either**
or simple named arguments). (As a follow-up, any functions called by such a function) - The
execute
function of a BaseOperator subclass (As a follow-up, any functions called byexecute
) - The dict returned by
get_current_context
.
This should be better than detecting with variable name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about python_callable
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don’t think python_callable
takes the context though? It only accepts things you provide in self.op_args
and self.op_kwargs
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that it'll be useful to guard this check by first verifying that the parameter is coming from a function which is decorated with a @task
.
I think this can be done as a pre-check for context variables by using the checker.semantic().current_statements()
method to traverse up the AST to find the function definition node and checking whether the function has a @task
decorator that originates from the airflow
module.
ruff/crates/ruff_python_semantic/src/model.rs
Lines 1232 to 1239 in 9fd4eb8
/// Returns an [`Iterator`] over the current statement hierarchy, from the current [`Stmt`] | |
/// through to any parents. | |
pub fn current_statements(&self) -> impl Iterator<Item = &'a Stmt> + '_ { | |
let id = self.node_id.expect("No current node"); | |
self.nodes | |
.ancestor_ids(id) | |
.filter_map(move |id| self.nodes[id].as_statement()) | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don’t think
python_callable
takes the context though? It only accepts things you provide inself.op_args
andself.op_kwargs
.
I though we can still get it in the python_callable? https://airflow.apache.org/docs/apache-airflow/stable/howto/operator/python.html#pythonoperator
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm OK I didn’t even realise you can do that… yeah in that case it’s probably a good idea to also detect python_callable
arguments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have updated the logic for named argument and function decorated with @task
5c96f89
to
5103ef7
Compare
@task | ||
def print_config(**context): | ||
# This should not throw an error as logical_date is part of airflow context. | ||
logical_date = context["logical_date"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about python_callable
?
crates/ruff_linter/resources/test/fixtures/airflow/AIR302_context.py
Outdated
Show resolved
Hide resolved
d580a4b
to
c0a34d3
Compare
pub(crate) fn removed_context_variable(checker: &mut Checker, expr: &Expr) { | ||
const REMOVED_CONTEXT_KEYS: [&str; 12] = [ | ||
"conf", | ||
"execution_date", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For execution_date
there is actually a replacement - in the docs: https://airflow.apache.org/docs/apache-airflow/stable/templates-ref.html#deprecated-variables - can you add this?
(Same for next_execution_date, prev_execution_date)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have renamed execution_date to logical_date at places but we have removed them as well:apache/airflow#42404
add lint rule to show error for removed context variables in airflow
c0a34d3
to
72a9dd3
Compare
62de813
to
fb8b300
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have a case where a function should not raise a warning?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added two of them but i can add more.
pub(crate) fn removed_context_variable(checker: &mut Checker, expr: &Expr) { | ||
if let Expr::Subscript(ExprSubscript { value, slice, .. }) = expr { | ||
if let Expr::Name(ExprName { id, .. }) = &**value { | ||
if id.as_str() == "context" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to support code like this?
c = get_current_context()
c["execution_date"]
We should probably not hard code the variable name.
|
||
if !value | ||
.as_name_expr() | ||
.is_some_and(|name| matches!(name.id.as_str(), "context" | "kwargs")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same, I think we should just check for all names (as long as it’s **
).
Going to focus on reviewing this PR instead of #15240 for now as I think this one supersedes the other one but please correct me if I'm wrong. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for working on this. I've a couple of doubts which I've highlighted in the review comments and #15240 (comment).
let parents: Vec<_> = checker.semantic().current_statements().collect(); | ||
|
||
for stmt in parents { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should avoid allocating a vector here as we can directly iterate over the statement tree like so:
for stmt in checker.semantic().current_statements() {
// ...
}
if let Stmt::FunctionDef(function_def) = stmt { | ||
if is_decorated_with(checker, function_def) { | ||
let arguments = extract_task_function_arguments(function_def); | ||
|
||
for deprecated_arg in REMOVED_CONTEXT_KEYS { | ||
if arguments.contains(&deprecated_arg.to_string()) { | ||
checker.diagnostics.push(Diagnostic::new( | ||
Airflow3Removal { | ||
deprecated: deprecated_arg.to_string(), | ||
replacement: Replacement::None, | ||
}, | ||
expr.range(), | ||
)); | ||
return true; | ||
} | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can avoid multiple indentation levels by using let ... else ...
which improves readability like so:
let Stmt::FunctionDef(function_def) = stmt else {
continue;
};
if !is_decorated_with(checker, function_def) {
continue;
}
let arguments = extract_task_function_arguments(function_def);
for deprecated_arg in REMOVED_CONTEXT_KEYS {
if arguments.contains(&deprecated_arg.to_string()) {
checker.diagnostics.push(Diagnostic::new(
Airflow3Removal {
deprecated: deprecated_arg.to_string(),
replacement: Replacement::None,
},
expr.range(),
));
return true;
}
}
@@ -849,3 +954,55 @@ fn is_airflow_builtin_or_provider(segments: &[&str], module: &str, symbol_suffix | |||
_ => false, | |||
} | |||
} | |||
|
|||
fn is_task_context_referenced(checker: &mut Checker, expr: &Expr) -> bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you clarify what this function is suppose to do? I'm confused as to why this function is also looping over the REMOVED_CONTEXT_ARGS
and adding the diagnostics when the callee also does something similar. Additionally, the name and the return type of this function suggests that it checks for a condition but it's adding diagnostics as well.
fn is_decorated_with(checker: &mut Checker, stmt: &StmtFunctionDef) -> bool { | ||
stmt.decorator_list.iter().any(|decorator| { | ||
checker | ||
.semantic() | ||
.resolve_qualified_name(map_callable(&decorator.expression)) | ||
.is_some_and(|qualified_name| { | ||
matches!(qualified_name.segments(), ["airflow", "decorators", "task"]) | ||
}) | ||
}) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on the name of the function, did you mean to make this a generic function over any decorator originating from airflow.decorators.*
? If not, can you rename this to has_task_decorator
?
fn extract_task_function_arguments(stmt: &StmtFunctionDef) -> Vec<String> { | ||
let mut arguments = Vec::new(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we could avoid this allocation but I would like to first understand what does is_task_context_referenced
function is suppose to do.
Summary
Airflow 3.0 removes following deprecated Airflow context variables:
They have been deprecated in 2.x, but the removal causes incompatibilities that we want to detect.
related: apache/airflow#44409, apache/airflow#41641
Test Plan
A test fixture is included in the PR.