-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[airflow] Add lint rule to show error for removed context variables in airflow #15144
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
@@ -0,0 +1,124 @@ | ||||||||||||||||||
import pendulum | ||||||||||||||||||
from airflow.models import DAG | ||||||||||||||||||
from airflow.operators.dummy import DummyOperator | ||||||||||||||||||
from datetime import datetime | ||||||||||||||||||
from airflow.plugins_manager import AirflowPlugin | ||||||||||||||||||
from airflow.decorators import task, get_current_context | ||||||||||||||||||
from airflow.models.baseoperator import BaseOperator | ||||||||||||||||||
from airflow.decorators import dag, task | ||||||||||||||||||
from airflow.providers.standard.operators.python import PythonOperator | ||||||||||||||||||
|
||||||||||||||||||
|
||||||||||||||||||
def access_invalid_key_in_context(**context): | ||||||||||||||||||
print("access invalid key", context["conf"]) | ||||||||||||||||||
|
||||||||||||||||||
|
||||||||||||||||||
@task | ||||||||||||||||||
def access_invalid_key_task_out_of_dag(**context): | ||||||||||||||||||
print("access invalid key", context.get("conf")) | ||||||||||||||||||
|
||||||||||||||||||
|
||||||||||||||||||
|
||||||||||||||||||
@dag( | ||||||||||||||||||
schedule=None, | ||||||||||||||||||
start_date=pendulum.datetime(2021, 1, 1, tz="UTC"), | ||||||||||||||||||
catchup=False, | ||||||||||||||||||
tags=[""], | ||||||||||||||||||
) | ||||||||||||||||||
def invalid_dag(): | ||||||||||||||||||
@task() | ||||||||||||||||||
def access_invalid_key_task(**context): | ||||||||||||||||||
print("access invalid key", context.get("conf")) | ||||||||||||||||||
|
||||||||||||||||||
task1 = PythonOperator( | ||||||||||||||||||
task_id="task1", | ||||||||||||||||||
python_callable=access_invalid_key_in_context, | ||||||||||||||||||
) | ||||||||||||||||||
access_invalid_key_task() >> task1 | ||||||||||||||||||
access_invalid_key_task_out_of_dag() | ||||||||||||||||||
|
||||||||||||||||||
|
||||||||||||||||||
invalid_dag() | ||||||||||||||||||
|
||||||||||||||||||
@task | ||||||||||||||||||
def print_config(**context): | ||||||||||||||||||
sunank200 marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||||||||||
# This should not throw an error as logical_date is part of airflow context. | ||||||||||||||||||
logical_date = context["logical_date"] | ||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @sunank200 and I discussed this earlier. What we're trying to check is whether there's a variable named as There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I have added logic for other ways to access context value as well. It is part of tests. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It’s probably better to detect
This should be better than detecting with variable name. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What about There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don’t think There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I agree that it'll be useful to guard this check by first verifying that the parameter is coming from a function which is decorated with a I think this can be done as a pre-check for context variables by using the ruff/crates/ruff_python_semantic/src/model.rs Lines 1232 to 1239 in 9fd4eb8
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I though we can still get it in the python_callable? https://airflow.apache.org/docs/apache-airflow/stable/howto/operator/python.html#pythonoperator There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hmm OK I didn’t even realise you can do that… yeah in that case it’s probably a good idea to also detect There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I have updated the logic for named argument and function decorated with @task |
||||||||||||||||||
|
||||||||||||||||||
sunank200 marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||||||||||
# Removed usage - should trigger violations | ||||||||||||||||||
execution_date = context["execution_date"] | ||||||||||||||||||
next_ds = context["next_ds"] | ||||||||||||||||||
next_ds_nodash = context["next_ds_nodash"] | ||||||||||||||||||
next_execution_date = context["next_execution_date"] | ||||||||||||||||||
prev_ds = context["prev_ds"] | ||||||||||||||||||
prev_ds_nodash = context["prev_ds_nodash"] | ||||||||||||||||||
prev_execution_date = context["prev_execution_date"] | ||||||||||||||||||
prev_execution_date_success = context["prev_execution_date_success"] | ||||||||||||||||||
tomorrow_ds = context["tomorrow_ds"] | ||||||||||||||||||
yesterday_ds = context["yesterday_ds"] | ||||||||||||||||||
yesterday_ds_nodash = context["yesterday_ds_nodash"] | ||||||||||||||||||
|
||||||||||||||||||
with DAG( | ||||||||||||||||||
dag_id="example_dag", | ||||||||||||||||||
schedule_interval="@daily", | ||||||||||||||||||
start_date=datetime(2023, 1, 1), | ||||||||||||||||||
template_searchpath=["/templates"], | ||||||||||||||||||
) as dag: | ||||||||||||||||||
task1 = DummyOperator( | ||||||||||||||||||
task_id="task1", | ||||||||||||||||||
params={ | ||||||||||||||||||
# Removed variables in template | ||||||||||||||||||
"execution_date": "{{ execution_date }}", | ||||||||||||||||||
"next_ds": "{{ next_ds }}", | ||||||||||||||||||
"prev_ds": "{{ prev_ds }}" | ||||||||||||||||||
}, | ||||||||||||||||||
) | ||||||||||||||||||
|
||||||||||||||||||
class CustomMacrosPlugin(AirflowPlugin): | ||||||||||||||||||
name = "custom_macros" | ||||||||||||||||||
macros = { | ||||||||||||||||||
"execution_date_macro": lambda context: context["execution_date"], | ||||||||||||||||||
"next_ds_macro": lambda context: context["next_ds"] | ||||||||||||||||||
} | ||||||||||||||||||
|
||||||||||||||||||
@task | ||||||||||||||||||
def print_config(): | ||||||||||||||||||
context = get_current_context() | ||||||||||||||||||
execution_date = context["execution_date"] | ||||||||||||||||||
next_ds = context["next_ds"] | ||||||||||||||||||
next_ds_nodash = context["next_ds_nodash"] | ||||||||||||||||||
next_execution_date = context["next_execution_date"] | ||||||||||||||||||
prev_ds = context["prev_ds"] | ||||||||||||||||||
prev_ds_nodash = context["prev_ds_nodash"] | ||||||||||||||||||
prev_execution_date = context["prev_execution_date"] | ||||||||||||||||||
prev_execution_date_success = context["prev_execution_date_success"] | ||||||||||||||||||
tomorrow_ds = context["tomorrow_ds"] | ||||||||||||||||||
yesterday_ds = context["yesterday_ds"] | ||||||||||||||||||
yesterday_ds_nodash = context["yesterday_ds_nodash"] | ||||||||||||||||||
|
||||||||||||||||||
class CustomOperator(BaseOperator): | ||||||||||||||||||
def execute(self, context): | ||||||||||||||||||
execution_date = context["execution_date"] | ||||||||||||||||||
next_ds = context["next_ds"] | ||||||||||||||||||
next_ds_nodash = context["next_ds_nodash"] | ||||||||||||||||||
next_execution_date = context["next_execution_date"] | ||||||||||||||||||
prev_ds = context["prev_ds"] | ||||||||||||||||||
prev_ds_nodash = context["prev_ds_nodash"] | ||||||||||||||||||
prev_execution_date = context["prev_execution_date"] | ||||||||||||||||||
prev_execution_date_success = context["prev_execution_date_success"] | ||||||||||||||||||
tomorrow_ds = context["tomorrow_ds"] | ||||||||||||||||||
yesterday_ds = context["yesterday_ds"] | ||||||||||||||||||
yesterday_ds_nodash = context["yesterday_ds_nodash"] | ||||||||||||||||||
|
||||||||||||||||||
@task | ||||||||||||||||||
def access_invalid_argument_task_out_of_dag(execution_date, **context): | ||||||||||||||||||
print("execution date", execution_date) | ||||||||||||||||||
print("access invalid key", context.get("conf")) | ||||||||||||||||||
|
||||||||||||||||||
@task(task_id="print_the_context") | ||||||||||||||||||
def print_context(ds=None, **kwargs): | ||||||||||||||||||
"""Print the Airflow context and ds variable from the context.""" | ||||||||||||||||||
print(ds) | ||||||||||||||||||
print(kwargs.get("tomorrow_ds")) | ||||||||||||||||||
|
||||||||||||||||||
run_this = print_context() |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,17 +1,17 @@ | ||
use crate::checkers::ast::Checker; | ||
use ruff_diagnostics::{Diagnostic, Edit, Fix, FixAvailability, Violation}; | ||
use ruff_macros::{derive_message_formats, ViolationMetadata}; | ||
use ruff_python_ast::helpers::map_callable; | ||
use ruff_python_ast::{ | ||
name::QualifiedName, Arguments, Expr, ExprAttribute, ExprCall, ExprContext, ExprName, | ||
StmtClassDef, | ||
ExprStringLiteral, ExprSubscript, Stmt, StmtClassDef, StmtFunctionDef, | ||
}; | ||
use ruff_python_semantic::analyze::typing; | ||
use ruff_python_semantic::Modules; | ||
use ruff_python_semantic::ScopeKind; | ||
use ruff_text_size::Ranged; | ||
use ruff_text_size::TextRange; | ||
|
||
use crate::checkers::ast::Checker; | ||
|
||
/// ## What it does | ||
/// Checks for uses of deprecated Airflow functions and values. | ||
/// | ||
|
@@ -71,6 +71,63 @@ impl Violation for Airflow3Removal { | |
} | ||
} | ||
|
||
const REMOVED_CONTEXT_KEYS: [&str; 12] = [ | ||
"conf", | ||
"execution_date", | ||
"next_ds", | ||
"next_ds_nodash", | ||
"next_execution_date", | ||
"prev_ds", | ||
"prev_ds_nodash", | ||
"prev_execution_date", | ||
"prev_execution_date_success", | ||
"tomorrow_ds", | ||
"yesterday_ds", | ||
"yesterday_ds_nodash", | ||
]; | ||
|
||
fn extract_name_from_slice(slice: &Expr) -> Option<String> { | ||
match slice { | ||
Expr::StringLiteral(ExprStringLiteral { value, .. }) => Some(value.to_string()), | ||
_ => None, | ||
} | ||
} | ||
|
||
pub(crate) fn removed_context_variable(checker: &mut Checker, expr: &Expr) { | ||
if let Expr::Subscript(ExprSubscript { value, slice, .. }) = expr { | ||
if let Expr::Name(ExprName { id, .. }) = &**value { | ||
if id.as_str() == "context" { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is it possible to support code like this? c = get_current_context()
c["execution_date"] We should probably not hard code the variable name. |
||
if let Some(key) = extract_name_from_slice(slice) { | ||
if REMOVED_CONTEXT_KEYS.contains(&key.as_str()) { | ||
checker.diagnostics.push(Diagnostic::new( | ||
Airflow3Removal { | ||
deprecated: key, | ||
replacement: Replacement::None, | ||
}, | ||
slice.range(), | ||
)); | ||
} | ||
} | ||
} | ||
} | ||
} | ||
|
||
if let Expr::StringLiteral(ExprStringLiteral { value, .. }) = expr { | ||
let value_str = value.to_string(); | ||
for key in REMOVED_CONTEXT_KEYS { | ||
if value_str.contains(&format!("{{{{ {key} }}}}")) { | ||
checker.diagnostics.push(Diagnostic::new( | ||
Airflow3Removal { | ||
deprecated: key.to_string(), | ||
replacement: Replacement::None, | ||
}, | ||
expr.range(), | ||
)); | ||
} | ||
} | ||
} | ||
} | ||
|
||
/// AIR302 | ||
pub(crate) fn removed_in_3(checker: &mut Checker, expr: &Expr) { | ||
if !checker.semantic().seen_module(Modules::AIRFLOW) { | ||
|
@@ -87,6 +144,7 @@ pub(crate) fn removed_in_3(checker: &mut Checker, expr: &Expr) { | |
check_call_arguments(checker, &qualname, arguments); | ||
}; | ||
check_method(checker, call_expr); | ||
check_context_get(checker, call_expr); | ||
} | ||
Expr::Attribute(attribute_expr @ ExprAttribute { attr, .. }) => { | ||
check_name(checker, expr, attr.range()); | ||
|
@@ -100,6 +158,9 @@ pub(crate) fn removed_in_3(checker: &mut Checker, expr: &Expr) { | |
} | ||
} | ||
} | ||
Expr::Subscript(_) => { | ||
removed_context_variable(checker, expr); | ||
} | ||
_ => {} | ||
} | ||
} | ||
|
@@ -247,6 +308,50 @@ fn check_class_attribute(checker: &mut Checker, attribute_expr: &ExprAttribute) | |
} | ||
} | ||
|
||
/// Check whether a removed context key is access through context.get("key"). | ||
/// | ||
/// ```python | ||
/// from airflow.decorators import task | ||
/// | ||
/// | ||
/// @task | ||
/// def access_invalid_key_task_out_of_dag(**context): | ||
/// print("access invalid key", context.get("conf")) | ||
/// ``` | ||
fn check_context_get(checker: &mut Checker, call_expr: &ExprCall) { | ||
if is_task_context_referenced(checker, &call_expr.func) { | ||
return; | ||
} | ||
|
||
let Expr::Attribute(ExprAttribute { value, attr, .. }) = &*call_expr.func else { | ||
return; | ||
}; | ||
|
||
if !value | ||
.as_name_expr() | ||
.is_some_and(|name| matches!(name.id.as_str(), "context" | "kwargs")) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Same, I think we should just check for all names (as long as it’s |
||
{ | ||
return; | ||
} | ||
|
||
if attr.as_str() != "get" { | ||
return; | ||
} | ||
|
||
for removed_key in REMOVED_CONTEXT_KEYS { | ||
if let Some(argument) = call_expr.arguments.find_argument_value(removed_key, 0) { | ||
checker.diagnostics.push(Diagnostic::new( | ||
Airflow3Removal { | ||
deprecated: removed_key.to_string(), | ||
replacement: Replacement::None, | ||
}, | ||
argument.range(), | ||
)); | ||
return; | ||
} | ||
} | ||
} | ||
|
||
/// Check whether a removed Airflow class method is called. | ||
/// | ||
/// For example: | ||
|
@@ -849,3 +954,55 @@ fn is_airflow_builtin_or_provider(segments: &[&str], module: &str, symbol_suffix | |
_ => false, | ||
} | ||
} | ||
|
||
fn is_task_context_referenced(checker: &mut Checker, expr: &Expr) -> bool { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you clarify what this function is suppose to do? I'm confused as to why this function is also looping over the |
||
let parents: Vec<_> = checker.semantic().current_statements().collect(); | ||
|
||
for stmt in parents { | ||
Comment on lines
+959
to
+961
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We should avoid allocating a vector here as we can directly iterate over the statement tree like so: for stmt in checker.semantic().current_statements() {
// ...
} |
||
if let Stmt::FunctionDef(function_def) = stmt { | ||
if is_decorated_with(checker, function_def) { | ||
let arguments = extract_task_function_arguments(function_def); | ||
|
||
for deprecated_arg in REMOVED_CONTEXT_KEYS { | ||
if arguments.contains(&deprecated_arg.to_string()) { | ||
checker.diagnostics.push(Diagnostic::new( | ||
Airflow3Removal { | ||
deprecated: deprecated_arg.to_string(), | ||
replacement: Replacement::None, | ||
}, | ||
expr.range(), | ||
)); | ||
return true; | ||
} | ||
} | ||
} | ||
} | ||
Comment on lines
+962
to
+979
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We can avoid multiple indentation levels by using let Stmt::FunctionDef(function_def) = stmt else {
continue;
};
if !is_decorated_with(checker, function_def) {
continue;
}
let arguments = extract_task_function_arguments(function_def);
for deprecated_arg in REMOVED_CONTEXT_KEYS {
if arguments.contains(&deprecated_arg.to_string()) {
checker.diagnostics.push(Diagnostic::new(
Airflow3Removal {
deprecated: deprecated_arg.to_string(),
replacement: Replacement::None,
},
expr.range(),
));
return true;
}
} |
||
} | ||
|
||
false | ||
} | ||
|
||
fn extract_task_function_arguments(stmt: &StmtFunctionDef) -> Vec<String> { | ||
let mut arguments = Vec::new(); | ||
Comment on lines
+985
to
+986
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think we could avoid this allocation but I would like to first understand what does |
||
|
||
for param in &stmt.parameters.args { | ||
arguments.push(param.parameter.name.to_string()); | ||
} | ||
|
||
if let Some(vararg) = &stmt.parameters.kwarg { | ||
arguments.push(format!("**{}", vararg.name)); | ||
} | ||
|
||
arguments | ||
} | ||
|
||
fn is_decorated_with(checker: &mut Checker, stmt: &StmtFunctionDef) -> bool { | ||
stmt.decorator_list.iter().any(|decorator| { | ||
checker | ||
.semantic() | ||
.resolve_qualified_name(map_callable(&decorator.expression)) | ||
.is_some_and(|qualified_name| { | ||
matches!(qualified_name.segments(), ["airflow", "decorators", "task"]) | ||
}) | ||
}) | ||
} | ||
Comment on lines
+999
to
+1008
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Based on the name of the function, did you mean to make this a generic function over any decorator originating from |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have a case where a function should not raise a warning?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added two of them but i can add more.