-
Notifications
You must be signed in to change notification settings - Fork 28.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-50694][SQL] Support withColumns / withColumnsRenamed in subqueries #49386
base: master
Are you sure you want to change the base?
Conversation
cc @cloud-fan too |
newChild: LogicalPlan): UnresolvedWithColumns = copy(child = newChild) | ||
} | ||
|
||
case class UnresolvedWithColumnsRenamed( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need new logical plans. See how we implement a similar feature for SQL pipe: 68be1da
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can reuse Project
with UnresolvedStarExceptOrReplace
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, df.withColumn
can append or replace columns, we should probably extend UnresolvedStarExceptOrReplace
for this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @dtenedor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There seem to be some semantic differences even with withColumnsRenamed
:
- If the specified column names are missing in the
df
,df.withColumnsRenamed
ignores whereasUnresolvedStarExceptOrReplace
throws an exception. df.withColumnsRenamed
respects the argument order, e.g.,whereastest("SPARK-46260: withColumnsRenamed should respect the Map ordering") { val df = spark.range(10).toDF() assert(df.withColumnsRenamed(ListMap("id" -> "a", "a" -> "b")).columns === Array("b")) assert(df.withColumnsRenamed(ListMap("a" -> "b", "id" -> "a")).columns === Array("a")) }
UnresolvedStarExceptOrReplace
throws an exception.
I guess we should keep the new plans to be safer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. One question is shall we add new logical plans or new star-like expressions? star-like expressions are more flexible as they work for function input as well (e.g. struct(*)
). And we might be able to unify them with the SQL pipe one with extra flags.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved to the star-like expressions.
What changes were proposed in this pull request?
Supports
withColumns
/withColumnsRenamed
in subqueries.Why are the changes needed?
When the query is used as a subquery by adding
col.outer()
,withColumns
orwithColumnsRenamed
doesn't work because they need analyzed plans.Does this PR introduce any user-facing change?
Yes, those APIs are available in subqueries.
How was this patch tested?
Added the related tests.
Was this patch authored or co-authored using generative AI tooling?
No.