Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-49025] Make Column implementation agnostic #3913

Open
wants to merge 17 commits into
base: master
Choose a base branch
from

Conversation

hvanhovell
Copy link
Contributor

Which Delta project/connector is this regarding?

  • Spark
  • Standalone
  • Flink
  • Kernel
  • Other (fill in here)

Description

This PR ports the changes made in SPARK-49025 to Delta.

How was this patch tested?

Existing tests.

Does this PR introduce any user-facing changes?

No.

@hvanhovell
Copy link
Contributor Author

Waiting for apache/spark#49038 to get merged.

@hvanhovell hvanhovell marked this pull request as ready for review December 3, 2024 21:05
@hvanhovell
Copy link
Contributor Author

I am not sure why DSM is failing on expr/expression not found. The latest spark master should have the functionality I am using. I will fix the other DSM issues in a follow-up (this is tracked in https://issues.apache.org/jira/browse/SPARK-49709).

@allisonport-db
Copy link
Collaborator

Where do you see "expr/expression not found"? Filename maybe? I don't see it anywhere

@hvanhovell
Copy link
Contributor Author

@allisonport-db the error is gone now.

@allisonport-db
Copy link
Collaborator

I see 1 column compilation error in the latest run still

[error] /home/runner/work/delta/delta/spark/src/main/scala/org/apache/spark/sql/delta/stats/DeletedRecordCountsHistogramUtils.scala:156:64: value expr is not a member of org.apache.spark.sql.Column
[error]       Column(DeletedRecordCountsHistogramAgg(dvCardinalityExpr.expr).toAggregateExpression())
[error]                                                                ^

Comment on lines -167 to +169
val ifSourceRowNull = col(SOURCE_ROW_PRESENT_COL).isNull.expr
val ifTargetRowNull = col(TARGET_ROW_PRESENT_COL).isNull.expr
val ifSourceRowNull = expression(col(SOURCE_ROW_PRESENT_COL).isNull)
val ifTargetRowNull = expression(col(TARGET_ROW_PRESENT_COL).isNull)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we make this change sometimes? (using expression(..))

Other times it seems we use the implicit conversion to RichColumn in ClassicColumnConversions?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason we need expression(..) at all? And cannot just leave all these instances

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

expression(...) is much cheaper, and does not rely on the SparkSession.active thread local. The downside is that it can only be used in scenarios where you know the Expression is used to build a Column.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha. Could you maybe add this in docs to ColumnConversionShims? It would be good to have this info somewhere since I don't think it'll be obvious to most developers

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants