Skip to content
This repository has been archived by the owner on Jun 14, 2024. It is now read-only.

[Gold Standard] Enable stats with spark 2.4 with a sample query #429

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

apoorvedave1
Copy link
Contributor

What is the context for this pull request?

As discussed previously, this PR enables stats on spark's query plans

What changes were proposed in this pull request?

Enable stats in native spark's gold standard test cases. The stats are picked directly from apache spark 3.0+ codebase from this file https://github.com/apache/spark/blob/master/sql/core/src/test/scala/org/apache/spark/sql/TPCDSTableStats.scala

This ensures that for joining large tables, Broadcast join is replaced by SortMergeJoin.

Does this PR introduce any user-facing change?

No

How was this patch tested?

unit test

== Physical Plan ==
CollectLimit 100
+- *(5) Project [cs_bill_customer_sk#1, ss_customer_sk#2]
+- *(5) SortMergeJoin [cs_sold_date_sk#3], [ss_sold_date_sk#4], Inner
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: SortMergeJoin is used here because stats are enabled.

@@ -554,17 +553,35 @@ trait TPCDSBase extends SparkFunSuite with SparkInvolvedSuite {
""".stripMargin)
}

private val originalCBCEnabled = conf.cboEnabled
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all changes below this line onward are directly picked from spark codebase. Refer https://github.com/apache/spark/blob/master/sql/core/src/test/scala/org/apache/spark/sql/TPCDSBase.scala#L609

Copy link
Collaborator

@sezruby sezruby left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @apoorvedave1!

))
)
// scalastyle:on line.size.limit
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: new line

Copy link
Contributor

@andrei-ionescu andrei-ionescu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🚀

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants