Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-50753][PYTHON] Add CachedAccessor for PySpark Plotting #49394

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

xinrong-meng
Copy link
Member

@xinrong-meng xinrong-meng commented Jan 7, 2025

What changes were proposed in this pull request?

Add CachedAccessor for PySpark Plotting.

Why are the changes needed?

Previously, plot is defined as a property:

>>> from pyspark.sql import DataFrame
>>> DataFrame.plot
<property object at 0x10543fd60>

This caused an issue where Sphinx, the documentation generator, could not recognize its chaining methods (e.g., bar, barh, etc.). As a result, it failed to populate the documentation for the available plot methods.

It should reach parity with Pandas API on Spark

>>> import pyspark.pandas as ps
>>> ps.DataFrame.plot
<class 'pyspark.pandas.plot.core.PandasOnSparkPlotAccessor'>

Does this PR introduce any user-facing change?

No user-facing changes.

From

>>> from pyspark.sql import DataFrame
>>> DataFrame.plot
<property object at 0x10543fd60>

TO

>>> from pyspark.sql import DataFrame
>>> DataFrame.plot
<class 'pyspark.sql.plot.core.PySparkPlotAccessor'>

How was this patch tested?

Existing tests.

Was this patch authored or co-authored using generative AI tooling?

No.

zhengruifeng
zhengruifeng previously approved these changes Jan 7, 2025
@xinrong-meng xinrong-meng changed the title [WIP] Make DataFrame.plot a class variable [WIP] Make DataFrame.plot a class property Jan 7, 2025
@xinrong-meng xinrong-meng changed the title [WIP] Make DataFrame.plot a class property [WIP] Make PySpark plot accessor a class property Jan 7, 2025
@xinrong-meng xinrong-meng changed the title [WIP] Make PySpark plot accessor a class property [SPARK-50753][PYTHON] Make PySpark plot accessor a class property Jan 7, 2025
@xinrong-meng xinrong-meng marked this pull request as ready for review January 7, 2025 08:57
@xinrong-meng xinrong-meng changed the title [SPARK-50753][PYTHON] Make PySpark plot accessor a class property [WIP][SPARK-50753][PYTHON] Make PySpark plot accessor a class property Jan 7, 2025
@xinrong-meng xinrong-meng marked this pull request as draft January 7, 2025 08:59
@xinrong-meng xinrong-meng changed the title [WIP][SPARK-50753][PYTHON] Make PySpark plot accessor a class property [SPARK-50753][PYTHON] Add CachedAccessor for PySpark Plotting Jan 8, 2025
T = TypeVar("T")


class CachedAccessor(Generic[T]):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There will be a followup to consolidate CachedAccessors between the Pandas API on Spark and the one here

>>> df.plot(kind="line", x="category", y=["int_val", "float_val"]) # doctest: +SKIP
"""
...
plot: CachedAccessor = CachedAccessor("plot", "pyspark.sql.plot.core.PySparkPlotAccessor")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That lazy import fails doc generation

Warning, treated as error:
autodoc: failed to import class 'plot' from module 'DataFrame'; the following exception was raised:
No module named 'DataFrame'
make: *** [Makefile:35: html] Error 2

@HyukjinKwon @itholic do you happen to know how to work around this? Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants