From 4f59e1b663812a47ec1906b40dc59f6ed5342e50 Mon Sep 17 00:00:00 2001 From: Hyukjin Kwon Date: Tue, 28 Nov 2023 10:46:28 +0900 Subject: [PATCH] [SPARK-46126][PYTHON][TESTS] Fix the doctest in pyspark.pandas.frame.DataFrame.to_dict (Python 3.12) ### What changes were proposed in this pull request? This PR proposes to fix doctest, `pyspark.pandas.frame.DataFrame.to_dict`, compatible with Python 3.12. ``` File "/__w/spark/spark/python/pyspark/pandas/frame.py", line 2515, in pyspark.pandas.frame.DataFrame.to_dict Failed example: df.to_dict(into=OrderedDict) Expected: OrderedDict([('col1', OrderedDict([('row1', 1), ('row2', 2)])), ('col2', OrderedDict([('row1', 0.5), ('row2', 0.75)]))]) Got: OrderedDict({'col1': OrderedDict({'row1': 1, 'row2': 2}), 'col2': OrderedDict({'row1': 0.5, 'row2': 0.75})}) ``` ### Why are the changes needed? For the proper test for Python 3.12. It is failing, see https://github.com/apache/spark/actions/runs/7006848931/job/19059702970 ### Does this PR introduce _any_ user-facing change? No. A bit of user-facing doc change but very trival. ### How was this patch tested? Fixed unittests. Manually tested via: ```bash python/run-tests --python-executable=python3 --testnames 'pyspark.pandas.frame' ... Tests passed in 721 seconds ``` ### Was this patch authored or co-authored using generative AI tooling? No. Closes #44042 from HyukjinKwon/SPARK-46126. Authored-by: Hyukjin Kwon Signed-off-by: Hyukjin Kwon --- python/pyspark/pandas/frame.py | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/python/pyspark/pandas/frame.py b/python/pyspark/pandas/frame.py index 4ecc85ce8f795..b53f5adfbaa81 100644 --- a/python/pyspark/pandas/frame.py +++ b/python/pyspark/pandas/frame.py @@ -2512,9 +2512,8 @@ def to_dict(self, orient: str = "dict", into: Type = dict) -> Union[List, Mappin You can also specify the mapping type. >>> from collections import OrderedDict, defaultdict - >>> df.to_dict(into=OrderedDict) - OrderedDict([('col1', OrderedDict([('row1', 1), ('row2', 2)])), \ -('col2', OrderedDict([('row1', 0.5), ('row2', 0.75)]))]) + >>> df.to_dict(into=OrderedDict) # doctest: +ELLIPSIS + OrderedDict(...) If you want a `defaultdict`, you need to initialize it: