-
Notifications
You must be signed in to change notification settings - Fork 64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bump pandas from 1.5.3 to the latest stable version #422
Changes from 78 commits
fb4f0d6
06076d9
3ef1e2c
da14ee5
c68b7f8
0084cce
12bacb7
77ebf1c
45e793a
c4455e0
7135507
d2203be
c55639d
b1f2319
77c56a7
25905b6
feeb0b5
1162608
dce7c74
cd5ff41
87411e3
8665e59
a02af31
101ead9
7b1d24d
25cf1b5
8bfe735
f7746cb
cf9363a
ce6554c
ff469e2
c58cc67
8bf6835
6485228
dc4a5bf
612f262
83c0a24
4f2d71a
563488c
1589de9
36570c1
2cc883d
b6f3433
5bfb86f
f3337bf
41f7b17
77e4326
78e530d
ebea62e
c073e96
fdaf8df
427a4aa
20ec7e4
f4aa0df
707eb1f
3bf0816
3230f1a
6b4f3a7
e53a0e3
deecdbc
a386bf4
8bf7df9
11aecc5
3f1834d
cc0d810
3bdfb6a
87ec2ff
d45da1a
49198f1
73bc70c
e9ec7fc
c9be924
b20c31c
c26e5e1
7522433
93f491a
c4ca7ee
be44c48
1c9159c
d5f365c
6d1dd9e
4fc4a88
933498b
cfa0e3d
1d4691b
240a902
209a646
0efc20d
2e516f2
5129800
9f67ddb
e77e8c5
df05824
ad535f1
02b7fa2
c30064c
31b12f6
3bfa8b7
55faaf6
1ec27c2
88d38cd
d959396
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -55,14 +55,33 @@ | |
|
||
|
||
def build_pd_series( | ||
data: Dict[str, Any], dtype: Optional["DTypeLike"] = None, **kwargs: Any | ||
data: Dict[str, Any], | ||
dtype: Optional["DTypeLike"] = None, | ||
index_name: Optional[str] = None, | ||
**kwargs: Any, | ||
) -> pd.Series: | ||
"""Builds a pd.Series while squelching the warning | ||
for unspecified dtype on empty series | ||
""" | ||
Builds a pandas Series from a dictionary, optionally setting an index name. | ||
|
||
Parameters: | ||
data : Dict[str, Any] | ||
The data to build the Series from, with keys as the index. | ||
dtype : Optional[DTypeLike] | ||
The desired data type of the Series. If not specified, uses EMPTY_SERIES_DTYPE if data is empty. | ||
index_name : Optional[str] | ||
Name to assign to the Series index, similar to `index_name` in `value_counts`. | ||
|
||
Returns: | ||
pd.Series | ||
A pandas Series constructed from the given data, with the specified dtype and index name. | ||
""" | ||
|
||
dtype = dtype or (EMPTY_SERIES_DTYPE if not data else dtype) | ||
if dtype is not None: | ||
kwargs["dtype"] = dtype | ||
if index_name is not None: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can we add a comment why do we need this? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. done |
||
index = pd.Index(data.keys(), name=index_name) | ||
kwargs["index"] = index | ||
return pd.Series(data, **kwargs) | ||
|
||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -47,14 +47,17 @@ | |
from opensearch_py_ml.groupby import DataFrameGroupBy | ||
from opensearch_py_ml.ndframe import NDFrame | ||
from opensearch_py_ml.series import Series | ||
from opensearch_py_ml.utils import is_valid_attr_name | ||
from opensearch_py_ml.utils import is_valid_attr_name, to_list_if_needed | ||
|
||
if TYPE_CHECKING: | ||
from opensearchpy import OpenSearch | ||
|
||
from .query_compiler import QueryCompiler | ||
|
||
|
||
PANDAS_MAJOR_VERSION = int(pd.__version__.split(".")[0]) | ||
|
||
|
||
class DataFrame(NDFrame): | ||
""" | ||
Two-dimensional size-mutable, potentially heterogeneous tabular data structure with labeled axes | ||
|
@@ -424,9 +427,14 @@ def drop( | |
axis = pd.DataFrame._get_axis_name(axis) | ||
axes = {axis: labels} | ||
elif index is not None or columns is not None: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Kind of confused here the parent branch is checking that if one of them is not None but inside its checking again There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Fixed |
||
axes, _ = pd.DataFrame()._construct_axes_from_arguments( | ||
(index, columns), {} | ||
) | ||
axes = { | ||
"index": to_list_if_needed(index), | ||
"columns": ( | ||
pd.Index(to_list_if_needed(columns)) | ||
if columns is not None | ||
else None | ||
Comment on lines
+424
to
+426
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I noticed that in the implementation of opensearch_py_ml.utils that when the value is None that you would return None already maybe we dont need a ternary operation here since its already doing that? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hey Brian, We can’t remove the ternary operation at this point because calling |
||
), | ||
} | ||
else: | ||
raise ValueError( | ||
"Need to specify at least one of 'labels', 'index' or 'columns'" | ||
|
@@ -440,7 +448,7 @@ def drop( | |
axes["index"] = [axes["index"]] | ||
if errors == "raise": | ||
# Check if axes['index'] values exists in index | ||
count = self._query_compiler._index_matches_count(axes["index"]) | ||
count = self._query_compiler._index_matches_count(list(axes["index"])) | ||
if count != len(axes["index"]): | ||
raise ValueError( | ||
f"number of labels {count}!={len(axes['index'])} not contained in axis" | ||
|
@@ -1341,6 +1349,10 @@ def to_csv( | |
-------- | ||
:pandas_api_docs:`pandas.DataFrame.to_csv` | ||
""" | ||
if PANDAS_MAJOR_VERSION < 2: | ||
line_terminator_keyword = "line_terminator" | ||
else: | ||
line_terminator_keyword = "lineterminator" | ||
kwargs = { | ||
"path_or_buf": path_or_buf, | ||
"sep": sep, | ||
|
@@ -1355,7 +1367,7 @@ def to_csv( | |
"compression": compression, | ||
"quoting": quoting, | ||
"quotechar": quotechar, | ||
"line_terminator": line_terminator, | ||
line_terminator_keyword: line_terminator, | ||
"chunksize": chunksize, | ||
"date_format": date_format, | ||
"doublequote": doublequote, | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious what are the issues with these versions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In pandas 2.1.0, there is a lot of failures because it introduced changes that were later reverted in subsequent versions.
However, I think we can fix failures for 2.0.0 and 2.0.1 versions. They have the same doctest failure:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we can fix failures for 2.0.0 and 2.0.1 versions, that would be great!!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
got it, already working on that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome, thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ready
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great. Thanks for your contribution. Approved.