-
-
Notifications
You must be signed in to change notification settings - Fork 138
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Account for .loc[Scalar, list]
in DataFrame
#1109
base: main
Are you sure you want to change the base?
Conversation
As of now, the syntax: ```python df.loc[0, ["a", "b"]] ``` Yields the following typing error: ``` reportArgumentType: Argument of type "tuple[Literal[1], list[str]]" cannot be assigned to parameter "idx" of type "tuple[Scalar, slice]" in function "__getitem__" "tuple[Literal[1], list[str]]" is not assignable to "tuple[Scalar, slice]" Tuple entry 2 is incorrect type "list[str]" is not assignable to "slice" ``` I need help determining what the appropriate contents of the `list` should be (`Hashable`?) so I hope this can open discussion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a test for this in tests/test_frame.py
?
Make sure to use the check(assert_type(
pattern
I attempted to fix the wrong overload; really just needed to add `int` to the union of row indexers
@Dr-Irv I added tests and corrected my change. My original change was incorrect and I added what I believe the correct change is to the correct overload. It was just adding |
def test_loc_int_row_strlist_col() -> None: | ||
df = pd.DataFrame({"a": [1, 2], "b": [3, 4]}) | ||
check(assert_type(df.loc[0, ["a"]], pd.DataFrame), pd.DataFrame) | ||
check(assert_type(df.loc[0, ["a", "b"]], pd.DataFrame), pd.DataFrame) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be this:
def test_loc_int_row_strlist_col() -> None:
df = pd.DataFrame({"a": [1, 2], "b": [3, 4]})
check(assert_type(df.loc[0, ["a"]], pd.Series), pd.Series)
check(assert_type(df.loc[0, ["a", "b"]], pd.Series), pd.Series)
df.index = pd.Index(["x", "y"])
check(assert_type(df.loc["x", ["a"]], pd.Series), pd.Series)
check(assert_type(df.loc["x", ["a", "b"]], pd.Series), pd.Series)
The results of these operations are Series
not DataFrame
. You also have to test a string as the first index as well.
int | ||
| IndexType |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the wrong place to make this change. You want it in the overload below that returns Series
. You would have to add an overload in there for tuple[int | str, list[HashableT]
to make it work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right. I was testing a different case and got confused. Thanks for the redirect. Will update.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think that tuple[Scalar, list[HashableT]]
would be more appropriate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that should be fine.
Also, you should set up the pre-commit so things are properly formatted. See the output here:
https://github.com/pandas-dev/pandas-stubs/actions/runs/13202706637/job/36858930373?pr=1106
And instructions here for setting up your local testing environment:
https://github.com/pandas-dev/pandas-stubs/blob/main/docs/setup.md
then you can make sure that the changes you make don't break any existing tests
Turns out that this issue appeared in some code from my team. And after investigation, you have to realize that the result that could be returned could be a >>> df = pd.DataFrame({"a":["x", "x", "y"], "b":[3,4,5], "c": [100, 200, 300]}).set_index(["a", "b"])
>>> df
c
a b
x 3 100
4 200
y 5 300
>>> df.loc["x", ["c"]]
c
b
3 100
4 200
>>> dfnob = df.reset_index("b", drop=True)
>>> dfnob
c
a
x 100
x 200
y 300
>>> dfnob.loc["x", ["c"]]
c
a
x 100
x 200
>>> dfnob.loc["y", ["c"]]
c 300
Name: y, dtype: int64 Note how the expressions You could change this overload in @overload
def __getitem__(self, idx: tuple[Scalar, slice]) -> Series | _T: ... to @overload
def __getitem__(self, idx: tuple[Scalar, MaskType | list[HashableT] | IndexType]) -> Series | _T: ... to catch your use case. so it matches something, and have the tests match |
As of now, the syntax:
Yields the following typing error:
I need help determining what the appropriate contents of the
list
should be (Hashable
?) so I hope this can open discussion.assert_type()
to assert the type of any return value