Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Account for .loc[Scalar, list] in DataFrame #1109

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

carschandler
Copy link

As of now, the syntax:

df.loc[0, ["a", "b"]]

Yields the following typing error:

reportArgumentType: Argument of type "tuple[Literal[1], list[str]]" cannot be assigned to parameter "idx" of type "tuple[Scalar, slice]" in function "__getitem__"
  "tuple[Literal[1], list[str]]" is not assignable to "tuple[Scalar, slice]"
    Tuple entry 2 is incorrect type
      "list[str]" is not assignable to "slice"

I need help determining what the appropriate contents of the list should be (Hashable?) so I hope this can open discussion.

  • Closes #xxxx (Replace xxxx with the Github issue number)
  • Tests added: Please use assert_type() to assert the type of any return value

As of now, the syntax:

```python
df.loc[0, ["a", "b"]]
```

Yields the following typing error:
```
reportArgumentType: Argument of type "tuple[Literal[1], list[str]]" cannot be assigned to parameter "idx" of type "tuple[Scalar, slice]" in function "__getitem__"
  "tuple[Literal[1], list[str]]" is not assignable to "tuple[Scalar, slice]"
    Tuple entry 2 is incorrect type
      "list[str]" is not assignable to "slice"
```

I need help determining what the appropriate contents of the `list` should be (`Hashable`?) so I hope this can open discussion.
Copy link
Collaborator

@Dr-Irv Dr-Irv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a test for this in tests/test_frame.py ?

Make sure to use the check(assert_type( pattern

I attempted to fix the wrong overload; really  just needed to add `int`
to the union of row indexers
@carschandler
Copy link
Author

@Dr-Irv I added tests and corrected my change. My original change was incorrect and I added what I believe the correct change is to the correct overload. It was just adding int to the possible "row" indexer types. Fixes the pyright error.

Comment on lines +3059 to +3062
def test_loc_int_row_strlist_col() -> None:
df = pd.DataFrame({"a": [1, 2], "b": [3, 4]})
check(assert_type(df.loc[0, ["a"]], pd.DataFrame), pd.DataFrame)
check(assert_type(df.loc[0, ["a", "b"]], pd.DataFrame), pd.DataFrame)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be this:

def test_loc_int_row_strlist_col() -> None:
    df = pd.DataFrame({"a": [1, 2], "b": [3, 4]})
    check(assert_type(df.loc[0, ["a"]], pd.Series), pd.Series)
    check(assert_type(df.loc[0, ["a", "b"]], pd.Series), pd.Series)

    df.index = pd.Index(["x", "y"])
    check(assert_type(df.loc["x", ["a"]], pd.Series), pd.Series)
    check(assert_type(df.loc["x", ["a", "b"]], pd.Series), pd.Series)

The results of these operations are Series not DataFrame. You also have to test a string as the first index as well.

Comment on lines +185 to +186
int
| IndexType
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the wrong place to make this change. You want it in the overload below that returns Series. You would have to add an overload in there for tuple[int | str, list[HashableT] to make it work.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right. I was testing a different case and got confused. Thanks for the redirect. Will update.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think that tuple[Scalar, list[HashableT]] would be more appropriate?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that should be fine.

Also, you should set up the pre-commit so things are properly formatted. See the output here:
https://github.com/pandas-dev/pandas-stubs/actions/runs/13202706637/job/36858930373?pr=1106

And instructions here for setting up your local testing environment:
https://github.com/pandas-dev/pandas-stubs/blob/main/docs/setup.md

then you can make sure that the changes you make don't break any existing tests

@Dr-Irv
Copy link
Collaborator

Dr-Irv commented Feb 11, 2025

Turns out that this issue appeared in some code from my team. And after investigation, you have to realize that the result that could be returned could be a Series or DataFrame, especially when there are possible duplicate indices or a MultiIndex getting involved:

>>> df = pd.DataFrame({"a":["x", "x", "y"], "b":[3,4,5], "c": [100, 200, 300]}).set_index(["a", "b"])
>>> df
       c
a b
x 3  100
  4  200
y 5  300
>>> df.loc["x", ["c"]]
     c
b
3  100
4  200
>>> dfnob = df.reset_index("b", drop=True)
>>> dfnob
     c
a
x  100
x  200
y  300
>>> dfnob.loc["x", ["c"]]
     c
a
x  100
x  200
>>> dfnob.loc["y", ["c"]]
c    300
Name: y, dtype: int64

Note how the expressions dfnob.loc["x", ["c"]] and dfnob.loc["y", ["c"]] have the same arguments, but different types of results.

You could change this overload in frame.pyi:

    @overload
    def __getitem__(self, idx: tuple[Scalar, slice]) -> Series | _T: ...

to

    @overload
    def __getitem__(self, idx: tuple[Scalar, MaskType | list[HashableT] | IndexType]) -> Series | _T: ...

to catch your use case. so it matches something, and have the tests match Series | DataFrame . You might want to include examples like the above in your tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants