Account for `.loc[Scalar, list]` in DataFrame #1109

carschandler · 2025-02-07T17:35:52Z

As of now, the syntax:

df.loc[0, ["a", "b"]]

Yields the following typing error:

reportArgumentType: Argument of type "tuple[Literal[1], list[str]]" cannot be assigned to parameter "idx" of type "tuple[Scalar, slice]" in function "__getitem__"
  "tuple[Literal[1], list[str]]" is not assignable to "tuple[Scalar, slice]"
    Tuple entry 2 is incorrect type
      "list[str]" is not assignable to "slice"

I need help determining what the appropriate contents of the list should be (Hashable?) so I hope this can open discussion.

Closes #xxxx (Replace xxxx with the Github issue number)
Tests added: Please use assert_type() to assert the type of any return value

As of now, the syntax: ```python df.loc[0, ["a", "b"]] ``` Yields the following typing error: ``` reportArgumentType: Argument of type "tuple[Literal[1], list[str]]" cannot be assigned to parameter "idx" of type "tuple[Scalar, slice]" in function "__getitem__" "tuple[Literal[1], list[str]]" is not assignable to "tuple[Scalar, slice]" Tuple entry 2 is incorrect type "list[str]" is not assignable to "slice" ``` I need help determining what the appropriate contents of the `list` should be (`Hashable`?) so I hope this can open discussion.

Dr-Irv

Can you add a test for this in tests/test_frame.py ?

Make sure to use the check(assert_type( pattern

I attempted to fix the wrong overload; really just needed to add `int` to the union of row indexers

carschandler · 2025-02-10T16:32:39Z

@Dr-Irv I added tests and corrected my change. My original change was incorrect and I added what I believe the correct change is to the correct overload. It was just adding int to the possible "row" indexer types. Fixes the pyright error.

Dr-Irv · 2025-02-10T16:40:05Z

tests/test_frame.py

+def test_loc_int_row_strlist_col() -> None:
+    df = pd.DataFrame({"a": [1, 2], "b": [3, 4]})
+    check(assert_type(df.loc[0, ["a"]], pd.DataFrame), pd.DataFrame)
+    check(assert_type(df.loc[0, ["a", "b"]], pd.DataFrame), pd.DataFrame)


Should be this:

def test_loc_int_row_strlist_col() -> None: df = pd.DataFrame({"a": [1, 2], "b": [3, 4]}) check(assert_type(df.loc[0, ["a"]], pd.Series), pd.Series) check(assert_type(df.loc[0, ["a", "b"]], pd.Series), pd.Series) df.index = pd.Index(["x", "y"]) check(assert_type(df.loc["x", ["a"]], pd.Series), pd.Series) check(assert_type(df.loc["x", ["a", "b"]], pd.Series), pd.Series)

The results of these operations are Series not DataFrame. You also have to test a string as the first index as well.

Dr-Irv · 2025-02-10T16:42:30Z

pandas-stubs/core/frame.pyi

+                int
+                | IndexType


This is the wrong place to make this change. You want it in the overload below that returns Series. You would have to add an overload in there for tuple[int | str, list[HashableT] to make it work.

You're right. I was testing a different case and got confused. Thanks for the redirect. Will update.

Do you think that tuple[Scalar, list[HashableT]] would be more appropriate?

Yes, that should be fine.

Also, you should set up the pre-commit so things are properly formatted. See the output here:
https://github.com/pandas-dev/pandas-stubs/actions/runs/13202706637/job/36858930373?pr=1106

And instructions here for setting up your local testing environment:
https://github.com/pandas-dev/pandas-stubs/blob/main/docs/setup.md

then you can make sure that the changes you make don't break any existing tests

Dr-Irv · 2025-02-11T20:28:48Z

Turns out that this issue appeared in some code from my team. And after investigation, you have to realize that the result that could be returned could be a Series or DataFrame, especially when there are possible duplicate indices or a MultiIndex getting involved:

>>> df = pd.DataFrame({"a":["x", "x", "y"], "b":[3,4,5], "c": [100, 200, 300]}).set_index(["a", "b"])
>>> df
       c
a b
x 3  100
  4  200
y 5  300
>>> df.loc["x", ["c"]]
     c
b
3  100
4  200
>>> dfnob = df.reset_index("b", drop=True)
>>> dfnob
     c
a
x  100
x  200
y  300
>>> dfnob.loc["x", ["c"]]
     c
a
x  100
x  200
>>> dfnob.loc["y", ["c"]]
c    300
Name: y, dtype: int64

Note how the expressions dfnob.loc["x", ["c"]] and dfnob.loc["y", ["c"]] have the same arguments, but different types of results.

You could change this overload in frame.pyi:

    @overload
    def __getitem__(self, idx: tuple[Scalar, slice]) -> Series | _T: ...

to

    @overload
    def __getitem__(self, idx: tuple[Scalar, MaskType | list[HashableT] | IndexType]) -> Series | _T: ...

to catch your use case. so it matches something, and have the tests match Series | DataFrame . You might want to include examples like the above in your tests.

Dr-Irv requested changes Feb 7, 2025

View reviewed changes

Correct fix of df.loc[int, [list, str]]

0809031

I attempted to fix the wrong overload; really just needed to add `int` to the union of row indexers

Dr-Irv requested changes Feb 10, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Account for `.loc[Scalar, list]` in DataFrame #1109

Account for `.loc[Scalar, list]` in DataFrame #1109

carschandler commented Feb 7, 2025

Dr-Irv left a comment

carschandler commented Feb 10, 2025

Dr-Irv Feb 10, 2025

Dr-Irv Feb 10, 2025

carschandler Feb 10, 2025

carschandler Feb 10, 2025

Dr-Irv Feb 10, 2025

Dr-Irv commented Feb 11, 2025

Account for .loc[Scalar, list] in DataFrame #1109

Are you sure you want to change the base?

Account for .loc[Scalar, list] in DataFrame #1109

Conversation

carschandler commented Feb 7, 2025

Dr-Irv left a comment

Choose a reason for hiding this comment

carschandler commented Feb 10, 2025

Dr-Irv Feb 10, 2025

Choose a reason for hiding this comment

Dr-Irv Feb 10, 2025

Choose a reason for hiding this comment

carschandler Feb 10, 2025

Choose a reason for hiding this comment

carschandler Feb 10, 2025

Choose a reason for hiding this comment

Dr-Irv Feb 10, 2025

Choose a reason for hiding this comment

Dr-Irv commented Feb 11, 2025

Account for `.loc[Scalar, list]` in DataFrame #1109

Account for `.loc[Scalar, list]` in DataFrame #1109