-
Notifications
You must be signed in to change notification settings - Fork 124
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(typing): Resolve all mypy
& pyright
errors for _arrow
#2007
base: main
Are you sure you want to change the base?
Conversation
Will close #1961 Still have ~50 errors for `mypy` to review
@MarcoGorelli Almost out of the rabbit hole on this! I've found some more places where (#1657 (comment)) would be pretty helpful: narwhals/narwhals/_arrow/series.py Lines 126 to 160 in 1d24dec
narwhals/narwhals/_arrow/series.py Lines 549 to 552 in 1d24dec
Essentially anywhere that So for My brain has fully melted working on this, hope the above made sense π« |
thanks for working on this is it necessary to make |
@MarcoGorelli 100% needed to resolve the issues I'm afraid π Without the Having read through a lot of the code (but not using It seems to be available in our min version (https://arrow.apache.org/docs/11.0/python/generated/pyarrow.dataset.Expression.html) For some context of the kinds of errors, (#1961 (comment)) |
Causing issues in CI, but not locally? https://github.com/narwhals-dev/narwhals/actions/runs/13313655502/job/37182229033?pr=2007
Started at like 150-300 errors Down to 32 errors π!!!!!!!!!!! Update 1Now 23 errors (https://github.com/narwhals-dev/narwhals/actions/runs/13315369207/job/37187931923?pr=2007) Update 221 errors (https://github.com/narwhals-dev/narwhals/actions/runs/13315749955/job/37189182650?pr=2007) Update 3Now with 17 errors (https://github.com/narwhals-dev/narwhals/actions/runs/13317234868/job/37194138357?pr=2007) Update 4Down to 10 errors (https://github.com/narwhals-dev/narwhals/actions/runs/13328559038/job/37227189197?pr=2007) All remaining errors are in Update 5Only 1 error left (https://github.com/narwhals-dev/narwhals/actions/runs/13328999125/job/37228573119?pr=2007) Update 6
|
β¦) [attr-defined] `__iter__` doesn't seem to be defined in the docs or stubs https://arrow.apache.org/docs/python/generated/pyarrow.Table.html
- The stubs are overly strict for `Table.from_arrays` - It can accept `Sequence[Array[T] | ChunkedArray[T]]`
β¦s" cannot be "object" [type-var] - Had 2 of these - Used a different path that preserved the generic type
Incomplete: TypeAlias = Any # pragma: no cover | ||
""" | ||
Marker for working code that fails on the stubs. | ||
|
||
Common issues: | ||
- Annotated for `Array`, but not `ChunkedArray` | ||
- Relies on typing information that the stubs don't provide statically | ||
- Missing attributes | ||
- Incorrect return types | ||
- Inconsistent use of generic/concrete types | ||
- `_clone_signature` used on signatures that are not identical | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note
See for more context https://github.com/python/typeshed/blob/a410f251ee535950eb4440ad41ae4c00c87e6a67/stdlib/_typeshed/__init__.pyi#L45-L50
I've been sprinkling these in with a comment when all else fails, e.g:
narwhals/narwhals/_arrow/series.py
Lines 486 to 490 in 1b537e7
def diff(self: ArrowSeries[_NumericOrTemporalT]) -> ArrowSeries[_NumericOrTemporalT]: | |
# NOTE: stub only permits `ChunkedArray[TemporalScalar]` | |
# (https://github.com/zen-xu/pyarrow-stubs/blob/d97063876720e6a5edda7eb15f4efe07c31b8296/pyarrow-stubs/compute.pyi#L145-L148) | |
diff: Incomplete = pc.pairwise_diff | |
return self._from_native_series(diff(self._native_series.combine_chunks())) |
If the stub issues get resolved in the future, this will be a lot easier to fix than just using Any
directly
Want to start a thread, but havent changed the code here yet
`pyright` doesn't need this, `mypy` infers this as `str` - which is too wide > narwhals/_arrow/namespace.py:372: error: No overload variant of "binary_join_element_wise" matches argument types "Generator[ChunkedArray[StringScalar], None, None]", "str" [call-overload]> narwhals/_arrow/namespace.py:372: note: Possible overload variants:
# NOTE: stubs leave unannotated | ||
if_else: Incomplete = pc.if_else |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note
stubs introduce an Unknown
here
(https://github.com/zen-xu/pyarrow-stubs/blob/d97063876720e6a5edda7eb15f4efe07c31b8296/pyarrow-stubs/compute.pyi#L1829)
# empty bin intervals should have a 0 count | ||
counts_coalesce = cast( | ||
"pa.Array[Any]", | ||
pc.coalesce(cast("pa.Array[Any]", counts.column("counts")), lit(0)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def _hist_from_bin_count( | ||
bin_count: int, | ||
) -> tuple[Sequence[int], Sequence[int | float], Sequence[int | float]]: | ||
def _hist_from_bin_count(bin_count: int): # type: ignore[no-untyped-def] # noqa: ANN202 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The whole of ArrowSeries.hist
is too complex to bother with typing nested inner-method return types.
Falling back to inference is fine.
Kinda concerned though that bin_left
is returned and never used?
narwhals/narwhals/_arrow/series.py
Lines 1214 to 1224 in 3bdf0e8
if bins is not None: | |
if len(bins) < 2: | |
counts, bin_left, bin_right = [], [], [] | |
else: | |
counts, bin_left, bin_right = _hist_from_bins(bins) | |
elif bin_count is not None: | |
if bin_count == 0: | |
counts, bin_left, bin_right = [], [], [] | |
else: | |
counts, bin_left, bin_right = _hist_from_bin_count(bin_count) |
- Initially was just trying to clean up `__invert__` - Then `pyright` pointed me towards this route of annotating type-transforming methods
mypy
& pyright
errors for _arrow
mypy
& pyright
errors for _arrow
@MarcoGorelli had a bit of an interesting development with this. I haven't gone too deep into that, but have made a start with:
|
narwhals/_arrow/utils.py
Outdated
if TYPE_CHECKING: | ||
return pa.repeat(None, n).cast(series._type) | ||
return pa.nulls(n, series._type) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same comment, can we type: ignore
and report upstream if there's a stubs error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MarcoGorelli would you be okay with (adef6a0)?
I prefer that, since we can easily find all refs like:
I would've added as a suggestion, but the import was outside the range
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some more context in (#2007 (comment))
Part of #2007 (comment) I'm expecting this to report in CI if not available in some version The previous fix was to resolve `pd.Series` not annotated as accepting `pa.ChunkedArray`
@property | ||
def _type(self: ArrowSeries[pa.Scalar[DataTypeT_co]]) -> DataTypeT_co: | ||
if TYPE_CHECKING: | ||
return self._native_series[0].type | ||
return self._native_series.type |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think there's a way to get this working without the TYPE_CHECKING
block.
I'm using this in a few places to resolve ChunkedArray
erasing its type
property:
def type(self) -> DataType: ...
The type is preserved for Array
, so here I'm stealing the generic from that type
property:
def type(self: Array[Scalar[_DataType_CoT]]) -> _DataType_CoT: ...
def maybe_extract_py_scalar(value: Any, return_py_scalar: bool) -> Any: # noqa: FBT001 | ||
if TYPE_CHECKING: | ||
return value.as_py() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's some similarities between this one and (#2007 (comment))
Part of this is recreating a subset of the @overload
(s) in
https://github.com/zen-xu/pyarrow-stubs/blob/d97063876720e6a5edda7eb15f4efe07c31b8296/pyarrow-stubs/__lib_pxi/scalar.pyi#L62-L105
But I'm also needing to lie, since .as_py()
isn't available in all versions.
For a lot of the cases where maybe_extract_py_scalar
is used - this avoids needing to do a [no-any-return]
- since we have pa.Scalar[_BasicDataType[_AsPyType]]
provided by #2007 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
first, thanks a tonne for your efforts, really appreciate it
second, I think this might be doing too much - mainly i'm concerned about there being both logic changes (e.g. to_pandas
) and typing changes. can we keep them to separate PRs? i'm concerned about missing things with too large PRs
@MarcoGorelli the ignore(s) will show up for `mypy` after #2008 (I assume) #2007 (review)
What type of PR is this? (check all applicable)
Related issues
mypy
passing withpyarrow-stubs
installedΒ #1961Checklist
If you have comments or can explain your changes, please do so below
Planning to finish getting the, thenmypy
errors to 0tidying things up