Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GroupBy(chunked-array) #9522

Merged
merged 20 commits into from
Nov 4, 2024
Merged

GroupBy(chunked-array) #9522

merged 20 commits into from
Nov 4, 2024

Conversation

dcherian
Copy link
Contributor

@dcherian dcherian commented Sep 19, 2024

This came together quickly last night ;)

TODO:

  • decide on backwards compatibility: we used to eagerly compute dask arrays, now it errors.

cc @bradyrx

@dcherian dcherian marked this pull request as draft September 19, 2024 16:44
@dcherian
Copy link
Contributor Author

dcherian commented Sep 20, 2024

This is ready for review. It is backwards-incompatible. Previously when grouping by a dask array we would just compute it eagerly. It has been like that for a very long time, so perhaps a deprecation cycle is needed. Thoughts?

@dcherian dcherian marked this pull request as ready for review September 20, 2024 02:57
@@ -190,8 +192,8 @@ def values(self) -> range:
return range(self.size)

@property
def data(self) -> range:
return range(self.size)
def data(self) -> np.ndarray:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for typing

* main: (63 commits)
  Add close() method to DataTree and use it to clean-up open files in tests (pydata#9651)
  Change URL for pydap test (pydata#9655)
  Fix multiple grouping with missing groups (pydata#9650)
  flox: Properly propagate multiindex (pydata#9649)
  Update Datatree html repr to indicate inheritance (pydata#9633)
  Re-implement map_over_datasets using group_subtrees (pydata#9636)
  fix zarr intersphinx (pydata#9652)
  Replace black and blackdoc with ruff-format (pydata#9506)
  Fix error and missing code cell in io.rst (pydata#9641)
  Support alternative names for the root node in DataTree.from_dict (pydata#9638)
  Updates to DataTree.equals and DataTree.identical (pydata#9627)
  DOC: Clarify error message in open_dataarray (pydata#9637)
  Add zip_subtrees for paired iteration over DataTrees (pydata#9623)
  Type check datatree tests (pydata#9632)
  Add missing `memo` argument to DataTree.__deepcopy__ (pydata#9631)
  Bug fixes for DataTree indexing and aggregation (pydata#9626)
  Add inherit=False option to DataTree.copy() (pydata#9628)
  docs(groupby): mention deprecation of `squeeze` kwarg (pydata#9625)
  Migration guide for users of old datatree repo (pydata#9598)
  Reimplement Datatree typed ops (pydata#9619)
  ...
@dcherian dcherian marked this pull request as draft October 22, 2024 15:12
* main:
  Add `DataTree.persist` (pydata#9682)
  Typing annotations for arithmetic overrides (e.g., DataArray + Dataset) (pydata#9688)
  Raise `ValueError` for unmatching chunks length in `DataArray.chunk()` (pydata#9689)
  Fix inadvertent deep-copying of child data in DataTree (pydata#9684)
  new blank whatsnew (pydata#9679)
  v2024.10.0 release summary (pydata#9678)
  drop the length from `numpy`'s fixed-width string dtypes (pydata#9586)
  fixing behaviour for group parameter in `open_datatree` (pydata#9666)
  Use zarr v3 dimension_names (pydata#9669)
  fix(zarr): use inplace array.resize for zarr 2 and 3 (pydata#9673)
  implement `dask` methods on `DataTree` (pydata#9670)
  support `chunks` in `open_groups` and `open_datatree` (pydata#9660)
  Compatibility for zarr-python 3.x (pydata#9552)
  Update to_dataframe doc to match current behavior (pydata#9662)
  Reduce graph size through writing indexes directly into graph for ``map_blocks`` (pydata#9658)
@dcherian dcherian marked this pull request as ready for review October 29, 2024 23:26
@dcherian
Copy link
Contributor Author

dcherian commented Oct 29, 2024

This should be backwards compatible now, and raise nice warnings. I'd like to merge this soon, it's been around for a while...

xarray/core/groupby.py Show resolved Hide resolved
xarray/tests/test_groupby.py Outdated Show resolved Hide resolved
* main:
  Refactor out utility functions from to_zarr (pydata#9695)
  Use the same function to floatize coords in polyfit and polyval (pydata#9691)
@dcherian
Copy link
Contributor Author

dcherian commented Nov 1, 2024

I'll merge on Monday if there are no comments

@dcherian dcherian added plan to merge Final call for comments and removed needs review labels Nov 1, 2024
@dcherian dcherian merged commit a00bc91 into pydata:main Nov 4, 2024
33 of 35 checks passed
dcherian added a commit to dcherian/xarray that referenced this pull request Nov 4, 2024
* main:
  GroupBy(chunked-array) (pydata#9522)
  DOC: mention attribute peculiarities in docs/docstrings (pydata#9700)
  add pydap-server dependencies to environment.yml (pydata#9709)
@dcherian dcherian deleted the groupby-dask branch November 12, 2024 16:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow grouping by dask variables Ordered Groupby Keys
3 participants