-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Children proxy interface #9477
base: main
Are you sure you want to change the base?
Children proxy interface #9477
Changes from 9 commits
320a8a2
1383211
e61112f
2c1b13f
03c56ea
0ca7f67
860764d
51c198e
e0358aa
08f83f8
7f87d08
1dace08
d6de4a1
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||||||
---|---|---|---|---|---|---|---|---|---|---|
|
@@ -5,11 +5,12 @@ | |||||||||
from pathlib import PurePosixPath | ||||||||||
from typing import ( | ||||||||||
TYPE_CHECKING, | ||||||||||
Any, | ||||||||||
Generic, | ||||||||||
TypeVar, | ||||||||||
) | ||||||||||
|
||||||||||
from xarray.core.utils import Frozen, is_dict_like | ||||||||||
from xarray.core.utils import is_dict_like | ||||||||||
|
||||||||||
if TYPE_CHECKING: | ||||||||||
from xarray.core.types import T_DataArray | ||||||||||
|
@@ -44,6 +45,76 @@ def __init__(self, *pathsegments): | |||||||||
Tree = TypeVar("Tree", bound="TreeNode") | ||||||||||
|
||||||||||
|
||||||||||
class Children(Mapping[str, Tree], Generic[Tree]): | ||||||||||
""" | ||||||||||
Dictionary-like container for the immediate children of a single DataTree node. | ||||||||||
|
||||||||||
This collection can be passed directly to the :py:class:`~xarray.DataTree` constructor via its `children` argument. | ||||||||||
""" | ||||||||||
|
||||||||||
_treenode: Tree | ||||||||||
|
||||||||||
# TODO add slots? | ||||||||||
# __slots__ = ("_data",) | ||||||||||
|
||||||||||
def __init__(self, treenode: Tree): | ||||||||||
self._treenode = treenode | ||||||||||
|
||||||||||
@property | ||||||||||
def _names(self) -> list[str]: | ||||||||||
return list(self._treenode._children.keys()) | ||||||||||
Comment on lines
+64
to
+65
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why not directly use the set-like DictKeys object?
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I can probably use There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I was thinking of |
||||||||||
|
||||||||||
def __iter__(self) -> Iterator[str]: | ||||||||||
return iter(self._names) | ||||||||||
|
||||||||||
def __len__(self) -> int: | ||||||||||
return len(self._names) | ||||||||||
|
||||||||||
def __contains__(self, key: str) -> bool: | ||||||||||
return key in self._names | ||||||||||
|
||||||||||
def __repr__(self) -> str: | ||||||||||
# TODO | ||||||||||
from xarray.core import formatting | ||||||||||
|
||||||||||
return formatting.children_repr(self) | ||||||||||
|
||||||||||
def __getitem__(self, key: str) -> Tree: | ||||||||||
return self._treenode._children[key] | ||||||||||
|
||||||||||
def __delitem__(self, key: str) -> None: | ||||||||||
if key in self._names: | ||||||||||
child = self._treenode._children[key] | ||||||||||
del self._treenode._children[key] | ||||||||||
child.orphan() | ||||||||||
else: | ||||||||||
raise KeyError(key) | ||||||||||
|
||||||||||
def __setitem__(self, key: str, value: Any) -> None: | ||||||||||
self.update({key: value}) | ||||||||||
|
||||||||||
def update(self, other: Mapping[str, Tree]) -> None: | ||||||||||
"""Update with other child nodes.""" | ||||||||||
|
||||||||||
# TODO forbid strings with `/` in here? | ||||||||||
|
||||||||||
if not len(other): | ||||||||||
return | ||||||||||
Comment on lines
+98
to
+99
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's slightly more canonical Python to avoid the call to
Suggested change
|
||||||||||
|
||||||||||
children = self._treenode._children.copy() | ||||||||||
children.update(other) | ||||||||||
self._treenode.children = children | ||||||||||
Comment on lines
+101
to
+103
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Just to verify, this is a no-op if children have incompatible indexes, right? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The Footnotes
|
||||||||||
|
||||||||||
def _ipython_key_completions_(self): | ||||||||||
"""Provide method for the key-autocompletions in IPython.""" | ||||||||||
# TODO | ||||||||||
return [ | ||||||||||
key | ||||||||||
for key in self._treenode._ipython_key_completions_() | ||||||||||
if key not in self._treenode.variables | ||||||||||
] | ||||||||||
|
||||||||||
|
||||||||||
class TreeNode(Generic[Tree]): | ||||||||||
""" | ||||||||||
Base class representing a node of a tree, with methods for traversing and altering the tree. | ||||||||||
|
@@ -160,9 +231,9 @@ def orphan(self) -> None: | |||||||||
self._set_parent(new_parent=None) | ||||||||||
|
||||||||||
@property | ||||||||||
def children(self: Tree) -> Mapping[str, Tree]: | ||||||||||
def children(self: Tree) -> Children[str, Tree]: | ||||||||||
"""Child nodes of this node, stored under a mapping via their names.""" | ||||||||||
return Frozen(self._children) | ||||||||||
return Children(self) | ||||||||||
|
||||||||||
@children.setter | ||||||||||
def children(self: Tree, children: Mapping[str, Tree]) -> None: | ||||||||||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -6,7 +6,13 @@ | |
import pytest | ||
|
||
from xarray.core.iterators import LevelOrderIter | ||
from xarray.core.treenode import InvalidTreeError, NamedNode, NodePath, TreeNode | ||
from xarray.core.treenode import ( | ||
Children, | ||
InvalidTreeError, | ||
NamedNode, | ||
NodePath, | ||
TreeNode, | ||
) | ||
|
||
|
||
class TestFamilyTree: | ||
|
@@ -224,6 +230,56 @@ def test_overwrite_child(self): | |
assert marys_evil_twin.parent is john | ||
|
||
|
||
class TestChildren: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's a bit awkward to know whether to put tests for this class in On the one hand the concept of access to There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is a lot less awkward after #9482. |
||
def test_properties(self): | ||
sue: TreeNode = TreeNode() | ||
mary: TreeNode = TreeNode(children={"Sue": sue}) | ||
kate: TreeNode = TreeNode() | ||
john = TreeNode(children={"Mary": mary, "Kate": kate}) | ||
|
||
children = john.children | ||
assert isinstance(children, Children) | ||
|
||
# len | ||
assert len(children) == 2 | ||
|
||
# iter | ||
assert list(children) == ["Mary", "Kate"] | ||
|
||
assert john.children["Mary"] is mary | ||
assert john.children["Kate"] is kate | ||
|
||
assert "Mary" in john.children | ||
assert "Kate" in john.children | ||
assert 0 not in john.children | ||
assert "foo" not in john.children | ||
|
||
# only immediate children should be accessible | ||
assert "sue" not in john.children | ||
|
||
with pytest.raises(KeyError): | ||
children["foo"] | ||
with pytest.raises(KeyError): | ||
children[0] | ||
|
||
# TODO not sure what this should look like... | ||
# repr | ||
# expected = dedent( | ||
# """\ | ||
# Children: | ||
# * x (x) int64 16B -1 -2 | ||
# * y (y) int64 24B 0 1 2 | ||
# a (x) int64 16B 4 5 | ||
# b int64 8B -10""" | ||
# ) | ||
# actual = repr(coords) | ||
# assert expected == actual | ||
|
||
def test_modify(self): | ||
# TODO | ||
... | ||
|
||
|
||
class TestPruning: | ||
def test_del_child(self): | ||
john: TreeNode = TreeNode() | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure whether this class should allow path-like access to grandchildren or only allow access to immediate children.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same question also for setting
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On second thoughts I think this class should only allow getting/setting immediate children (because it's basically a glorified
dict
).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This relates to #9485 though - why exactly do we want to support full paths for
.coords
but only local access for.children
? We could do whatever, but what's the rationale for them being different?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So you're basically thinking of
.coords
/.children
as meaning "access to all coords/children defined anywhere on the subtree". I originally thought of them as "access to just coords/children defined on this node".FYI path-like access like this means that
list(dt.children)
will return a subset of what can be accessed via.__getitem__
, (and__contains__
could go either way, see #9354)).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One compromise option of sorts (similar to what is done for Dataset for derived variables like
time.year
, or for coordinates inDataset.__getitem__
) is to only list immediate children in__iter__
and__contains__
but to support accessing them in__getitem__
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a detailed discussion about this on the old repo (xarray-contrib/datatree#240 (comment)). There the conclusion was that
__contains__
should support paths if__getitem__
does, but it's okay for.keys()
/.values()
/.items()
to be local only.To play devil's advocate a bit: If
.coords
can access the whole tree then why not also.variables
? I think they should all be consistent.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that seems to be how Dataset works for coordinates.
Sure, we could do that. Hopefully it's all very straightforwardly reusing the same machinery?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what you mean. You're talking about virtual vs non-virtual coordinates on
Dataset
? That they are all in__contains__
and__getitem__
, but only a subset (the non-virtual ones) are listed by.keys()
?I actually meant that as a example of counter-intuitive behaviour 😅 . But maybe it's fine too? That would mean we have really done a 180 degree turn in this issue... It would imply changing the current behaviour of
dt.data_vars
too.It's not right now (local vs path-supporting have different codepaths), but it could be made to: everything could go through some version of using
TreeNode._set_item
xarray/xarray/core/treenode.py
Line 550 in 8bd456c
or
._walk_to
and we could add a parameter liketraverse_paths
to generalize it for both cases. Generalizing.update
to support paths might be the approach that is easiest to integrate with theCoordinates
base class.Also one final thing: We are talking here about supporting paths to descendant nodes only? Or upwards via
dt['../../path/to/node/']
too?