Skip to content

Commit

Permalink
Merge pull request #235 from shoyer/better-format
Browse files Browse the repository at this point in the history
Better formatting for coordinates, getting rid of "index coordinates" (and assorted doc improvements)
  • Loading branch information
shoyer committed Sep 22, 2014
2 parents fbff4a7 + 4a06024 commit 5df6bdd
Show file tree
Hide file tree
Showing 12 changed files with 329 additions and 174 deletions.
19 changes: 18 additions & 1 deletion doc/combining.rst
Original file line number Diff line number Diff line change
Expand Up @@ -21,16 +21,33 @@ that dimension:
arr = xray.DataArray(np.random.randn(2, 3),
[('x', ['a', 'b']), ('y', [10, 20, 30])])
arr[:, :1]
# this resembles how you would use np.concatenate
xray.concat([arr[:, :1], arr[:, 1:]], dim='y')
In addition to combining along an existing dimension, ``concat`` can create a
new dimension by stacking lower dimension arrays together:
new dimension by stacking lower dimensional arrays together:

.. ipython:: python
arr[0]
# to combine these 1d arrays into a 2d array in numpy, you would use np.array
xray.concat([arr[0], arr[1]], 'x')
If the second argument to ``concat`` is a new dimension name, the arrays will
be concatenated along that new dimension, which is always inserted as the first
dimension:

.. ipython:: python
xray.concat([arr[0], arr[1]], 'new_dim')
This is actually the default behavior for ``concat``:

.. ipython:: python
xray.concat([arr[0], arr[1]])
The second argument to ``concat`` can also be an :py:class:`~pandas.Index` or
:py:class:`~xray.DataArray` object as well as a string, in which case it is
used to label the values along the new dimension:
Expand Down
28 changes: 19 additions & 9 deletions doc/computation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -127,11 +127,14 @@ This means, for example, that you always subtract an array from its transpose:
c - c.T
.. _alignment and coordinates:

Alignment and coordinates
=========================

For now, performing most binary operations on xray objects requires that the
all *index* coordinates have the same values:
all *index* :ref:`coordinates` (that is, coordinates with the same name as a
dimension) have the same values:

.. ipython::

Expand All @@ -157,18 +160,25 @@ See :ref:`align and reindex` for more details.
expect to default to ``join='inner'``.

Although index coordinates are required to match exactly, other coordinates are
not. Still, xray will persist other coordinates in arithmetic, as long as there
not, and if their values conflict, they will be dropped. This is necessary,
for example, because indexing turns 1D coordinates into scalars:

.. ipython:: python
arr[0]
arr[1]
# notice that the scalar coordinate 'x' is silently dropped
arr[1] - arr[0]
Still, xray will persist other coordinates in arithmetic, as long as there
are no conflicting values:

.. ipython:: python
a.coords['z'] = -1
b.coords['z'] = 999
# notice that 'z' is silently dropped
a + b
b.coords['z'] = -1
# now 'z' is persisted, because it has a unique value
a + b
# only one argument has the 'x' coordinate
arr[0] + 1
# both arguments have the same 'x' coordinate
arr[0] - arr[0]
Math with Datasets
==================
Expand Down
92 changes: 45 additions & 47 deletions doc/data-structures.rst
Original file line number Diff line number Diff line change
@@ -1,20 +1,16 @@
.. _data structures:

Data Structures
===============

.. ipython:: python
:suppress:
import numpy as np
np.random.seed(123456)
np.set_printoptions(threshold=10)
To get started, we will import numpy, pandas and xray:

.. ipython:: python
import numpy as np
import pandas as pd
import xray
np.random.seed(123456)
np.set_printoptions(threshold=10)
DataArray
---------
Expand All @@ -31,10 +27,9 @@ multi-dimensional array. It has several key properties:

xray uses ``dims`` and ``coords`` to enable its core metadata aware operations.
Dimensions provide names that xray uses instead of the ``axis`` argument found
in many numpy functions. Coordinates (particularly "index coordinates") enable
fast label based indexing and alignment, building on the functionality of the
``index`` found on a pandas :py:class:`~pandas.DataFrame` or
:py:class:`~pandas.Series`.
in many numpy functions. Coordinates enable fast label based indexing and
alignment, building on the functionality of the ``index`` found on a pandas
:py:class:`~pandas.DataFrame` or :py:class:`~pandas.Series`.

DataArray objects also can have a ``name`` and can hold arbitrary metadata in
the form of their ``attrs`` property (an ordered dictionary). Names and
Expand Down Expand Up @@ -66,9 +61,9 @@ in with default values:
xray.DataArray(data)
As you can see, dimension names and index coordinates, which label tick marks
along each dimension, are always present. This behavior is similar to pandas,
which fills in index values in the same way.
As you can see, dimensions and coordinate arrays corresponding to each
dimension are always present. This behavior is similar to pandas, which fills
in index values in the same way.

The data array constructor also supports supplying ``coords`` as a list of
``(dim, ticks[, attrs])`` pairs with length equal to the number of dimensions:
Expand All @@ -80,7 +75,7 @@ The data array constructor also supports supplying ``coords`` as a list of
Yet another option is to supply ``coords`` in the form of a dictionary where
the values are scaler values, 1D arrays or tuples (in the same form as the
`dataarray constructor`_). This form lets you supply other coordinates than
those used for indexing (more on these later):
those corresponding to dimensions (more on these later):

.. ipython:: python
Expand Down Expand Up @@ -214,16 +209,14 @@ variables. Dictionary like access on a dataset will supply arrays found in
either category. However, the distinction does have important implications for
indexing and compution.

Here is an example how we might structure a dataset for a weather forecast:
Here is an example of how we might structure a dataset for a weather forecast:

.. image:: _static/dataset-diagram.png

In this example, it would be natural to call ``temperature`` and
``precipitation`` "variables" and all the other arrays "coordinates" because
they label the points along the dimensions. ``x``, ``y`` and ``time`` are
index coordinates (used for alignment purposes), and ``latitude``,
``longitude`` and ``reference_time`` are other coordinates, not used for
indexing (see [1]_ for more background on this example).
they label the points along the dimensions. (see [1]_ for more background on
this example).

.. _dataarray constructor:

Expand Down Expand Up @@ -383,40 +376,46 @@ Another useful option is the ability to rename the variables in a dataset:
ds.rename({'temperature': 'temp', 'precipitation': 'precip'})
.. _coordinates:

Coordinates
-----------

``DataArray`` and ``Dataset`` objects store two types of arrays in their
``coords`` attribute:
Coordinates are ancilliary arrays stored for ``DataArray`` and ``Dataset``
objects in the ``coords`` attribute:

.. ipython:: python
ds.coords
* "Index" coordinates are used for label based indexing and alignment, like the
``index`` found on a pandas :py:class:`~pandas.DataFrame` or
:py:class:`~pandas.Series`. Index coordinates must be one-dimensional, and
are (automatically) identified by arrays with a name equal to their (single)
dimension.
* "Other" coordinates are also intended to be descriptive of points along
dimensions, but xray makes no any direct use of them, beyond persisting
through operations when it can be done unambiguously. These coordinates can
have any number of dimensions.
Unlike attributes, xray *does* interpret and persist coordinates in
operations that transform xray objects.

.. note::
One dimensional coordinates with a name equal to their sole dimension (marked
by ``*`` when printing a dataset or data array) take on a special meaning in
xray. They are used for label based indexing and alignment,
like the ``index`` found on a pandas :py:class:`~pandas.DataFrame` or
:py:class:`~pandas.Series`. Indeed, these "dimension" coordinates use a
:py:class:`pandas.Index` internally to store their values.

You cannot yet use a :py:class:`pandas.MultiIndex` as a xray index
coordinate (:issue:`164`).
Other than for indexing, xray does not make any direct use of the values
associated with coordinates. Coordinates with names not matching a dimension
are not used for alignment or indexing, nor are they required to match when
doing arithmetic (see :ref:`alignment and coordinates`).

Converting to ``pandas.Index``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

To convert an index coordinate into an actual :py:class:`pandas.Index`, use
the :py:meth:`~xray.DataArray.to_index` method:
To convert a coordinate (or any ``DataArray``) into an actual
:py:class:`pandas.Index`, use the :py:meth:`~xray.DataArray.to_index` method:

.. ipython:: python
ds['time'].to_index()
A useful shortcut is the ``indexes`` property (on both ``DataArray`` and
``Dataset``), which lazily constructs a dictionary where the values are
``Index`` objects:
``Dataset``), which lazily constructs a dictionary whose keys are given by each
dimension and whose the values are ``Index`` objects:

.. ipython:: python
Expand All @@ -436,18 +435,17 @@ variables, use the the :py:meth:`~xray.Dataset.set_coords` and
ds.set_coords(['temperature', 'precipitation'])
ds['temperature'].reset_coords(drop=True)
Notice that these operations skip index coordinates.

.. note::

We do not yet have a ``set_index`` method like pandas for manipulating
indexes. This is planned.
Notice that these operations skip coordinates with names given by dimensions,
as used for indexing. This mostly because we are not entirely sure how to
design the interface around the fact that xray cannot store a coordinate and
variable with the name but different values in the same dictionary. But we do
recognize that supporting something like this would be useful.

Converting into datasets
~~~~~~~~~~~~~~~~~~~~~~~~

Coordinate objects also have a few useful methods, mostly for converting them
into dataset objects:
``Coordinates`` objects also have a few useful methods, mostly for converting
them into dataset objects:

.. ipython:: python
Expand Down
2 changes: 2 additions & 0 deletions doc/groupby.rst
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
.. _groupby:

GroupBy: split-apply-combine
----------------------------

Expand Down
1 change: 1 addition & 0 deletions doc/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ Documentation

why-xray
installing
quickstart
data-structures
indexing
computation
Expand Down
15 changes: 9 additions & 6 deletions doc/indexing.rst
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
.. _indexing:

Indexing and selecting data
===========================

Expand Down Expand Up @@ -76,12 +78,12 @@ and :py:meth:`~xray.DataArray.isel` methods:
# index by integer array indices
arr.isel(space=0, time=slice(None, 2))
# index by index coordinate labels
# index by dimension coordinate labels
arr.sel(time=slice('2000-01-01', '2000-01-02'))
The arguments to these methods can be any objects that could index the array
along that dimension, e.g., labels for an individual value, Python ``slice``
objects or 1-dimensional arrays.
along the dimension given by the keyword, e.g., labels for an individual value,
Python :py:func:`slice` objects or 1-dimensional arrays.

.. note::

Expand Down Expand Up @@ -170,9 +172,10 @@ Align and reindex
-----------------

xray's ``reindex``, ``reindex_like`` and ``align`` impose a ``DataArray`` or
``Dataset`` onto a new set of index coordinates. The original values are subset
to the index labels still found in the new labels, and values corresponding to
new labels not found in the original object are in-filled with `NaN`.
``Dataset`` onto a new set of coordinates corresponding to dimensions. The
original values are subset to the index labels still found in the new labels,
and values corresponding to new labels not found in the original object are
in-filled with `NaN`.

To reindex a particular dimension, use :py:meth:`~xray.DataArray.reindex`:

Expand Down
8 changes: 4 additions & 4 deletions doc/installing.rst
Original file line number Diff line number Diff line change
Expand Up @@ -16,13 +16,13 @@ Optional dependencies:
The easiest way to get all these dependencies installed is to use the
`Anaconda python distribution <https://store.continuum.io/cshop/anaconda/>`__.

To install xray, use pip:

::
To install xray, use pip::

pip install xray

.. warning::

If you don't already have recent versions of numpy and pandas installed,
installing xray will automatically update them.
installing xray will attempt to automatically update them. This may or may
not succeed: you probably want to ensure you have an up-to-date installs
of numpy and pandas before attempting to install xray.
Loading

0 comments on commit 5df6bdd

Please sign in to comment.