Skip to content

Commit cb87da7

Browse files
committed
Up docs
1 parent 7ae1df7 commit cb87da7

File tree

2 files changed

+94
-54
lines changed

2 files changed

+94
-54
lines changed

docs/source/accessor.rst

Lines changed: 91 additions & 51 deletions
Original file line numberDiff line numberDiff line change
@@ -7,9 +7,8 @@ Accessor
77
An :external+xarray:doc:`accessor <internals/extending-xarray>` is provided to
88
ease manipulation and analysis of the histogram outputs. Simply import
99
:mod:`xarray_histogram.accessor` to register it. It will then be available for
10-
all DataArrays that meet some conditions (:ref:`see below
11-
<accessor-conditions>`), under the ``hist`` attribute. It gives access to a
12-
number of methods. ::
10+
all DataArrays that meet some conditions (see below), under the ``hist``
11+
attribute. It gives access to a number of methods. ::
1312

1413
import xarray_histogram as xh
1514
import xarray_histogram.accessor
@@ -18,39 +17,85 @@ number of methods. ::
1817

1918
h.hist.median()
2019

21-
Operations are vectorized, so that you can apply them to entire arrays of
22-
histograms. For instance for data defined along time, latitude and longitude,
23-
we can compute one histogram per time-step::
20+
Operations are vectorized [#vector]_, so that you can apply them to entire
21+
arrays of histograms. For instance for data defined along time, latitude and
22+
longitude, we can compute one histogram per time-step::
2423

2524
>>> h = xh.histogram(data, dims=["lon", "lat"])
2625
>>> h.hist.median()
2726
will be of dimensions ("time",)
2827

28+
.. [#vector] Computations are automatically vectorized in Python with
29+
:func:`xarray.apply_ufunc`, which is not efficient for a large number of
30+
histograms.
31+
32+
33+
Conditions of accessibility
34+
===========================
35+
36+
Once registered, an accessor is a cached property that can be accessed on any
37+
DataArray. They are some conditions for the *hist* accessor to be created
38+
successfully:
39+
40+
* The coordinates of the bins must be named ``<variable>_bins``.
41+
* The array must be named as ``<variable(s)_name>_<histogram or pdf>``.
42+
*histogram* if it is not normalized, and *pdf* if it is normalized as a
43+
probability density function. If the histogram is multi-dimensional, the
44+
variables names must be separated by underscores. For instance:
45+
``Temp_Sal_histogram``.
46+
47+
Each bins coordinate may contain attributes:
48+
49+
* ``bin_type``: the class name of the Boost axis type that was used. If not
50+
present, the accessor will assume the bins are regularly spaced and will try
51+
to infer the rightmost edge.
52+
* ``right_edge``: the rightmost edge position, only necessary for Regular and
53+
Variable bins.
54+
* ``underflow`` and ``overflow``: booleans that indicate if the corresponding
55+
flow bins are present. If not present, will assume no flow bins.
56+
57+
Those conventions are coherent with the output of
58+
``xarray_histogram.histogram*``, so if you use this package functions you
59+
should not have to worry. The names of the array and coordinates is also
60+
consistent with that of :external+xhistogram:doc:`xhistogram <index>`
61+
(although coordinates attributes will be missing).
62+
2963
Computations
3064
============
3165

3266
Bins
3367
----
3468

35-
The accessor provides the bins edges as a DataArray of size N+1 (it includes the
36-
last bins right edge) for a given variable:
37-
:meth:`~.HistDataArrayAccessor.edges`. Similarly, it provides the bins
38-
:meth:`~.HistDataArrayAccessor.centers`, :meth:`~.HistDataArrayAccessor.widths`,
39-
and :meth:`~.HistDataArrayAccessor.areas`.
69+
The accessor provides a number of methods that return bins-related values for a
70+
given variable. If the histogram is uni-dimensional (*ie* for a single variable)
71+
the variable name can be omitted. By default flow bins are kept but they can be
72+
excluded by passing ``flow=False``.
4073

41-
Normalization
42-
-------------
74+
* :meth:`~.HistDataArrayAccessor.bins` returns the corresponding coordinate,
75+
this is essentially ``h.hist.coords["var_bins"]``.
4376

44-
.. important::
77+
* :meth:`~.HistDataArrayAccessor.edges` returns the N+1 edges (including the
78+
rightmost edge). Edges are not available for the discrete bins "IntCategory"
79+
and "StrCategory".
80+
81+
* :meth:`~.HistDataArrayAccessor.widths` returns the widths of the bins
82+
The widths of flow bins and StrCategory are always 1.
4583

46-
The accessor considers the histogram normalized or not given the name of its
47-
DataArray: normalized if named ``<variables>_pdf`` and non-normalized
48-
if ``<variables>_histogram``. This is consistent with the output of
49-
:func:`~.core.histogram`.
84+
* :meth:`~.HistDataArrayAccessor.centers` returns the center position of the
85+
bins. The overflow bins centers are the same as their position (``np.inf`` for
86+
instance).
5087

51-
The histogram can be normalized if not already, using
52-
:meth:`~.HistDataArrayAccessor.normalize`. Note that for a N-dimensional
53-
histogram, this function can normalize only some variables.
88+
* :meth:`~.HistDataArrayAccessor.areas` returns the areas of multidimensional
89+
bins. This is the product of the widths of all bins. Only some variable can be
90+
specified. The areas of points that correspond to a flow bin in at least one
91+
dimension is equal to one. For instance for a 2D-histogram with underflow and
92+
overflow bins, all the borders of the 2D array for areas will be equal to 1.
93+
94+
To remove flow bins, :meth:`~.HistDataArrayAccessor.remove_flow` will returns a
95+
new histogram DataArray without the flow bins of the given variables (by default
96+
all of them). This simply does a ``.isel`` operation based on the ``underflow``
97+
and ``overflow`` attributes of specified coordinates. It also set those
98+
attributes to False in the output.
5499

55100
Bins transform
56101
--------------
@@ -64,15 +109,39 @@ the *right_edge* attribute.
64109
For instance, :meth:`~.HistDataArrayAccessor.scale` scales bins by a given
65110
factor. It essential does ``hist.apply_func(lambda edges: edges * factor)``
66111

112+
113+
Normalization
114+
-------------
115+
116+
The histogram can be normalized to a probability density function if not
117+
already, using :meth:`~.HistDataArrayAccessor.normalize`. Note that for a
118+
N-dimensional histogram, this function can normalize only along some variables.
119+
120+
The accessor considers the histogram normalized or not given the name of its
121+
DataArray: normalized if named ``<variables>_pdf`` and non-normalized
122+
if ``<variables>_histogram``. This is consistent with the output of
123+
:func:`~.core.histogram`.
124+
125+
.. important::
126+
127+
This is important when computing statistics (see below) where the accessor
128+
must know if the histogram is normalized or not.
129+
130+
Normalizing when flow bins are present in the output is allowed. The values in
131+
flow bins are not changed and not counted in the normalization.
132+
67133
Statistics
68134
----------
69135

70136
A number of statistics can be extracted from the histogram. The following
71137
functions are wrappers around methods of :class:`scipy.stats.rv_histogram`.
138+
These function work only on 1D histograms, thus for ND-histograms a variable
139+
must be specified. This does not support flow bins, they are removed along the
140+
core dimension (the specified variable).
72141

73142
.. note::
74143

75-
The histogram cannot be chunked in any bins dimensions.
144+
The histogram cannot be chunked in the core dimension.
76145

77146
.. autosummary::
78147

@@ -86,32 +155,3 @@ functions are wrappers around methods of :class:`scipy.stats.rv_histogram`.
86155
~accessor.HistDataArrayAccessor.var
87156

88157

89-
.. _accessor-conditions:
90-
91-
Conditions of accessibility
92-
===========================
93-
94-
Once registered, an accessor is a cached property that can be accessed on any
95-
DataArray. They are some conditions for the *hist* accessor to be created
96-
successfully:
97-
98-
* The coordinates of the bins must be named ``<variable>_bins``.
99-
* Each bins coordinates must contain an attribute named ``right_edge``,
100-
corresponding to the right edge of the last bin.
101-
* The array must be named as ``<variable(s)_name>_<histogram or pdf>``.
102-
*histogram* if it is not normalized, and *pdf* if it is normalized as a
103-
probability density function. If the histogram is multi-dimensional, the
104-
variables names must be separated by underscores. For instance:
105-
``Temp_Sal_histogram``.
106-
107-
Those conventions are coherent with the output of
108-
``xarray_histogram.histogram*``, so if you use this packages functions you
109-
should not have to worry. The names of the array and coordinates is also
110-
consistent with that of :external+xhistogram:doc:`xhistogram <index>`. Only the
111-
right edge attribute will be missing.
112-
113-
.. admonition:: Right edge inference
114-
115-
If the right edge attribute is missing in a bins coordinates, the accessor
116-
will try to infer it. It will make the hypothesis that bins are regularly
117-
spaced. If this is not the case, an exception will be raised.

docs/source/usage.rst

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -69,13 +69,13 @@ Over/underflow
6969
By default, Boost axes are configured to keep count of the data points that
7070
fall outside their range. Pass ``underflow=False`` and/or ``overflow=False``
7171
when creating an axis to disable this.
72-
Still by default, the flow bins values are not kept in the output array.
73-
72+
However, by default, the flow bins values are not kept in the output array.
7473
To keep the flow bins, pass ``flow=True`` to the histogram functions. The
7574
coordinates values for the underflow and overflow bins will be set to
7675

7776
- for a float variable: :data:`-np.inf<numpy.inf>` and :data:`np.inf<numpy.inf>`
7877
- for an integer variable: the minimum and maximum values of the dtype
78+
- for a string variable: `_flow_bin`
7979

8080

8181
Output
@@ -85,7 +85,7 @@ All three functions return a simple :class:`xarray.DataArray`. Its name is
8585
``<variable names separated by underscores>_histogram`` (so for instance
8686
``x_y_histogram``). The bins edges are contained in coordinates named
8787
``<variable>_bins``. The right edge of the last bin is stored in a coordinate
88-
attribute.
88+
attribute when applicable.
8989

9090
The nomenclature is the same as :external+xhistogram:doc:`xhistogram <index>` to
9191
ensure easy transition between the two packages. It also enables the use of an

0 commit comments

Comments
 (0)