77An :external+xarray:doc: `accessor <internals/extending-xarray >` is provided to
88ease manipulation and analysis of the histogram outputs. Simply import
99:mod: `xarray_histogram.accessor ` to register it. It will then be available for
10- all DataArrays that meet some conditions (:ref: `see below
11- <accessor-conditions>`), under the ``hist `` attribute. It gives access to a
12- number of methods. ::
10+ all DataArrays that meet some conditions (see below), under the ``hist ``
11+ attribute. It gives access to a number of methods. ::
1312
1413 import xarray_histogram as xh
1514 import xarray_histogram.accessor
@@ -18,39 +17,85 @@ number of methods. ::
1817
1918 h.hist.median()
2019
21- Operations are vectorized, so that you can apply them to entire arrays of
22- histograms. For instance for data defined along time, latitude and longitude,
23- we can compute one histogram per time-step::
20+ Operations are vectorized [ #vector ]_ , so that you can apply them to entire
21+ arrays of histograms. For instance for data defined along time, latitude and
22+ longitude, we can compute one histogram per time-step::
2423
2524 >>> h = xh.histogram(data, dims=["lon", "lat"])
2625 >>> h.hist.median()
2726 will be of dimensions ("time",)
2827
28+ .. [#vector ] Computations are automatically vectorized in Python with
29+ :func: `xarray.apply_ufunc `, which is not efficient for a large number of
30+ histograms.
31+
32+
33+ Conditions of accessibility
34+ ===========================
35+
36+ Once registered, an accessor is a cached property that can be accessed on any
37+ DataArray. They are some conditions for the *hist * accessor to be created
38+ successfully:
39+
40+ * The coordinates of the bins must be named ``<variable>_bins ``.
41+ * The array must be named as ``<variable(s)_name>_<histogram or pdf> ``.
42+ *histogram * if it is not normalized, and *pdf * if it is normalized as a
43+ probability density function. If the histogram is multi-dimensional, the
44+ variables names must be separated by underscores. For instance:
45+ ``Temp_Sal_histogram ``.
46+
47+ Each bins coordinate may contain attributes:
48+
49+ * ``bin_type ``: the class name of the Boost axis type that was used. If not
50+ present, the accessor will assume the bins are regularly spaced and will try
51+ to infer the rightmost edge.
52+ * ``right_edge ``: the rightmost edge position, only necessary for Regular and
53+ Variable bins.
54+ * ``underflow `` and ``overflow ``: booleans that indicate if the corresponding
55+ flow bins are present. If not present, will assume no flow bins.
56+
57+ Those conventions are coherent with the output of
58+ ``xarray_histogram.histogram* ``, so if you use this package functions you
59+ should not have to worry. The names of the array and coordinates is also
60+ consistent with that of :external+xhistogram:doc: `xhistogram <index >`
61+ (although coordinates attributes will be missing).
62+
2963Computations
3064============
3165
3266Bins
3367----
3468
35- The accessor provides the bins edges as a DataArray of size N+1 (it includes the
36- last bins right edge) for a given variable:
37- :meth: `~.HistDataArrayAccessor.edges `. Similarly, it provides the bins
38- :meth: `~.HistDataArrayAccessor.centers `, :meth: `~.HistDataArrayAccessor.widths `,
39- and :meth: `~.HistDataArrayAccessor.areas `.
69+ The accessor provides a number of methods that return bins-related values for a
70+ given variable. If the histogram is uni-dimensional (*ie * for a single variable)
71+ the variable name can be omitted. By default flow bins are kept but they can be
72+ excluded by passing ``flow=False ``.
4073
41- Normalization
42- -------------
74+ * :meth: ` ~.HistDataArrayAccessor.bins ` returns the corresponding coordinate,
75+ this is essentially `` h.hist.coords["var_bins"] ``.
4376
44- .. important ::
77+ * :meth: `~.HistDataArrayAccessor.edges ` returns the N+1 edges (including the
78+ rightmost edge). Edges are not available for the discrete bins "IntCategory"
79+ and "StrCategory".
80+
81+ * :meth: `~.HistDataArrayAccessor.widths ` returns the widths of the bins
82+ The widths of flow bins and StrCategory are always 1.
4583
46- The accessor considers the histogram normalized or not given the name of its
47- DataArray: normalized if named ``<variables>_pdf `` and non-normalized
48- if ``<variables>_histogram ``. This is consistent with the output of
49- :func: `~.core.histogram `.
84+ * :meth: `~.HistDataArrayAccessor.centers ` returns the center position of the
85+ bins. The overflow bins centers are the same as their position (``np.inf `` for
86+ instance).
5087
51- The histogram can be normalized if not already, using
52- :meth: `~.HistDataArrayAccessor.normalize `. Note that for a N-dimensional
53- histogram, this function can normalize only some variables.
88+ * :meth: `~.HistDataArrayAccessor.areas ` returns the areas of multidimensional
89+ bins. This is the product of the widths of all bins. Only some variable can be
90+ specified. The areas of points that correspond to a flow bin in at least one
91+ dimension is equal to one. For instance for a 2D-histogram with underflow and
92+ overflow bins, all the borders of the 2D array for areas will be equal to 1.
93+
94+ To remove flow bins, :meth: `~.HistDataArrayAccessor.remove_flow ` will returns a
95+ new histogram DataArray without the flow bins of the given variables (by default
96+ all of them). This simply does a ``.isel `` operation based on the ``underflow ``
97+ and ``overflow `` attributes of specified coordinates. It also set those
98+ attributes to False in the output.
5499
55100Bins transform
56101--------------
@@ -64,15 +109,39 @@ the *right_edge* attribute.
64109For instance, :meth: `~.HistDataArrayAccessor.scale ` scales bins by a given
65110factor. It essential does ``hist.apply_func(lambda edges: edges * factor) ``
66111
112+
113+ Normalization
114+ -------------
115+
116+ The histogram can be normalized to a probability density function if not
117+ already, using :meth: `~.HistDataArrayAccessor.normalize `. Note that for a
118+ N-dimensional histogram, this function can normalize only along some variables.
119+
120+ The accessor considers the histogram normalized or not given the name of its
121+ DataArray: normalized if named ``<variables>_pdf `` and non-normalized
122+ if ``<variables>_histogram ``. This is consistent with the output of
123+ :func: `~.core.histogram `.
124+
125+ .. important ::
126+
127+ This is important when computing statistics (see below) where the accessor
128+ must know if the histogram is normalized or not.
129+
130+ Normalizing when flow bins are present in the output is allowed. The values in
131+ flow bins are not changed and not counted in the normalization.
132+
67133Statistics
68134----------
69135
70136A number of statistics can be extracted from the histogram. The following
71137functions are wrappers around methods of :class: `scipy.stats.rv_histogram `.
138+ These function work only on 1D histograms, thus for ND-histograms a variable
139+ must be specified. This does not support flow bins, they are removed along the
140+ core dimension (the specified variable).
72141
73142.. note ::
74143
75- The histogram cannot be chunked in any bins dimensions .
144+ The histogram cannot be chunked in the core dimension .
76145
77146.. autosummary ::
78147
@@ -86,32 +155,3 @@ functions are wrappers around methods of :class:`scipy.stats.rv_histogram`.
86155 ~accessor.HistDataArrayAccessor.var
87156
88157
89- .. _accessor-conditions :
90-
91- Conditions of accessibility
92- ===========================
93-
94- Once registered, an accessor is a cached property that can be accessed on any
95- DataArray. They are some conditions for the *hist * accessor to be created
96- successfully:
97-
98- * The coordinates of the bins must be named ``<variable>_bins ``.
99- * Each bins coordinates must contain an attribute named ``right_edge ``,
100- corresponding to the right edge of the last bin.
101- * The array must be named as ``<variable(s)_name>_<histogram or pdf> ``.
102- *histogram * if it is not normalized, and *pdf * if it is normalized as a
103- probability density function. If the histogram is multi-dimensional, the
104- variables names must be separated by underscores. For instance:
105- ``Temp_Sal_histogram ``.
106-
107- Those conventions are coherent with the output of
108- ``xarray_histogram.histogram* ``, so if you use this packages functions you
109- should not have to worry. The names of the array and coordinates is also
110- consistent with that of :external+xhistogram:doc: `xhistogram <index >`. Only the
111- right edge attribute will be missing.
112-
113- .. admonition :: Right edge inference
114-
115- If the right edge attribute is missing in a bins coordinates, the accessor
116- will try to infer it. It will make the hypothesis that bins are regularly
117- spaced. If this is not the case, an exception will be raised.
0 commit comments