-
Notifications
You must be signed in to change notification settings - Fork 13
Description
For evaluating large ensembles (thousands) of hydrologic time series, relying solely on point-wise error metrics such as RMSE, NSE, and KGE can miss important information about how well simulated flows reproduce the statistical characteristics of the reference data.
I propose adding distribution-based and quantile-based evaluation metrics to TEEHR to provide deeper insight into model performance, particularly for extremes, variability, and distributional similarity. For large-scale benchmarking and model intercomparison, distribution-aware metrics are especially valuable.
Proposed Metrics
1. Quantile-Based Comparisons
1.1. Quantile–Quantile (Q–Q) analysis
To compare the distributions of two datasets by plotting their quantiles against each other.
If the points fall approximately along a 45° straight line, it suggests the two distributions are similar.
Deviations from the line indicate differences in distribution shape, spread, or skewness.
Useful for checking extremes (e.g., floods) vs medians (normal flows).
1.2. Quantile Error/loss Metrics (Pinball Loss)
To Compute errors at selected quantiles (Q10, Q50, Q95).
Used in quantile regression.
For quantile level τ (e.g., 0.1, 0.5, 0.9):
QL_τ (y,y ̂ )= {█(τ(y-(y)) ̂ if y≥y ̂@(1-τ)(y ̂-y) if y<y ̂ )┤
y : The observed (true) value.
y ̂ : The predicted quantile value from the model (e.g., 10th, 50th, or 90th percentile prediction).
τ : The quantile level being evaluated, where 0<τ<1.
2. Divergence / Distance Between Distributions
These treat the time series as samples from a probability distribution:
2.1. Kullback–Leibler (KL) Divergence:
measures how one probability distribution P diverges from another reference distribution Q.
For district distributions:
Here we use a sum because probabilities are defined over a finite or countable set (like dice rolls, categories, etc.).
KL(P || Q) = L(P || Q) =∑_i▒〖P(i) log〖(P(i))/(Q(i))〗 〗
For Continuous distribution:
Here we use an integral because probabilities are defined over a continuous space (like real numbers, normal distribution, etc.).
KL(P || Q)=∫▒〖P(x)(P(x))/(Q(x))〗 dx
P = the true (or target) probability distribution.
Q = the model probability distribution.
KL(P∥Q) = the “distance” (not symmetric!) from Q to P.
Interpretation
KL(P∥Q): How much extra information is needed if we use Q to approximate the true distribution P.
KL(Q∥P): How much extra information is needed if we use P to approximate Q.
So, unlike physical distance, the cost of using the wrong distribution depends on which one is assumed to be the “truth” — hence the asymmetry.
That’s why people often use Jensen–Shannon Divergence (JSD), which makes it symmetric by averaging both directions.
2.2. Jensen–Shannon Divergence (JS Divergence):
To measure the similarity (or dissimilarity) between two probability distributions.
It is a symmetrized and smoothed version of the Kullback–Leibler Divergence (KLD).
Unlike KLD, JSD is always finite and symmetric.
JSD(p||Q)= 1/2 KL(P||M)+1/2 KL(Q||M)
M=1/2(P+Q)
KL(P||M) : Kullback–Leibler divergence of P from the average distribution M.
KL(Q||M) : Kullback–Leibler divergence of Q from the average distribution M.
Key Properties
Symmetric: JSD(P∥Q) = JSD(Q∥P).
Bounded: Value is between 0 and 1 (if log base 2 is used).
JSD = 0 → Distributions are identical.
JSD = 1→ Distributions are maximally different.
Smooth & finite even if P or Q has zero probabilities.
The choice of method depends on the evaluation goal: if the focus is on overall performance, distributional divergence measures (e.g., JS Divergence) are appropriate, but if we are interested in extremes or certain quantiles, quantile-based methods work better.