-
Notifications
You must be signed in to change notification settings - Fork 91
Open
Labels
Effort > Brief 🐇Small tasks expected to take a few hours up to a couple of days.Small tasks expected to take a few hours up to a couple of days.Great First Contribution! 🌱Beginner friendly tickets with narrow scope and huge impact. Perfect to join our community!Beginner friendly tickets with narrow scope and huge impact. Perfect to join our community!Impact > Minor 🔷Small, backward compatible change. Treat like a patch release (e.g., 0.5.8 → 0.5.9).Small, backward compatible change. Treat like a patch release (e.g., 0.5.8 → 0.5.9).
Description
Are you on the latest chainladder version?
- Yes, this bug occurs on the latest version.
Describe the bug in words
Description
Currently, when using the Development estimator, the n_periods parameter is applied before any exclusions from drop, drop_valuation, etc., are considered. This leads to unexpected results when attempting to exclude specific origin-development combinations from LDF calculations.
Current Behavior
In the example below when n_periods=1 is specified along with drop=[("2009", 12)], the system:
- Identifies the most recent period (2009)
- Applies the drop, resulting in no valid periods for the calculation
- Returns NaN or uses an incorrect period
Expected Behavior
When n_periods=1 is specified along with drop=[("2009", 12)], the system should:
- Identify which periods are valid (non-dropped)
- Select the most recent valid period (skipping 2009)
- Use that period (e.g., 2008) for the LDF calculation
How can the bug be reproduced?
import chainladder as cl
import pandas as pd
# create example triangle
data = {
'origin': ["2007-01-01", "2007-01-01", "2007-01-01", "2007-01-01", "2008-01-01", "2008-01-01", "2008-01-01", "2009-01-01", "2009-01-01", "2010-01-01"],
'development': ["2007-01-01", "2008-01-01", "2009-01-01", "2010-01-01", "2008-01-01", "2009-01-01", "2010-01-01", "2009-01-01", "2010-01-01", "2010-01-01"],
'loss': [100, 200, 300, 400, 150, 300, 450, 200, 250, 50]
}
df = pd.DataFrame(data)
tri = cl.Triangle(
df,
origin='origin',
development='development',
columns='loss',
cumulative=True
)
# calculate ldf with n_periods=1 and the most recent period (2009) dropped
dev = cl.Development(n_periods=1, drop=[('2009', 12)]).fit(tri)
dev.ldf_
# current behaviour : uses only the excluded ratio from 2009 in dev month 12 resulting in NaN
# desired behaviour: use valid ratios (that haven't been dropped) resulting in the ratio from 2008 in dev month 12 and so expected result is 300 / 150 = 2What is the expected behavior?
The n_periods parameter should be applied after all drop logic. It should select the n most recent link ratios from the set of all valid, non-dropped, non-NaN data points. This ensures that the n_periods argument is always honored when enough valid data is available.
Metadata
Metadata
Assignees
Labels
Effort > Brief 🐇Small tasks expected to take a few hours up to a couple of days.Small tasks expected to take a few hours up to a couple of days.Great First Contribution! 🌱Beginner friendly tickets with narrow scope and huge impact. Perfect to join our community!Beginner friendly tickets with narrow scope and huge impact. Perfect to join our community!Impact > Minor 🔷Small, backward compatible change. Treat like a patch release (e.g., 0.5.8 → 0.5.9).Small, backward compatible change. Treat like a patch release (e.g., 0.5.8 → 0.5.9).