Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 23 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1 +1,23 @@
# rtichoke_python
# rtichoke

`rtichoke` is a Python library for visualizing the performance of predictive models. It provides a flexible and intuitive way to create a variety of common evaluation plots, including:

* **ROC Curves**
* **Precision-Recall Curves**
* **Gains and Lift Charts**
* **Decision Curves**

The library is designed to be easy to use, while still offering a high degree of control over the final plots.

## Key Features

* **Simple API**: Create complex visualizations with just a few lines of code.
* **Time-to-Event Analysis**: Native support for models with time-dependent outcomes, including censoring and competing risks.
* **Interactive Plots**: Built on Plotly for interactive, publication-quality figures.
* **Flexible Data Handling**: Works seamlessly with NumPy and Polars.

## Documentation

For a complete guide to the library, including a "Getting Started" tutorial and a full API reference, please see the **[official documentation](https://your-documentation-url.com)**.

*(Note: The documentation URL will need to be updated once the website is deployed.)*
28 changes: 14 additions & 14 deletions docs/_quarto.yml
Original file line number Diff line number Diff line change
@@ -1,30 +1,30 @@
project:
type: website

metadata-files:
- _sidebar.yml

website:
title: "rtichoke"
navbar:
left:
- href: reference/
text: Reference
sidebar:
- id: user-guide
title: "User Guide"
style: "docked"
contents:
- text: "Getting Started"
href: tutorials/getting_started.qmd
- id: api-reference
title: "API Reference"
style: "docked"
contents:
- href: reference/index.qmd
text: "Reference"

quartodoc:
# the name used to import the package you want to create reference docs for
package: rtichoke
sidebar: "_sidebar.yml"
sections:
- title: Performance Data
desc: Functions for creating performance data.
contents:
- prepare_performance_data
- prepare_performance_data_times
# - title: Calibration
# desc: Functions for Calibration.
# contents:
# - create_calibration_curve
- title: Discrimination
desc: Functions for Discrimination.
contents:
Expand All @@ -40,4 +40,4 @@ quartodoc:
desc: Functions for Utility.
contents:
- create_decision_curve
- plot_decision_curve
- plot_decision_curve
13 changes: 13 additions & 0 deletions docs/index.qmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
---
title: "rtichoke Documentation"
---

Welcome to the official documentation for `rtichoke`, a Python library for visualizing the performance of predictive models.

## Getting Started

If you're new to `rtichoke`, the best place to start is the **[Getting Started Tutorial](./tutorials/getting_started.qmd)**. It will walk you through the basics of installing the library, preparing your data, and creating your first plot.

## API Reference

For detailed information on the functions and classes provided by `rtichoke`, please refer to the **[API Reference](./reference/index.qmd)**.
62 changes: 62 additions & 0 deletions docs/tutorials/getting_started.qmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
---
title: "Getting Started with Rtichoke"
---

This tutorial provides a basic introduction to the `rtichoke` library. We'll walk through the process of preparing data, creating a decision curve, and visualizing the results.

## 1. Import Libraries

First, let's import the necessary libraries. We'll need `numpy` for data manipulation and `rtichoke` for the core functionality.

```python
import numpy as np
import rtichoke as rk
```

## 2. Prepare Your Data

`rtichoke` expects data in a specific format. You'll need two main components:

* **Probabilities (`probs`)**: A dictionary where keys are model names and values are NumPy arrays of predicted probabilities.
* **Real Outcomes (`reals`)**: A NumPy array containing the true binary outcomes (0 or 1).

Let's create some sample data for two different models:

```python
# Sample data from the dcurves_example.py script
probs_dict = {
"Marker": np.array([
0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 0.1, 0.2, 0.3, 0.4, 0.5,
0.6, 0.7, 0.8, 0.9, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9
]),
"Marker2": np.array([
0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 0.1, 0.2, 0.3, 0.4, 0.5,
0.6, 0.7, 0.8, 0.9, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9
])
}
reals = np.array([
1, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1
])
```

## 3. Create a Decision Curve

Now that we have our data, we can create a decision curve. This is a simple one-liner with `rtichoke`:

```python
fig = rk.create_decision_curve(
probs=probs_dict,
reals=reals,
)
```

## 4. Show the Plot

Finally, let's display the plot. Since `rtichoke` uses Plotly under the hood, you can show the figure just like any other Plotly object.

```python
# To display the plot in an interactive environment (like a Jupyter notebook)
fig.show()
```

And that's it! You've created your first decision curve with `rtichoke`. From here, you can explore the other curve types and options that the library has to offer.
91 changes: 56 additions & 35 deletions src/rtichoke/discrimination/gains.py
Original file line number Diff line number Diff line change
Expand Up @@ -42,39 +42,33 @@ def create_gains_curve(
"#585123",
],
) -> Figure:
"""Create Gains Curve.
"""Creates a Gains curve.

A Gains curve is a marketing and business analytics tool that evaluates
the performance of a predictive model. It shows the percentage of
positive outcomes (the "gain") that can be captured by targeting a
certain percentage of the population, sorted by predicted probability.

Parameters
----------
probs : Dict[str, np.ndarray]
Dictionary mapping a label or group name to an array of predicted
probabilities for the positive class.
A dictionary mapping model or dataset names to 1-D numpy arrays of
predicted probabilities.
reals : Union[np.ndarray, Dict[str, np.ndarray]]
Ground-truth binary labels (0/1) as a single array, or a dictionary
mapping the same label/group keys used in ``probs`` to arrays of
ground-truth labels.
The true binary labels (0 or 1).
by : float, optional
Resolution for probability thresholds when computing the curve
(step size). Default is 0.01.
The step size for the probability thresholds. Defaults to 0.01.
stratified_by : Sequence[str], optional
Sequence of column names to stratify the performance data by.
Default is ["probability_threshold"].
Variables for stratification. Defaults to ``["probability_threshold"]``.
size : int, optional
Plot size in pixels (width and height). Default is 600.
The width and height of the plot in pixels. Defaults to 600.
color_values : List[str], optional
List of color hex strings to use for the plotted lines. If not
provided, a default palette is used.
A list of hex color strings for the plot lines.

Returns
-------
Figure
A Plotly ``Figure`` containing the Gains curve(s).

Notes
-----
The function delegates computation and plotting to
``_create_rtichoke_plotly_curve_binary`` and returns the resulting
Plotly figure.
A Plotly ``Figure`` object representing the Gains curve.
"""
fig = _create_rtichoke_plotly_curve_binary(
probs,
Expand All @@ -93,30 +87,27 @@ def plot_gains_curve(
stratified_by: Sequence[str] = ["probability_threshold"],
size: int = 600,
) -> Figure:
"""Plot Gains curve from performance data.
"""Plots a Gains curve from pre-computed performance data.

This function is useful for plotting a Gains curve directly from a
DataFrame that already contains the necessary performance metrics.

Parameters
----------
performance_data : pl.DataFrame
A Polars DataFrame containing performance metrics for the Gains curve.
Expected columns include (but may not be limited to)
``probability_threshold`` and gains-related metrics, plus any
stratification columns.
A Polars DataFrame with performance metrics. It must include columns
for the percentage of the population targeted and the corresponding
gain, along with any stratification variables.
stratified_by : Sequence[str], optional
Sequence of column names used for stratification in the
``performance_data``. Default is ["probability_threshold"].
The columns in `performance_data` used for stratification. Defaults to
``["probability_threshold"]``.
size : int, optional
Plot size in pixels (width and height). Default is 600.
The width and height of the plot in pixels. Defaults to 600.

Returns
-------
Figure
A Plotly ``Figure`` containing the Gains plot.

Notes
-----
This function wraps ``_plot_rtichoke_curve_binary`` to produce a
ready-to-render Plotly figure from precomputed performance data.
A Plotly ``Figure`` object representing the Gains curve.
"""
fig = _plot_rtichoke_curve_binary(
performance_data,
Expand Down Expand Up @@ -163,7 +154,37 @@ def create_gains_curve_times(
"#585123",
],
) -> Figure:
"""Create time-dependent Lift Curve."""
"""Creates a time-dependent Gains curve.

Generates a Gains curve for time-to-event models, which is evaluated at
specified time horizons and handles censored data and competing risks.

Parameters
----------
probs : Dict[str, np.ndarray]
A dictionary of predicted probabilities.
reals : Union[np.ndarray, Dict[str, np.ndarray]]
The true event statuses.
times : Union[np.ndarray, Dict[str, np.ndarray]]
The event or censoring times.
fixed_time_horizons : list[float]
A list of time points for performance evaluation.
heuristics_sets : list[Dict], optional
Specifies how to handle censored data and competing events.
by : float, optional
The step size for probability thresholds. Defaults to 0.01.
stratified_by : Sequence[str], optional
Variables for stratification. Defaults to ``["probability_threshold"]``.
size : int, optional
The width and height of the plot in pixels. Defaults to 600.
color_values : List[str], optional
A list of hex color strings for the plot lines.

Returns
-------
Figure
A Plotly ``Figure`` object for the time-dependent Gains curve.
"""

fig = _create_rtichoke_plotly_curve_times(
probs,
Expand Down
Loading