From 80e1c0eb4211733899bfe76045745120e3754e0b Mon Sep 17 00:00:00 2001
From: "google-labs-jules[bot]"
 <161369871+google-labs-jules[bot]@users.noreply.github.com>
Date: Wed, 24 Dec 2025 17:08:39 +0000
Subject: [PATCH 1/5] docs: Update README and "Getting Started" tutorial

Updates the project's documentation to be more user-friendly for new users.

- The main `README.md` has been updated with installation instructions, a clearer "Getting Started" section, and links to the blog and official documentation. The code example has been corrected to use the proper dictionary format for the `reals` parameter.
- The "Getting Started" tutorial (`docs/tutorials/getting_started.qmd`) has been restructured to clearly explain and provide examples for the three main use cases: single model evaluation, model comparison, and population comparison. This new structure is inspired by the documentation for the R version of `rtichoke`.
---
 README.md                          |  40 +++++++++--
 docs/tutorials/getting_started.qmd | 106 +++++++++++++++++++++--------
 2 files changed, 112 insertions(+), 34 deletions(-)

diff --git a/README.md b/README.md
index 32d2fd1..18e1643 100644
--- a/README.md
+++ b/README.md
@@ -7,7 +7,41 @@
 *   **Gains and Lift Charts**
 *   **Decision Curves**
 
-The library is designed to be easy to use, while still offering a high degree of control over the final plots.
+The library is designed to be easy to use, while still offering a high degree of control over the final plots. For some reproducible examples please visit the [rtichoke blog](https://uriahf.github.io/rtichoke-py/blog.html)!
+
+## Installation
+
+You can install `rtichoke` from PyPI:
+
+```bash
+pip install rtichoke
+```
+
+## Getting Started
+
+To use `rtichoke`, you'll need two main inputs:
+
+*   `probs`: A dictionary containing your model's predicted probabilities.
+*   `reals`: A dictionary of the true binary outcomes.
+
+Here's a quick example of how to create a ROC curve for a single model:
+
+```python
+import numpy as np
+import rtichoke as rk
+
+# Sample data
+probs = {'My Model': np.random.rand(100)}
+reals = {'My Population': np.random.randint(0, 2, 100)}
+
+# Create the ROC curve
+fig = rk.create_roc_curve(
+  probs=probs,
+  reals=reals
+)
+
+fig.show()
+```
 
 ## Key Features
 
@@ -18,6 +52,4 @@ The library is designed to be easy to use, while still offering a high degree of
 
 ## Documentation
 
-For a complete guide to the library, including a "Getting Started" tutorial and a full API reference, please see the **[official documentation](https://your-documentation-url.com)**.
-
-*(Note: The documentation URL will need to be updated once the website is deployed.)*
+For a complete guide to the library, including a "Getting Started" tutorial and a full API reference, please see the **[official documentation](https://uriahf.github.io/rtichoke-py/)**.
diff --git a/docs/tutorials/getting_started.qmd b/docs/tutorials/getting_started.qmd
index 86e9d51..5e64806 100644
--- a/docs/tutorials/getting_started.qmd
+++ b/docs/tutorials/getting_started.qmd
@@ -1,8 +1,8 @@
 ---
-title: "Getting Started with Rtichoke"
+title: "Getting Started with rtichoke"
 ---
 
-This tutorial provides a basic introduction to the `rtichoke` library. We'll walk through the process of preparing data, creating a decision curve, and visualizing the results.
+This tutorial provides an introduction to the `rtichoke` library, showing how to visualize model performance for different scenarios.
 
 ## 1. Import Libraries
 
@@ -11,52 +11,98 @@ First, let's import the necessary libraries. We'll need `numpy` for data manipul
 ```python
 import numpy as np
 import rtichoke as rk
+
+# For reproducibility
+np.random.seed(42)
 ```
 
-## 2. Prepare Your Data
+## 2. Understanding the Inputs
+
+`rtichoke` expects two main inputs for creating performance curves:
+
+*   **`probs` (Probabilities)**: A dictionary where keys are model or population names and values are lists or NumPy arrays of predicted probabilities.
+*   **`reals` (Outcomes)**: A dictionary where keys are population names and values are lists or NumPy arrays of the true binary outcomes (0 or 1).
 
-`rtichoke` expects data in a specific format. You'll need two main components:
+Let's look at the three main use cases.
 
-*   **Probabilities (`probs`)**: A dictionary where keys are model names and values are NumPy arrays of predicted probabilities.
-*   **Real Outcomes (`reals`)**: A NumPy array containing the true binary outcomes (0 or 1).
+### Use Case 1: Single Model
 
-Let's create some sample data for two different models:
+This is the simplest case, where you want to evaluate the performance of a single predictive model.
+
+For this, you provide `probs` with a single entry for your model and `reals` with a single entry for the corresponding outcomes.
 
 ```python
-# Sample data from the dcurves_example.py script
-probs_dict = {
-    "Marker": np.array([
-        0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 0.1, 0.2, 0.3, 0.4, 0.5,
-        0.6, 0.7, 0.8, 0.9, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9
-    ]),
-    "Marker2": np.array([
-        0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 0.1, 0.2, 0.3, 0.4, 0.5,
-        0.6, 0.7, 0.8, 0.9, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9
-    ])
-}
-reals = np.array([
-    1, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1
-])
+# Generate sample data for one model
+probs_single = {"Good Model": np.random.rand(100)}
+reals_single = {"Population": np.random.randint(0, 2, 100)}
+
+# Create a ROC curve
+fig = rk.create_roc_curve(
+    probs=probs_single,
+    reals=reals_single,
+)
+
+# In an interactive environment (like a Jupyter notebook),
+# this will display the plot.
+fig.show()
 ```
 
-## 3. Create a Decision Curve
+### Use Case 2: Models Comparison
+
+Often, you want to compare the performance of several different models on the *same* population.
 
-Now that we have our data, we can create a decision curve. This is a simple one-liner with `rtichoke`:
+For this, you provide `probs` with an entry for each model you want to compare. `reals` will still have a single entry, since the outcome data is the same for all models.
 
 ```python
-fig = rk.create_decision_curve(
-    probs=probs_dict,
-    reals=reals,
+# Generate sample data for three models
+probs_comparison = {
+    "Good Model": np.random.rand(100) + 0.1,  # Slightly better
+    "Bad Model": np.random.rand(100),
+    "Random Guess": np.linspace(0, 1, 100)
+}
+reals_comparison = {"Population": np.random.randint(0, 2, 100)}
+
+
+# Create a precision-recall curve to compare the models
+fig = rk.create_precision_recall_curve(
+    probs=probs_comparison,
+    reals=reals_comparison,
 )
+
+fig.show()
 ```
 
-## 4. Show the Plot
+### Use Case 3: Several Populations
 
-Finally, let's display the plot. Since `rtichoke` uses Plotly under the hood, you can show the figure just like any other Plotly object.
+This is useful when you want to evaluate a single model's performance across different populations. A common example is comparing performance on a training set versus a testing set to check for overfitting.
+
+For this, you provide `probs` with an entry for each population and `reals` with a corresponding entry for each population's outcomes.
 
 ```python
-# To display the plot in an interactive environment (like a Jupyter notebook)
+# Generate sample data for train and test sets
+probs_train = np.random.rand(100)
+reals_train = (probs_train > 0.5).astype(int)
+
+probs_test = np.random.rand(80)
+reals_test = (probs_test > 0.4).astype(int) # A slightly different relationship
+
+probs_populations = {
+    "Train": probs_train,
+    "Test": probs_test
+}
+reals_populations = {
+    "Train": reals_train,
+    "Test": reals_test
+}
+
+# Create a calibration curve to compare the model's performance
+# on the two populations.
+fig = rk.create_calibration_curve(
+    probs=probs_populations,
+    reals=reals_populations,
+)
+
 fig.show()
 ```
 
-And that's it! You've created your first decision curve with `rtichoke`. From here, you can explore the other curve types and options that the library has to offer.
+And that's it! You've now seen how to create three of the most common evaluation plots with `rtichoke`. From here, you can explore the other curve types and options that the library has to offer in the [API Reference](../reference/index.qmd).

From 5818f61634f970764454d635b43accdf09540b78 Mon Sep 17 00:00:00 2001
From: "google-labs-jules[bot]"
 <161369871+google-labs-jules[bot]@users.noreply.github.com>
Date: Wed, 24 Dec 2025 17:22:24 +0000
Subject: [PATCH 2/5] docs: Update README and "Getting Started" tutorial

Updates the project's documentation to be more user-friendly for new users.

- The main `README.md` has been updated with installation instructions, a clearer "Getting Started" section, and links to the blog and official documentation.
- The "Getting Started" tutorial (`docs/tutorials/getting_started.qmd`) has been restructured to clearly explain and provide examples for the three main use cases: single model evaluation, model comparison, and population comparison.
- All code examples in both the README and the tutorial now use more realistic and intuitive sample data where model predictions are clearly correlated with outcomes, making the visualizations more meaningful.
---
 README.md                          | 19 +++++++--
 docs/tutorials/getting_started.qmd | 65 +++++++++++++++++++++---------
 2 files changed, 63 insertions(+), 21 deletions(-)

diff --git a/README.md b/README.md
index 18e1643..efb8d93 100644
--- a/README.md
+++ b/README.md
@@ -30,9 +30,22 @@ Here's a quick example of how to create a ROC curve for a single model:
 import numpy as np
 import rtichoke as rk
 
-# Sample data
-probs = {'My Model': np.random.rand(100)}
-reals = {'My Population': np.random.randint(0, 2, 100)}
+# For reproducibility
+np.random.seed(42)
+
+# Generate more realistic sample data for a "good" model
+# Probabilities for the positive class are generally higher
+probs_positive_class = np.random.rand(50) * 0.5 + 0.5  # High probabilities (0.5 to 1.0)
+probs_negative_class = np.random.rand(50) * 0.5       # Low probabilities (0.0 to 0.5)
+
+# Combine and shuffle the data
+probs_combined = np.concatenate([probs_positive_class, probs_negative_class])
+reals_combined = np.concatenate([np.ones(50), np.zeros(50)])
+
+shuffle_index = np.random.permutation(100)
+probs = {'My Model': probs_combined[shuffle_index]}
+reals = {'My Population': reals_combined[shuffle_index]}
+
 
 # Create the ROC curve
 fig = rk.create_roc_curve(
diff --git a/docs/tutorials/getting_started.qmd b/docs/tutorials/getting_started.qmd
index 5e64806..9e7cefe 100644
--- a/docs/tutorials/getting_started.qmd
+++ b/docs/tutorials/getting_started.qmd
@@ -32,9 +32,15 @@ This is the simplest case, where you want to evaluate the performance of a singl
 For this, you provide `probs` with a single entry for your model and `reals` with a single entry for the corresponding outcomes.
 
 ```python
-# Generate sample data for one model
-probs_single = {"Good Model": np.random.rand(100)}
-reals_single = {"Population": np.random.randint(0, 2, 100)}
+# Generate realistic sample data for a "good" model
+probs_positive_class = np.random.rand(50) * 0.5 + 0.5
+probs_negative_class = np.random.rand(50) * 0.5
+probs_combined = np.concatenate([probs_positive_class, probs_negative_class])
+reals_combined = np.concatenate([np.ones(50), np.zeros(50)])
+shuffle_index = np.random.permutation(100)
+
+probs_single = {"Good Model": probs_combined[shuffle_index]}
+reals_single = {"Population": reals_combined[shuffle_index]}
 
 # Create a ROC curve
 fig = rk.create_roc_curve(
@@ -54,13 +60,26 @@ Often, you want to compare the performance of several different models on the *s
 For this, you provide `probs` with an entry for each model you want to compare. `reals` will still have a single entry, since the outcome data is the same for all models.
 
 ```python
-# Generate sample data for three models
+# Generate data for a "Good Model", a "Bad Model", and a "Random Guess"
+# The "Good Model" has a clearer separation of probabilities.
+good_probs_pos = np.random.rand(50) * 0.4 + 0.6  # 0.6 to 1.0
+good_probs_neg = np.random.rand(50) * 0.4       # 0.0 to 0.4
+good_probs = np.concatenate([good_probs_pos, good_probs_neg])
+
+# The "Bad Model" has more overlap.
+bad_probs_pos = np.random.rand(50) * 0.5 + 0.4  # 0.4 to 0.9
+bad_probs_neg = np.random.rand(50) * 0.5 + 0.1  # 0.1 to 0.6
+bad_probs = np.concatenate([bad_probs_pos, bad_probs_neg])
+
+reals_comparison_data = np.concatenate([np.ones(50), np.zeros(50)])
+shuffle_index_comp = np.random.permutation(100)
+
 probs_comparison = {
-    "Good Model": np.random.rand(100) + 0.1,  # Slightly better
-    "Bad Model": np.random.rand(100),
-    "Random Guess": np.linspace(0, 1, 100)
+    "Good Model": good_probs[shuffle_index_comp],
+    "Bad Model": bad_probs[shuffle_index_comp],
+    "Random Guess": np.random.rand(100)
 }
-reals_comparison = {"Population": np.random.randint(0, 2, 100)}
+reals_comparison = {"Population": reals_comparison_data[shuffle_index_comp]}
 
 
 # Create a precision-recall curve to compare the models
@@ -79,20 +98,30 @@ This is useful when you want to evaluate a single model's performance across dif
 For this, you provide `probs` with an entry for each population and `reals` with a corresponding entry for each population's outcomes.
 
 ```python
-# Generate sample data for train and test sets
-probs_train = np.random.rand(100)
-reals_train = (probs_train > 0.5).astype(int)
-
-probs_test = np.random.rand(80)
-reals_test = (probs_test > 0.4).astype(int) # A slightly different relationship
+# Generate sample data for a train and test set.
+# Let's assume the model is slightly overfit, performing better on the train set.
+
+# Train set: clear separation
+train_probs_pos = np.random.rand(50) * 0.4 + 0.6
+train_probs_neg = np.random.rand(50) * 0.4
+train_probs = np.concatenate([train_probs_pos, train_probs_neg])
+train_reals = np.concatenate([np.ones(50), np.zeros(50)])
+train_shuffle = np.random.permutation(100)
+
+# Test set: more overlap
+test_probs_pos = np.random.rand(40) * 0.5 + 0.4
+test_probs_neg = np.random.rand(40) * 0.5 + 0.1
+test_probs = np.concatenate([test_probs_pos, test_probs_neg])
+test_reals = np.concatenate([np.ones(40), np.zeros(40)])
+test_shuffle = np.random.permutation(80)
 
 probs_populations = {
-    "Train": probs_train,
-    "Test": probs_test
+    "Train": train_probs[train_shuffle],
+    "Test": test_probs[test_shuffle]
 }
 reals_populations = {
-    "Train": reals_train,
-    "Test": reals_test
+    "Train": train_reals[train_shuffle],
+    "Test": test_reals[test_shuffle]
 }
 
 # Create a calibration curve to compare the model's performance

From 6131ae10b8478c02cf1ed63f8cfa01d1c5da642c Mon Sep 17 00:00:00 2001
From: "google-labs-jules[bot]"
 <161369871+google-labs-jules[bot]@users.noreply.github.com>
Date: Wed, 24 Dec 2025 17:24:28 +0000
Subject: [PATCH 3/5] docs: Improve realism of code examples

Refines the code examples in the README and "Getting Started" tutorial to use more realistic and intuitive sample data. This addresses feedback that the previous "good model" was indistinguishable from a random one. The new examples now clearly demonstrate a model with predictive power.

From 89135860ccbfcef33fdfb65e61ba3bccc429004a Mon Sep 17 00:00:00 2001
From: "google-labs-jules[bot]"
 <161369871+google-labs-jules[bot]@users.noreply.github.com>
Date: Wed, 24 Dec 2025 17:33:30 +0000
Subject: [PATCH 4/5] docs: Remove jargon from code examples

Removes subjective jargon like "good" and "bad" from the model names in the documentation. The examples now use neutral, descriptive names like "Model A" and "Model B" for clarity and professionalism.
---
 README.md                          |  6 +++---
 docs/tutorials/getting_started.qmd | 26 +++++++++++++-------------
 2 files changed, 16 insertions(+), 16 deletions(-)

diff --git a/README.md b/README.md
index efb8d93..8c3798b 100644
--- a/README.md
+++ b/README.md
@@ -33,7 +33,7 @@ import rtichoke as rk
 # For reproducibility
 np.random.seed(42)
 
-# Generate more realistic sample data for a "good" model
+# Generate more realistic sample data for a model
 # Probabilities for the positive class are generally higher
 probs_positive_class = np.random.rand(50) * 0.5 + 0.5  # High probabilities (0.5 to 1.0)
 probs_negative_class = np.random.rand(50) * 0.5       # Low probabilities (0.0 to 0.5)
@@ -43,8 +43,8 @@ probs_combined = np.concatenate([probs_positive_class, probs_negative_class])
 reals_combined = np.concatenate([np.ones(50), np.zeros(50)])
 
 shuffle_index = np.random.permutation(100)
-probs = {'My Model': probs_combined[shuffle_index]}
-reals = {'My Population': reals_combined[shuffle_index]}
+probs = {'Model A': probs_combined[shuffle_index]}
+reals = {'Population': reals_combined[shuffle_index]}
 
 
 # Create the ROC curve
diff --git a/docs/tutorials/getting_started.qmd b/docs/tutorials/getting_started.qmd
index 9e7cefe..9027dc6 100644
--- a/docs/tutorials/getting_started.qmd
+++ b/docs/tutorials/getting_started.qmd
@@ -32,14 +32,14 @@ This is the simplest case, where you want to evaluate the performance of a singl
 For this, you provide `probs` with a single entry for your model and `reals` with a single entry for the corresponding outcomes.
 
 ```python
-# Generate realistic sample data for a "good" model
+# Generate realistic sample data for a model
 probs_positive_class = np.random.rand(50) * 0.5 + 0.5
 probs_negative_class = np.random.rand(50) * 0.5
 probs_combined = np.concatenate([probs_positive_class, probs_negative_class])
 reals_combined = np.concatenate([np.ones(50), np.zeros(50)])
 shuffle_index = np.random.permutation(100)
 
-probs_single = {"Good Model": probs_combined[shuffle_index]}
+probs_single = {"Model A": probs_combined[shuffle_index]}
 reals_single = {"Population": reals_combined[shuffle_index]}
 
 # Create a ROC curve
@@ -60,23 +60,23 @@ Often, you want to compare the performance of several different models on the *s
 For this, you provide `probs` with an entry for each model you want to compare. `reals` will still have a single entry, since the outcome data is the same for all models.
 
 ```python
-# Generate data for a "Good Model", a "Bad Model", and a "Random Guess"
-# The "Good Model" has a clearer separation of probabilities.
-good_probs_pos = np.random.rand(50) * 0.4 + 0.6  # 0.6 to 1.0
-good_probs_neg = np.random.rand(50) * 0.4       # 0.0 to 0.4
-good_probs = np.concatenate([good_probs_pos, good_probs_neg])
+# Generate data for two different models to compare.
+# Model A has a clearer separation of probabilities.
+model_a_probs_pos = np.random.rand(50) * 0.4 + 0.6  # 0.6 to 1.0
+model_a_probs_neg = np.random.rand(50) * 0.4       # 0.0 to 0.4
+model_a_probs = np.concatenate([model_a_probs_pos, model_a_probs_neg])
 
-# The "Bad Model" has more overlap.
-bad_probs_pos = np.random.rand(50) * 0.5 + 0.4  # 0.4 to 0.9
-bad_probs_neg = np.random.rand(50) * 0.5 + 0.1  # 0.1 to 0.6
-bad_probs = np.concatenate([bad_probs_pos, bad_probs_neg])
+# Model B has more overlap.
+model_b_probs_pos = np.random.rand(50) * 0.5 + 0.4  # 0.4 to 0.9
+model_b_probs_neg = np.random.rand(50) * 0.5 + 0.1  # 0.1 to 0.6
+model_b_probs = np.concatenate([model_b_probs_pos, model_b_probs_neg])
 
 reals_comparison_data = np.concatenate([np.ones(50), np.zeros(50)])
 shuffle_index_comp = np.random.permutation(100)
 
 probs_comparison = {
-    "Good Model": good_probs[shuffle_index_comp],
-    "Bad Model": bad_probs[shuffle_index_comp],
+    "Model A": model_a_probs[shuffle_index_comp],
+    "Model B": model_b_probs[shuffle_index_comp],
     "Random Guess": np.random.rand(100)
 }
 reals_comparison = {"Population": reals_comparison_data[shuffle_index_comp]}

From 476b565e5676d2e42697fa1a59d70497cf223cd9 Mon Sep 17 00:00:00 2001
From: "google-labs-jules[bot]"
 <161369871+google-labs-jules[bot]@users.noreply.github.com>
Date: Wed, 24 Dec 2025 17:39:27 +0000
Subject: [PATCH 5/5] docs: Use deterministic data in examples

Replaces all calls to `np.random.rand` in the documentation with small, hardcoded datasets. This makes all code examples fully reproducible, deterministic, and easier for new users to understand at a glance.
---
 README.md                          | 19 ++-------
 docs/tutorials/getting_started.qmd | 66 +++++++-----------------------
 2 files changed, 19 insertions(+), 66 deletions(-)

diff --git a/README.md b/README.md
index 8c3798b..8582ab8 100644
--- a/README.md
+++ b/README.md
@@ -30,21 +30,10 @@ Here's a quick example of how to create a ROC curve for a single model:
 import numpy as np
 import rtichoke as rk
 
-# For reproducibility
-np.random.seed(42)
-
-# Generate more realistic sample data for a model
-# Probabilities for the positive class are generally higher
-probs_positive_class = np.random.rand(50) * 0.5 + 0.5  # High probabilities (0.5 to 1.0)
-probs_negative_class = np.random.rand(50) * 0.5       # Low probabilities (0.0 to 0.5)
-
-# Combine and shuffle the data
-probs_combined = np.concatenate([probs_positive_class, probs_negative_class])
-reals_combined = np.concatenate([np.ones(50), np.zeros(50)])
-
-shuffle_index = np.random.permutation(100)
-probs = {'Model A': probs_combined[shuffle_index]}
-reals = {'Population': reals_combined[shuffle_index]}
+# Sample data for a model. Note that the probabilities for the
+# positive class (1) are generally higher than for the negative class (0).
+probs = {'Model A': np.array([0.1, 0.9, 0.4, 0.8, 0.3, 0.7, 0.2, 0.6])}
+reals = {'Population': np.array([0, 1, 0, 1, 0, 1, 0, 1])}
 
 
 # Create the ROC curve
diff --git a/docs/tutorials/getting_started.qmd b/docs/tutorials/getting_started.qmd
index 9027dc6..fe26b0d 100644
--- a/docs/tutorials/getting_started.qmd
+++ b/docs/tutorials/getting_started.qmd
@@ -11,9 +11,6 @@ First, let's import the necessary libraries. We'll need `numpy` for data manipul
 ```python
 import numpy as np
 import rtichoke as rk
-
-# For reproducibility
-np.random.seed(42)
 ```
 
 ## 2. Understanding the Inputs
@@ -32,15 +29,10 @@ This is the simplest case, where you want to evaluate the performance of a singl
 For this, you provide `probs` with a single entry for your model and `reals` with a single entry for the corresponding outcomes.
 
 ```python
-# Generate realistic sample data for a model
-probs_positive_class = np.random.rand(50) * 0.5 + 0.5
-probs_negative_class = np.random.rand(50) * 0.5
-probs_combined = np.concatenate([probs_positive_class, probs_negative_class])
-reals_combined = np.concatenate([np.ones(50), np.zeros(50)])
-shuffle_index = np.random.permutation(100)
-
-probs_single = {"Model A": probs_combined[shuffle_index]}
-reals_single = {"Population": reals_combined[shuffle_index]}
+# Sample data for a model. Note that the probabilities for the
+# positive class (1) are generally higher than for the negative class (0).
+probs_single = {"Model A": np.array([0.1, 0.9, 0.4, 0.8, 0.3, 0.7, 0.2, 0.6])}
+reals_single = {"Population": np.array([0, 1, 0, 1, 0, 1, 0, 1])}
 
 # Create a ROC curve
 fig = rk.create_roc_curve(
@@ -60,26 +52,13 @@ Often, you want to compare the performance of several different models on the *s
 For this, you provide `probs` with an entry for each model you want to compare. `reals` will still have a single entry, since the outcome data is the same for all models.
 
 ```python
-# Generate data for two different models to compare.
-# Model A has a clearer separation of probabilities.
-model_a_probs_pos = np.random.rand(50) * 0.4 + 0.6  # 0.6 to 1.0
-model_a_probs_neg = np.random.rand(50) * 0.4       # 0.0 to 0.4
-model_a_probs = np.concatenate([model_a_probs_pos, model_a_probs_neg])
-
-# Model B has more overlap.
-model_b_probs_pos = np.random.rand(50) * 0.5 + 0.4  # 0.4 to 0.9
-model_b_probs_neg = np.random.rand(50) * 0.5 + 0.1  # 0.1 to 0.6
-model_b_probs = np.concatenate([model_b_probs_pos, model_b_probs_neg])
-
-reals_comparison_data = np.concatenate([np.ones(50), np.zeros(50)])
-shuffle_index_comp = np.random.permutation(100)
-
+# Sample data for two models. Model A is better at separating the classes.
 probs_comparison = {
-    "Model A": model_a_probs[shuffle_index_comp],
-    "Model B": model_b_probs[shuffle_index_comp],
-    "Random Guess": np.random.rand(100)
+    "Model A": np.array([0.1, 0.9, 0.2, 0.8, 0.3, 0.7]),
+    "Model B": np.array([0.2, 0.8, 0.3, 0.7, 0.4, 0.6]),
+    "Random Guess": np.array([0.5, 0.5, 0.5, 0.5, 0.5, 0.5])
 }
-reals_comparison = {"Population": reals_comparison_data[shuffle_index_comp]}
+reals_comparison = {"Population": np.array([0, 1, 0, 1, 0, 1])}
 
 
 # Create a precision-recall curve to compare the models
@@ -98,30 +77,15 @@ This is useful when you want to evaluate a single model's performance across dif
 For this, you provide `probs` with an entry for each population and `reals` with a corresponding entry for each population's outcomes.
 
 ```python
-# Generate sample data for a train and test set.
-# Let's assume the model is slightly overfit, performing better on the train set.
-
-# Train set: clear separation
-train_probs_pos = np.random.rand(50) * 0.4 + 0.6
-train_probs_neg = np.random.rand(50) * 0.4
-train_probs = np.concatenate([train_probs_pos, train_probs_neg])
-train_reals = np.concatenate([np.ones(50), np.zeros(50)])
-train_shuffle = np.random.permutation(100)
-
-# Test set: more overlap
-test_probs_pos = np.random.rand(40) * 0.5 + 0.4
-test_probs_neg = np.random.rand(40) * 0.5 + 0.1
-test_probs = np.concatenate([test_probs_pos, test_probs_neg])
-test_reals = np.concatenate([np.ones(40), np.zeros(40)])
-test_shuffle = np.random.permutation(80)
-
+# Sample data for a train and test set.
+# The model performs slightly better on the train set.
 probs_populations = {
-    "Train": train_probs[train_shuffle],
-    "Test": test_probs[test_shuffle]
+    "Train": np.array([0.1, 0.9, 0.2, 0.8, 0.3, 0.7]),
+    "Test":  np.array([0.2, 0.8, 0.3, 0.7, 0.4, 0.6])
 }
 reals_populations = {
-    "Train": train_reals[train_shuffle],
-    "Test": test_reals[test_shuffle]
+    "Train": np.array([0, 1, 0, 1, 0, 1]),
+    "Test":  np.array([0, 1, 0, 1, 0, 0]) # Note one outcome is different
 }
 
 # Create a calibration curve to compare the model's performance