TpT #80

MathiasValla · 2025-12-16T16:25:43Z

Hi Simon,

I’m happy to share that the TpT implementation is now finalized code-wise on my side. This MR introduces a working TpTDecisionTreeClassifier with a depth-first builder, compatible with scikit-lexicographical-trees / scikit-longitudinal.

References to cite

[1] Valla, M. Time-penalised trees (TpT): introducing a new tree-based data mining algorithm for time-varying covariates. Ann Math Artif Intell 92, 1609–1661 (2024). https://doi.org/10.1007/s10472-024-09950-w

[2] Mathias Valla, Xavier Milhaud. Time-penalized trees: consistency results and simulations. 2025. ⟨hal-05022929⟩ https://cnrs.hal.science/hal-05022929

Scope of this PR (minimal, functional)

Estimator: TpTDecisionTreeClassifier (classification only in this PR’s target scope).
Builder: DepthFirstTreeBuilder (no Best-First in this scope).
Criterion: Gini (others left out intentionally for the minimal version).
I/O and structure: Cython implementation aligned with the existing tree stack.
(No plot_tree adaptation included here; regular sklearn.tree.plot_tree is usable for quick inspection.)

Note: I also have a working regression path and more features locally, but to keep this PR focused and easy to land, I’m proposing only the minimal, agreed-upon surface.

Where the code lives

TpT estimator code: scikit_longitudinal/estimators/trees/TpT/
(primary class: TpTDecisionTreeClassifier, splitter, builder, structs)
Long→Wide converter (temporary): scikit_longitudinal/estimators/trees/TpT/_preprocessing.py
This should be moved under or near LongitudinalDataset per your design. I’d be grateful if you could drop it into the right place in a temporary branch; I’ll review once moved. It currently handles long → wide only (not the reverse), without TIDAL/polars. It’s a first step toward 💡 Feature Request - From Wide to Long and vice-versa Longitudinal data formatting, inspired from TIDAL #64 but does not fully solve it.

Quick example (concise)

import pandas as pd
from sklearn import tree
from scikit_longitudinal.estimators.trees import TpTDecisionTreeClassifier

# Example dataset from tests (long format):
# columns include: ["id", "time_point", "duration", "target", ... features ...]
df = pd.read_csv("path/to/stroke.csv", sep=";")

X = df.drop(columns=["target", "cholesterol"])  # minimal cleanup for this file
y = df["target"].astype(int)

clf = TpTDecisionTreeClassifier(
    gamma=0.1,              # time penalty (λ)
    criterion="gini",
    id_col="id",
    time_col="time_point",
    duration_col="duration",
    assume_long_format=True,
    time_step=1,
    max_horizon=1000,
    min_samples_split=2,
    max_depth=1000,
    random_state=42,
).fit(X, y)

# Optional: quick textual sanity checks
print("max_depth:", clf.tree_.max_depth)
print("node_count:", clf.tree_.node_count)
print("split_time_index[:10]:", clf.tree_.split_time_index[:10])

# Optional: quick visual check (works for fast inspection)
feature_names = getattr(clf, "_wide_feature_names_", None)
_ = tree.plot_tree(
    clf, filled=True, fontsize=10,
    feature_names=feature_names, tpt_time_scale=0.5
)

That’s intentionally minimal (no non-essential utilities, no external metrics). It just loads data, fits TpT, prints a few structural fields, and plots.

Dependencies

I temporarily removed ray from the dependencies due to local issues.
Please feel free to restore it where appropriate (e.g., as an optional dependency / extra for parallelism or tests).

Ask / next steps

I’ve pushed this as far as I can right now. It would be ideal if you could manage the integration so TpT aligns perfectly with sklong’s patterns (API surface, dataset plumbing via LongitudinalDataset, docs structure, examples, CI, etc.). I’ll follow up with any fixes you need during review.

I’m also working on two additional papers involving TpT and will of course cite scikit-longitudinal as the reference implementation. For any future article specifically about the implementation, I’d be happy to include you as co-author for your guidance and help.

Thanks a lot, and please let me know how you’d like to proceed with the preprocessing relocation!

— Mathias

Introduced temporal-penalized splitting (TpT) for regression tasks. Added handling of features_group parameter to align with classifier implementation. Ensured regressor leverages the new TpTSplitter from scikit-lexicographical-trees. Updated initialization and validation logic for compatibility with Phase 1b changes (wave index / split_time_index propagation).

Added support for features_group parameter in TpTDecisionTreeClassifier to enable temporal penalization and lexicographic splitting. Updated fit method to enforce presence of features_group and ensure consistent initialization. Integrated calls to the updated TpTSplitter from scikit-lexicographical-trees. Improved docstring with guidance for preparing longitudinal input data.

- Removed specific version constraints for `scikit-lexicographical-trees` and `ray` in `pyproject.toml`. -> Maybe Simon can find a way to bypass that ? - Added local path for `scikit-lexicographical-trees` in the `tool.uv.sources` section. -> I do not know if we should keep it this way ? - Updated `uv.lock` with new versions for several packages, including `anyio`, `argon2-cffi`, `arrow`, `astroid`, `typing-extensions`, `tzdata`, `urllib3`, `wcwidth`, `websocket-client`, `widgetsnbextension`, and `zipp`. - Refactored `TpT` tree estimators to improve code organization and clarity, including the addition of a public namespace for `TpT` estimators and enhancements to the `TpTDecisionTreeClassifier` and `TpTDecisionTreeRegressor` classes for better handling of long-format data. TO DO: Clean everything (get rid of all tests datasets)

simonprovost · 2026-01-14T10:56:44Z

@MathiasValla Your work is amazing! Thanks so very much! Happy new year as well!

I will be busy in the coming weeks / months but as always as soon as I can I'll work on making this PR ready — Some news incoming within the year for Sklong (positive!); and TpT will definitely be useful :)

Wishing you well,

Cheers

MathiasValla · 2026-01-14T11:35:16Z

@simonprovost Thanks, bets wishes to you too !

Great to hear.
I only have one constraint: with your blessing, I'd like to submit the TpT code in a journal in the first half of the year (at most !), so I guess it's better to wait your integration of TpT within the existing framework. If you agree (and if interested), I could prepare a draft (including you as co-author for all your help on that integration !) in that direction.

In the meantime, I still need to finalize 2 articles on TpT, where I'll refer to sklong, of course. I'm also working on an animation explaining how TpT works (animated with manim), it could be usefull to include it on the scikit-longitudinal website (if you also think it's a good idea).

Cheers,
Mathias

simonprovost · 2026-01-14T11:44:32Z

@simonprovost Thanks, bets wishes to you too !

Great to hear. I only have one constraint: with your blessing, I'd like to submit the TpT code in a journal in the first half of the year (at most !), so I guess it's better to wait your integration of TpT within the existing framework. If you agree (and if interested), I could prepare a draft (including you as co-author for all your help on that integration !) in that direction.

In the meantime, I still need to finalize 2 articles on TpT, where I'll refer to sklong, of course. I'm also working on an animation explaining how TpT works (animated with manim), it could be usefull to include it on the scikit-longitudinal website (if you also think it's a good idea).

Cheers, Mathias

You are working so efficiently @MathiasValla ! Kind of jealous 🙏

That's fantastic to hear, and thank you for your kind gesture, mate! I am very interested! I would be pleased, as I have great trust in your research. If you want to continue the discussion by email regarding adding me as the (last) co-author of your manuscript, please do so 🫡.

I suppose I'll be more transparent so you're aware. In two weeks, I'll begin a Longitudinal ML-based research visit at the University of Edinburgh, and in the meantime, I'm applying here and there arounds for Post-Docs and trying to finish writing my Ph.D. thesis, so the faster I get, the more time I'll have to work on TpT integration.

This being said, given your journal wishes, I believe I could spend time on TpT in a month or so. Perhaps set yourself a reminder (I wish there was bots agents in Github Issues 👀 that could chase meself) and chase back, saying, "Hey Simon, you are late:)"? My head tends to be everywhere at once...

This is a fantastic idea the TpT animation. I had in mind last year to create a visualisation for each of Sklong's primitives at the top of their respective API reference page so that users could grasp them easily. Definitely include it, or give it to me and I'll include it in the API ref of TpT; it's a modern documentation notion that I plan (during my Post-Doc) to extend to others so be the first one with TpT :) PS: I adore Manim! Look here: Automated machine learning for longitudinal classification — AIDA (UKC) @ https://simonprovostdev.vercel.app/talks.

Let us keep each other transparently updated

Cheers,

MathiasValla added 11 commits August 1, 2025 15:13

Update __init__.py

c738ba1

Add files via upload

6cb2714

Update TpT_decision_tree.py

e79e2a4

Update TpT_decision_tree_regressor.py

ebb60fa

Update __init__.py

8d4a492

Merge branch 'simonprovost:main' into TpT

e297d88

Added contributors

4597e9b

Add tpt_time_scale parameter to tree.plot_tree

dea07b6

MathiasValla mentioned this pull request Dec 17, 2025

TpT simonprovost/scikit-lexicographical-trees#1

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TpT #80

TpT #80

Uh oh!

MathiasValla commented Dec 16, 2025

Uh oh!

simonprovost commented Jan 14, 2026

Uh oh!

MathiasValla commented Jan 14, 2026

Uh oh!

simonprovost commented Jan 14, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

TpT #80

Are you sure you want to change the base?

TpT #80

Uh oh!

Conversation

MathiasValla commented Dec 16, 2025

References to cite

Scope of this PR (minimal, functional)

Where the code lives

Quick example (concise)

Dependencies

Ask / next steps

Uh oh!

simonprovost commented Jan 14, 2026

Uh oh!

MathiasValla commented Jan 14, 2026

Uh oh!

simonprovost commented Jan 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

simonprovost commented Jan 14, 2026 •

edited

Loading