SHAP Interpretability method implementation #611

naveenkcb · 2025-11-15T04:32:19Z

Contributor: Naveen Baskaran

Contribution Type: Interpretability method, Tests, Example

Description

This PR implements the SHAP (SHapley Additive exPlanations) interpretability method for PyHealth models, enabling users to understand which features contribute most to model predictions. SHAP is based on coalitional game theory and provides theoretically grounded feature importance scores with desirable properties like local accuracy, missingness, and consistency.

Files to Review

pyhealth/interpret/methods/init.py
pyhealth/interpret/methods/shap.py - Core SHAP method implementation. Suports embedding based attribution, continuous feature support
pyhealth/processors/tensor_processor.py - minor fix to resolve warning message
examples/shap_stagenet_mimic4.py - Example script showing the usage of SHAP method
tests/core/test_shap.py - added comprehensive test cases to test the main class, utility methods and attribution methods.

Results on mimic4-demo dataset

jhnwu3 · 2025-11-17T01:21:51Z

examples/shap_stagenet_mimic4.py

+# Initialize SHAP explainer with custom parameters
+shap_explainer = ShapExplainer(
+    model,
+    use_embeddings=True,  # Use embeddings for discrete features


I think SHAP should be compatible with discrete tokens like ICD codes here? Correct me if I'm wrong. Will look deeper into understanding the full implementation of SHAP here later when I'm more congitively sound.

I updated the SHAPExplainer class instance creation to pass the just the model and default all other values including "use_embeddings" inside the init method. Yes the SHAP works for ICD codes but will use the embeddings from the input model.

jhnwu3

Some other nice to haves:

Can we add an entry to docs/api/interpret/shap.rst ? And add its entry in the interpretability index interpret.rst?
Can we check that this is compatible when the device is on GPU? Maybe, through a colab notebook? (There's a way to install the branch/repo to the colab environment)
I might be able to share some compute resources soon once NCSA gets back to me.

jhnwu3 · 2025-11-17T22:39:23Z

pyhealth/interpret/methods/shap.py

+        if coalition_size == 0 or coalition_size == n_features:
+            return torch.tensor(1000.0)  # Large weight for edge cases
+
+        comb_val = math.comb(n_features - 1, coalition_size - 1)


Wait, isn't it binom(M, |z|) here? Why do we take n_features -1 and coalition-size -1 instead of n_features and coalition_size?

I updated the code to match the equation used in the original SHAP paper in the method _compute_kernel_weight

weight = (M - 1) / (binom(M, |z|) * |z| * (M - |z|))

I also added the .rst file as requested.

Added "examples/shap_stagenet_mimic4.ipynb" using colab with GPU

jhnwu3 · 2025-11-18T02:05:53Z

pyhealth/interpret/methods/shap.py

+        coalition_vectors = []
+        coalition_weights = []
+        coalition_preds = []
+


Can you check if we don't need to add the edge case coalitions specifically (full features and no features) in the prediction set for training the kernel/linear model for predicting shapley values here?

I've linked some captum code examples here:

https://github.com/meta-pytorch/captum/blob/master/captum/attr/_core/kernel_shap.pyhttps://github.com/meta-pytorch/captum/blob/master/captum/attr/_core/lime.py

handled the edge cases and updated the code accordingly in the method _compute_kernel_shap

jhnwu3 · 2025-12-05T23:25:33Z

pyhealth/interpret/methods/shap.py

+            )
+
+        # Sample remaining coalitions randomly (excluding edge cases already added)
+        n_random_coalitions = max(0, n_coalitions - 2)


Hey one last request, as I've been digging deeper into the official shap implementation of kernel regression. They of course do it in numpy, which is mostly so people can interpret random forests and XGBoost.

But, I think we can adopt some of the nice tricks in coalition sampling here:

https://github.com/shap/shap/blob/master/shap/explainers/_kernel.py
Specifically, it seems they do some type of compliment sampling to optimize how much coverage of samples we can sample at a time.

https://github.com/shap/shap/blob/ace49bf463a802f18725a869a969c060a192e3f8/shap/explainers/_kernel.py#L480

Let me know if you need any help with this. It does look a little complicated, but everything else looks good to me.

@jhnwu3 based on our discord convo, I am keeping the torch implementation asis to support for StageNet. I did update for the create dataset portions in the unit test and example script. Hope this helps. Please advise if futher changes may be required.

jhnwu3

Just one last doc change, and then lgtm! Will more robustly test and decide if we need revisions as we interpret the model/look into it deeper. (i.e we might need to compare our implementation with another given some workarounds).

docs/api/interpret/pyhealth.interpret.methods.shap.rst

naveenkcb · 2025-12-24T20:52:46Z

@jhnwu3 I added the comment to interpret.rst file and pushed my changes.

naveenkcb and others added 21 commits October 31, 2025 20:26

Initial attempt

9c297e2

vocab size correction

198e320

fix for capitalized code references

282d6e4

updated more capitalized

3fcc157

saving ddi score

bb3da26

updated example for micron model

eb492e4

fixed size determination

e79c809

Merge branch 'sunlabuiuc:master' into master

697a050

Merge branch 'sunlabuiuc:master' into master

23153f5

Merge branch 'sunlabuiuc:master' into master

de7ee7b

Merge branch 'sunlabuiuc:master' into master

970a055

Merge branch 'sunlabuiuc:master' into master

aa10098

SHAP implementation

e5755d8

Merge branch 'master' of https://github.com/naveenkcb/PyHealth

6eefb43

added SHAP test and example files

20d0a90

Merge branch 'sunlabuiuc:master' into master

cfe7e6d

added example file

2dc4d83

Merge branch 'sunlabuiuc:master' into master

9d77ce5

shap implementation

2f3c18f

removed ipynb file

71c4160

update

1d784ec

jhnwu3 requested changes Nov 17, 2025

View reviewed changes

jhnwu3 requested changes Nov 18, 2025

View reviewed changes

naveenkcb and others added 5 commits November 23, 2025 20:08

address PR comments

402c39f

added example notebook

5d9e9e3

Merge branch 'master' into master

47e5973

fixed interpret/__init__

80b3caa

fix for failed CI test

0645efd

jhnwu3 requested changes Dec 5, 2025

View reviewed changes

Merge branch 'sunlabuiuc:master' into master

3806739

aligned for memory optimization

d34f694

jhnwu3 requested changes Dec 24, 2025

View reviewed changes

docs/api/interpret/pyhealth.interpret.methods.shap.rst Show resolved Hide resolved

naveenkcb and others added 2 commits December 24, 2025 14:16

Merge branch 'sunlabuiuc:master' into master

a0ffe6c

updated api doc

006e205

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

SHAP Interpretability method implementation #611

SHAP Interpretability method implementation #611

Uh oh!

naveenkcb commented Nov 15, 2025

Uh oh!

jhnwu3 Nov 17, 2025

Uh oh!

naveenkcb Nov 17, 2025 •

edited

Loading

Uh oh!

jhnwu3 left a comment

Uh oh!

jhnwu3 Nov 17, 2025

Uh oh!

naveenkcb Nov 24, 2025

Uh oh!

jhnwu3 Nov 18, 2025

Uh oh!

naveenkcb Nov 24, 2025

Uh oh!

jhnwu3 Dec 5, 2025

Uh oh!

naveenkcb Dec 23, 2025

Uh oh!

jhnwu3 left a comment

Uh oh!

Uh oh!

naveenkcb commented Dec 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

SHAP Interpretability method implementation #611

Are you sure you want to change the base?

SHAP Interpretability method implementation #611

Uh oh!

Conversation

naveenkcb commented Nov 15, 2025

Description

Files to Review

Results on mimic4-demo dataset

Uh oh!

jhnwu3 Nov 17, 2025

Choose a reason for hiding this comment

Uh oh!

naveenkcb Nov 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jhnwu3 left a comment

Choose a reason for hiding this comment

Uh oh!

jhnwu3 Nov 17, 2025

Choose a reason for hiding this comment

Uh oh!

naveenkcb Nov 24, 2025

Choose a reason for hiding this comment

Uh oh!

jhnwu3 Nov 18, 2025

Choose a reason for hiding this comment

Uh oh!

naveenkcb Nov 24, 2025

Choose a reason for hiding this comment

Uh oh!

jhnwu3 Dec 5, 2025

Choose a reason for hiding this comment

Uh oh!

naveenkcb Dec 23, 2025

Choose a reason for hiding this comment

Uh oh!

jhnwu3 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

naveenkcb commented Dec 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

naveenkcb Nov 17, 2025 •

edited

Loading