Add BulkRNABert model for cancer prognosis #713
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
This PR adds the BulkRNABert model for cancer prognosis tasks based on the paper by Gélard et al. (2025).
Key additions:
BulkRNABertLayer: Transformer encoder layer for gene expression dataBulkRNABert: Main model for cancer type classification (33 TCGA cancer types)BulkRNABertForSurvival: Model for survival prediction using Cox proportional hazards losscompute_c_index: Utility function for computing concordance indexPaper reference:
Gélard et al. (2025) "BulkRNABert: Cancer prognosis from bulk RNA-seq based language models"
Implementation Details
The implementation follows PyHealth's
BaseModelpattern and provides:Test Plan
Context
This contribution was developed as part of a CS 598 DLH (Deep Learning for Healthcare) course project reproducing the BulkRNABert paper results.