Skip to content

Conversation

@gwicho38
Copy link

@gwicho38 gwicho38 commented Dec 8, 2025

Summary

This PR adds the BulkRNABert model for cancer prognosis tasks based on the paper by Gélard et al. (2025).

Key additions:

  • BulkRNABertLayer: Transformer encoder layer for gene expression data
  • BulkRNABert: Main model for cancer type classification (33 TCGA cancer types)
  • BulkRNABertForSurvival: Model for survival prediction using Cox proportional hazards loss
  • compute_c_index: Utility function for computing concordance index
  • Comprehensive unit tests

Paper reference:
Gélard et al. (2025) "BulkRNABert: Cancer prognosis from bulk RNA-seq based language models"

Implementation Details

The implementation follows PyHealth's BaseModel pattern and provides:

  • Multiclass classification for 33 TCGA cancer types
  • Binary classification support
  • Survival prediction with Cox partial likelihood loss
  • C-index metric computation for survival evaluation

Test Plan

  • Unit tests for BulkRNABertLayer encoder
  • Unit tests for BulkRNABert classification model
  • Unit tests for BulkRNABertForSurvival model
  • Unit tests for compute_c_index function
  • All tests pass locally

Context

This contribution was developed as part of a CS 598 DLH (Deep Learning for Healthcare) course project reproducing the BulkRNABert paper results.

- Add BulkRNABertLayer encoder for gene expression data
- Add BulkRNABert model for cancer type classification
- Add BulkRNABertForSurvival for survival prediction with Cox loss
- Add compute_c_index utility for survival evaluation
- Add comprehensive unit tests

Based on: Gélard et al. (2025) "BulkRNABert: Cancer prognosis from
bulk RNA-seq based language models"

Paper: https://www.biorxiv.org/content/10.1101/2024.06.13.598798
Model: https://huggingface.co/InstaDeepAI/BulkRNABert
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant