This repository contains all the files for my final project in CS598 Deep Learning for Halthcare.
The goal of the project was to reproduce key parts of the BulkRNA-BERT paper using real TCGA gene expression data and clinical survival information. Since the original codebase could not be installed (old dependencies, Python/CUDA issues), I recreated a lightweight version of the model and ran experiments on four TCGA GBM/LGG patients.
final_project.py– main script that:- Loads the four TCGA TSV files
- Extracts survival info from the XML files
- Builds the patient-by-gene matrix
- Runs PCA and a simplified BulkRNA-BERT encoder
- Computes C-index values and generates plots
- All four expression TSV files
- All four clinical XML files
- My final report PDF