Turned complex data into preclinically validated strategies through rigorous biological & mathematical reasoning | Multi-omics & AI/ML x Health | Computational biologist
I am a passionate, impact-driven computational biologist who transforms complex data into preclinically validated cancer prevention and treatment, and drives high-confidence, efficient data-to-decision outcomes, including hypothesis generation, actionable insights, and preclinical validated strategies.
My (co-)first-author publications demonstrate the impact and outcome: Oncogene(2022 and 2025), EMBO Reports(2023), Nature Communications(2025), two manuscripts in preparation.
My computational approach to deliver preclinically validated strategies:
(a) seamlessly integrates expertise in computational analysis and bench science;
(b) applies robust mathematical and biological reasoning;
(c) leverages single-cell and spatial genomics to discover and prioritize cellular and molecular targets.
I represented my university in the National Mathematical Modeling Contest and am highly proficient in developing and customizing computational infrastructure, including:
(a) building R packages from scratch;
(b) debugging and extending open-source tools in R and Python;
(c) developing and optimizing pipelines for translational R&D.
I have built end-to-end spatial genomics pipelines, covering: sample preparation; sequencing library construction; bioinformatics analysis; publication-ready presentation; preclinical validation.
-
💬 My accomplished computational biologiy projects are list on my website as well as:
- SeqWins: An R package allowing flexible base trimming and complete Fastq analysis on Windows System as fast and memory-efficient as linux
- Modeling the deposition of infectious agents suggests that aerosolized drugs offer enhanced specificity
- Effective RNA Velocity Analysis: Math, Implementation, Benchmark, and Discovery of Meaningful Tumor Trajectories from Real-World Single-Cell Genomics
- Single-Cell Genomic and Clinical Data Analyses Pinpointed Targets and Drove the Preclinically Validated Novel Strategy To Prevent/Treat kidney cancer
- Quantify Un-/Under- Annotated Features (hERVs, Transgenes) from (Single-Cell) Transcriptomics: Math Algorithms, Implementation and Turnkey Workflows
- Integrated Analysis of Spatial Genomics and scRNAseq Identifies Occult Cervical Cancer Subtypes: Why Current Billion Dollar Screening Programs Fail the Most At-Risk Patients
- Improved UMI Normalization algorithm that Determines the Downstream Interpretation of Single-Cell Transcriptomes
- Infection-Driven Uterine Tumorigenesis: Integrated Spatial and Single-Cell Profiling Validated in Patients and Preclinical Models
-
🔭 I’m currently working on AI x Health and AI agents and Transformer engineering including building nano version of ChatGPT and Deepseek from scratch, LoRA fine tuning from sctratch:
- I engineered a high-performance CNN model for histological analysis based on YOLOv8, optimizing model accuracy through data augmentation and implementing Active Learning to combat overfitting
- I am certifed AI agents developer
- My Hackthon project: An multimodal agentic EHR co-pilot
-
👯 Tools I collobratively contributed to
- The rseqR package: perform differential analysis of RNA-seq data(my PR got approved and merged with other improvements)
- cellcuratoR: interactive single-cell gene expression visualizations from Seurat(my PR got merged)
- My edition of stereopy: a fundamental and comprehensive tool for mining and visualization based on spatial transcriptomics data(I fixed bugs in h5ad2rds.R and enabled successful conversion of spatial dataset from python h5ad to R seurat object)
- My edition of sctransform: a R package normalization and variance stabilization of single-cell RNA-seq data using regularized negative binomial regression(I add new functions allow the theta_given to be effetive together with vst.flavor = "v2", thus to use the stabler glmgampoi (implemented in sctransform v2) to run the analytic pearson residual. Check my blog to learn the logic flow of the original sctransform() and the improved UMI normalization model.)
- My edition of msigdbr: MSigDB gene sets typically used with the Gene Set Enrichment Analysis (GSEA)(I updated the source genesets, and enabled using mouse genesets directly from Msigdb rather than converted from human genesets based on orthologs. My PR was not approved at that time, but the newer version acutally adopted my idea.)
-
📖 Love to share my computational biology blogs, where I am writting to fill gaps I couldn’t find clearly addressed elsewhere, offering original insights and mathematical reasoning behind bioinformatics tools and large language models and more.
