Skip to content

Flomics/fl-cfRNAmeta

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

314 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

fl-cfRNAmeta

This repository hosts the code used in the analysis performed as part of the manuscript "Systematic cross-study assessment of RNA-Seq experimental workflows for plasma cell-free transcriptome profiling" by Tuñí et al.

Repository Structure

fl-cfRNAmeta/
├── README.md
├── nextflow/
├── sra_metadata/
├── src/
└── tables/

Contents

1. src/ scripts and notebooks

  • preprocess_metadata_functions.py
    Main script for preprocessing and harmonizing metadata from multiple cfRNA-seq studies.

    • Loads and standardizes sample-level metadata from various studies.
    • Merges with dataset-level metadata.
    • Applies dataset-specific cleaning, exclusion, and annotation logic.
    • Outputs harmonized per-sample and per-batch metadata tables.
  • sra_columns_mapping.py
    Helper functions for renaming and standardizing column names and values across metadata tables.

  • dataset_mappings.json
    JSON file with mappings for dataset names, colours, orders, or values used across scripts.

  • boxplots_fig2.R
    R script to generate the following boxplots: percentage of spliced reads, percentage of exonic reads, percentage of fragments mapping to the correct gene orientation, NG80, fragment number, percentage of reads mapping to reference human genome, percentage of reads mapping to ERCC spike-ins, effective fragment length distribution, etc. Also used to create the NG80 vs spliced reads scatterplot, and the percent of human reads vs percent of microbial reads scatterplot.

  • fig_1a_stacked_barplot.R
    R script for creating donor phenotype stacked bar plot.

  • fig_1b_heatmap.R
    R script for creating the pre-analytical variables heatmap.

  • gene_coverage_profile_fig2_tmpH.ipynb
    Jupyter notebook for plotting gene coverage profiles.

  • join_count_matrix_and_qc_table.ipynb
    Jupyter notebook for merging sliced count matrices and QC tables into a single matrix or QC table file.

  • merge_fastqs_array_isolate.sh
    Shell script for merging FASTQ files by array or isolate.

  • ng80.R
    R script to obtain the NG80 metric reported on the manuscript. Needs the count matrix as input file.

  • spliced_reads.sh
    Shell script to obtain the number and the % of spliced reads. Needs the deduplicated BAM file as input file.


2. nextflow/

  • Purpose:
    Contains configuration files and parameter sets for running nf-core/rnaseq Nextflow pipeline with each dataset.
  • Files:
    • base.config, base_params.yml: Base Nextflow configuration and parameters.
    • smarter.config, smarter_v2_params.yml, smarter_v3_params.yml: Configs for SMARTer protocols.
    • non_smarter.config, non_smarter_reverse_params.yml, non_smarter_unstranded_params.yml: Configs for non-SMARTer protocols.
    • hg38_gencodev39_params.yml: Parameters for hg38/Gencode v39 reference.
    • two-color-illumina.config: Config for two-color Illumina sequencing.

3. sra_metadata/

  • Purpose:
    Stores raw and preprocessed metadata files for each study, as well as supplementary tables and GEO series matrix files.
  • Files:
    • <dataset>_metadata.csv / <dataset>_metadata_preprocessed.csv: Raw and processed sample metadata.
    • <dataset>_supp_table_*.xlsx / .tsv: Supplementary tables with additional sample/batch info.
    • <dataset>_GSE*_series_matrix.txt: GEO series matrix files for extracting sample annotations.

4. tables/

Contains output and intermediate tables generated by the preprocessing scripts and downstream analyses:

  • cfRNA-meta_per_sample_metadata.tsv
    Harmonized per-sample metadata table for all included cfRNA-seq datasets.

  • cfRNA-meta_per_batch_metadata.tsv
    Harmonized per-batch metadata table summarizing batch-level information.

  • sampleinfo_all-batches.tsv
    Sample information table including all batches.

  • taxa_simple_df_w_batch.tsv
    Simplified taxa table for downstream taxonomic analyses.


License

See LICENSE file for details.


Contact

For questions or contributions, please open an issue.

About

Comparative cfRNA-Seq meta-analysis

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 6