fl-cfRNAmeta

This repository hosts the code used in the analysis performed as part of the manuscript "Systematic cross-study assessment of RNA-Seq experimental workflows for plasma cell-free transcriptome profiling" by Tuñí et al.

Repository Structure

fl-cfRNAmeta/
├── README.md
├── nextflow/
├── sra_metadata/
├── src/
└── tables/

preprocess_metadata_functions.py
Main script for preprocessing and harmonizing metadata from multiple cfRNA-seq studies.
- Loads and standardizes sample-level metadata from various studies.
- Merges with dataset-level metadata.
- Applies dataset-specific cleaning, exclusion, and annotation logic.
- Outputs harmonized per-sample and per-batch metadata tables.
sra_columns_mapping.py
Helper functions for renaming and standardizing column names and values across metadata tables.
dataset_mappings.json
JSON file with mappings for dataset names, colours, orders, or values used across scripts.
boxplots_fig2.R
R script to generate the following boxplots: percentage of spliced reads, percentage of exonic reads, percentage of fragments mapping to the correct gene orientation, NG80, fragment number, percentage of reads mapping to reference human genome, percentage of reads mapping to ERCC spike-ins, effective fragment length distribution, etc. Also used to create the NG80 vs spliced reads scatterplot, and the percent of human reads vs percent of microbial reads scatterplot.
fig_1a_stacked_barplot.R
R script for creating donor phenotype stacked bar plot.
fig_1b_heatmap.R
R script for creating the pre-analytical variables heatmap.
gene_coverage_profile_fig2_tmpH.ipynb
Jupyter notebook for plotting gene coverage profiles.
join_count_matrix_and_qc_table.ipynb
Jupyter notebook for merging sliced count matrices and QC tables into a single matrix or QC table file.
merge_fastqs_array_isolate.sh
Shell script for merging FASTQ files by array or isolate.
ng80.R
R script to obtain the NG80 metric reported on the manuscript. Needs the count matrix as input file.
spliced_reads.sh
Shell script to obtain the number and the % of spliced reads. Needs the deduplicated BAM file as input file.

2. `nextflow/`

Purpose:
Contains configuration files and parameter sets for running nf-core/rnaseq Nextflow pipeline with each dataset.
Files:
- base.config, base_params.yml: Base Nextflow configuration and parameters.
- smarter.config, smarter_v2_params.yml, smarter_v3_params.yml: Configs for SMARTer protocols.
- non_smarter.config, non_smarter_reverse_params.yml, non_smarter_unstranded_params.yml: Configs for non-SMARTer protocols.
- hg38_gencodev39_params.yml: Parameters for hg38/Gencode v39 reference.
- two-color-illumina.config: Config for two-color Illumina sequencing.

3. `sra_metadata/`

Purpose:
Stores raw and preprocessed metadata files for each study, as well as supplementary tables and GEO series matrix files.
Files:
- <dataset>_metadata.csv / <dataset>_metadata_preprocessed.csv: Raw and processed sample metadata.
- <dataset>_supp_table_*.xlsx / .tsv: Supplementary tables with additional sample/batch info.
- <dataset>_GSE*_series_matrix.txt: GEO series matrix files for extracting sample annotations.

4. `tables/`

Contains output and intermediate tables generated by the preprocessing scripts and downstream analyses:

cfRNA-meta_per_sample_metadata.tsv
Harmonized per-sample metadata table for all included cfRNA-seq datasets.
cfRNA-meta_per_batch_metadata.tsv
Harmonized per-batch metadata table summarizing batch-level information.
sampleinfo_all-batches.tsv
Sample information table including all batches.
taxa_simple_df_w_batch.tsv
Simplified taxa table for downstream taxonomic analyses.

License

See LICENSE file for details.

Contact

For questions or contributions, please open an issue.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

fl-cfRNAmeta

Repository Structure

Contents

1. `src/` scripts and notebooks

2. `nextflow/`

3. `sra_metadata/`

4. `tables/`

License

Contact

About

Uh oh!

Releases

Packages

Contributors 6

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 314 Commits
.github/ISSUE_TEMPLATE		.github/ISSUE_TEMPLATE
nextflow		nextflow
sra_metadata		sra_metadata
src		src
tables		tables
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

License

Flomics/fl-cfRNAmeta

Folders and files

Latest commit

History

Repository files navigation

fl-cfRNAmeta

Repository Structure

Contents

1. src/ scripts and notebooks

2. nextflow/

3. sra_metadata/

4. tables/

License

Contact

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 6

Uh oh!

Languages

1. `src/` scripts and notebooks

2. `nextflow/`

3. `sra_metadata/`

4. `tables/`

Packages