Repository for supplemental material of the pathwayPCA MultiOmics manuscript
Gabriel Odom and Lily Wang
2021-08-27
The first file we include is the final variant of the C2 CP collection we used for the multi-omics analysis: c2_cp_entrez_trimmed_20210709.gmt. It includes only gene sets with between 5 and 200 genes.
The analysis_DNAm/ folder contains:
input_data/:"meta_analysis_no_crossHyb_smoking_ov_comb_p_singleCpG_20210803.csv": statistical modelling results for 742 co-methlyated genomic regions"meta_analysis_single_cpg_sig_no_crossHyb_smoking_with_state_greatAnnot_df.csv": statistical modelling results for 3751 individual methylation probes
"DNAm_gene_set_enrichment_20210729.R": analysis script to perform gene set methylation analysis via themissMethyl::package"c2cp_trimmed_missMethyl_pValues_20210803.csv": pathway analysis results"single_gene_spline_pValues_20210821.csv": single gene results used formitch::input
The analysis_RNAseq/ folder contains:
"rnaSeq_single_gene_pVals_20210810.csv": results from single-gene testing"rnaSeq_single_gene_testing_20210810.R": script to wrangle the raw data, perform the single gene testing, and then perform the subsequentfgsea::analysis"c2cp_trimmed_fgsea_pValues_20210810.csv": pathway-level results returned by thefgsea::package
The analysis_SNP/ folder contains:
example_input_data/: a folder with 10 Excel files. These files are annotated SNP activity within a single pathway. The full set of data files is 2,833 such tables (one for each pathway in the C2 CP collection). We do not include all these files due to repository size restrictions. The full set of 2833 Excel tables would be fed into SAS via the analysis scripts insrc/sas/."single_gene_SNP_min_pValue_spline_20210823.csv": single gene results used formitch::inputsrc/: script files."MOD_7-10-2021-clean.sas": script for pathway analysis of genetic variants"p-values-estimation_use-empirical-null-clean.R": script used to apply the Bacon significance correction"wrangle_SAS_output_20210720.R": script to clean up the results
"all_results_bacon_correction_7-21-2021.csv": the SNP pathway analysis results after correcting the p-values"all_bacon_results_wrangled_20210809.csv": SNP pathway analysis results from the SAS macro
The multiomics/ folder contains the scripts and results for the MiniMax statistic (MiniMax/) and for the mitch:: package (mitch/).
The file "data/pathwayMultiomics_test_size_20211201.RDS" is a collection of the pathway p-values for each data platform (using the pathwayPCA method) and the resulting MiniMax statistic under the null hypothesis (no treatment added to the data). The test size of the MiniMax statistic will depend on the parameters of the Beta distribution selected for this data, but the paper uses the asymptotic values of (alpha = 2, beta = 2). Therefore, MiniMax p-values can be calculated by finding the quantile of the Beta(2,2) distribution for the MiniMax statistic values.