RNA-mediated interactions: An eRNAi target search algorithm for studying the impact of the metagenome

The phenomenon of environmental RNA interference (eRNAi) is based on the transfer of small RNA molecules (sRNA) between organisms to suppress the expression of target genes. Investigating the processes of environmental RNA interference provides novel insights into the dynamics of interactions between living beings and raises the question of whether RNA could be transferred between organisms.

Aim

The main goal of the project is the development of an algorithm to identify potential targets in an organism's genome that can be specifically regulated by environmental double-stranded RNAs.

Pipeline

Data analysis and selection of suitable target datasets
Data pre-processing
Canonical Correspondence Analysis (CCA)
Identification of meta-genome and transcriptome overlap regions
Statistical validation of results
Pipeline validation and testing

Dataset

The data were obtained from public sources.

Gallus gallus, BioProject PRJNA503784 (Tejas M. Shah et al. 2019)
Homo Sapiens, NASA GeneLab - OSD-574 (Park J et al. 2024)

Methods

The following tools are used in this project:

Core Frameworks

Snakemake
Mölder et al., 2021

Metagenomics

Kraken 2 | Lu et al., 2022
Bracken | Lu et al., 2017

Sequence Analysis

DSK | Rizk et al., 2013
BLAST+ | Camacho et al., 2009

Transcriptomics

Kallisto | Bray et al., 2016

Statistical Computing

R | R Core Team, 2024

System requirements

The minimum requirements are as follows:

Device requirements

Operation memory - at least 4 GB
Processor - x86-64 architecture, 4+ cores recommended

Software requirements

Operation system
- Linux: Ubuntu 20.04+
- Windows: 10/11 (64-bit)
- macOS: 11.0+ (Big Sur)

Recomended

SSD-disk
4+ core processor

Dependencies

Full dependency list avialable in environmental.yaml General pipeline depends on:

python > 3.11
SnakeMake
R > 4.2.0
analysis tools:
- fastp
- Kraken 2
- Bracken
- Kallisto
- DSK
- BlastN

Installation

Install with:

git clone git@github.com:Valeriisht/eRNAi_project
conda env create -f enviromental.yaml
conda activate eRNAi

Then you can use your data, such as metagenomic and transcriptomic data. In order to use your own data, you need to specify the parameters (data references and SRA ID) in the configuration file (config/config.yaml).

Usage

The scripts for primary data processing can be found in the following folder rules

To run the process, select the desired rule all in SnakeFile to process the data.

Example:

snakemake --cores 8

Development

Testing

We implement an integrated testing pipeline for better development

├── Unit Tests:         ✔ Implemented [###>  ~70%]
├── Integration Tests:  ⌛ In progress
└── Generative Tests:   ⏳ Planned

Run with pytest

Contributing

See contribution guide.

Results

Canonical Correspondence Analysis (CCA)
- The relationship between microorganisms and the transcriptome

Instructions can be found in script_CCA

DSK+BLAST revealed a significant number of common sites between microbial sequences and the host genome

params:

e-value: 0.1
word_size 12
task blastn-short

Statistics

Check whether the probability of finding intersections is statistically significant and associated with CCA.

Fisher tests:
- Probabilities for DSK-targets inside and outside the transcriptome in CCA - intersection with CCA transcripts
- Probabilities for DSK-targets with co-directional, oppositely directed and other transcripts within CCA transcripts
Binomial test:
- Probability of co-directional/opposite-directional/otherwise
T-test
- Differences in probabilities of direction by group between species - revealed no statistically significant difference
Transcript overlapping between different organisms:

Scripts & documentation can be found in the custom R pipeline

Output:

The assay results do not contradict hypothesis concerning that non-coding regions of prokaryotic mRNAs may act as regulatory elements, exerting influence on the expression of target genes in eukaryotic cells.
The developed algorithm - is potentially able to detect the effectors of inter-organismal eRNAi interactions.

References

Park, J., Overbey, E. G., Narayanan, S. A., Kim, J., Tierney, B. T., Damle, N., Najjar, D., Ryon, K. A., Proszynski, J., Kleinman, A., Hirschberg, J. W., MacKay, M., Afshin, E. E., Granstein, R., Gurvitch, J., Hudson, B. M., Rininger, A., Mullane, S., Church, S. E., … Mason, C. E. (2024). Spatial multi-omics of human skin reveals KRAS and inflammatory responses to spaceflight. Nature Communications, 15(1), 4773. https://doi.org/10.1038/s41467-024-48625-2
Shah, T. M., Patel, J. G., Gohil, T. P., Blake, D. P., & Joshi, C. G. (2019). Host transcriptome and microbiome interaction modulates physiology of full-sibs broilers with divergent feed conversion ratio. Npj Biofilms and Microbiomes, 5(1), 24. https://doi.org/10.1038/s41522-019-0096-3
Mölder, F., Jablonski, K. P., Letcher, B., Hall, M. B., Tomkins-Tinch, C. H., Sochat, V., Forster, J., Lee, S., Twardziok, S. O., Kanitz, A., Wilm, A., Holtgrewe, M., Rahmann, S., Nahnsen, S., & Köster, J. (2021). Sustainable data analysis with Snakemake. F1000Research, 10, 33. https://doi.org/10.12688/f1000research.29032.1
Lu, J., Rincon, N., Wood, D. E., Breitwieser, F. P., Pockrandt, C., Langmead, B., Salzberg, S. L., & Steinegger, M. (2022). Metagenome analysis using the Kraken software suite. Nature Protocols, 17(12), 2815–2839. https://doi.org/10.1038/s41596-022-00738-y
Lu, J., Breitwieser, F. P., Thielen, P., & Salzberg, S. L. (2017). Bracken: estimating species abundance in metagenomics data. PeerJ Computer Science, 3, e104. https://doi.org/10.7717/peerj-cs.104
Bray, N. L., Pimentel, H., Melsted, P., & Pachter, L. (2016). Near-optimal probabilistic RNA-seq quantification. Nature Biotechnology, 34(5), 525–527. https://doi.org/10.1038/nbt.3519
Rizk, G., Lavenier, D., & Chikhi, R. (2013). DSK: k -mer counting with very low memory usage. Bioinformatics, 29(5), 652–653. https://doi.org/10.1093/bioinformatics/btt020
Camacho, C., Coulouris, G., Avagyan, V., Ma, N., Papadopoulos, J., Bealer, K., & Madden, T. L. (2009). BLAST+: architecture and applications. BMC Bioinformatics, 10(1), 421. https://doi.org/10.1186/1471-2105-10-421
R Core Team (2024). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/.

Authors

V. Ishtuganova¹ ² (valeriishtuganova@gmail.com), M. Kravchenko¹ ² (https://github.com/MariiaKaar), D.Smutin ³ (dvsmutin@gmail.com)

Bioinformatics Institute, Kantemirovskaya st. 2A, 197342, St. Petersburg, Russia
Saint-Petersburg State University, Universitetskaya emb. 7/9, 199034, St. Petersburg, Russia
Information Technologies, Mechanics and Optics University, Kronverksky Pr. 49, bldg. A, St. Petersburg, Russia

Name		Name	Last commit message	Last commit date
Latest commit History 567 Commits
.github/workflows		.github/workflows
CCA_analysis		CCA_analysis
config		config
data/data_ggallus		data/data_ggallus
docs		docs
img		img
requirements		requirements
results/kraken_res		results/kraken_res
scripts		scripts
tests		tests
workflow/rules		workflow/rules
.gitignore		.gitignore
README.md		README.md
Snakefile		Snakefile
enviromental.yaml		enviromental.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RNA-mediated interactions: An eRNAi target search algorithm for studying the impact of the metagenome

Aim

Contents

Pipeline

Dataset

Methods

Core Frameworks

Metagenomics

Sequence Analysis

Transcriptomics

Statistical Computing

System requirements

Device requirements

Software requirements

Recomended

Dependencies

Installation

Usage

Development

Testing

Contributing

Results

References

Authors

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

Valeriisht/eRNAi_project

Folders and files

Latest commit

History

Repository files navigation

RNA-mediated interactions: An eRNAi target search algorithm for studying the impact of the metagenome

Aim

Contents

Pipeline

Dataset

Methods

Core Frameworks

Metagenomics

Sequence Analysis

Transcriptomics

Statistical Computing

System requirements

Device requirements

Software requirements

Recomended

Dependencies

Installation

Usage

Development

Testing

Contributing

Results

References

Authors

About

Resources

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages