RNA-mediated interactions: An eRNAi target search algorithm for studying the impact of the metagenome
The phenomenon of environmental RNA interference (eRNAi) is based on the transfer of small RNA molecules (sRNA) between organisms to suppress the expression of target genes. Investigating the processes of environmental RNA interference provides novel insights into the dynamics of interactions between living beings and raises the question of whether RNA could be transferred between organisms.
- The main goal of the project is the development of an algorithm to identify potential targets in an organism's genome that can be specifically regulated by environmental double-stranded RNAs.
- Pipeline
- Dataset
- Methods
- System requirements
- Dependencies
- Installation
- Usage
- Development
- Results
- References
- Authors
- Data analysis and selection of suitable target datasets
- Data pre-processing
- Canonical Correspondence Analysis (CCA)
- Identification of meta-genome and transcriptome overlap regions
- Statistical validation of results
- Pipeline validation and testing
The data were obtained from public sources.
- Gallus gallus, BioProject PRJNA503784 (Tejas M. Shah et al. 2019)
- Homo Sapiens, NASA GeneLab - OSD-574 (Park J et al. 2024)
The following tools are used in this project:
- Snakemake
Mölder et al., 2021
- Kraken 2 | Lu et al., 2022
- Bracken | Lu et al., 2017
- DSK | Rizk et al., 2013
- BLAST+ | Camacho et al., 2009
- Kallisto | Bray et al., 2016
- R | R Core Team, 2024
- The minimum requirements are as follows:
- Operation memory - at least 4 GB
- Processor - x86-64 architecture, 4+ cores recommended
- Operation system
- Linux: Ubuntu 20.04+
- Windows: 10/11 (64-bit)
- macOS: 11.0+ (Big Sur)
- SSD-disk
- 4+ core processor
Full dependency list avialable in environmental.yaml
General pipeline depends on:
python> 3.11SnakeMakeR> 4.2.0- analysis tools:
fastpKraken 2BrackenKallistoDSKBlastN
Install with:
git clone git@github.com:Valeriisht/eRNAi_project
conda env create -f enviromental.yaml
conda activate eRNAiThen you can use your data, such as metagenomic and transcriptomic data.
In order to use your own data, you need to specify the parameters (data references and SRA ID) in the configuration file (config/config.yaml).
The scripts for primary data processing can be found in the following folder rules
To run the process, select the desired rule all in SnakeFile to process the data.
Example:
snakemake --cores 8
We implement an integrated testing pipeline for better development
├── Unit Tests: ✔ Implemented [###> ~70%]
├── Integration Tests: ⌛ In progress
└── Generative Tests: ⏳ Planned
Run with pytest
See contribution guide.
-
Canonical Correspondence Analysis (CCA)
- The relationship between microorganisms and the transcriptome
Instructions can be found in script_CCA
- DSK+BLAST revealed a significant number of common sites between microbial sequences and the host genome
params:
- e-value: 0.1
- word_size 12
- task blastn-short
- Statistics
Check whether the probability of finding intersections is statistically significant and associated with CCA.
-
Fisher tests:
- Probabilities for DSK-targets inside and outside the transcriptome in CCA - intersection with CCA transcripts
- Probabilities for DSK-targets with co-directional, oppositely directed and other transcripts within CCA transcripts
-
Binomial test:
- Probability of co-directional/opposite-directional/otherwise
-
T-test
- Differences in probabilities of direction by group between species - revealed no statistically significant difference
Transcript overlapping between different organisms:
Scripts & documentation can be found in the custom R pipeline
Output:
- The assay results do not contradict hypothesis concerning that non-coding regions of prokaryotic mRNAs may act as regulatory elements, exerting influence on the expression of target genes in eukaryotic cells.
- The developed algorithm - is potentially able to detect the effectors of inter-organismal eRNAi interactions.
-
Park, J., Overbey, E. G., Narayanan, S. A., Kim, J., Tierney, B. T., Damle, N., Najjar, D., Ryon, K. A., Proszynski, J., Kleinman, A., Hirschberg, J. W., MacKay, M., Afshin, E. E., Granstein, R., Gurvitch, J., Hudson, B. M., Rininger, A., Mullane, S., Church, S. E., … Mason, C. E. (2024). Spatial multi-omics of human skin reveals KRAS and inflammatory responses to spaceflight. Nature Communications, 15(1), 4773. https://doi.org/10.1038/s41467-024-48625-2
-
Shah, T. M., Patel, J. G., Gohil, T. P., Blake, D. P., & Joshi, C. G. (2019). Host transcriptome and microbiome interaction modulates physiology of full-sibs broilers with divergent feed conversion ratio. Npj Biofilms and Microbiomes, 5(1), 24. https://doi.org/10.1038/s41522-019-0096-3
-
Mölder, F., Jablonski, K. P., Letcher, B., Hall, M. B., Tomkins-Tinch, C. H., Sochat, V., Forster, J., Lee, S., Twardziok, S. O., Kanitz, A., Wilm, A., Holtgrewe, M., Rahmann, S., Nahnsen, S., & Köster, J. (2021). Sustainable data analysis with Snakemake. F1000Research, 10, 33. https://doi.org/10.12688/f1000research.29032.1
-
Lu, J., Rincon, N., Wood, D. E., Breitwieser, F. P., Pockrandt, C., Langmead, B., Salzberg, S. L., & Steinegger, M. (2022). Metagenome analysis using the Kraken software suite. Nature Protocols, 17(12), 2815–2839. https://doi.org/10.1038/s41596-022-00738-y
-
Lu, J., Breitwieser, F. P., Thielen, P., & Salzberg, S. L. (2017). Bracken: estimating species abundance in metagenomics data. PeerJ Computer Science, 3, e104. https://doi.org/10.7717/peerj-cs.104
-
Bray, N. L., Pimentel, H., Melsted, P., & Pachter, L. (2016). Near-optimal probabilistic RNA-seq quantification. Nature Biotechnology, 34(5), 525–527. https://doi.org/10.1038/nbt.3519
-
Rizk, G., Lavenier, D., & Chikhi, R. (2013). DSK: k -mer counting with very low memory usage. Bioinformatics, 29(5), 652–653. https://doi.org/10.1093/bioinformatics/btt020
-
Camacho, C., Coulouris, G., Avagyan, V., Ma, N., Papadopoulos, J., Bealer, K., & Madden, T. L. (2009). BLAST+: architecture and applications. BMC Bioinformatics, 10(1), 421. https://doi.org/10.1186/1471-2105-10-421
-
R Core Team (2024). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/.
V. Ishtuganova¹ ² (valeriishtuganova@gmail.com), M. Kravchenko¹ ² (https://github.com/MariiaKaar), D.Smutin ³ (dvsmutin@gmail.com)
- Bioinformatics Institute, Kantemirovskaya st. 2A, 197342, St. Petersburg, Russia
- Saint-Petersburg State University, Universitetskaya emb. 7/9, 199034, St. Petersburg, Russia
- Information Technologies, Mechanics and Optics University, Kronverksky Pr. 49, bldg. A, St. Petersburg, Russia


.png)


