Documentation |
|---|
RpGene : A soft-tool for automated gene extraction, gene sequencing analysis and dataset
The information on gene sequences is accessible in a variety of databases that are accessible online, like NCBI , DEG , OGEE and many more. The extraction of information on genes however, is a challenging task to extract from these databases. In the context of machine learning one of the most fundamental demands is the data to be well-organized and usable format. Converting information about gene sequences from sequences into datasets consisting of features derived from sequences in a proper format is a difficult task for researchers. In this study, we have created a soft tool called RpGene based on Python that can perform automatizing the extraction of sequence data of genes from the NCBI database, and analyzing the data using e-Path, and presenting the user with an optimally optimized dataset that can be utilized for dataset generation in context of machine learning and other statistical studies. Our soft tool vastly decreases the time and effort required for dataset generation from gene sequence information and automates the entire process. It finally calculates the sequence features from CodonW integration and outputs a read to go dataset for further studies.
git clone https://github.com/KrisshRp/RpGene.gitcd ./RpGenesudo python3 install.py
sudo python3 main.pyAPP_PORT = 3000
INFO: Started server process [4889]
INFO: Waiting for application startup.
INFO: Application startup complete.
INFO: Uvicorn running on http://127.0.0.1:3000 (Press CTRL+C to quit)
INFO: 127.0.0.1:48506 - GET / HTTP/1.1 200 OK
INFO: 127.0.0.1:48506 - GET /stylesheets/main.css HTTP/1.1 200 OK
INFO: 127.0.0.1:48528 - GET /javascripts/main.js HTTP/1.1 200 OK
INFO: 127.0.0.1:48522 - GET /images/dna.png HTTP/1.1 200 OK
INFO: 127.0.0.1:33926 - GET /images/DNA-helix.mp4 HTTP/1.1 200 OK
INFO: 127.0.0.1:40292 - GET / HTTP/1.1 200 OKhttp://127.0.0.1:3000or
http://localhost:3000Fill all the inputs "organism name" , "NCBI accession id" and "ePath gene locus tags"
Then click the "Submit" button
After that click the "Run" to start the script
If chrome driver is downloaded
Chrome Driver is up-to-datedElse
Downloading Chrome Driver [108.0.5359.71]
100% [...................................] 6904173 / 6904173In the Console the output showls like
https://www.ncbi.nlm.nih.gov/nuccore/CP005082.1
running script
--------------[Mycobacterium tuberculosis Beijing/NITR203]--------------
> [CP005082.1] :: sending request to server
> [CP005082.1] :: sending request to servere -> 3.24MB
> [CP005082.1] :: sending request to server
> [CP005082.1] :: sending request to servere -> 13.24MB
> [CP005082.1] :: 4155 -- J112_21090 -- [4410381, 4410525] [603]
> [CP005082.1] :: J112_12470 added 1057In the Portal the output showls like
Lastly click the "download" button to download the sequence data in a zip format




