DCGP is a sequence-based machine-learning framework for predicting genes under convergent selection during domestication across 43 plant species. Built upon the standardized comparative-genomic resource PhytoPop20K, DCGP enables cross-species inference of domestication convergence using sequence-encoded features alone, without requiring population-genetic statistics or allele-frequency contrasts. The model is trained using labeled convergence-selection genes curated in PhytoPop20K and sequence embeddings generated by Evo 2, which summarize nucleotide-level information into quantitative features suitable for machine learning. Once trained, DCGP can be applied to new genomes by simply passing gene sequences through the model, bypassing the need for population-level data, demographic modeling, or selection scans.
- DNNGP – Deep neural network for genomic prediction.
- EXGEP – A framework for predicting genotype-by-environment interactions using ensem)bles of explainable machine-learning models.
- GxEtoolkit – An automated and explainable machine learning framework for Genome Prediction.
- BDP-identifier – Genomic Language Model-Based Prediction of Bidirectional Promoter Activity.
- KANMB – A machine learning training and prediction tool based on KAN (Kolmogorov-Arnold Network) for identifying optimal metabolites from metabolite expression data.
- python 3.11
- conda/pip
Install packages:
- Create a python environment.
conda create -n dcgp python=3.11
conda activate dcgp- Clone this repository and cd into it.
git clone https://github.com/AIBreeding/DCGP.git
cd ./DCGP
conda env create -f environment.yml -n dcgp- If the installation above is unsuccessful, please refer to the Evo2 Official Installation Guide (https://github.com/arcinstitute/evo2) to resolve any dependency or installation issues.
python ./embedding.py \
--fasta ./sequence/sample.fa \
--step 250 \
--window 1024 \
--layer "blocks.24" \
--save_txt ./sequence/embedding.txt \
--out ./sequence/sa.ptpython ./predict.py \
--data ./squence \
--mdoel ./models/ \
--out ./results/pred \
--batch_size 256This project is free to use for non-commercial purposes - see the LICENSE file for details.
For more information, please contact with Huihui Li (lihuihui@caas.cn).