Detect candidate genes for adaptation to high temperature in African cowpea using via environmental genome wide association study (envGWAS).
- The current manuscript is available as:
Akakpo R, Lee EJ, Pacheco JB, Rios EF, Kantar MB, Ousmane B, Volz KM, Akinmade H, Getino L, Boote KJ, et al. 2025. Identification of genetic variation associated with high-temperature tolerance in cowpea. bioRxiv. doi:10.1101/2025.04.09.647789
Using SNP data available from IITA Core accessions, structure of population analysis in ~600 accessions of African cowpea will be performed. Various population Genetic analysis including population-based estimate of recombination rate Ped/pop method. Using the same data we will carrie out environmental genome wide association study .
- Genotyping dataset uses UCR Minicore Cluster file generated by Muñoz-Amatriaín M SNP genotyping Matrix
- SNP positions from the iSelect design file are listed Github SNP_Utils
- A Full passport data includes market collections & breeding material was provided by IITA (~1400 accessions) Full passport data
- Among them, some samples have localities, but no latitude and longitude. We curate the data for a cleaned passport data file that does not include market collection or breeding material. Cheking for matching information between country, Latitude/Longitude and location information will be done. Maybe possible to supplement with inferred latitude & longitude.
- Cleaned passport data will be upload on GitHub Cleaned passport data
- WorldClim is environmental data worldwide. We will focus on mean precipitation and temperature data, but also on growing month data WorldClim
- Performing envGWAS for BIO 1-19
- Performing envGWAS for Monthly Rainfall, Max temperature and Min temperature
-
Run structure analysis using STRUCTURE. Try the analysis for correlated & uncorrelated allele frequency i.e. models with admixture & no admixture As structure is time consuming for high dataset, get use LD pruning in Plink to get a subset of the total SNPs to run structure analysis. Introduce iterations in the parameters file, important for running structure
-
Run Procrustes analysis of PCA [Procruste PCA](PLoS Genet. 8: e1002886)
-
Fst Population differentiation test
Estimate FST for each SNP between subgroups (based on structure subgroups and/or geographic groups) and Identify outliers SNPs candidate for adaptive selection.
-
Identify the gene underling candidates SNPs using Bedtools intersect Bedtools, Snp2Gene Python program and appropriate GFF annotation files
-
Perform Gene Ontology Tests to identify gene functions using the standard Fisher’s exact tests implemented in the R package TopGO
-
LD of "hits" from iSelect to resequencing can use 36 cowpea SNP calls
- Collection locality & temperature Maps
The directory contains scripts for formating data.
- Extract_Monthly_WorlClim_data.R is extracts mean monthly data from raster wordclim data;
- Extract_Growing_Period_Clim_data.R is used formats genotype and phenotype text file into plink format (.tped, .tfam);
- ped2binary.sh converts plink text files into binary and produces set of plink files for envgwas
This directory contains scripts to run envGWAS.
- rel_mtx.sh calculates centered relatedness matrices for envgwas
- mlm.sh is used to fit a mixed-linear models with gemma
- prep_gemma_output_for_manathan_plot.R is an R script used to transform gemma output files into a suitable files for manhattan plot
- manhattan_plot.R is an R script used to produce manhattan plot