This repository contains code for simulation study exaiming the impact of gene conversion on the "softness" of selective sweeps, used for the analyses presented in a forthcoming preprint.
This repository contains four python scripts that generate, modify, and run SLiM scripts (for use with SLiM version 4.0.1) simulating a two-locus hitchhiking model with gene conversion events allowed at the selected locus but not the linked locus. This pipeline requires that stdpopsim version 0.2.0 and all of its dependencies be installed. (This is pretty easy using conda following the instructions on https://popsim-consortium.github.io/stdpopsim-docs/stable/installation.html). These scripts write all of their output to directories that they will create within the current working directory, so if you want to write them somewhere else you will have to modify them slightly.
-
buildAllSlimScripts.py: This script usesstdpopsimto generate all of the SLiM scripts which we will then modify to run our sweep simulations under the desired demographic models (using some Arabidopsis, Drosophila, and human models from thestdpopsimcatalog). Note that this builds a slim script (which will be modified in the next step) for EVERY parameter combination for each demographic model. This is admittedly very lazy design that results in many more simulation scripts than necessary, but it gets the job done. -
injectAllSlimScripts.py: This code modifies the SliM scripts generated bystdpopsimvia the above script. These SLiM scripts are modified to contain selective sweeps with gene conversion occurring only at the selected locus. Note that this step could be simplified substantially by taking advantage ofstdpopsim's ability to condition on sweeps occuring at a specified time (by adding code for this tobuildAllSlimScripts.py), but the development of this pipeline began prior to the incorporation of that functionality intostdpopsim. -
runSlimulations.py: This code actually launches the simulation jobs to a high-performance computing cluster. The code is written assuming that the cluster uses theSLURMscheduler and that the desired partition name isgeneral, so the code may have to be modified to run on your computing resources. It can be run for each species as follows:python runSlimulations.py HomSappython runSlimulations.py AraThapython runSlimulations.py DroMel -
parseOutputForSpecies.py: This code parses the output from each SLiM simulation and writes information about the simulation's outcomes into a tabular format that can be read by the analysis notebook described below. This can be run for each species as follows:python parseOutputForSpecies.py HomSappython parseOutputForSpecies.py DroMelpython parseOutputForSpecies.py AraTha
The softness_analysis.ipynb notebook contains the code that reads in the simulation summaries generated by step four of the above pipeline and generate the figures found in the paper. The notebook also contains tables with fairly detailed summaries of simulation outcomes for each demographic model and parameter combination examined in the paper.
The hapFreqSims directory contains the code for performing and analyzing the simulations that record the frequencies of distinct haplotypes created by gene conversion or recurrent mutation during a sweep.