Command-line tool for estimation of knockout efficiency in long-read whole genome sequencing. The samCRISPR is able to process multiple BAM files for multiple sgRNAs to detect CRISPR events around cut site of Cas9 in target region. The samCRISPR calculates Wilson score confidence intervals for knockout efficiency.
In order to run samCRISPR the following software components and packages are required:
- SAMtools >=1.21
- R computing environment >=4.4.2
- Linux environment
The samCRISPR script can be directly downloaded from this repository:
wget https://github.com/kaanokay/samCRISPR/blob/main/script/samCRISPR.sh
chmod u+x samCRISPR.sh
Then, you can directly execute the script with --help argument for a complete list of options.
bash ./samCRISPR.sh --help
To download the entire repository
git clone https://github.com/kaanokay/samCRISPR.git
Sample data and output examples can be found at data and examples, respectively.
bash ./samCRISPR.sh \
--sgRNA path/to/sgRNAs.bed \ # path to bed file containing coordinates of each sgRNA in corresponding genome
--reference path/to/genome.fa \ # path to corresponding reference genome FASTA file (uncompressed)
--bam path/to/bam.files.txt \ # path to a text file where each row contains path of individual bam file (should be indexed by SAMtools)
--quantification-window 1 # interval for seeking CRISPR events: how many basepairs upstream and downstream away from the cut site (default is 1)
- The 4th column of sgRNA bed file should contain a string specifying what strand of DNA (either forward or reverse) sgRNA of interest binds to.
- It is highly recommended to consider CRISPR events that occur in the interval a few bp away from the cut site. The tool is currently incapable of identifying CRISPR events that are more far. This also means that sgRNA bed file should be limited to only exact coordinates of sgRNA.