conda config --append channels bioconda
conda config --append channels conda-forge
conda config --append channels anaconda
conda create -n scape_env python=3.11
conda activate scape_env
conda install anaconda::numpy
conda install anaconda::scipy
conda install anaconda::pandas
conda install anaconda::matplotlib
conda install anaconda::click
conda install anaconda::tomli-w
conda install anaconda::requests
conda install conda-forge::psutil
conda install conda-forge::tomli-w
conda install bioconda::bedtools
conda install bioconda::pybedtools
conda install bioconda::pysam
conda install bioconda::gffutils
pip install taichi
pip install scape-apa
# Mac
conda env create -f mac_environment.yml
conda activate scape_env
# Linux
conda env create -f linux_environment.yml
conda activate scape_env
pip install -r linux_requirements.txt
# install locally
git clone https://github.com/chengl7-lab/scape.git
cd scape
pip install .
| Command | Description |
|---|---|
| scape gen_utr_annotation | Generate UTR annotation. |
| scape prepare_input | Prepare data per UTR. |
| scape infer_pa | Parameters inference. |
| scape merge_pa | Merge PA within junction per gene or UTR. |
| scape cal_exp_pa_len | Calculate the expected length of PA. |
| scape ex_pa_cnt_mat | Extract read count matrix. |
Get help information of scape or scape commands.
scape --help
scape gen_utr_annotation --help
| Input Argument | Type | Required | Default | Description |
|---|---|---|---|---|
| --gff_file | TEXT | Yes | NA | The gff3 or gff3.gz file including annotation of gene body. |
| --output_dir | TEXT | Yes | NA | Directory to save dataframe of selected UTR. |
| --res_file_name | TEXT | Yes | NA | File Name of dataframe of the UTR annotation. The suffix .csv is automatically generated. |
| --gff_merge_strategy | TEXT | No | merge | Method for processing overlapping regions. It follows merge_strategy in package gffutils. |
OUTPUT: An csv file including information of annotated 3UTR which is stored at {output_dir}/{res_file_name}.csv.
| Input Argument | Type | Required | Default | Description |
|---|---|---|---|---|
| --utr_file | TEXT | Yes | NA | UTR annotation file (dataframe, resulted from gen_utr_annotation). |
| --cb_file | TEXT | Yes | NA | File of tsv.gz including all validated barcodes (by CellRanger). This file has one column of cell barcode which must be consistent with value of CB tag in bam_file file. |
| --bam_file | TEXT | Yes | NA | Bam file that is used for searching reads over annotated UTR. |
| --output_dir | TEXT | Yes | NA | Output directory to save pickle files of selected reads over annotated UTR. |
| --chunksize | INTERGER | No | 1000 | Number of UTR regions included in each small pickle file, which contains the preprocessed input file for APA analysis. |
OUTPUT: Pickle files that include tuples (gene info, dataframe of parameter).
| Input Argument | Type | Required | Default | Description |
|---|---|---|---|---|
| --input_pickle_file | TEXT | Yes | NA | Input pickle file (result of prepare_input) |
| --output_dir | TEXT | Yes | NA | Directory to save output pickle files including PAS information over annotated UTR. |
| --toml_para_file | TEXT | No | None | A TOML file (example) specifies user-defined parameters. |
| --pre_para_pkl_file | TEXT | No | None | A pickle file with pre-specified pA sites and utr length, result file of scape analysis. |
OUTPUT: Pickle file including Parameters for each UTR region.
| Input Argument | Type | Required | Default | Description |
|---|---|---|---|---|
| --output_dir | TEXT | Yes | NA | Directory which was used in previous steps to save output by prepare_input and infer_pa. |
| --utr_merge | BOOLEAN | No | True | If True, PA sites from the same gene are merge. Otherwise, if False, PA sites from the same UTR are merged. |
OUTPUT: A single pickle file containing all UTRs of all genes is stored in output_dir/. Its name is res.gene.pkl if utr_merge=True, otherwise, its name is res.utr.pkl.
| Input Argument | Type | Required | Default | Description |
|---|---|---|---|---|
| --output_dir | TEXT | Yes | NA | Directory which was used in previous steps to save output by prepare_input and infer_pa. |
| --cell_cluster_file | TEXT | No | - | An csv file containing two columns in order: cell barcode (CB) and respective group (cell_cluster_file). Its name will be included in the file name of final result. |
| --res_pkl_file | TEXT | No | - | Name of res pickle file that contains PASs for calculating expected PA length. Its name will be included in the file name of final result. |
OUTPUT: exp_pa_len.csv. It is a dataframe with 2 columns.
| Input Argument | Type | Required | Default | Description |
|---|---|---|---|---|
| --output_dir | TEXT | Yes | NA | Directory which was used in previous steps to save output by prepare_input and infer_pa. |
| --res_pkl_file | TEXT | No | - | Name of res pickle file that contains PASs for calculating expected PA length. Its name will be included in the file name of final result. |
OUTPUT: An tsv.gz file named {res_pkl_file.cnt.tsv.gz} is stored in output_dir/.
The data used can be downloaded from examples.
Cheng G, Le T, Zhou R and Cheng L. SCAPE-APA: a package for estimating alternative polyadenylation events from scRNA-seq data [version 1; peer review: awaiting peer review]. Open Res Europe 2024, 4:220 (https://doi.org/10.12688/openreseurope.18673.1)