Skip to content

Tools for pooled single cell sequencing experiments: demultiplex cells, infer doublet rate, assign treatments/sgRNAs, infer ambient RNA from allele matching

License

Notifications You must be signed in to change notification settings

nkschaefer/cellbouncer

Repository files navigation

CellBouncer

Tools for checking cell identities and keeping the riffraff out of pooled single cell sequencing data sets.

                        I want to... I have... Tool to use
demux_species Demultiplex cells by species Raw reads, plus a transcriptome (FASTA) or annotation (GTF) and genome (FASTA) per species

OR

A BAM file of reads mapped to a composite reference genome
demux_species
demux_mt Demultiplex cells by individual of origin, and I hope individuals are unrelated enough to have different mitochondrial haplotypes A BAM file of aligned scATAC-seq or whole cell scRNA-seq data demux_mt
demux_vcf Demultiplex cells by individual of origin VCF of known variants, plus a BAM file of aligned single cell sequencing data demux_vcf
demux_tags Demultiplex individuals by custom label or treatment FASTQs containing MULTIseq/HTO/CITE-seq data, or a table of pre-computed counts, optionally in MEX format demux_tags
demux_tags Assign sgRNAs to cells FASTQs containing sgRNA capture data, or a table of pre-computed counts, optionally in MEX format demux_tags
quant_contam Quantify ambient RNA per cell, infer its origins, and optionally adjust gene counts Output from demux_vcf (plus optional single-cell expression data to adjust, in MEX format) quant_contam
doublet_dragon Infer global doublet rate and proportions of individuals Output from one or more CellBouncer programs run on the same cells doublet_dragon
bulkprops Determine proportion of individuals in a pool A VCF of known variants, plus a BAM of aligned sequence data (can be bulk) bulkprops

  

Visualizing and comparing results

I want to... I have... Tool to use
Visualize a set of labels and the pool compositions they produce at different confidence cutoffs An .assignments file from a CellBouncer program plot/assignment_llr.R
Compare two sets of labels on the same cells Two .assignments files from CellBouncer programs run on the same data plot/compare_assignments.R
Merge two sets of labels on the same cells into one set of labels Two .assignments files from CollBouncer programs run on the same data utils/merge_assignments.R
Compare two sets of pool proportions and assess significance if possible Two files describing pool composition (i.e. from bulkprops or contamination profile from quant_contam), or one file describing pool composition and an .assignments file describing cell labels utils/compare_props.R
Refine genotype calls to better match cell-individual labels A preexisting set of genotypes in VCF format, a BAM file of aligned single-cell data, and an .assignments file mapping cells to individuals of origin utils/refine_vcf
Plot species proportions Output from demux_species plot/species.R
Plot mitochondrial haplotypes Output from demux_mt plot/demux_mt_clust.R
plot/demux_mt_unclust.R
Plot ambient RNA profile Output from quant_contam plot/contam.R
Plot counts of cell identifications, according to different data types and to the consensus among them Output from doublet_dragon plot/doublet_dragon.R

Manipulating input files

I want to... I have... Tool to use
Split a BAM file into one file per cell identity A BAM file of aligned single-cell sequencing data and a CellBouncer-format .assignments file utils/bam_split_bcs
Tag reads in a BAM file to mark individual of origin A BAM file of aligned single-cell sequencing data and a CellBouncer-format .assignments file utils/bam_indiv_rg
Convert 10X or Scanpy (AnnData) data from .h5 to MEX format A CellRanger-format .h5 or Scanpy-format .h5ad file utils/h5tomex.py
Split MEX-format data into one data set per library/run Single-cell expression data in MEX format utils/split_mex_libs.py
Subset MEX-format data to specific cell barcodes Single-cell expression data in MEX format utils/subs_mex_bc.py
Subset MEX-format data to a specific feature type Single-cell expression data in MEX format utils/subs_mex_featuretype.py

Installation

Using Docker

The included Dockerfile can be used to set up and compile everything required by CellBouncer.

Not using Docker

To install (see below):

  • Clone the repository (and its submodules)
  • Choose a conda environment file to install
    • All necessary dependencies: cellbouncer_minimum.yml
    • All necessary dependencies plus extra helper programs mentioned in documentation:
      • Mac OS X: cellbouncer_extra_osx.yml
      • Linux: cellbouncer_extra.yml
  • Run make

For more information about installing or updating CellBouncer, see here.

Get the repository

git clone --recurse-submodules git@github.com:nkschaefer/cellbouncer.git
cd cellbouncer

Create conda environment

Linux

conda env create --file=cellbouncer_minimum.yml
conda activate cellbouncer
conda env config vars set LD_LIBRARY_PATH="${CONDA_PREFIX}/lib:${LD_LIBRARY_PATH}" -n cellbouncer

Mac OS X (M1)

CONDA_SUBDIR=osx-arm64 conda env create --file=cellbouncer_minimum.yml
conda activate cellbouncer
conda env config vars set DYLD_LIBRARY_PATH="${CONDA_PREFIX}/lib:${DYLD_LIBRARY_PATH}" -n cellbouncer

Mac OS X (Intel)

conda env create --file=cellbouncer_minimum.yml
conda activate cellbouncer
conda env config vars set DYLD_LIBRARY_PATH="${CONDA_PREFIX}/lib:${DYLD_LIBRARY_PATH}" -n cellbouncer

Compile

make

You've now got all the programs compiled, and you can run them as long as you remember to conda activate cellbouncer first.

Test data set

You can get a test data set from this link. It contains an example .bam file, vcf file, and cell hashing .counts file. The README in the linked directory will explain everything, but these give you the opportunity to test-run demux_species, demux_mt, demux_tags, demux_vcf, quant_contam, and doublet_dragon.

Overview

The programs in cellbouncer are standalone command line tools. If you run one of them with no arguments or with -h, it will give you detailed information about how to run it. Each program uses the concept of an --output_prefix/-o, which is a base name that will be used for all output files.

Output files

Demultiplexing tools all write a file called [output_prefix].assignments, which tells you information about each cell's identity. These files are 4 columns, tab separated:

  • cell barcode (optionally with unique ID appended; see below)
  • most likely identity (doublets are two names in alphabetical order separated by +)
  • droplet type: S (for singlet), D (for doublet), or in some cases M (for multiplet, 3+ individuals, so far only considered by demux_tags)
  • ratio of the log likelihood of the best to the second best assignment (a measure of confidence in the assignment)

Cell barcode format and merging with other data

To load data from CellBouncer into a single cell analysis tool like Seurat or scanpy, you will need to load CellBouncer's (text format) output files and merge with your single cell data set. This requires ensuring that cell barcodes are formatted the same way by CellBouncer as in your data set. Read more here.

Note about multi-library data sets

CellBouncer programs take input from single libraries. If you have concatenated multiple single cell sequencing data sets, CellBouncer will interpret all cells with the same barcode sequence as the same cell, ignoring any unique IDs you have added to barcodes. If you need to load data in MEX format (i.e. for demux_tags or quant_contam) and that data comes from multiple libraries that were concatenated together, you can separate the data by library using the program utils/split_mex_libs.py.

Acknowledgments

We thank Helena Pinheiro for creating the icons for demux_species, demux_mt, demux_vcf, demux_tags, quant_contam, and bulkprops, and the cell drawing used throughout.

Citing us

CellBouncer is now described in a preprint available on biorXiv. If you use CellBouncer in your work, please cite us. Thanks!

About

Tools for pooled single cell sequencing experiments: demultiplex cells, infer doublet rate, assign treatments/sgRNAs, infer ambient RNA from allele matching

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 3

  •  
  •  
  •