Tools for checking cell identities and keeping the riffraff out of pooled single cell sequencing data sets.
| I want to... | I have... | Tool to use | |
|---|---|---|---|
![]() |
Demultiplex cells by species | Raw reads, plus a transcriptome (FASTA) or annotation (GTF) and genome (FASTA) per species OR A BAM file of reads mapped to a composite reference genome |
demux_species |
![]() |
Demultiplex cells by individual of origin, and I hope individuals are unrelated enough to have different mitochondrial haplotypes | A BAM file of aligned scATAC-seq or whole cell scRNA-seq data | demux_mt |
![]() |
Demultiplex cells by individual of origin | VCF of known variants, plus a BAM file of aligned single cell sequencing data | demux_vcf |
![]() |
Demultiplex individuals by custom label or treatment | FASTQs containing MULTIseq/HTO/CITE-seq data, or a table of pre-computed counts, optionally in MEX format | demux_tags |
![]() |
Assign sgRNAs to cells | FASTQs containing sgRNA capture data, or a table of pre-computed counts, optionally in MEX format | demux_tags |
![]() |
Quantify ambient RNA per cell, infer its origins, and optionally adjust gene counts | Output from demux_vcf (plus optional single-cell expression data to adjust, in MEX format) |
quant_contam |
![]() |
Infer global doublet rate and proportions of individuals | Output from one or more CellBouncer programs run on the same cells |
doublet_dragon |
![]() |
Determine proportion of individuals in a pool | A VCF of known variants, plus a BAM of aligned sequence data (can be bulk) | bulkprops |
| I want to... | I have... | Tool to use |
|---|---|---|
| Visualize a set of labels and the pool compositions they produce at different confidence cutoffs | An .assignments file from a CellBouncer program |
plot/assignment_llr.R |
| Compare two sets of labels on the same cells | Two .assignments files from CellBouncer programs run on the same data |
plot/compare_assignments.R |
| Merge two sets of labels on the same cells into one set of labels | Two .assignments files from CollBouncer programs run on the same data |
utils/merge_assignments.R |
| Compare two sets of pool proportions and assess significance if possible | Two files describing pool composition (i.e. from bulkprops or contamination profile from quant_contam), or one file describing pool composition and an .assignments file describing cell labels |
utils/compare_props.R |
| Refine genotype calls to better match cell-individual labels | A preexisting set of genotypes in VCF format, a BAM file of aligned single-cell data, and an .assignments file mapping cells to individuals of origin |
utils/refine_vcf |
| Plot species proportions | Output from demux_species |
plot/species.R |
| Plot mitochondrial haplotypes | Output from demux_mt |
plot/demux_mt_clust.R plot/demux_mt_unclust.R |
| Plot ambient RNA profile | Output from quant_contam |
plot/contam.R |
| Plot counts of cell identifications, according to different data types and to the consensus among them | Output from doublet_dragon |
plot/doublet_dragon.R |
| I want to... | I have... | Tool to use |
|---|---|---|
| Split a BAM file into one file per cell identity | A BAM file of aligned single-cell sequencing data and a CellBouncer-format .assignments file |
utils/bam_split_bcs |
| Tag reads in a BAM file to mark individual of origin | A BAM file of aligned single-cell sequencing data and a CellBouncer-format .assignments file |
utils/bam_indiv_rg |
Convert 10X or Scanpy (AnnData) data from .h5 to MEX format |
A CellRanger-format .h5 or Scanpy-format .h5ad file |
utils/h5tomex.py |
| Split MEX-format data into one data set per library/run | Single-cell expression data in MEX format | utils/split_mex_libs.py |
| Subset MEX-format data to specific cell barcodes | Single-cell expression data in MEX format | utils/subs_mex_bc.py |
| Subset MEX-format data to a specific feature type | Single-cell expression data in MEX format | utils/subs_mex_featuretype.py |
The included Dockerfile can be used to set up and compile everything required by CellBouncer.
To install (see below):
- Clone the repository (and its submodules)
- Choose a
condaenvironment file to install- All necessary dependencies:
cellbouncer_minimum.yml - All necessary dependencies plus extra helper programs mentioned in documentation:
- Mac OS X:
cellbouncer_extra_osx.yml - Linux:
cellbouncer_extra.yml
- Mac OS X:
- All necessary dependencies:
- Run
make
For more information about installing or updating CellBouncer, see here.
git clone --recurse-submodules git@github.com:nkschaefer/cellbouncer.git
cd cellbouncer
conda env create --file=cellbouncer_minimum.yml
conda activate cellbouncer
conda env config vars set LD_LIBRARY_PATH="${CONDA_PREFIX}/lib:${LD_LIBRARY_PATH}" -n cellbouncer
CONDA_SUBDIR=osx-arm64 conda env create --file=cellbouncer_minimum.yml
conda activate cellbouncer
conda env config vars set DYLD_LIBRARY_PATH="${CONDA_PREFIX}/lib:${DYLD_LIBRARY_PATH}" -n cellbouncer
conda env create --file=cellbouncer_minimum.yml
conda activate cellbouncer
conda env config vars set DYLD_LIBRARY_PATH="${CONDA_PREFIX}/lib:${DYLD_LIBRARY_PATH}" -n cellbouncer
make
You've now got all the programs compiled, and you can run them as long as you remember to conda activate cellbouncer first.
You can get a test data set from this link. It contains an example .bam file, vcf file, and cell hashing .counts file. The README in the linked directory will explain everything, but these give you the opportunity to test-run demux_species, demux_mt, demux_tags, demux_vcf, quant_contam, and doublet_dragon.
The programs in cellbouncer are standalone command line tools. If you run one of them with no arguments or with -h, it will give you detailed information about how to run it. Each program uses the concept of an --output_prefix/-o, which is a base name that will be used for all output files.
Demultiplexing tools all write a file called [output_prefix].assignments, which tells you information about each cell's identity. These files are 4 columns, tab separated:
- cell barcode (optionally with unique ID appended; see below)
- most likely identity (doublets are two names in alphabetical order separated by
+) - droplet type:
S(for singlet),D(for doublet), or in some casesM(for multiplet, 3+ individuals, so far only considered bydemux_tags) - ratio of the log likelihood of the best to the second best assignment (a measure of confidence in the assignment)
To load data from CellBouncer into a single cell analysis tool like Seurat or scanpy, you will need to load CellBouncer's (text format) output files and merge with your single cell data set. This requires ensuring that cell barcodes are formatted the same way by CellBouncer as in your data set. Read more here.
CellBouncer programs take input from single libraries. If you have concatenated multiple single cell sequencing data sets, CellBouncer will interpret all cells with the same barcode sequence as the same cell, ignoring any unique IDs you have added to barcodes. If you need to load data in MEX format (i.e. for demux_tags or quant_contam) and that data comes from multiple libraries that were concatenated together, you can separate the data by library using the program utils/split_mex_libs.py.
We thank Helena Pinheiro for creating the icons for demux_species, demux_mt, demux_vcf, demux_tags, quant_contam, and bulkprops, and the cell drawing used throughout.
CellBouncer is now described in a preprint available on biorXiv. If you use CellBouncer in your work, please cite us. Thanks!









