Installation

Tools for checking cell identities and keeping the riffraff out of pooled single cell sequencing data sets.

I want to...	I have...	Tool to use
Demultiplex cells by species	Raw reads, plus a transcriptome (FASTA) or annotation (GTF) and genome (FASTA) per species OR A BAM file of reads mapped to a composite reference genome	`demux_species`
Demultiplex cells by individual of origin, and I hope individuals are unrelated enough to have different mitochondrial haplotypes	A BAM file of aligned scATAC-seq or whole cell scRNA-seq data	`demux_mt`
Demultiplex cells by individual of origin	VCF of known variants, plus a BAM file of aligned single cell sequencing data	`demux_vcf`
Demultiplex individuals by custom label or treatment	FASTQs containing MULTIseq/HTO/CITE-seq data, or a table of pre-computed counts, optionally in MEX format	`demux_tags`
Assign sgRNAs to cells	FASTQs containing sgRNA capture data, or a table of pre-computed counts, optionally in MEX format	`demux_tags`
Quantify ambient RNA per cell, infer its origins, and optionally adjust gene counts	Output from `demux_vcf` (plus optional single-cell expression data to adjust, in MEX format)	`quant_contam`
Infer global doublet rate and proportions of individuals	Output from one or more `CellBouncer` programs run on the same cells	`doublet_dragon`
Determine proportion of individuals in a pool	A VCF of known variants, plus a BAM of aligned sequence data (can be bulk)	`bulkprops`

Visualizing and comparing results

I want to...	I have...	Tool to use
Visualize a set of labels and the pool compositions they produce at different confidence cutoffs	An `.assignments` file from a CellBouncer program	`plot/assignment_llr.R`
Compare two sets of labels on the same cells	Two `.assignments` files from CellBouncer programs run on the same data	`plot/compare_assignments.R`
Merge two sets of labels on the same cells into one set of labels	Two `.assignments` files from CollBouncer programs run on the same data	`utils/merge_assignments.R`
Compare two sets of pool proportions and assess significance if possible	Two files describing pool composition (i.e. from `bulkprops` or contamination profile from `quant_contam`), or one file describing pool composition and an `.assignments` file describing cell labels	`utils/compare_props.R`
Refine genotype calls to better match cell-individual labels	A preexisting set of genotypes in VCF format, a BAM file of aligned single-cell data, and an `.assignments` file mapping cells to individuals of origin	`utils/refine_vcf`
Plot species proportions	Output from `demux_species`	`plot/species.R`
Plot mitochondrial haplotypes	Output from `demux_mt`	`plot/demux_mt_clust.R` `plot/demux_mt_unclust.R`
Plot ambient RNA profile	Output from `quant_contam`	`plot/contam.R`
Plot counts of cell identifications, according to different data types and to the consensus among them	Output from `doublet_dragon`	`plot/doublet_dragon.R`

Manipulating input files

I want to...	I have...	Tool to use
Split a BAM file into one file per cell identity	A BAM file of aligned single-cell sequencing data and a CellBouncer-format `.assignments` file	`utils/bam_split_bcs`
Tag reads in a BAM file to mark individual of origin	A BAM file of aligned single-cell sequencing data and a CellBouncer-format `.assignments` file	`utils/bam_indiv_rg`
Convert 10X or Scanpy (AnnData) data from `.h5` to MEX format	A CellRanger-format `.h5` or Scanpy-format `.h5ad` file	`utils/h5tomex.py`
Split MEX-format data into one data set per library/run	Single-cell expression data in MEX format	`utils/split_mex_libs.py`
Subset MEX-format data to specific cell barcodes	Single-cell expression data in MEX format	`utils/subs_mex_bc.py`
Subset MEX-format data to a specific feature type	Single-cell expression data in MEX format	`utils/subs_mex_featuretype.py`

Installation

Using Docker

The included Dockerfile can be used to set up and compile everything required by CellBouncer.

Not using Docker

To install (see below):

Clone the repository (and its submodules)
Choose a conda environment file to install
- All necessary dependencies: cellbouncer_minimum.yml
- All necessary dependencies plus extra helper programs mentioned in documentation:
  - Mac OS X: cellbouncer_extra_osx.yml
  - Linux: cellbouncer_extra.yml
Run make

For more information about installing or updating CellBouncer, see here.

Get the repository

git clone --recurse-submodules git@github.com:nkschaefer/cellbouncer.git
cd cellbouncer

Create conda environment

Linux

conda env create --file=cellbouncer_minimum.yml
conda activate cellbouncer
conda env config vars set LD_LIBRARY_PATH="${CONDA_PREFIX}/lib:${LD_LIBRARY_PATH}" -n cellbouncer

Mac OS X (M1)

CONDA_SUBDIR=osx-arm64 conda env create --file=cellbouncer_minimum.yml
conda activate cellbouncer
conda env config vars set DYLD_LIBRARY_PATH="${CONDA_PREFIX}/lib:${DYLD_LIBRARY_PATH}" -n cellbouncer

Mac OS X (Intel)

conda env create --file=cellbouncer_minimum.yml
conda activate cellbouncer
conda env config vars set DYLD_LIBRARY_PATH="${CONDA_PREFIX}/lib:${DYLD_LIBRARY_PATH}" -n cellbouncer

Compile

make

You've now got all the programs compiled, and you can run them as long as you remember to conda activate cellbouncer first.

Test data set

You can get a test data set from this link. It contains an example .bam file, vcf file, and cell hashing .counts file. The README in the linked directory will explain everything, but these give you the opportunity to test-run demux_species, demux_mt, demux_tags, demux_vcf, quant_contam, and doublet_dragon.

Overview

The programs in cellbouncer are standalone command line tools. If you run one of them with no arguments or with -h, it will give you detailed information about how to run it. Each program uses the concept of an --output_prefix/-o, which is a base name that will be used for all output files.

Output files

Demultiplexing tools all write a file called [output_prefix].assignments, which tells you information about each cell's identity. These files are 4 columns, tab separated:

cell barcode (optionally with unique ID appended; see below)
most likely identity (doublets are two names in alphabetical order separated by +)
droplet type: S (for singlet), D (for doublet), or in some cases M (for multiplet, 3+ individuals, so far only considered by demux_tags)
ratio of the log likelihood of the best to the second best assignment (a measure of confidence in the assignment)

Cell barcode format and merging with other data

To load data from CellBouncer into a single cell analysis tool like Seurat or scanpy, you will need to load CellBouncer's (text format) output files and merge with your single cell data set. This requires ensuring that cell barcodes are formatted the same way by CellBouncer as in your data set. Read more here.

Note about multi-library data sets

CellBouncer programs take input from single libraries. If you have concatenated multiple single cell sequencing data sets, CellBouncer will interpret all cells with the same barcode sequence as the same cell, ignoring any unique IDs you have added to barcodes. If you need to load data in MEX format (i.e. for demux_tags or quant_contam) and that data comes from multiple libraries that were concatenated together, you can separate the data by library using the program utils/split_mex_libs.py.

Acknowledgments

We thank Helena Pinheiro for creating the icons for demux_species, demux_mt, demux_vcf, demux_tags, quant_contam, and bulkprops, and the cell drawing used throughout.

Citing us

CellBouncer is now described in a preprint available on biorXiv. If you use CellBouncer in your work, please cite us. Thanks!

Name		Name	Last commit message	Last commit date
Latest commit History 766 Commits
build		build
data		data
dependencies		dependencies
docs		docs
img		img
include		include
lib		lib
pipelines		pipelines
plot		plot
src		src
utils		utils
.gitignore		.gitignore
.gitmodules		.gitmodules
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
cellbouncer_extra.yml		cellbouncer_extra.yml
cellbouncer_extra_osx.yml		cellbouncer_extra_osx.yml
cellbouncer_minimum.yml		cellbouncer_minimum.yml
mt_subcluster.py		mt_subcluster.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Visualizing and comparing results

Manipulating input files

Installation

Using Docker

Not using Docker

Get the repository

Create conda environment

Linux

Mac OS X (M1)

Mac OS X (Intel)

Compile

Test data set

Overview

Output files

Cell barcode format and merging with other data

Note about multi-library data sets

Acknowledgments

Citing us

About

Uh oh!

Releases 1

Packages

Contributors 3

Uh oh!

Languages

License

nkschaefer/cellbouncer

Folders and files

Latest commit

History

Repository files navigation

Visualizing and comparing results

Manipulating input files

Installation

Using Docker

Not using Docker

Get the repository

Create conda environment

Linux

Mac OS X (M1)

Mac OS X (Intel)

Compile

Test data set

Overview

Output files

Cell barcode format and merging with other data

Note about multi-library data sets

Acknowledgments

Citing us

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 3

Uh oh!

Languages

Packages