A comprehensive toolkit for single-cell methylation sequencing data analysis
Single-cell sequencing technologies have revolutionized biomedical research by enabling deconvolution of cell type-specific properties in highly heterogeneous tissue. While robust tools have been developed to handle bioinformatic challenges posed by single-cell RNA and ATAC data, options for emergent modalities such methylation are much more limited, impeding the utility of results. Here we present Amethyst, the first comprehensive R package for atlas-scale single-cell methylation sequencing data analysis. Amethyst takes base-level methylation calls and facilitates batch integration, doublet detection, dimensionality reduction, clustering, cell type annotation, differentially methylated region calling, and interpretation of results all in one streamlined platform. See our manuscript to learn more!
To become familiar with the Amethyst workflow, we recommend beginning with the PBMC vignette, which is focused on CG methylation analysis and applicable to any tissue.
Certain tissues - such as the brain and stem cells - also contain high levels of non-CG methylation and necessitate a very different analysis approach. After completing the PBMC vignette, we recommend going over the brain vignette for mCH-specific analysis.
In addition to these general workflow examples, we have specific vignettes for:
- Combining datasets with overlapping barcodes
- Batch integration
- Alternative clustering approaches
- Doublet detection
- Additional utilities: subsetting, merging, imputation
You will likely need to install one or more dependencies first:
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
library("BiocManager")
BiocManager::install(c("caret", "cowplot", "devtools", "data.table", "dplyr", "FNN", "furrr", "future", "future.apply",
"GenomicRanges", "ggplot2", "grDevices", "gridExtra", "igraph", "IRanges", "irlba", "janitor", "jsonlite", "Matrix", "mgcv", "methods", "pheatmap",
"plotly", "plyr", "purrr", "randomForest", "RANN", "rhdf5", "rtracklayer", "rlang", "Rtsne", "scales", "stats", "stringr",
"tibble", "tidyr", "umap", "utils"))
devtools::install_github("JinmiaoChenLab/Rphenograph")
devtools::install_github("KrishnaswamyLab/MAGIC/Rmagic")
Installation of Amethyst can then be done using devtools or remotes:
library("devtools")
devtools::install_github("lrylaarsdam/amethyst")
library("remotes")
remotes::install_github("lrylaarsdam/amethyst")
Amethyst begins with base-level methylation calls per cell wrapped into h5 files. The structure of the h5 file is illustrated in the diagram below. If desired, aggregate methylation levels over features can be calculated with Facet and stored in the h5 file as well.
Scripts for initial processing of sequencing data to produce this input format are available at the Adey Lab Premethyst repo. Please see vignettes for example Premethyst outputs and subsequent analysis steps.
We recently made many improvements to Amethyst that required upgrades to the input data structures. If you have h5 files in the old v0.0.0.9000 format, Facet has a helper function to convert the files so base-level observations are stored under context/barcode/1.
facet convert new_format.h5 old_format.h5
If you are using the Scale Biosciences pipeline, we have written a helper function to load the output into an Amethyst object. createScaleObject automatically populates the metadata and path slots for you. In its most basic form, all that is needed is the directory path:
obj <- createScaleObject(directory = "~/Downloads/ScaleMethyl.out/samples")
You may also wish to load any pre-generated matrices, which would allow one to skip past the makeWindows step in the vignette. Below is an example of how to load the "CG.score" matrix. Double-check your computational resources are capable of handling the entire matrix size first.
obj <- createScaleObject(directory = "~/Downloads/ScaleMethyl.out/samples", genomeMatrices = list("CG.score", "CH"))
If using neither Premethyst nor ScaleMethyl, any pre-processing platform can be used. The important thing is that the data is in the expected structure. The diagram below illustrates each slot and its respective data contents.
You might notice in the diagram above that we implemented some minor changes to the object structure in v1.0. To make this transition as smooth as possible, we have provided a helper function convertObject to convert format v0.0.0.9000 to v1.0.
new_obj <- convertObject(obj = old_obj)
Amethyst is still a work in progress. Please let us know if any issues come up.
Our manuscript is now officially published in Communications Biology! Please cite if you use Amethyst for your analysis.
We have recently implemented widespread improvements from v0.0.0.9000. Changes include:
- Base-resolution methylation information is now expected to be stored in the h5 file under context/barcode/1 (see Getting started)
- Aggregated values can also be calculated and stored in the h5 file under context/barcode/name using the efficient helper package Facet
- New slots added to store methylation tracks and results (see Getting started)
- Multiple clustering and dimensionality reductions can be stored within the same object
- Chromosome white lists can be accommodated
- Experiments with overlapping barcode sets can be combined (see vignette)
Please see our news for a more comprehensive change log and version update information. Detailed object structure explanations and conversion instructions are included above.
Amethyst is distributed under the MIT License. Please see LICENSE.txt for further information.


