Skip to content

Inconsistency between CHOIR clusters with Harmony Integration #26

@alifarhat40

Description

@alifarhat40

Hello,

I am trying to apply CHOIR on the RNA assay of a multiomic dataset. I did this in two ways. First, I applied Harmony integration using the harmony package followed by seurat clustering with resolution of 2 to be conservative. The second way I used the following command in CHOIR:

cellranger <- readRDS("integrated_cellranger_peaks_RNA.rds")

cellranger

DefaultAssay(cellranger) <- "RNA"
cellranger

cellranger <- CHOIR(cellranger, 
                batch_correction_method = "Harmony",
                batch_labels = "sample",
                use_assay = "RNA",
                n_cores= 50)

However, the clusters do not look good when compared to seurat's clustering. See comparison below:

Image

Image

  1. How is pca used when CHOIR says computing pca? Does it make sense to start the tree with pca? Shouldn't CHOIR use the dimensionality reduction obtained from Harmony instead to build the tree and the graphs?

  2. Should I add use_slot = counts or = data?

  3. I would like to understand more how the dimensionality reduction works when setting subtree_reductions = TRUE .

Warning message in .validInput(subtree_reductions, "subtree_reductions", list(reduction, :
“Supplied dimensionality reduction matrix for parameter 'subtree_reductions' will only be used for the root tree. Thereafter, the dimensionality reductions for each subtree will be calculated according to the specified 'reduction_method'. To use only the supplied dimensionality reduction matrix, set parameter 'subtree_reductions' to FALSE.”

How are subsequent dimensionality reductions computed? It is my understanding that Harmony only provides a reduced dimensionality reduciton, and thats it. It does not provide a corrected counts or corrected data matrix. So do you compute PCA every time on the counts matrix for subsequent trees? If so, where does Harmony integration come to play here? I suspect the discrepancy is coming from the buildTree() function. These integration methods do not provide corrected counts or corrected scaled data matrices, so how does integration and batch correction come into play here.

Similarly, I get errors and issues with ATAC Harmony and Multiomic Harmony integrations. Also, should I be using counts or data below in the use_slot parameter?

cellranger <- CHOIR(cellranger,
                    use_assay = c("peaks", "RNA"),
                    use_slot = c("data", "data"),
                    atac = c(TRUE, FALSE),
                    batch_correction_method = c("Harmony", "Harmony"),
                    batch_labels = c("sample","sample"),
                    distance_approx = FALSE,
                    n_cores= 50)

Note that I have 8 different batches.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions