Inconsistency between CHOIR clusters with Harmony Integration

Hello,

I am trying to apply CHOIR on the RNA assay of a multiomic dataset. I did this in two ways. First, I applied Harmony integration using the harmony package followed by seurat clustering with resolution of 2 to be conservative. The second way I used the following command in CHOIR:

```
cellranger <- readRDS("integrated_cellranger_peaks_RNA.rds")

cellranger

DefaultAssay(cellranger) <- "RNA"
cellranger

cellranger <- CHOIR(cellranger, 
                batch_correction_method = "Harmony",
                batch_labels = "sample",
                use_assay = "RNA",
                n_cores= 50)

```

However, the clusters do not look good when compared to seurat's clustering. See comparison below:

![Image](https://github.com/user-attachments/assets/3b0b0d82-74fd-406f-83d0-de792d7dfeda)

![Image](https://github.com/user-attachments/assets/ae8e019d-6461-4974-8238-6add542bcc9e)

1) How is pca used when CHOIR says computing pca? Does it make sense to start the tree with pca? Shouldn't CHOIR use the dimensionality reduction obtained from Harmony instead to build the tree and the graphs?

2) Should I add use_slot = counts or = data?


3) I would like to understand more how the dimensionality reduction works when setting `subtree_reductions = TRUE` . 
```
Warning message in .validInput(subtree_reductions, "subtree_reductions", list(reduction, :
“Supplied dimensionality reduction matrix for parameter 'subtree_reductions' will only be used for the root tree. Thereafter, the dimensionality reductions for each subtree will be calculated according to the specified 'reduction_method'. To use only the supplied dimensionality reduction matrix, set parameter 'subtree_reductions' to FALSE.”
```
How are subsequent dimensionality reductions computed? It is my understanding that Harmony only provides a reduced dimensionality reduciton, and thats it. It does not provide a corrected counts or corrected data matrix. So do you compute PCA every time on the counts matrix for subsequent trees? If so, where does Harmony integration come to play here? I suspect the discrepancy is coming from the `buildTree()` function. These integration methods do not provide corrected counts or corrected scaled data matrices, so how does integration and batch correction come into play here. 

Similarly, I get errors and issues with ATAC Harmony and Multiomic Harmony integrations. Also, should I be using counts or data below in the use_slot parameter?

```
cellranger <- CHOIR(cellranger,
                    use_assay = c("peaks", "RNA"),
                    use_slot = c("data", "data"),
                    atac = c(TRUE, FALSE),
                    batch_correction_method = c("Harmony", "Harmony"),
                    batch_labels = c("sample","sample"),
                    distance_approx = FALSE,
                    n_cores= 50)
```

Note that I have 8 different batches. 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Inconsistency between CHOIR clusters with Harmony Integration #26

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Inconsistency between CHOIR clusters with Harmony Integration #26

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions