Unexpectledly large number of clusters

Hi Cathrine,

First of all thanks for developing this very interesting method! I tried out CHOIR on one of my datasets to see how the results would compare with my previous clustering analysis. I understand that one of the main advantages of CHOIR is that in principle it should identify the "correct" number of clusters without over or under clustering. 

I have a snRNA-seq dataset of postmortem human cortical tissue from 8 individuals totaling ~66k nuclei. Based on my previous analysis and what we expect based on other studies, this dataset has clusters corresponding to seven major cell lineages (oligodendrocytes, OPCs, microglia, astrocytes, vascular cells, excitatory neurons, and inhibitory neurons). We generally expect that each of these major cell types would have some subclusters as well. I was surprised that after running CHOIR, I ended up with 132 clusters identified in my dataset, many of which contained very few nuclei. I ran CHOIR using the `harmony` dim reduction that I had already computed. I am not sure what is going on here or if you have any advice in this scenario? Below I am including the code that I ran as well as a UMAP plot comparing my previous clustering to the CHOIR clustering.

```
seurat_obj <- CHOIR(
    seurat_obj, 
    reduction = seurat_obj@reductions$harmony@cell.embeddings,
    var_features = VariableFeatures(seurat_obj)
)
```

<img width="899" alt="CHOIR_clusters" src="https://github.com/corceslab/CHOIR/assets/20936230/7a932717-f30e-4e9e-ac09-2f16f5192f99">



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Unexpectledly large number of clusters #7

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Unexpectledly large number of clusters #7

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions