CHOIR yielded ~50 T-cell clusters—expected or over-splitting?

Hi CHOIR team,

I’m trying to use CHOIR to sub-cluster a T-cell subset, but I ended up with ~50 clusters, which seems higher than expected. The whole T cells has 34,581 genes × 228,417 cells). Here is my trying Workflow:

```{R}
# 1. Extract T cells
Idents(all) <- "celltype_broad"
t_cell <- subset(all, subset = celltype_broad %in% c("CD4 T", "CD8 T", "MAIT"))

# 2. Clear previous embeddings
t_cell@reductions <- list()

# 3. Standard Seurat preprocessing
t_cell <- NormalizeData(t_cell)
t_cell <- FindVariableFeatures(t_cell, nfeatures = 3000)
t_cell <- ScaleData(t_cell)
t_cell <- RunPCA(t_cell, verbose = FALSE)

# 4. CHOIR clustering & visualization
library(CHOIR)
t_cell <- CHOIR(t_cell, n_cores = 4)
t_cell <- runCHOIRumap(t_cell, reduction = "P0_reduction")
plotCHOIR(t_cell, accuracy_scores = TRUE, plot_nearest = FALSE)

```

Questions
	1.	Cluster count: Is obtaining ~50 clusters for a CD4/CD8/MAIT subset typical with CHOIR, or does it indicate over-splitting?
	2.	Workflow validation: Is my preprocessing + CHOIR() + runCHOIRumap() + plotCHOIR() pipeline correct?
	3.	Resetting reductions: Could t_cell@reductions <- list() remove critical metadata or embeddings?

The UMAP looks like this

<img width="930" height="771" alt="Image" src="https://github.com/user-attachments/assets/995f7b0a-5c01-4334-afb5-56e62d31acb3" />

Thank you for your guidance!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

CHOIR yielded ~50 T-cell clusters—expected or over-splitting? #42

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

CHOIR yielded ~50 T-cell clusters—expected or over-splitting? #42

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions