Skip to content

CHOIR yielded ~50 T-cell clusters—expected or over-splitting? #42

@laleoarrow

Description

@laleoarrow

Hi CHOIR team,

I’m trying to use CHOIR to sub-cluster a T-cell subset, but I ended up with ~50 clusters, which seems higher than expected. The whole T cells has 34,581 genes × 228,417 cells). Here is my trying Workflow:

# 1. Extract T cells
Idents(all) <- "celltype_broad"
t_cell <- subset(all, subset = celltype_broad %in% c("CD4 T", "CD8 T", "MAIT"))

# 2. Clear previous embeddings
t_cell@reductions <- list()

# 3. Standard Seurat preprocessing
t_cell <- NormalizeData(t_cell)
t_cell <- FindVariableFeatures(t_cell, nfeatures = 3000)
t_cell <- ScaleData(t_cell)
t_cell <- RunPCA(t_cell, verbose = FALSE)

# 4. CHOIR clustering & visualization
library(CHOIR)
t_cell <- CHOIR(t_cell, n_cores = 4)
t_cell <- runCHOIRumap(t_cell, reduction = "P0_reduction")
plotCHOIR(t_cell, accuracy_scores = TRUE, plot_nearest = FALSE)

Questions
1. Cluster count: Is obtaining ~50 clusters for a CD4/CD8/MAIT subset typical with CHOIR, or does it indicate over-splitting?
2. Workflow validation: Is my preprocessing + CHOIR() + runCHOIRumap() + plotCHOIR() pipeline correct?
3. Resetting reductions: Could t_cell@reductions <- list() remove critical metadata or embeddings?

The UMAP looks like this

Image

Thank you for your guidance!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions