Skip to content

Metrics to decide on the number of clusters #22

@aakrosh

Description

@aakrosh

SUMO generates plots for cophenetic correlation coefficient and the proportion of ambiguously clustered pairs to assist with determining the optimal number of clusters. Additionally, the following metrics can be helpful in certain scenarios and should be generated:

  1. Jaccard index: In some cases, as we go from k clusters to k+1 clusters, a tiny number of samples are assigned to the new cluster. In such a scenario, k+1 clusters may offer little information regarding classification compared to k clusters. If a is the number of pairs of samples that are in the same subgroup for k and the same subgroup for k+1 clusters, and b is the number of pairs of samples that are either in the same group in k and different in k+1 or same group in k+1, but different in k, then you can calculate this index as a / (a+b).

  2. Silhouette score: can be calculated based on H calculated each time, and the final score can be based on those.

  3. Agreement score: How many pairs of samples in each run of the solver get assigned labels that agree with the consensus labels.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestsumo runissue concerns "run" mode

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions