See https://github.com/SqueezeAILab/SqueezeLLM/issues/67 that claims to have a KMeans library that can speedup clustering a model from 2 hours down to 6 minutes.