Native Gang Scheduling (Proof of Concept) #440
Draft
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This is a proof of concept for gang scheduling in Cortex using the native Kubernetes v1.35 Workload API.
I know it is still undecided whether Cortex is supposed to implement its own gang scheduler or rely on Kubernetes' native support.
If we decide Cortex is supposed to have its own gang scheduling mechanism instead of relying on the solution provided by Kubernetes, we might be able to reuse parts of this PR in any case. Also, even if we don't use the native gang scheduling, it might at some point be useful to have a comparison of both implementations.
If this cannot be used in any way, it was still quite interesting for me to test out the native gang scheduling.
Usage:
tilt upin existing cluster, e.g. minikubekubectl config set-context --current --namespace=cortex-systemkubectl describe pod gang-podInteresting links:
Ref Issue: #393