-
Notifications
You must be signed in to change notification settings - Fork 22
Announce Rails in ResourceSlice #173
Description
For optimal performance of AI/ML workloads there are several factors that impact the network, more on presentation in https://docs.google.com/presentation/d/1VbxvXc1aqIjdpin7-MxF_ECaApsgAoWPySIZ3tq1ZoE/edit?slide=id.p#slide=id.p
Intra-Node topology and GPU/NIC alignment is achieved via MatchAttributes, however, it also require Inter-Node alignment, as VMs/Machines use to be cabled in a certain way to optimize the network.
This causes that if we have a cluster of VMs, we can not just require to match ANY GPU and NIC that are in the same pciRoot, we MUST match any GPU and NIC that are close in the machine topology but also that are in the same Rail across machines.
Per conversation in Kubernetes slack https://kubernetes.slack.com/archives/C0409NGC1TK/p1753615387658689
, this can be achieved using DRA :
Using a node selector in the ResourceSlice instead of a node name makes the devices available for use on all nodes matching the selector.
The scheduler picks a device as usual and sets the ResourceClaim status so that it is marked as usable on the same nodes as the device. If there are multiple devices, the claim node selector covers the intersection of all device node selectors.
Depending on that outcome, the ResourceClaim might be usable by multiple different pods on different nodes.
NVIDIA already implements something similar with IMEX channels:
https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/latest/dra-cds.html#a-deeper-dive-related-resources
https://docs.google.com/presentation/d/1Xupr8IZVAjs5bNFKJnYaK0LE7QWETnJjkz6KOfLu87E/edit?pli=1&slide=id.g28ac369118f_0_1647#slide=id.g[…]118f_0_1647