Conversation
Pods which have a gpu cores map specified in Triad config receive these mappings in a form of annotation list. Annotations can be used to expose individual nvidia devices as hostPath volume objects to the k8s pod (instead of just exposing whole /dev contents from the host) The example form of the annotation is: sigproc.viasat.io/nhd_gpu_devices.nvidia0: 3 ... where the device name as seen on the pod is the last segment of the label here - "nvidia0" and the physical gpu core name is the value of the annotation - in this case "3".
Later versions introduce changes in module behavior and cause problems with NHD: AttributeError: module 'kubernetes.client' has no attribute 'V1Event'
Whenever dual_port=true is specified in topology config and the workload is scheduled on a dual port gpu network card - 2 network attachment definitions are created for the pod on 2 ports of the same network card.
This patch adds support for parsing extra NIC label which is supposed to be adde as last segment of the label name by NFD. If this last segment of the label is set to "bkp" - then the NIC is labeled as a spare/failover NIC to be allocated whenever dual_port option is set. NIC-s with this label are excluded from general NHD matchmaking process.
Whenever dual_port option is present in topology cfg and workload is scheduled to a sr-iov nic - 2 vfs are exposed to the pod instead of one only. TBD: Right now the implementation naively assumes that vfs 0-3 on the card are mapped to the 1st port on the PF while vfs 4-7 are mapped to the 2nd port on the PF. This will be improved to include NIC-s with varying number of ports
urbanatb
reviewed
Feb 1, 2022
Contributor
urbanatb
left a comment
There was a problem hiding this comment.
Sorry for taking so long to get this. In general, the approach looks pretty good, but it seems we can infer the bkp label without actually having to go out and change the labels.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR introduces basic awareness of dual_port nic-s in NHD for failover purposes.
How it works:
NHD relies on node label names to detect the NIC topology inside the node. This PR adds an extra segment to the nic label called: "bkp". If the interface is marked this way - it will be treated as a failover interface and mounted as a secondary interface inside the pod. It is up to the pod to detect if the link is down on port and perform the failover.
This implementation works for dual port smart nics with just one VF per port and also with SR-IOV NICs.
The config option to be set to enable dual nic in topology config is called: dual_port=true. If the option is set to false or not set at all - NHD will not expose the extra NIC to the pod.
Future improvements:
For now the assignment of the backup SR-IOV nic is pretty naive and assumes the following:
In the future we will provide the detection of vfs-per-pf. This will require proper labeling of the node nics to contain the required information.