Dual port support for SR-IOV and smart NIC-s by psiwczak · Pull Request #29 · Viasat/nhd

psiwczak · 2022-01-27T15:10:50Z

This PR introduces basic awareness of dual_port nic-s in NHD for failover purposes.

How it works:
NHD relies on node label names to detect the NIC topology inside the node. This PR adds an extra segment to the nic label called: "bkp". If the interface is marked this way - it will be treated as a failover interface and mounted as a secondary interface inside the pod. It is up to the pod to detect if the link is down on port and perform the failover.
This implementation works for dual port smart nics with just one VF per port and also with SR-IOV NICs.
The config option to be set to enable dual nic in topology config is called: dual_port=true. If the option is set to false or not set at all - NHD will not expose the extra NIC to the pod.

Future improvements:
For now the assignment of the backup SR-IOV nic is pretty naive and assumes the following:

each PF has 8 VF-s
vf-s 0-3 are mapped to the 1st PF on the network card
vf-s 4-7 are mapped to the 2nd PF on the network card
In the future we will provide the detection of vfs-per-pf. This will require proper labeling of the node nics to contain the required information.

Pods which have a gpu cores map specified in Triad config receive these mappings in a form of annotation list. Annotations can be used to expose individual nvidia devices as hostPath volume objects to the k8s pod (instead of just exposing whole /dev contents from the host) The example form of the annotation is: sigproc.viasat.io/nhd_gpu_devices.nvidia0: 3 ... where the device name as seen on the pod is the last segment of the label here - "nvidia0" and the physical gpu core name is the value of the annotation - in this case "3".

Later versions introduce changes in module behavior and cause problems with NHD: AttributeError: module 'kubernetes.client' has no attribute 'V1Event'

Whenever dual_port=true is specified in topology config and the workload is scheduled on a dual port gpu network card - 2 network attachment definitions are created for the pod on 2 ports of the same network card.

This patch adds support for parsing extra NIC label which is supposed to be adde as last segment of the label name by NFD. If this last segment of the label is set to "bkp" - then the NIC is labeled as a spare/failover NIC to be allocated whenever dual_port option is set. NIC-s with this label are excluded from general NHD matchmaking process.

Whenever dual_port option is present in topology cfg and workload is scheduled to a sr-iov nic - 2 vfs are exposed to the pod instead of one only. TBD: Right now the implementation naively assumes that vfs 0-3 on the card are mapped to the 1st port on the PF while vfs 4-7 are mapped to the 2nd port on the PF. This will be improved to include NIC-s with varying number of ports

urbanatb

Sorry for taking so long to get this. In general, the approach looks pretty good, but it seems we can infer the bkp label without actually having to go out and change the labels.

nhd/Node.py

psiwczak added 8 commits January 10, 2022 12:35

Lock version of k8s python module to 17.17.0

d2ece41

Later versions introduce changes in module behavior and cause problems with NHD: AttributeError: module 'kubernetes.client' has no attribute 'V1Event'

Dual port functionality for gpu network cards

797097a

Whenever dual_port=true is specified in topology config and the workload is scheduled on a dual port gpu network card - 2 network attachment definitions are created for the pod on 2 ports of the same network card.

Moved dual_port parameter to top section in topology config

bc1282d

Bumped up nhd version to 0.3.40

172ff9a

Merge branch 'master' into dualport2

f5f5d4d

psiwczak changed the title ~~Dualport2~~ Dual port support for SR-IOV and smart NIC-s Jan 27, 2022

psiwczak requested review from bendavis0 and jmentzer722 January 27, 2022 15:27

urbanatb reviewed Feb 1, 2022

View reviewed changes

nhd/Node.py Show resolved Hide resolved

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dual port support for SR-IOV and smart NIC-s#29

Dual port support for SR-IOV and smart NIC-s#29
psiwczak wants to merge 8 commits intomasterfrom
dualport2

psiwczak commented Jan 27, 2022 •

edited

Loading

Uh oh!

urbanatb left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

psiwczak commented Jan 27, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

urbanatb left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

psiwczak commented Jan 27, 2022 •

edited

Loading