Skip to content

Dual port support for SR-IOV and smart NIC-s#29

Open
psiwczak wants to merge 8 commits intomasterfrom
dualport2
Open

Dual port support for SR-IOV and smart NIC-s#29
psiwczak wants to merge 8 commits intomasterfrom
dualport2

Conversation

@psiwczak
Copy link
Contributor

@psiwczak psiwczak commented Jan 27, 2022

This PR introduces basic awareness of dual_port nic-s in NHD for failover purposes.

How it works:
NHD relies on node label names to detect the NIC topology inside the node. This PR adds an extra segment to the nic label called: "bkp". If the interface is marked this way - it will be treated as a failover interface and mounted as a secondary interface inside the pod. It is up to the pod to detect if the link is down on port and perform the failover.
This implementation works for dual port smart nics with just one VF per port and also with SR-IOV NICs.
The config option to be set to enable dual nic in topology config is called: dual_port=true. If the option is set to false or not set at all - NHD will not expose the extra NIC to the pod.

Future improvements:
For now the assignment of the backup SR-IOV nic is pretty naive and assumes the following:

  • each PF has 8 VF-s
  • vf-s 0-3 are mapped to the 1st PF on the network card
  • vf-s 4-7 are mapped to the 2nd PF on the network card
    In the future we will provide the detection of vfs-per-pf. This will require proper labeling of the node nics to contain the required information.

Pods which have a gpu cores map specified in Triad config receive
these mappings in a form of annotation list. Annotations can be used
to expose individual nvidia devices as hostPath volume objects to the
k8s pod (instead of just exposing whole /dev contents from the host)

The example form of the annotation is:

sigproc.viasat.io/nhd_gpu_devices.nvidia0: 3

... where the device name as seen on the pod is the last segment of the label
here - "nvidia0" and the physical gpu core name is the value of the annotation -
in this case "3".
Later versions introduce changes in module behavior and cause problems with
NHD:

AttributeError: module 'kubernetes.client' has no attribute 'V1Event'
Whenever dual_port=true is specified in topology config and the workload
is scheduled on a dual port gpu network card - 2 network attachment definitions
are created for the pod on 2 ports of the same network card.
This patch adds support for parsing extra NIC label which is
supposed to be adde as last segment of the label name by NFD.

If this last segment of the label is set to "bkp" - then the
NIC is labeled as a spare/failover NIC to be allocated whenever
dual_port option is set. NIC-s with this label are excluded
from general NHD matchmaking process.
Whenever dual_port option is present in topology cfg and workload
is scheduled to a sr-iov nic - 2 vfs are exposed to the pod instead
of one only.
TBD:
Right now the implementation naively assumes that vfs 0-3 on the card
are mapped to the 1st port on the PF while vfs 4-7 are mapped to the
2nd port on the PF. This will be improved to include NIC-s with
varying number of ports
@psiwczak psiwczak changed the title Dualport2 Dual port support for SR-IOV and smart NIC-s Jan 27, 2022
Copy link
Contributor

@urbanatb urbanatb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for taking so long to get this. In general, the approach looks pretty good, but it seems we can infer the bkp label without actually having to go out and change the labels.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants