-
Notifications
You must be signed in to change notification settings - Fork 6
fix: Fix rdma remove doca #71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Reviewer's guide (collapsed on small PRs)Reviewer's GuideThis PR removes all DOCA/OFED-related configuration from the HPC role so RDMA support relies solely on the RHEL inbox InfiniBand stack while preserving the Azure persistent RDMA naming feature. Flow diagram for RDMA configuration without DOCA/OFEDflowchart TD
A[Start RDMA role tasks] --> B[Check if Azure persistent RDMA naming is enabled]
B -->|Enabled| C[Configure Azure persistent RDMA naming]
B -->|Disabled| D[Skip Azure RDMA naming configuration]
C --> C1[Deploy rdma_rename helper path variable]
C1 --> C2[Install and configure systemd units for RDMA rename]
C2 --> C3[Install udev rules for RDMA device naming]
C3 --> C4[Trigger handlers to reload udev and RDMA rules]
D --> E[Proceed with remaining HPC RDMA-related tasks]
C4 --> E
E --> F[Use RHEL inbox InfiniBand stack for RDMA devices]
F --> G[End RDMA role tasks]
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
6d0afa1 to
60d3610
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey - I've left some high level feedback:
- Since the DOCA installation block and variables have been removed, double-check and remove any remaining references to
__hpc_doca_*variables or DOCA-specific logic elsewhere in the role to avoid dead configuration paths. - If the
Clean dnf metadatahandler was only used by the DOCA host RPM installation, consider removing that handler (and any now-unused related tasks) to keep the handlers list minimal and accurate.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- Since the DOCA installation block and variables have been removed, double-check and remove any remaining references to `__hpc_doca_*` variables or DOCA-specific logic elsewhere in the role to avoid dead configuration paths.
- If the `Clean dnf metadata` handler was only used by the DOCA host RPM installation, consider removing that handler (and any now-unused related tasks) to keep the handlers list minimal and accurate.Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
|
There is a spurious white space change commit in this PR that shouldn't be there. Also, I can't tell if the entirity of the original commit was reverted because the revert is a manual change. When reverting a commit, the change should be created using the command |
0bde090 to
bdbe795
Compare
…ck (gcc-inbox). DOCA-OFED pulls proprietary kernel modules and is not supported by RHEL engineering. Signed-off-by: Gaurav Goklani <ggoklani@ggoklani-thinkpadt14gen4.punetw6.csb>
bdbe795 to
0bb222c
Compare
|
@dgchinner Changes mode, testing works well.. please review |
Enhancement: Revert DOCA-OFED installation; product uses RHEL inbox IB stack (gcc-inbox). DOCA-OFED pulls proprietary kernel modules and is not supported by RHEL engineering.
Reason: We have explicitly not used DOCA-OFED for the IB stack that we are supporting - we use the RHEL OS IB stack ("gcc-inbox" in IB stack terms). The two stacks are not directly compatible, and the DOCA-OFED stack is not supported by RHEL engineering because it requires proprietary, non-GPL kernel modules.
As a result, the initial product we shipped uses the RHEL OS IB stack and not the DOCA stack. The RHEL OS IB stack is supposed to support everything the DOCA stack supports, so AFAICT there is no reason to be installing the DOCA-OFED stack here. OFED should not be necessary to ensure persistent naming of the IB devices, given that the IB devices are currently running RHEL OS provided drivers.
Result:
[azureuser@gaurav-hpcrdmatest1 hpc]$ ibv_devices
device node GUID
------ ----------------
mlx5_0 00155dfffe340069
Issue Tracker Tickets (Jira or BZ if any):
Summary by Sourcery
Remove DOCA-related RDMA configuration and package installation in favor of using the RHEL-provided InfiniBand stack.
Enhancements: