-
Notifications
You must be signed in to change notification settings - Fork 6
feat: refine hpc_tuning and add additional tunings #70
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Refine existing hpc_tuning and add additional system-level tunings - Remove azsec-monitor to prevent CPU utilization by azsec-monitor - Add DefaultLimitMEMLOCK configuration for systemd - Install sunrpc module via systemd service sunrpc_tcp_settings JIRA: RHELHPC-109 Signed-off-by: Xuemin Li <xuli@redhat.com>
Reviewer's GuideRefines the HPC tuning Ansible role by tightening resource limits, managing sunrpc settings via a dedicated systemd service, and adding safeguards like removing azsec-monitor and re-execing/reloading systemd when configuration changes occur. Sequence diagram for sunrpc_tcp_settings systemd-based tuningsequenceDiagram
participant Ansible
participant Systemd
participant SunrpcService as sunrpc_tcp_settings.service
participant Script as sunrpc_tcp_settings.sh
participant Kernel as sunrpc_module
Ansible->>Systemd: Install sunrpc_tcp_settings.service
Ansible->>Systemd: daemon-reload
Ansible->>Systemd: Enable sunrpc_tcp_settings
Ansible->>Systemd: Start sunrpc_tcp_settings
Systemd->>SunrpcService: Activate service
SunrpcService->>Script: ExecStart /usr/sbin/sunrpc_tcp_settings.sh
Script->>Kernel: modprobe sunrpc
Script->>Systemd: sysctl -p
SunrpcService-->>Systemd: Exit status 0
Ansible->>Systemd: systemctl is-active sunrpc_tcp_settings
Systemd-->>Ansible: active
Sequence diagram for updating systemd MEMLOCK limitssequenceDiagram
participant Ansible
participant Systemd
participant SystemConf as system.conf
participant UserConf as user.conf
Ansible->>SystemConf: Ensure DefaultLimitMEMLOCK=infinity
Ansible->>UserConf: Ensure DefaultLimitMEMLOCK=infinity
Ansible->>Systemd: daemon-reexec
Systemd-->>Ansible: New limits applied for services
Flow diagram for HPC tuning Ansible task blockflowchart TD
A["Start hpc_tuning block"] --> B["Remove azsec-monitor package if present"]
B --> C["Write 90-hpc-limits.conf with updated nofile limits"]
C --> D["Set DefaultLimitMEMLOCK=infinity in system.conf"]
D --> E["Set DefaultLimitMEMLOCK=infinity in user.conf"]
E --> F["Write 90-hpc-sysctl.conf and reload sysctl"]
F --> G["Deploy sunrpc_tcp_settings.sh to /usr/sbin"]
G --> H["Deploy sunrpc_tcp_settings.service to /etc/systemd/system"]
H --> I["systemctl daemon-reload"]
I --> J["Enable and start sunrpc_tcp_settings service"]
J --> K["Verify sunrpc_tcp_settings is active"]
K --> L["End hpc_tuning block"]
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey - I've found 1 issue, and left some high level feedback:
- The
lineinfiletasks forDefaultLimitMEMLOCK=infinitywill not update an existingDefaultLimitMEMLOCKentry with a different value; consider using aregexpto replace any existing line rather than just appending a new one. - The
sunrpc_tcp_settings.shscript runssysctl -p, which reloads all sysctl settings and might have unintended side effects; it would be safer to scope this to the specific sysctl config file(s) you manage (e.g.,sysctl -p /etc/sysctl.d/90-hpc-sysctl.conf) or usesysctl -wfor the needed keys.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The `lineinfile` tasks for `DefaultLimitMEMLOCK=infinity` will not update an existing `DefaultLimitMEMLOCK` entry with a different value; consider using a `regexp` to replace any existing line rather than just appending a new one.
- The `sunrpc_tcp_settings.sh` script runs `sysctl -p`, which reloads all sysctl settings and might have unintended side effects; it would be safer to scope this to the specific sysctl config file(s) you manage (e.g., `sysctl -p /etc/sysctl.d/90-hpc-sysctl.conf`) or use `sysctl -w` for the needed keys.
## Individual Comments
### Comment 1
<location> `templates/sunrpc_tcp_settings.sh:6` </location>
<code_context>
+{{ "system_role:hpc" | comment(prefix="", postfix="") }}
+
+modprobe sunrpc
+sysctl -p
</code_context>
<issue_to_address>
**suggestion (bug_risk):** Using `sysctl -p` without a specific file may reload and depend on global sysctl configuration
This reloads `/etc/sysctl.conf` and all of `/etc/sysctl.d`, which can introduce unrelated changes and fail due to invalid, non-HPC sysctl entries. To limit scope and improve robustness, invoke `sysctl -p` with the specific HPC (or sunrpc-specific) configuration file instead.
Suggested implementation:
```
sysctl -p {{ sunrpc_sysctl_config | default('/etc/sysctl.d/99-hpc-sunrpc-tcp.conf') }}
```
1. Ensure a matching sysctl configuration file (e.g. `/etc/sysctl.d/99-hpc-sunrpc-tcp.conf`) is created/managed by this role, containing only the sunrpc/HPC-specific sysctl settings.
2. Optionally define the `sunrpc_sysctl_config` variable in the role defaults or inventory if a different path is desired.
3. Confirm that the managed sysctl file is written before this script is executed so that `sysctl -p` can successfully load it.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| {{ "system_role:hpc" | comment(prefix="", postfix="") }} | ||
|
|
||
| modprobe sunrpc | ||
| sysctl -p |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (bug_risk): Using sysctl -p without a specific file may reload and depend on global sysctl configuration
This reloads /etc/sysctl.conf and all of /etc/sysctl.d, which can introduce unrelated changes and fail due to invalid, non-HPC sysctl entries. To limit scope and improve robustness, invoke sysctl -p with the specific HPC (or sunrpc-specific) configuration file instead.
Suggested implementation:
sysctl -p {{ sunrpc_sysctl_config | default('/etc/sysctl.d/99-hpc-sunrpc-tcp.conf') }}
- Ensure a matching sysctl configuration file (e.g.
/etc/sysctl.d/99-hpc-sunrpc-tcp.conf) is created/managed by this role, containing only the sunrpc/HPC-specific sysctl settings. - Optionally define the
sunrpc_sysctl_configvariable in the role defaults or inventory if a different path is desired. - Confirm that the managed sysctl file is written before this script is executed so that
sysctl -pcan successfully load it.
|
It looks like the sunrpc service does two things:
Note that you do not need a service to do this - this can be done automatically by the os.
Is there some reason you need a custom systemd service to do this? |
| #!/bin/bash | ||
| {{ ansible_managed | comment }} | ||
| {{ "system_role:hpc" | comment(prefix="", postfix="") }} | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two things that will fail linter checks here. You need to use "#!/bin/bash -eu" to pass one of the linters, and then the shellcheck linter is going to fail on the ansible variables in this file. You need to do this:
#!/bin/bash -eu
# This is a template, not an actual shell script, so tell shellcheck to
# ignore the problematic templated parts
# shellcheck disable=all
{{ ansible_managed | comment }}
{{ "system_role:hpc" | comment(prefix="", postfix="") }}
# shellcheck enable=all
| - name: Create sunrpc_tcp_settings.sh script | ||
| template: | ||
| src: sunrpc_tcp_settings.sh | ||
| dest: /usr/sbin/sunrpc_tcp_settings.sh |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically the name should be "Install the sunrpc_tcp_settings.sh script" - we are not creating it...
Secondly, custom scripts like this go in /opt/hpc/azure/bin, aka "{{ __hpc_azure_resource_dir }}/bin", not in system binary directories.
| {{ ansible_managed | comment }} | ||
| {{ "system_role:hpc" | comment(prefix="", postfix="") }} | ||
|
|
||
| modprobe sunrpc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, this should fail if the kernel module loading fails. That way the failure gets propagated to teh service that invoked the script. However, I think the '-eu' addition to the shebang line I mentioned above will trigger script failure correctly here.
| notify: Restart systemd-modules-load | ||
| notify: Reload systemd |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this removes the only user of the systemd-modules-load handler. If it is now unused, please remove it from the handlers/main.yml file.
Enhancement:
Refine existing hpc_tuning and add additional system-level tunings
Reason:
Add more system-level tunings and refine current hpc_tuning
Result:
cat /etc/systemd/system.conf | grep -i DefaultLimitMEMLOCK=infinity
DefaultLimitMEMLOCK=infinity
cat /etc/systemd/user.conf | grep -i DefaultLimitMEMLOCK=infinity
DefaultLimitMEMLOCK=infinity
systemctl status sunrpc_tcp_settings
● sunrpc_tcp_settings.service - Set sunrpc tcp settings
Loaded: loaded (/etc/systemd/system/sunrpc_tcp_settings.service; enabled; preset: disabled)
Active: active (exited) since Mon 2026-02-09 07:39:18 UTC; 38min ago
Main PID: 19764 (code=exited, status=0/SUCCESS)
CPU: 5ms
systemd[1]: Starting Set sunrpc tcp settings...
systemd[1]: Finished Set sunrpc tcp settings.
systemctl restart sunrpc_tcp_settings
systemctl status sunrpc_tcp_settings
● sunrpc_tcp_settings.service - Set sunrpc tcp settings
Loaded: loaded (/etc/systemd/system/sunrpc_tcp_settings.service; enabled; preset: disabled)
Active: active (exited) since Mon 2026-02-09 08:18:16 UTC; 3s ago
Process: 35183 ExecStart=/usr/sbin/sunrpc_tcp_settings.sh (code=exited, status=0/SUCCESS)
Main PID: 35183 (code=exited, status=0/SUCCESS)
CPU: 5ms
systemd[1]: Starting Set sunrpc tcp settings...
systemd[1]: Finished Set sunrpc tcp settings.
Issue Tracker Tickets (Jira or BZ if any):
JIRA: RHELHPC-109
Summary by Sourcery
Refine HPC system tuning by expanding system-level resource limits, managing azsec-monitor, and replacing direct sunrpc module loading with a dedicated systemd service.
New Features:
Enhancements:
Documentation: