fix(helm): use empty defaults for extraConfigMapVolumes/extraVolumeMounts to fix SELinux-enforcing environments (fixes #613)#633
Open
cluster2600 wants to merge 1 commit intoNVIDIA:mainfrom
Conversation
…meMounts Fixes NVIDIA#613 On SELinux-enforcing node operating systems such as EKS Bottlerocket (aws-k8s-1.33-nvidia) the default Helm chart values caused container startup failures. The subPath mount that overlays an existing file inside the container image is blocked by the SELinux policy: runc create failed: unable to start container process: error during container init: error mounting ... not a directory The container image already ships a built-in default-counters.csv so no volume mount is required for standard deployments. Changing the defaults to empty arrays makes the chart work out-of-the-box across all common Kubernetes environments including Bottlerocket, RHCOS, and any other SELinux-enforcing OS. Detailed comments in values.yaml explain how to re-enable the custom counters file when needed, with separate examples for standard and SELinux-enforcing environments.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Change the default values of
extraConfigMapVolumesandextraVolumeMountsfrom a ConfigMap-backedsubPathmount to empty arrays ([]). Add comprehensive comments invalues.yamlexplaining when and how to re-enable the custom counters file, with separate examples for standard and SELinux-enforcing environments.Changes in
deployment/values.yaml:extraConfigMapVolumes:[{...configmap...}]→[]extraVolumeMounts:[{subPath: default-counters.csv}]→[]Why
Fixes #613
On EKS clusters using Bottlerocket (SELinux enforcing, e.g.
aws-k8s-1.33-nvidia) the default Helm chart values caused every dcgm-exporter pod to fail at startup with:Root cause: The SELinux policy on Bottlerocket blocks
subPathmounts that overlay an existing file inside the container image (/etc/dcgm-exporter/default-counters.csvships in the image). The kernel refuses the bind-mount because the destination already exists as a regular file in the image's overlayfs layer.The container image already includes a built-in
default-counters.csv; a ConfigMap volume mount is only needed when operators want to supply a custom metric list. Making the defaults empty means the chart works out-of-the-box on all Kubernetes environments including Bottlerocket, RHCOS (OpenShift), and any other SELinux-enforcing OS.Workaround already documented in the issue (verified by the reporter):
This PR makes that the default behaviour.
How
extraConfigMapVolumesandextraVolumeMountsto[].subPath(standard environments) and directory mount (SELinux-enforcing environments).Testing
python3 -c "import yaml; yaml.safe_load(open('deployment/values.yaml'))"helm lint[]resolves the issue on EKS BottlerocketChecklist
helm lintpasses (default values produce a renderable chart)