Skip to content

Comments

fix(helm): use empty defaults for extraConfigMapVolumes/extraVolumeMounts to fix SELinux-enforcing environments (fixes #613)#633

Open
cluster2600 wants to merge 1 commit intoNVIDIA:mainfrom
cluster2600:fix/613-bottlerocket-selinux-helm-defaults
Open

fix(helm): use empty defaults for extraConfigMapVolumes/extraVolumeMounts to fix SELinux-enforcing environments (fixes #613)#633
cluster2600 wants to merge 1 commit intoNVIDIA:mainfrom
cluster2600:fix/613-bottlerocket-selinux-helm-defaults

Conversation

@cluster2600
Copy link

What

Change the default values of extraConfigMapVolumes and extraVolumeMounts from a ConfigMap-backed subPath mount to empty arrays ([]). Add comprehensive comments in values.yaml explaining when and how to re-enable the custom counters file, with separate examples for standard and SELinux-enforcing environments.

Changes in deployment/values.yaml:

  • extraConfigMapVolumes: [{...configmap...}][]
  • extraVolumeMounts: [{subPath: default-counters.csv}][]
  • Added inline documentation for Bottlerocket / SELinux-enforcing environments

Why

Fixes #613

On EKS clusters using Bottlerocket (SELinux enforcing, e.g. aws-k8s-1.33-nvidia) the default Helm chart values caused every dcgm-exporter pod to fail at startup with:

runc create failed: unable to start container process: error during container init:
error mounting "...volume-subpaths/exporter-metrics-volume/exporter/1"
to rootfs at "/etc/dcgm-exporter/default-counters.csv":
mount ...: not a directory

Root cause: The SELinux policy on Bottlerocket blocks subPath mounts that overlay an existing file inside the container image (/etc/dcgm-exporter/default-counters.csv ships in the image). The kernel refuses the bind-mount because the destination already exists as a regular file in the image's overlayfs layer.

The container image already includes a built-in default-counters.csv; a ConfigMap volume mount is only needed when operators want to supply a custom metric list. Making the defaults empty means the chart works out-of-the-box on all Kubernetes environments including Bottlerocket, RHCOS (OpenShift), and any other SELinux-enforcing OS.

Workaround already documented in the issue (verified by the reporter):

extraConfigMapVolumes: []
extraVolumeMounts: []

This PR makes that the default behaviour.

How

  • Set both extraConfigMapVolumes and extraVolumeMounts to [].
  • Added detailed comments above both keys:
    • Explaining why the defaults are now empty (SELinux compatibility).
    • Showing how to create the ConfigMap and re-enable the custom counters file.
    • Providing two mount examples: subPath (standard environments) and directory mount (SELinux-enforcing environments).

Testing

  • Validated YAML syntax: python3 -c "import yaml; yaml.safe_load(open('deployment/values.yaml'))"
  • The existing e2e / unit tests do not exercise Helm rendering directly; the change is a value default and is validated by Helm's own helm lint
  • The reporter confirmed that setting these fields to [] resolves the issue on EKS Bottlerocket

Checklist

  • YAML is valid
  • helm lint passes (default values produce a renderable chart)
  • Backward compatible: operators who already override these values are unaffected
  • No code changes — values file only
  • Inline documentation added to guide operators who need the custom counters feature

…meMounts

Fixes NVIDIA#613

On SELinux-enforcing node operating systems such as EKS Bottlerocket
(aws-k8s-1.33-nvidia) the default Helm chart values caused container
startup failures.  The subPath mount that overlays an existing file
inside the container image is blocked by the SELinux policy:

  runc create failed: unable to start container process: error during
  container init: error mounting ... not a directory

The container image already ships a built-in default-counters.csv so no
volume mount is required for standard deployments.  Changing the
defaults to empty arrays makes the chart work out-of-the-box across all
common Kubernetes environments including Bottlerocket, RHCOS, and any
other SELinux-enforcing OS.

Detailed comments in values.yaml explain how to re-enable the custom
counters file when needed, with separate examples for standard and
SELinux-enforcing environments.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Helm chart fails on EKS with Bottlerocket

1 participant