Skip to content

Conversation

@rajathagasthya
Copy link
Contributor

@rajathagasthya rajathagasthya commented Jan 14, 2026

This PR eliminates the need to manually maintain static MIG configuration files by generating them at runtime from hardware. Previously, every new MIG-capable GPU required manual updates to config-default.yaml, which was time-consuming and required GPUs to be first available to read MIG profiles. Now, when the nvidia-mig-manager systemd service starts (or on every node boot), it runs nvidia-mig-parted generate-config to query available MIG profiles via NVML and produces a complete configuration file automatically. This includes per-profile configs (e.g., all-1g.10gb, all-7g.80gb) as well as the all-balanced config, with proper device-filter handling for heterogeneous GPU systems. The k8s-mig-manager pod also generates this config on startup, writes the config to a per-node ConfigMap and uses the config throughout its lifetime instead of requiring config-default.yaml to be mounted as volume.

The implementation adds a new generate-config CLI command and introduces two new packages: pkg/mig/discovery for querying MIG profiles from hardware using go-nvlib, and pkg/mig/builder for constructing the mig-parted config spec. The systemd service has been updated to generate the config on every start, with fallback to a previously generated config if generation fails.

Guide to Reviewers

Overview

This PR eliminates the need to manually maintain static MIG configuration files. Instead of shipping a pre-built config.yaml that must be updated whenever a new GPU is released, we now generate the configuration at runtime by querying the hardware via NVML.

Before: Manual maintenance of config-default.yaml for every new MIG-capable GPU.

After: Config is auto-generated from hardware on every boot/service start.

Architecture

flowchart TB
    subgraph cli [mig-parted CLI]
        GenerateConfig[generate-config command]
    end

    subgraph pkgBuilder [pkg/mig/builder]
        BuildSpec[GenerateConfigSpec]
        BuildYAML[GenerateConfigYAML]
        BuildBalanced[buildAllBalancedConfig]
    end

    subgraph pkgDiscovery [pkg/mig/discovery]
        Discover[DiscoverMIGProfiles]
    end

    subgraph nvml [NVML via go-nvlib]
        VisitDevices[VisitDevices]
        GetMigProfiles[GetMigProfiles]
        GetGpuInstanceProfileInfo[GetGpuInstanceProfileInfo]
    end

    GenerateConfig --> BuildYAML
    BuildYAML --> BuildSpec
    BuildSpec --> Discover
    BuildSpec --> BuildBalanced
    Discover --> VisitDevices
    VisitDevices --> GetMigProfiles
    GetMigProfiles --> GetGpuInstanceProfileInfo
Loading

Component Changes

1. mig-parted CLI

New command: nvidia-mig-parted generate-config

# Output to stdout (default)
nvidia-mig-parted generate-config

# Output to file
nvidia-mig-parted generate-config -f /path/to/config.yaml

# Output as JSON
nvidia-mig-parted generate-config -o json

Files:

2. Systemd Service

The service now generates a fresh config on every start/boot:

# From deployments/systemd/service.sh
nvidia-mig-parted generate-config -f "${CONFIG_FILE}.tmp"

If generation fails (e.g., no GPUs, driver not loaded), it falls back to a previously generated config if available.

Files:

3. k8s-mig-manager (Future Work)

The k8s-mig-manager will:

  • Generate a per-node ConfigMap with default MIG configs
  • Allow users to bring their own ConfigMap (existing behavior)

NVML Discovery Logic

The discovery layer uses go-nvlib to query MIG profiles from hardware.

Flow in pkg/mig/discovery/discovery.go:

1. Get GPU device IDs via util.GetGPUDeviceIDs()
2. Initialize NVML and create nvdev.Interface
3. VisitDevices() to iterate all GPUs
4. For each MIG-capable GPU:
   a. GetMigProfiles() - returns all MIG profiles from NVML
   b. Filter out CI profiles (Compute Instance profiles where C > 0 && C < G)
   c. GetGpuInstanceProfileInfo() - get max instance count for each profile
   d. Return ProfileInfo{Name, MaxCount, DeviceID, Profile}

Key Types:

type ProfileInfo struct {
    Name     string           // e.g., "1g.10gb", "2g.20gb+me"
    MaxCount int              // Maximum instances of this profile
    DeviceID types.DeviceID   // PCI device ID
    Profile  nvdev.MigProfile // Underlying NVML profile object
}

type DeviceProfiles map[int][]ProfileInfo  // deviceIndex -> profiles

Config Building Logic

The builder layer converts discovered profiles into a mig-parted config spec.

Flow in pkg/mig/builder/builder.go:

1. Group profiles by name across all devices
2. Track unique device IDs (for device-filter logic)
3. Generate base configs:
   - all-disabled: MIG disabled on all GPUs
   - all-enabled: MIG enabled, no partitions
4. Generate all-balanced (if applicable)
5. For each profile:
   a. Group devices by max count (same profile may have different counts on different GPUs)
   b. Create config entry per unique count
   c. Add device-filter if:
      - Heterogeneous GPUs exist (different GPU types), OR
      - Not all devices support this profile (partial support)

Example output for A100-80GB:

version: v1
mig-configs:
  all-disabled:
    - devices: all
      mig-enabled: false

  all-enabled:
    - devices: all
      mig-enabled: true
      mig-devices: {}

  all-balanced:
    - devices: all
      mig-enabled: true
      mig-devices:
        1g.10gb: 2
        2g.20gb: 1
        3g.40gb: 1

  all-1g.10gb:
    - devices: all
      mig-enabled: true
      mig-devices:
        1g.10gb: 7

All-Balanced Config

The all-balanced config creates a mix of small, medium, and large MIG instances.

Formula in pkg/mig/builder/balanced.go:

GPU Type Formula Total Slices
7-slot (A100, H100, etc.) 2×1g + 1×2g + 1×3g 7
4-slot (A30, RTX PRO 6000, etc.) 2×1g + 1×2g 4

Profile selection: For each G-value, we select the base profile (no +me, +gfx attributes) with the highest MaxCount. This gives the smallest memory footprint option, which is most flexible.

Heterogeneous GPU handling: When multiple GPU types exist, each gets its own entry with a device-filter:

all-balanced:
  - devices: all
    mig-enabled: true
    device-filter: ["0x20B510DE"]  # A100-80GB
    mig-devices:
      1g.10gb: 2
      2g.20gb: 1
      3g.40gb: 1
  - devices: all
    mig-enabled: true
    device-filter: ["0x20B710DE"]  # A30-24GB
    mig-devices:
      1g.6gb: 2
      2g.12gb: 1

Testing

Test File Coverage
pkg/mig/builder/builder_test.go Config generation for all GPU types (A30, A100, H100, H200, B200, RTX PRO), heterogeneous scenarios, device-filter verification
pkg/mig/builder/balanced_test.go All-balanced formula for 7-slot and 4-slot GPUs, edge cases (missing profiles, attributed-only profiles)
pkg/mig/discovery/discovery_test.go NVML discovery using dgxa100 mock, CI profile filtering

Test data: Profile data in tests matches NVIDIA MIG User Guide.

Files Changed

File Change Type Description
cmd/nvidia-mig-parted/main.go Modified Register generateconfig command
cmd/nvidia-mig-parted/generateconfig/generate_config.go New CLI command implementation
pkg/mig/discovery/discovery.go New NVML profile discovery
pkg/mig/discovery/discovery_test.go New Discovery unit tests
pkg/mig/builder/builder.go New Config spec builder
pkg/mig/builder/builder_test.go New Builder unit tests
pkg/mig/builder/balanced.go New All-balanced config logic
pkg/mig/builder/balanced_test.go New Balanced config tests
deployments/systemd/service.sh Modified Generate config on boot

Key Design Decisions

  1. Runtime generation over static files: Eliminates maintenance burden when new GPUs are released.

  2. Fallback behavior: If generation fails, the systemd service uses a previously generated config. This handles edge cases like driver not being loaded.

  3. CI profile filtering: Compute Instance profiles (e.g., 1c.2g.20gb) are filtered out since they represent subdivisions of GPU instances, not standalone configs.

  4. Device-filter for heterogeneous systems: When multiple GPU types exist, configs include device-filter to target specific devices. This is necessary because different GPUs may have different profile names or max counts.

  5. Profile normalization: + in profile names is converted to . for config names (e.g., 1g.10gb+me becomes all-1g.10gb.me) since + is not ideal in YAML keys.

This PR eliminates the need to manually maintain static MIG
configuration files by generating them at runtime from hardware.
Previously, every new MIG-capable GPU required manual updates to
`config-default.yaml`, which was time-consuming and required GPUs to be
first available to read MIG profiles. Now, when the `nvidia-mig-manager`
systemd service starts (or on every node boot), it runs
`nvidia-mig-parted generate-config` to query available MIG profiles via
NVML and produces a complete configuration file automatically. This
includes per-profile configs (e.g., `all-1g.10gb`, `all-7g.80gb`) as
well as the all-balanced config, with proper device-filter handling for
heterogeneous GPU systems. The `k8s-mig-manager` pod also generates this
config on startup, writes the config to a per-node ConfigMap and uses
the config throughout its lifetime instead of requiring
`config-default.yaml` to be mounted as volume.

The implementation adds a new `generate-config` CLI command and
introduces two new packages: `pkg/mig/discovery` for querying MIG
profiles from hardware using `go-nvlib`, and `pkg/mig/builder` for
constructing the mig-parted config spec. The systemd service has been
updated to generate the config on every start, with fallback to
a previously generated config if generation fails.

Signed-off-by: Rajath Agasthya <ragasthya@nvidia.com>
Some versions of the driver report incorrect MIG profiles for A30 GPUs.
The fix will be made on 580 and 590 driver branches, but in the
meantime, return a known list of MIG configs when profiles are queried
for an A30 GPU.

Signed-off-by: Rajath Agasthya <ragasthya@nvidia.com>
Removed redundant condition in device-filter logic and added detailed
comments explaining when device-filter is needed. Also simplify
setupMigConfig in mig-manager.

Signed-off-by: Rajath Agasthya <ragasthya@nvidia.com>
* Remove verbose diagnostic messages and redundant echo statements
* Use direct exit code check instead of capturing $? separately
* Add warning when falling back to existing config on generation failure
* Keep error output visible for debugging (no stderr suppression)

The config generation behavior remains the same:
* Generate fresh config from hardware on every boot
* Fall back to existing config.yaml if generation fails
* Exit with error only if no config is available at all

Signed-off-by: Rajath Agasthya <ragasthya@nvidia.com>
Change app.kubernetes.io/component label value from "mig-manager" to
"nvidia-mig-manager" for consistency with other gpu-operator components.

Signed-off-by: Rajath Agasthya <ragasthya@nvidia.com>
Signed-off-by: Rajath Agasthya <ragasthya@nvidia.com>
@rajathagasthya rajathagasthya marked this pull request as ready for review January 22, 2026 16:46
Signed-off-by: Rajath Agasthya <ragasthya@nvidia.com>
Comment on lines 115 to 118
output, err = json.MarshalIndent(spec, "", " ")
if err != nil {
return fmt.Errorf("error marshaling MIG config to JSON: %v", err)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question -- is there a technical reason for marshaling to json here (at the call site) but not doing the same for the yaml (the marshaling to yaml is not done at the call site)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted a top level build.GenerateConfigYAML function that I could reference directly in main.go to write the generated file: https://github.com/NVIDIA/mig-parted/pull/295/changes#diff-85fd584658fe1f46bd4d96385d360fa875f8f9bd58b7b1d9e66a44636adafe64R341. Otherwise, I'd have to marshal there too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, that is fine. Does it make sense to also have a top level equivalent for json, i.e. build.GenerateConfigJSON?

fi
}

function maybe_add_config_symlink() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question -- I forget the meaning of this symlink. Is it relevant for the default config we generate?

Copy link
Contributor Author

@rajathagasthya rajathagasthya Jan 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't sure why there was a symlink too. But the code now writes config directly to config.yaml, so we don't need the symlink.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is fine to remove this code, however we need to update the readme in deployments/systemd, specifically the portion about customizing the default config:

Users should only need to customize the `config.yaml` (to add any user-specific
MIG configurations they would like to apply) and the `hooks.sh` and
`hooks.yaml` files (to add any user specific services that need to be shutdown
and restarted when applying a MIG configuration).

Signed-off-by: Rajath Agasthya <ragasthya@nvidia.com>
@rajathagasthya rajathagasthya force-pushed the dynamic-mig-config branch 2 times, most recently from eec822d to e2fa0df Compare January 29, 2026 21:08
Signed-off-by: Rajath Agasthya <ragasthya@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants