Skip to content

mig config failed #157

@MLintin

Description

@MLintin

i have two A30 gpu
if i config with below, it will success

version: v1
mig-configs:
  all-disabled:
  - devices: all
    mig-enabled: false
  node1:
  - devices:
    - 0
    mig-enabled: true
    mig-devices:
      1g.6gb: 4

i config with below, it will success too

version: v1
mig-configs:
  all-disabled:
  - devices: all
    mig-enabled: false
  node1:
  - devices:
    - 0
    - 1
    mig-enabled: true
    mig-devices:
      1g.6gb: 4

but if i config with below, it will failed

version: v1
mig-configs:
  all-disabled:
  - devices: all
    mig-enabled: false
  node1:
  - devices:
    - 1
    mig-enabled: true
    mig-devices:
      1g.6gb: 4

logs:
Applying the MIG mode change from the selected config to the node (and double checking it took effect)
If the -r option was passed, the node will be automatically rebooted if this is not successful
time="2024-12-31T06:44:06Z" level=debug msg="Parsing config file..."
time="2024-12-31T06:44:06Z" level=debug msg="Selecting specific MIG config..."
time="2024-12-31T06:44:06Z" level=debug msg="Running apply-start hook"
time="2024-12-31T06:44:06Z" level=debug msg="Checking current MIG mode..."
time="2024-12-31T06:44:08Z" level=debug msg="Walking MigConfig for (devices=[1])"
time="2024-12-31T06:44:08Z" level=debug msg=" GPU 1: 0x20B710DE"
time="2024-12-31T06:44:08Z" level=debug msg=" Asserting MIG mode: Enabled"
time="2024-12-31T06:44:08Z" level=debug msg=" MIG capable: true\n"
time="2024-12-31T06:44:08Z" level=debug msg=" Current MIG mode: Disabled"
time="2024-12-31T06:44:10Z" level=debug msg="Running pre-apply-mode hook"
time="2024-12-31T06:44:10Z" level=debug msg="Applying MIG mode change..."
time="2024-12-31T06:44:13Z" level=debug msg="Walking MigConfig for (devices=[1])"
time="2024-12-31T06:44:13Z" level=debug msg=" GPU 1: 0x20B710DE"
time="2024-12-31T06:44:13Z" level=debug msg=" MIG capable: true\n"
time="2024-12-31T06:44:13Z" level=debug msg=" Current MIG mode: Disabled"
time="2024-12-31T06:44:13Z" level=debug msg=" Updating MIG mode: Enabled"
time="2024-12-31T06:44:17Z" level=debug msg=" Mode change pending: false"
time="2024-12-31T06:44:19Z" level=debug msg="Running apply-exit hook"
MIG configuration applied successfully
time="2024-12-31T06:44:19Z" level=debug msg="Parsing config file..."
time="2024-12-31T06:44:19Z" level=debug msg="Selecting specific MIG config..."
time="2024-12-31T06:44:19Z" level=debug msg="Asserting MIG mode configuration..."
time="2024-12-31T06:44:22Z" level=debug msg="Walking MigConfig for (devices=[1])"
time="2024-12-31T06:44:22Z" level=debug msg=" GPU 1: 0x20B710DE"
time="2024-12-31T06:44:22Z" level=debug msg=" Asserting MIG mode: Enabled"
time="2024-12-31T06:44:22Z" level=debug msg=" MIG capable: true\n"
time="2024-12-31T06:44:22Z" level=debug msg=" Current MIG mode: Enabled"
Selected MIG mode settings from configuration currently applied
Applying the selected MIG config to the node
time="2024-12-31T06:44:23Z" level=debug msg="Parsing config file..."
time="2024-12-31T06:44:23Z" level=debug msg="Selecting specific MIG config..."
time="2024-12-31T06:44:23Z" level=debug msg="Running apply-start hook"
time="2024-12-31T06:44:23Z" level=debug msg="Checking current MIG mode..."
time="2024-12-31T06:44:26Z" level=debug msg="Walking MigConfig for (devices=[1])"
time="2024-12-31T06:44:26Z" level=debug msg=" GPU 1: 0x20B710DE"
time="2024-12-31T06:44:26Z" level=debug msg=" Asserting MIG mode: Enabled"
time="2024-12-31T06:44:26Z" level=debug msg=" MIG capable: true\n"
time="2024-12-31T06:44:26Z" level=debug msg=" Current MIG mode: Enabled"
time="2024-12-31T06:44:28Z" level=debug msg="Checking current MIG device configuration..."
time="2024-12-31T06:44:30Z" level=debug msg="Walking MigConfig for (devices=[1])"
time="2024-12-31T06:44:30Z" level=debug msg=" GPU 1: 0x20B710DE"
time="2024-12-31T06:44:30Z" level=debug msg=" Asserting MIG config: map[1g.6gb:4]"
time="2024-12-31T06:44:32Z" level=debug msg="Running pre-apply-config hook"
time="2024-12-31T06:44:32Z" level=debug msg="Applying MIG device configuration..."
time="2024-12-31T06:44:35Z" level=debug msg="Walking MigConfig for (devices=[1])"
time="2024-12-31T06:44:35Z" level=debug msg=" GPU 1: 0x20B710DE"
time="2024-12-31T06:44:35Z" level=debug msg=" MIG capable: true\n"
time="2024-12-31T06:44:35Z" level=debug msg=" Updating MIG config: map[1g.6gb:4]"
time="2024-12-31T06:44:35Z" level=error msg="Error getting GPU instance profile info for '1g.6gb': ERROR_NOT_SUPPORTED"
time="2024-12-31T06:44:37Z" level=debug msg="Running apply-exit hook"
time="2024-12-31T06:44:37Z" level=fatal msg="Error applying MIG configuration with hooks: error setting MIGConfig: error attempting multiple config orderings: all orderings failed"
Restarting any GPU clients previously shutdown on the host by restarting their systemd services
Starting kubelet.service

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions