You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -8,6 +8,8 @@ The AMD GPU Operator uses the Kernel Module Management (KMM) Operator to deploy
8
8
- OS release version
9
9
- Kernel version
10
10
11
+
Users could prepare pre-compiled driver images in advance and import them into the cluster to let KMM skip the driver build stage within the cluster and directly use driver images to load amdgpu kernel modules into the worker nodes.
12
+
11
13
## How KMM Selects Driver Images
12
14
13
15
KMM determines the appropriate driver image based on the combination of:
@@ -17,25 +19,42 @@ KMM determines the appropriate driver image based on the combination of:
17
19
18
20
### Image Tag Format
19
21
20
-
KMM looks for images with tags in these formats:
22
+
KMM looks for driver images based on tags, the controller will use these methods to determine the image tag:
23
+
24
+
1. Parse the node's `osImage` field to determine the OS and version `kubectl get node -oyaml | grep -i osImage`:
25
+
26
+
| osImage | OS | version |
27
+
|---------|-----------|-------------------|
28
+
|`Ubuntu 24.04.1 LTS`|`Ubuntu`|`24.04`|
29
+
|`Red Hat Enterprise Linux CoreOS 9.6.20250916-0 (Plow)`|`coreos`|`9.6`|
30
+
31
+
2. Read the node's `kernelVersion` field to determine to kernel version `kubectl get node -oyaml | grep -i kernelVersion`.
32
+
3. Read user configured amdgpu driver version from `DeviceConfig` field `spec.driver.version`.
When a DeviceConfig is created with driver management enabled (`spec.driver.enable=true`), KMM will:
28
41
29
42
1. Check if a matching driver image exists in the registry
30
43
2. If not found, build the driver image in-cluster using the AMD GPU Operator's Dockerfile
31
44
3. If found, directly use the existing image to install the driver
32
45
33
46
## Building Pre-compiled Driver Images
34
47
35
-
### Dockerfile Example
48
+
### Ubuntu
49
+
50
+
Follow these image build steps to get a pre-compiled driver images, make sure your system matched with [ROCm required Linux system requirement](https://rocm.docs.amd.com/projects/install-on-linux/en/latest/reference/system-requirements.html).
51
+
52
+
1. Prepare the Dockerfile
36
53
37
54
```dockerfile
38
-
FROM ubuntu:$$VERSION as builder
55
+
ARG OS_VERSION
56
+
FROM ubuntu:${OS_VERSION} as builder
57
+
ARG OS_CODENAME
39
58
ARG KERNEL_FULL_VERSION
40
59
ARG DRIVERS_VERSION
41
60
ARG REPO_URL
@@ -57,15 +76,16 @@ RUN apt-get update && apt-get install -y bc \
57
76
RUN mkdir --parents --mode=0755 /etc/apt/keyrings
58
77
RUN wget ${REPO_URL}/rocm/rocm.gpg.key -O - | \
59
78
gpg --dearmor | tee /etc/apt/keyrings/rocm.gpg > /dev/null
60
-
RUN echo "deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] ${REPO_URL}/amdgpu/${DRIVERS_VERSION}/ubuntu $$DRIVER_LABEL main" \
79
+
RUN echo "deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] ${REPO_URL}/amdgpu/${DRIVERS_VERSION}/ubuntu ${OS_CODENAME} main" \
61
80
| tee /etc/apt/sources.list.d/amdgpu.list
62
81
63
82
# Install and configure driver
64
83
RUN apt-get update && apt-get install -y amdgpu-dkms
Follow these image build steps to get a pre-compiled driver images for OpenShift cluster, make sure your RHEL version and driver version matched with [ROCm required Linux system requirement](https://rocm.docs.amd.com/projects/install-on-linux/en/latest/reference/system-requirements.html).
142
+
143
+
1. Collect System Information
144
+
145
+
Please collect system information from OpenShift build node before configuring the build process:
Please create the following YAML file, the full example is assuming you are using OpenShift internal image registry and build config will be saved in default namespace.
179
+
180
+
* If you want to configure the build in other namespace, please change the namespace accordingly in the example steps.
181
+
* If you want to use other image registry, please replace the `spec.output` part with this:
182
+
183
+
```yaml
184
+
spec:
185
+
output:
186
+
pushSecret:
187
+
name: docker-auth
188
+
to:
189
+
kind: DockerImage
190
+
# follow the Image Tag Format section to get your image ta
* Login to OpenShift web console with username and password
281
+
* Select `Builds`thenselect`BuildConfigs`in the navigation bar
282
+
* Click `Create BuildConfig`thenselectYAML view, copy over the YAML file created in last step
283
+
* Select the `BuildConfig`in the list, click `Actions`thenselect`Start Build`
284
+
* Select `Builds`in the current `BuildConfig` page, a new build should be triggered and in running status.
285
+
* Wait forit to be completed, you can also monitor the progressin`Logs` section, in the end it should show push is successful.
286
+
* Delete the `BuildConfig`if needed.
287
+
* Option 2 - Command Line Interface (CLI):
288
+
* Create the `BuildConfig` by using the YAML file created in the last step: `oc apply -f build-config.yaml`
289
+
* Start the build: `oc start-build amd-gpu-operator-build`
290
+
* Check the build status: `oc get build` and `oc get pods | grep build`
291
+
* Wait for it to complete, the logs should show that push is successful
292
+
* Delete the `BuildConfig`if needed: `oc delete -f build-config.yaml`
293
+
119
294
## Using Pre-compiled Images
120
295
121
-
Configure your DeviceConfig to use the pre-compiled images:
296
+
In previous section [Building Pre-compiled Driver Images](#building-pre-compiled-driver-images) we pushed driver image to `registry.example.com/amdgpu-driver`. Now you can configure your `DeviceConfig` to use the pre-compiled images:
122
297
123
298
```yaml
124
299
apiVersion: amd.com/v1alpha1
@@ -129,21 +304,14 @@ metadata:
129
304
spec:
130
305
driver:
131
306
# Registry path without tag - operator manages tags
132
-
image: registry.example.com/amdgpu-driver
307
+
# If you use OpenShift internal image registry, by default the operator will auto select the internal image registry URL
>**Important**: Do not include the image tag in the `image` field - the operator automatically appends the appropriate tag based on the node's OS and kernel version.
0 commit comments