Skip to content

Commit 0aefcc1

Browse files
authored
Merge pull request #100595 from ousleyp/CNV-67095-pan
CNV#67095: Docs on setup vGPU reach a dead end on step 3 of mdev creation
2 parents 8db8367 + 87aaab2 commit 0aefcc1

File tree

1 file changed

+96
-69
lines changed

1 file changed

+96
-69
lines changed
Lines changed: 96 additions & 69 deletions
Original file line numberDiff line numberDiff line change
@@ -1,32 +1,101 @@
11
// Module included in the following assemblies:
22
//
3-
// * virt/virtual_machines/advanced_vm_management/virt-configuring-virtual-gpus.adoc
3+
// * virt/managing_vms/advanced_vm_management/virt-configuring-virtual-gpus.adoc
44

55
:_mod-docs-content-type: PROCEDURE
66
[id="virt-creating-exposing-mediated-devices_{context}"]
77
= Creating and exposing mediated devices
88

9-
As an administrator, you can create mediated devices and expose them to the cluster by editing the `HyperConverged` custom resource (CR).
9+
As an administrator, you can create mediated devices and expose them to the cluster by editing the `HyperConverged` custom resource (CR). Before you edit the CR, explore a worker node to find the configuration values that are specific to your hardware devices.
1010

1111
.Prerequisites
1212

13-
* You have installed the {oc-first}.
13+
* You installed the {oc-first}.
1414
* You enabled the Input-Output Memory Management Unit (IOMMU) driver.
1515
* If your hardware vendor provides drivers, you installed them on the nodes where you want to create mediated devices.
1616
** If you use NVIDIA cards, you link:https://docs.nvidia.com/datacenter/cloud-native/openshift/latest/openshift-virtualization.html[installed the NVIDIA GRID driver].
1717
18+
// [IMPORTANT]
19+
// ====
20+
// Before {VirtProductName} 4.14, the `mediatedDeviceTypes` field was named `mediatedDevicesTypes`. Ensure that you use the correct field name when configuring mediated devices.
21+
// ====
22+
1823
.Procedure
1924

25+
. Identify the name selector and resource name values for the mediated devices by exploring a worker node:
26+
27+
.. Start a debugging session with the worker node by using the `oc debug` command. For example:
28+
+
29+
[source,terminal]
30+
----
31+
$ oc debug node/node-11.redhat.com
32+
----
33+
34+
.. Change the root directory of the shell process to the file system of the host node by running the following command:
35+
+
36+
[source,terminal]
37+
----
38+
# chroot /host
39+
----
40+
41+
.. Navigate to the `mdev_bus` directory and view its contents. Each subdirectory name is a PCI address of a physical GPU. For example:
42+
+
43+
[source,terminal]
44+
----
45+
# cd sys/class/mdev_bus && ls
46+
----
47+
+
48+
Example output:
49+
+
50+
[source,terminal]
51+
----
52+
0000:4b:00.4
53+
----
54+
55+
.. Go to the directory for your physical device and list the supported mediated device types as defined by the hardware vendor. For example:
56+
+
57+
[source,terminal]
58+
----
59+
# cd 0000:4b:00.4 && ls mdev_supported_types
60+
----
61+
+
62+
Example output:
63+
+
64+
[source,terminal]
65+
----
66+
nvidia-742 nvidia-744 nvidia-746 nvidia-748 nvidia-750 nvidia-752
67+
nvidia-743 nvidia-745 nvidia-747 nvidia-749 nvidia-751 nvidia-753
68+
----
69+
70+
.. Select the mediated device type that you want to use and identify its name selector value by viewing the contents of its `name` file. For example:
71+
+
72+
[source,terminal]
73+
----
74+
# cat nvidia-745/name
75+
----
76+
+
77+
Example output:
78+
+
79+
[source,terminal]
80+
----
81+
NVIDIA A2-2Q
82+
----
83+
2084
. Open the `HyperConverged` CR in your default editor by running the following command:
2185
+
2286
[source,terminal,subs="attributes+"]
2387
----
2488
$ oc edit hyperconverged kubevirt-hyperconverged -n {CNVNamespace}
2589
----
90+
91+
. Create and expose the mediated devices by updating the configuration:
92+
93+
.. Create mediated devices by adding them to the `spec.mediatedDevicesConfiguration` stanza.
94+
95+
.. Expose the mediated devices to the cluster by adding the `mdevNameSelector` and `resourceName` values to the `spec.permittedHostDevices.mediatedDevices` stanza. The `resourceName` value is based on the `mdevNameSelector` value, but you use underscores instead of spaces.
96+
+
97+
Example `HyperConverged` CR:
2698
+
27-
.Example configuration file with mediated devices configured
28-
[%collapsible]
29-
====
3099
[source,yaml,subs="attributes+"]
31100
----
32101
apiVersion: hco.kubevirt.io/v1
@@ -37,87 +106,45 @@ metadata:
37106
spec:
38107
mediatedDevicesConfiguration:
39108
mediatedDeviceTypes:
40-
- nvidia-231
109+
- nvidia-745
41110
nodeMediatedDeviceTypes:
42111
- mediatedDeviceTypes:
43-
- nvidia-233
112+
- nvidia-746
44113
nodeSelector:
45114
kubernetes.io/hostname: node-11.redhat.com
46115
permittedHostDevices:
47116
mediatedDevices:
48-
- mdevNameSelector: GRID T4-2Q
49-
resourceName: nvidia.com/GRID_T4-2Q
50-
- mdevNameSelector: GRID T4-8Q
51-
resourceName: nvidia.com/GRID_T4-8Q
52-
# ...
53-
----
54-
====
55-
56-
. Create mediated devices by adding them to the `spec.mediatedDevicesConfiguration` stanza:
57-
+
58-
.Example YAML snippet
59-
[source,yaml]
60-
----
61-
# ...
62-
spec:
63-
mediatedDevicesConfiguration:
64-
mediatedDeviceTypes: <1>
65-
- <device_type>
66-
nodeMediatedDeviceTypes: <2>
67-
- mediatedDeviceTypes: <3>
68-
- <device_type>
69-
nodeSelector: <4>
70-
<node_selector_key>: <node_selector_value>
117+
- mdevNameSelector: NVIDIA A2-2Q
118+
resourceName: nvidia.com/NVIDIA_A2-2Q
119+
- mdevNameSelector: NVIDIA A2-4Q
120+
resourceName: nvidia.com/NVIDIA_A2-4Q
71121
# ...
72122
----
73-
<1> Required: Configures global settings for the cluster.
74-
<2> Optional: Overrides the global configuration for a specific node or group of nodes. Must be used with the global `mediatedDeviceTypes` configuration.
75-
<3> Required if you use `nodeMediatedDeviceTypes`. Overrides the global `mediatedDeviceTypes` configuration for the specified nodes.
76-
<4> Required if you use `nodeMediatedDeviceTypes`. Must include a `key:value` pair.
77123
+
78-
[IMPORTANT]
79-
====
80-
Before {VirtProductName} 4.14, the `mediatedDeviceTypes` field was named `mediatedDevicesTypes`. Ensure that you use the correct field name when configuring mediated devices.
81-
====
124+
where:
82125

83-
. Identify the name selector and resource name values for the devices that you want to expose to the cluster. You will add these values to the `HyperConverged` CR in the next step.
84-
.. Find the `resourceName` value by running the following command:
85-
+
86-
[source,terminal]
87-
----
88-
$ oc get $NODE -o json \
89-
| jq '.status.allocatable \
90-
| with_entries(select(.key | startswith("nvidia.com/"))) \
91-
| with_entries(select(.value != "0"))'
92-
----
126+
`mediatedDeviceTypes`:: Specifies global settings for the cluster and is required.
93127

94-
.. Find the `mdevNameSelector` value by viewing the contents of `/sys/bus/pci/devices/<slot>:<bus>:<domain>.<function>/mdev_supported_types/<type>/name`, substituting the correct values for your system.
95-
+
96-
For example, the name file for the `nvidia-231` type contains the selector string `GRID T4-2Q`. Using `GRID T4-2Q` as the `mdevNameSelector` value allows nodes to use the `nvidia-231` type.
128+
`nodeMediatedDeviceTypes`:: Specifies global configuration overrides for a specific node or group of nodes and is optional. Must be used with the global `mediatedDeviceTypes` configuration.
97129

98-
. Expose the mediated devices to the cluster by adding the `mdevNameSelector` and `resourceName` values to the
99-
`spec.permittedHostDevices.mediatedDevices` stanza of the `HyperConverged` CR:
100-
+
101-
.Example YAML snippet
102-
[source,yaml]
103-
----
104-
# ...
105-
permittedHostDevices:
106-
mediatedDevices:
107-
- mdevNameSelector: GRID T4-2Q <1>
108-
resourceName: nvidia.com/GRID_T4-2Q <2>
109-
# ...
110-
----
111-
<1> Exposes the mediated devices that map to this value on the host.
112-
<2> Matches the resource name that is allocated on the node.
130+
`mediatedDeviceTypes`:: Specifies an override to the global `mediatedDeviceTypes` configuration for the specified nodes. Required if you use `nodeMediatedDeviceTypes`.
131+
132+
`nodeSelector`:: Specifies the node selector and must include a `key:value` pair. Required if you use `nodeMediatedDeviceTypes`.
133+
134+
`mdevNameSelector`:: Specifies the mediated devices that map to this value on the host.
135+
136+
`resourceName`:: Specifies the matching resource name that is allocated on the node.
113137

114138
. Save your changes and exit the editor.
115139

116140
.Verification
117141

118-
* Optional: Confirm that a device was added to a specific node by running the following command:
142+
* Confirm that the virtual GPU is attached to the node by running the following command:
119143
+
120144
[source,terminal]
121145
----
122-
$ oc describe node <node_name>
146+
$ oc get node <node_name> -o json \
147+
| jq '.status.allocatable \
148+
| with_entries(select(.key | startswith("nvidia.com/"))) \
149+
| with_entries(select(.value != "0"))'
123150
----

0 commit comments

Comments
 (0)