|
| 1 | +// Module included in the following assemblies: |
| 2 | +// |
| 3 | +// * networking/networking_operators/nw-dpu-running-workloads.adoc |
| 4 | + |
| 5 | +:_mod-docs-content-type: PROCEDURE |
| 6 | +[id="nw-dpu-monitoring-status_{context}"] |
| 7 | += Monitoring the status of DPU |
| 8 | + |
| 9 | +[role="_abstract"] |
| 10 | +You can monitor the DPU infrastructure status to check the current state and health of your DPU devices across the cluster. |
| 11 | + |
| 12 | +You can monitor the DPU status to see the current state of the DPU infrastructure. |
| 13 | + |
| 14 | +The `oc get dpu` command shows the current state of the DPU infrastructure. Follow this procedure to monitor the status of various cards. |
| 15 | + |
| 16 | +.Prerequisites |
| 17 | + |
| 18 | +* The OpenShift CLI (`oc`) is installed. |
| 19 | +* An account with `cluster-admin` privileges is available. |
| 20 | +* The DPU Operator is installed. |
| 21 | +
|
| 22 | +.Procedure |
| 23 | + |
| 24 | +. Run the following command to check the overall health of your nodes: |
| 25 | ++ |
| 26 | +[source,terminal] |
| 27 | +---- |
| 28 | +$ oc get nodes |
| 29 | +---- |
| 30 | ++ |
| 31 | +The example output provides a list of all nodes in the cluster along with their status. Ensure that all nodes are in the `Ready` state before proceeding. |
| 32 | ++ |
| 33 | +[source,terminal] |
| 34 | +---- |
| 35 | +NAME STATUS ROLES AGE VERSION |
| 36 | +ocpcluster-master-1 Ready master 10d v1.32.9 |
| 37 | +ocpcluster-master-2 Ready master 10d v1.32.9 |
| 38 | +ocpcluster-master-3 Ready master 10d v1.32.9 |
| 39 | +ocpcluster-dpu-ipu-219 Ready worker 42h v1.32.9 |
| 40 | +ocpcluster-dpu-marvell-41 Ready worker 3d23h v1.32.9 |
| 41 | +ocpcluster-dpu-ptl-243 Ready worker 3d23h v1.32.9 |
| 42 | +worker-host-ipu-219 Ready worker 3d19h v1.32.9 |
| 43 | +worker-host-marvell-41 Ready worker 4d v1.32.9 |
| 44 | +worker-host-ptl-243 Ready worker 3d23h v1.32.9 |
| 45 | +---- |
| 46 | ++ |
| 47 | +This output shows three master nodes, and three worker nodes identified by the worker-host prefix, for example, `worker-host-ipu-219`. Each worker node contains a DPU identified by the ocpcluster-dpu prefix, for example, `ocpcluster-dpu-ipu-219`. |
| 48 | + |
| 49 | +. Run the following command to report on the status of the DPUs: |
| 50 | ++ |
| 51 | +[source,terminal] |
| 52 | +---- |
| 53 | +$ oc get dpu |
| 54 | +---- |
| 55 | ++ |
| 56 | +The example output provides a list of detected DPUs. |
| 57 | ++ |
| 58 | +[source,terminal] |
| 59 | +---- |
| 60 | +NAME DPU PRODUCT DPU SIDE MODE NAME STATUS |
| 61 | +030001163eec00ff-host Intel Netsec Accelerator false worker-host-ptl-243 True |
| 62 | +d4-e5-c9-00-ec-3v-dpu Intel Netsec Accelerator true worker-dpu-ptl-243 True |
| 63 | +intel-ipu-0000-06-00.0-host Intel IPU E2100 false worker-host-ipu-219 False |
| 64 | +intel-ipu-dpu Intel IPU E2100 true worker-dpu-ipu-219 False |
| 65 | +marvell-dpu-0000-87-00.0-host Marvell DPU false worker-host-marvell-41 True |
| 66 | +marvell-dpu-ipu Marvell DPU true worker-dpu-marvell-41 True |
| 67 | +---- |
| 68 | +* `DPU PRODUCT`:Displays the vendor or type of DPU, for example, Intel or Marvell. |
| 69 | +* `DPU SIDE`:Indicates whether the DPU is operating on the host side (`false`) or the DPU side (`true`). Each physical DPU is represented twice. |
| 70 | +* `MODE NAME`:The name of the node where the DPU is located. This is the host worker node for `false` entries and the DPU node for `true` entries. |
| 71 | +* `STATUS`:Indicates whether the DPU is functioning correctly (`True`) or has issues (`False`). |
| 72 | ++ |
| 73 | +[NOTE] |
| 74 | +==== |
| 75 | +Run `oc get dpu -o yaml` to get more details about the status. |
| 76 | +==== |
0 commit comments