openshift
diff --git a/‎_topic_maps/_topic_map.yml‎
Lines changed: 2 additions & 0 deletions b/‎_topic_maps/_topic_map.yml‎
Lines changed: 2 additions & 0 deletions
diff --git a/‎hardware_accelerators/das-about-dynamic-accelerator-slicer-operator.adoc‎
Lines changed: 81 additions & 0 deletions b/‎hardware_accelerators/das-about-dynamic-accelerator-slicer-operator.adoc‎
Lines changed: 81 additions & 0 deletions
diff --git a/‎modules/das-operator-deploying-workloads.adoc‎
Lines changed: 149 additions & 0 deletions b/‎modules/das-operator-deploying-workloads.adoc‎
Lines changed: 149 additions & 0 deletions
@@ -3576,6 +3576,8 @@ Topics:
   File: gaudi-ai-accelerator
 - Name: Remote Direct Memory Access (RDMA)
   File: rdma-remote-direct-memory-access
+- Name: Dynamic Accelerator Slicer (DAS) Operator
+  File: das-about-dynamic-accelerator-slicer-operator
 ---
 Name: Backup and restore
 Dir: backup_and_restore
 
@@ -0,0 +1,81 @@
+:_mod-docs-content-type: ASSEMBLY
+[id="das-about-dynamic-accelerator-slicer-operator"]
+= Dynamic Accelerator Slicer (DAS) Operator 
+include::_attributes/common-attributes.adoc[]
+:context: das-about-dynamic-accelerator-slicer-operator
+
+toc::[]
+
+:FeatureName: Dynamic Accelerator Slicer Operator
+
+include::snippets/technology-preview.adoc[]
+
+The Dynamic Accelerator Slicer (DAS) Operator allows you to dynamically slice GPU accelerators in {product-title}, instead of relying on statically sliced GPUs defined when the node is booted. This allows you to dynamically slice GPUs based on specific workload demands, ensuring efficient resource utilization. 
+
+Dynamic slicing is useful if you do not know all the accelerator partitions needed in advance on every node on the cluster.
+
+The DAS Operator currently includes a reference implementation for NVIDIA Multi-Instance GPU (MIG) and is designed to support additional technologies such as NVIDIA MPS or GPUs from other vendors in the future.
+
+.Limitations
+
+The following limitations apply when using the Dynamic Accelerator Slicer Operator:
+
+ * You need to identify potential incompatibilities and ensure the system works seamlessly with various GPU drivers and operating systems.
+
+ * The Operator only works with specific MIG compatible NVIDIA GPUs and drivers, such as H100 and A100.
+
+ * The Operator cannot use only a subset of the GPUs of a node.
+
+ * The NVIDIA device plugin cannot be used together with the Dynamic Accelerator Slicer Operator to manage the GPU resources of a cluster.
+
+[NOTE]
+====
+The DAS Operator is designed to work with MIG-enabled GPUs. It allocates MIG slices instead of whole GPUs. Installing the DAS Operator prevents the use of the standard resource request through the NVIDIA device plugin such as `nvidia.com/gpu: "1"`, for allocating the entire GPU.
+====
+
+//Installing the Dynamic Accelerator Slicer Operator
+include::modules/das-operator-installing.adoc[leveloffset=+1]
+
+//Installing the Dynamic Accelerator Slicer Operator using the web console
+include::modules/das-operator-installing-web-console.adoc[leveloffset=+2]
+[role="_additional-resources"]
+.Additional resources
+** xref:../security/cert_manager_operator/cert-manager-operator-install.adoc#cert-manager-operator-install[{cert-manager-operator}]
+** xref:../hardware_enablement/psap-node-feature-discovery-operator.adoc#psap-node-feature-discovery-operator[Node Feature Discovery (NFD) Operator]
+** link:https://docs.nvidia.com/datacenter/cloud-native/openshift/latest/index.html[NVIDIA GPU Operator]
+
+** link:https://docs.redhat.com/en/documentation/openshift_container_platform/4.19/html/specialized_hardware_and_driver_enablement/psap-node-feature-discovery-operator#creating-nfd-cr-web-console_psap-node-feature-discovery-operator[NodeFeatureDiscovery CR]
+
+//Installing the Dynamic Accelerator Slicer Operator using the CLI
+include::modules/das-operator-installing-cli.adoc[leveloffset=+2]
+[role="_additional-resources"]
+.Additional resources
+* xref:../security/cert_manager_operator/cert-manager-operator-install.adoc#cert-manager-operator-install[{cert-manager-operator}]
+* xref:../hardware_enablement/psap-node-feature-discovery-operator.adoc#psap-node-feature-discovery-operator[Node Feature Discovery (NFD) Operator]
+* link:https://docs.nvidia.com/datacenter/cloud-native/openshift/latest/index.html[NVIDIA GPU Operator]
+* link:https://docs.redhat.com/en/documentation/openshift_container_platform/4.19/html/specialized_hardware_and_driver_enablement/psap-node-feature-discovery-operator#creating-nfd-cr-cli_psap-node-feature-discovery-operator[NodeFeatureDiscovery CR]
+
+//Uninstalling the Dynamic Accelerator Slicer Operator
+include::modules/das-operator-uninstalling.adoc[leveloffset=+1]
+
+//Uninstalling the Dynamic Accelerator Slicer Operator using the web console
+include::modules/das-operator-uninstalling-web-console.adoc[leveloffset=+2]
+
+//Uninstalling the Dynamic Accelerator Slicer Operator using the CLI
+include::modules/das-operator-uninstalling-cli.adoc[leveloffset=+2]
+
+//Deploying GPU workloads with the Dynamic Accelerator Slicer Operator
+include::modules/das-operator-deploying-workloads.adoc[leveloffset=+1]
+
+//Troubleshooting DAS Operator
+include::modules/das-operator-troubleshooting.adoc[leveloffset=+1]
+
+[role="_additional-resources"]
+.Additional resources
+* link:https://github.com/kubernetes/kubernetes/issues/128043[Kubernetes issue #128043]
+* xref:../hardware_enablement/psap-node-feature-discovery-operator.adoc#psap-node-feature-discovery-operator[Node Feature Discovery Operator]
+* link:https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/latest/troubleshooting.html[NVIDIA GPU Operator troubleshooting]
+
+
+
+
@@ -0,0 +1,149 @@
+// Module included in the following assemblies:
+//
+// * operators/user/das-dynamic-accelerator-slicer-operator.adoc
+//
+:_mod-docs-content-type: PROCEDURE
+[id="das-operator-deploying-workloads_{context}"]
+= Deploying GPU workloads with the Dynamic Accelerator Slicer Operator
+
+You can deploy workloads that request GPU slices managed by the Dynamic Accelerator Slicer (DAS) Operator. The Operator dynamically partitions GPU accelerators and schedules workloads to available GPU slices.
+
+.Prerequisites
+
+* You have MIG supported GPU hardware available in your cluster.
+* The NVIDIA GPU Operator is installed and the `ClusterPolicy` shows a **Ready** state.
+* You have installed the DAS Operator.
+
+.Procedure
+
+. Create a namespace by running the following command:
++
+[source,terminal]
+----
+oc new-project cuda-workloads
+----
+
+. Create a deployment that requests GPU resources using the NVIDIA MIG resource:
++
+[source,yaml]
+----
+apiVersion: apps/v1
+kind: Deployment
+metadata:
+  name: cuda-vectoradd
+spec:
+  replicas: 2
+  selector:
+    matchLabels:
+      app: cuda-vectoradd
+  template:
+    metadata:
+      labels:
+        app: cuda-vectoradd
+    spec:
+      restartPolicy: Always
+      containers:
+      - name: cuda-vectoradd
+        image: nvcr.io/nvidia/k8s/cuda-sample:vectoradd-cuda12.5.0-ubi8
+        resources:
+          limits:
+            nvidia.com/mig-1g.5gb: "1"
+        command:
+          - sh
+          - -c
+          - |
+            env && /cuda-samples/vectorAdd && sleep 3600
+----
+
+. Apply the deployment configuration by running the following command:
++
+[source,terminal]
+----
+$ oc apply -f cuda-vectoradd-deployment.yaml
+----
+
+. Verify that the deployment is created and pods are scheduled by running the following command:
++
+[source,terminal]
+----
+$ oc get deployment cuda-vectoradd
+----
++
+.Example output
+[source,terminal]
+----
+NAME             READY   UP-TO-DATE   AVAILABLE   AGE
+cuda-vectoradd   2/2     2            2           2m
+----
+
+. Check the status of the pods by running the following command:
++
+[source,terminal]
+----
+$ oc get pods -l app=cuda-vectoradd
+----
++
+.Example output
+[source,terminal]
+----
+NAME                              READY   STATUS    RESTARTS   AGE
+cuda-vectoradd-6b8c7d4f9b-abc12   1/1     Running   0          2m
+cuda-vectoradd-6b8c7d4f9b-def34   1/1     Running   0          2m
+----
+
+.Verification
+
+. Check that `AllocationClaim` resources were created for your deployment pods by running the following command:
++
+[source,terminal]
+----
+$ oc get allocationclaims -n das-operator
+----
++
+.Example output
+[source,terminal]
+----
+NAME                                                                                           AGE
+13950288-57df-4ab5-82bc-6138f646633e-harpatil000034jma-qh5fm-worker-f-57md9-cuda-vectoradd-0   2m
+ce997b60-a0b8-4ea4-9107-cf59b425d049-harpatil000034jma-qh5fm-worker-f-fl4wg-cuda-vectoradd-0   2m
+----
+
+. Verify that the GPU slices are properly allocated by checking one of the pod's resource allocation by running the following command:
++
+[source,terminal]
+----
+$ oc describe pod -l app=cuda-vectoradd
+----
+
+. Check the logs to verify the CUDA sample application runs successfully by running the following command:
++
+[source,terminal]
+----
+$ oc logs -l app=cuda-vectoradd
+----
++
+.Example output
+[source,terminal]
+----
+[Vector addition of 50000 elements]
+Copy input data from the host memory to the CUDA device
+CUDA kernel launch with 196 blocks of 256 threads
+Copy output data from the CUDA device to the host memory
+Test PASSED
+----
+
+. Check the environment variables to verify that the GPU devices are properly exposed to the container by running the following command:
++
+[source,terminal]
+----
+$ oc exec deployment/cuda-vectoradd -- env | grep -E "(NVIDIA_VISIBLE_DEVICES|CUDA_VISIBLE_DEVICES)"
+----
++
+.Example output
+[source,terminal]
+----
+NVIDIA_VISIBLE_DEVICES=MIG-d8ac9850-d92d-5474-b238-0afeabac1652
+CUDA_VISIBLE_DEVICES=MIG-d8ac9850-d92d-5474-b238-0afeabac1652
+----
++
+These environment variables indicate that the GPU MIG slice has been properly allocated and is visible to the CUDA runtime within the container.