From f2ae9b7457c889fec2212060d82b4b9f0f39cbb6 Mon Sep 17 00:00:00 2001
From: Aleksei Sviridkin <f@lex.la>
Date: Thu, 28 May 2026 20:54:35 +0300
Subject: [PATCH] docs(operations): add containerized GPU workloads guide
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Document the new container variant of cozystack.gpu-operator, paired with
cozystack/cozystack#2766. Covers the apt-installed-driver-and-toolkit
Linux shape that the variant targets: when to pick it over the
passthrough and vGPU variants, prerequisites (host driver + host
nvidia-container-toolkit registered with containerd via
nvidia-ctk runtime configure, validated with nvidia-smi over
kubectl debug), the host-driver reuse path (driver.enabled=false, so the
operator uses the pre-installed driver at its standard location with no
driverInstallDir override needed on a stock apt install), the Talos
caveat with a pointer to the values-native-talos.yaml reference, install
steps, a sample CUDA pod for verification, the variant comparison
matrix, and a note on why stacking HAMi directly on the container
variant on the management cluster is not a supported combination yet
(both register nvidia.com/gpu).

Lands under operations/ — symmetric with virtualization/gpu.md (VM
passthrough on management cluster) and kubernetes/gpu-sharing.md (HAMi
in tenant Kubernetes addons).

Assisted-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
---
 .../operations/gpu-container-workloads.md     | 138 ++++++++++++++++++
 1 file changed, 138 insertions(+)
 create mode 100644 content/en/docs/next/operations/gpu-container-workloads.md

diff --git a/content/en/docs/next/operations/gpu-container-workloads.md b/content/en/docs/next/operations/gpu-container-workloads.md
new file mode 100644
index 00000000..129fb92a
--- /dev/null
+++ b/content/en/docs/next/operations/gpu-container-workloads.md
@@ -0,0 +1,138 @@
+---
+title: "Running Containerized GPU Workloads"
+linkTitle: "GPU Containers"
+description: "Run CUDA pods and other containerized GPU workloads on Cozystack management nodes that ship the NVIDIA driver and container toolkit via the distro package manager."
+weight: 160
+---
+
+This page covers running GPU workloads in regular Kubernetes pods (CUDA, ML training, inference) on Cozystack management cluster nodes. It targets the typical Linux GPU node shape — `apt`-installed NVIDIA driver plus `nvidia-container-toolkit` on Ubuntu/Debian — and uses the `container` variant of the `cozystack.gpu-operator` package. Other distros with an equivalent driver + toolkit package layout should work the same way but are not regularly tested.
+
+If instead you want to pass whole GPUs to KubeVirt VMs, see [GPU Passthrough](/docs/next/virtualization/gpu/) and [GPU Sharing with HAMi](/docs/next/kubernetes/gpu-sharing/) (HAMi provides fractional sharing in tenant Kubernetes clusters; stacking it directly on the `container` variant on the management cluster is not a supported combination yet — see [Fractional GPU sharing](#fractional-gpu-sharing) below).
+
+## When to pick this variant
+
+The `cozystack.gpu-operator` package exposes three architectural variants. Pick `container` when **all** of the following are true:
+
+- The host already runs the NVIDIA driver, installed via the distro package manager (`apt install nvidia-driver-*` on Ubuntu/Debian; other distros with an equivalent driver package should work the same way but are not regularly tested). The operator must not load its own kernel module.
+- The host already has `nvidia-container-toolkit` installed (`apt install nvidia-container-toolkit`) and registered with containerd. The operator must not deploy its own toolkit DaemonSet — that would overwrite the `/etc/containerd/config.toml` the host configured (via `nvidia-ctk runtime configure`), breaking the host runtime wiring.
+- You want GPUs exposed to containers as `nvidia.com/gpu`, not passed through to KubeVirt VMs.
+
+The other two variants exist for the opposite host shape: `default` (passthrough) unbinds the host driver and binds `vfio-pci` for VM passthrough, and `vgpu` requires the proprietary NVIDIA vGPU host driver plus a license server. Neither path produces a working setup on a host that already ships the driver and container toolkit through apt — the operator and the host install fight each other.
+
+## Prerequisites
+
+- A Cozystack management cluster with at least one GPU-enabled node.
+- The GPU node runs Ubuntu or Debian with the NVIDIA driver installed via the distro package manager (other distros with an equivalent driver + toolkit package layout should work the same way but are not regularly tested). Verify with `nvidia-smi` over SSH or `kubectl debug node/<node-name>` — it must enumerate the physical GPUs and report a working driver version.
+- `nvidia-container-toolkit` installed on the same node and registered with containerd. `apt install nvidia-container-toolkit` lays down binaries only — it does not configure containerd. Register the runtime explicitly:
+
+  ```bash
+  sudo nvidia-ctk runtime configure --runtime=containerd
+  sudo systemctl restart containerd
+  grep nvidia /etc/containerd/config.toml   # must show the runtime entry
+  ```
+
+- The GPU node must not carry a `nvidia.com/gpu.workload.config` label left over from the passthrough setup (`kubectl label node <node-name> nvidia.com/gpu.workload.config-` to remove). The `container` variant relies on the upstream default `container` workload for unlabeled nodes; a leftover `vm-passthrough` label overrides that per-node and the device plugin will not serve the GPU.
+- `kubectl` configured against the management cluster.
+
+With `driver.enabled=false` the operator uses the pre-installed host driver at its standard location, so on a stock Ubuntu/Debian install no `hostPaths.driverInstallDir` override is needed. Talos installs the driver under a non-standard prefix, so the operator does not find it at the default location and requires a different starting point — see `packages/system/gpu-operator/examples/values-native-talos.yaml` in the [cozystack repo](https://github.com/cozystack/cozystack) for a working reference with the compat DaemonSet and the matching `driverInstallDir` override.
+
+## 1. Install the GPU Operator (container variant)
+
+**Do not** add `cozystack.gpu-operator` to `bundles.enabledPackages` for this variant. The platform Helm chart's optional-package template hardcodes `spec.variant: default` for every name in `enabledPackages` and reconciles the resulting `Package` CR under Helm ownership — any user `Package` CR with `variant: container` is overwritten on the next reconcile. Apply the `Package` CR directly instead; the cozystack platform controller installs it without the bundle entry.
+
+Apply a `Package` CR with `variant: container`:
+
+```yaml
+apiVersion: cozystack.io/v1alpha1
+kind: Package
+metadata:
+  name: cozystack.gpu-operator
+spec:
+  variant: container
+```
+
+```bash
+kubectl apply -f gpu-operator-container.yaml
+```
+
+The platform controller resolves the variant against the `PackageSource` (`packages/core/platform/sources/gpu-operator.yaml`), pulls `values.yaml` + `values-container.yaml` from the OCI repository, and installs the chart into `cozy-gpu-operator`.
+
+## 2. Verify the operator is healthy
+
+All pods in the `cozy-gpu-operator` namespace should reach `Running`:
+
+```bash
+kubectl get pods --namespace cozy-gpu-operator
+```
+
+Example output (pod names will vary):
+
+```console
+NAME                                                          READY   STATUS    RESTARTS   AGE
+gpu-feature-discovery-7jpzv                                   1/1     Running   0          2m
+gpu-operator-7976b5b8fb-xqg2z                                 1/1     Running   0          3m
+nvidia-cuda-validator-tjkfh                                   0/1     Completed 0          2m
+nvidia-dcgm-exporter-rmpfg                                    1/1     Running   0          2m
+nvidia-device-plugin-daemonset-cqj9w                          1/1     Running   0          2m
+nvidia-operator-validator-q5n4k                               1/1     Running   0          3m
+```
+
+The `container` variant does **not** spawn `nvidia-driver-daemonset`, `nvidia-container-toolkit-daemonset`, or `nvidia-vfio-manager` — all three are pinned off by design.
+
+The node should advertise `nvidia.com/gpu` as an allocatable resource:
+
+```bash
+kubectl describe node <node-name>
+```
+
+```console
+...
+Capacity:
+  ...
+  nvidia.com/gpu:         2
+  ...
+Allocatable:
+  ...
+  nvidia.com/gpu:         2
+...
+```
+
+## 3. Run a sample CUDA pod
+
+Create a pod that requests one GPU and runs `nvidia-smi`:
+
+```yaml
+apiVersion: v1
+kind: Pod
+metadata:
+  name: cuda-smoke
+spec:
+  restartPolicy: OnFailure
+  containers:
+  - name: cuda
+    image: nvcr.io/nvidia/cuda:12.4.1-base-ubuntu22.04
+    command: ["nvidia-smi"]
+    resources:
+      limits:
+        nvidia.com/gpu: 1
+```
+
+```bash
+kubectl apply -f cuda-smoke.yaml
+kubectl wait --for=jsonpath='{.status.phase}'=Succeeded pod/cuda-smoke --timeout=5m
+kubectl logs cuda-smoke
+```
+
+The output should enumerate the GPU(s) visible to the pod and report the driver version that the host runs.
+
+## Fractional GPU sharing
+
+The `container` variant exposes whole GPUs through the upstream NVIDIA device plugin. For fractional sharing (per-pod memory and compute quotas), see [GPU Sharing with HAMi](/docs/next/kubernetes/gpu-sharing/) — currently documented for tenant Kubernetes clusters, where enabling HAMi automatically disables the GPU Operator's built-in device plugin to avoid resource-registration conflicts. Stacking the `cozystack.hami` package directly on top of the `container` variant on the management cluster is not a supported combination yet: this variant pins the NVIDIA device plugin on, and HAMi ships its own device plugin, so the two would both register `nvidia.com/gpu`. The `cozystack.hami` PackageSource only declares `dependsOn: cozystack.gpu-operator` for install ordering — it does not disable the operator's device plugin the way the tenant `kubernetes` app chart does.
+
+## Variant comparison
+
+| Workload shape | Variant | Host driver | Host container toolkit | Notes |
+| --- | --- | --- | --- | --- |
+| Containers (CUDA pods, ML) | `container` | required | required | This page |
+| Whole GPU to one VM | `default` | must NOT be loaded — operator binds `vfio-pci` | not used | [GPU Passthrough](/docs/next/virtualization/gpu/) |
+| Sliced GPU to multiple VMs | `vgpu` | proprietary NVIDIA vGPU host driver | not used | Requires NVIDIA vGPU license + a Delegated License Service endpoint |