From f2ae9b7457c889fec2212060d82b4b9f0f39cbb6 Mon Sep 17 00:00:00 2001 From: Aleksei Sviridkin Date: Thu, 28 May 2026 20:54:35 +0300 Subject: [PATCH] docs(operations): add containerized GPU workloads guide MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Document the new container variant of cozystack.gpu-operator, paired with cozystack/cozystack#2766. Covers the apt-installed-driver-and-toolkit Linux shape that the variant targets: when to pick it over the passthrough and vGPU variants, prerequisites (host driver + host nvidia-container-toolkit registered with containerd via nvidia-ctk runtime configure, validated with nvidia-smi over kubectl debug), the host-driver reuse path (driver.enabled=false, so the operator uses the pre-installed driver at its standard location with no driverInstallDir override needed on a stock apt install), the Talos caveat with a pointer to the values-native-talos.yaml reference, install steps, a sample CUDA pod for verification, the variant comparison matrix, and a note on why stacking HAMi directly on the container variant on the management cluster is not a supported combination yet (both register nvidia.com/gpu). Lands under operations/ — symmetric with virtualization/gpu.md (VM passthrough on management cluster) and kubernetes/gpu-sharing.md (HAMi in tenant Kubernetes addons). Assisted-By: Claude Signed-off-by: Aleksei Sviridkin --- .../operations/gpu-container-workloads.md | 138 ++++++++++++++++++ 1 file changed, 138 insertions(+) create mode 100644 content/en/docs/next/operations/gpu-container-workloads.md diff --git a/content/en/docs/next/operations/gpu-container-workloads.md b/content/en/docs/next/operations/gpu-container-workloads.md new file mode 100644 index 00000000..129fb92a --- /dev/null +++ b/content/en/docs/next/operations/gpu-container-workloads.md @@ -0,0 +1,138 @@ +--- +title: "Running Containerized GPU Workloads" +linkTitle: "GPU Containers" +description: "Run CUDA pods and other containerized GPU workloads on Cozystack management nodes that ship the NVIDIA driver and container toolkit via the distro package manager." +weight: 160 +--- + +This page covers running GPU workloads in regular Kubernetes pods (CUDA, ML training, inference) on Cozystack management cluster nodes. It targets the typical Linux GPU node shape — `apt`-installed NVIDIA driver plus `nvidia-container-toolkit` on Ubuntu/Debian — and uses the `container` variant of the `cozystack.gpu-operator` package. Other distros with an equivalent driver + toolkit package layout should work the same way but are not regularly tested. + +If instead you want to pass whole GPUs to KubeVirt VMs, see [GPU Passthrough](/docs/next/virtualization/gpu/) and [GPU Sharing with HAMi](/docs/next/kubernetes/gpu-sharing/) (HAMi provides fractional sharing in tenant Kubernetes clusters; stacking it directly on the `container` variant on the management cluster is not a supported combination yet — see [Fractional GPU sharing](#fractional-gpu-sharing) below). + +## When to pick this variant + +The `cozystack.gpu-operator` package exposes three architectural variants. Pick `container` when **all** of the following are true: + +- The host already runs the NVIDIA driver, installed via the distro package manager (`apt install nvidia-driver-*` on Ubuntu/Debian; other distros with an equivalent driver package should work the same way but are not regularly tested). The operator must not load its own kernel module. +- The host already has `nvidia-container-toolkit` installed (`apt install nvidia-container-toolkit`) and registered with containerd. The operator must not deploy its own toolkit DaemonSet — that would overwrite the `/etc/containerd/config.toml` the host configured (via `nvidia-ctk runtime configure`), breaking the host runtime wiring. +- You want GPUs exposed to containers as `nvidia.com/gpu`, not passed through to KubeVirt VMs. + +The other two variants exist for the opposite host shape: `default` (passthrough) unbinds the host driver and binds `vfio-pci` for VM passthrough, and `vgpu` requires the proprietary NVIDIA vGPU host driver plus a license server. Neither path produces a working setup on a host that already ships the driver and container toolkit through apt — the operator and the host install fight each other. + +## Prerequisites + +- A Cozystack management cluster with at least one GPU-enabled node. +- The GPU node runs Ubuntu or Debian with the NVIDIA driver installed via the distro package manager (other distros with an equivalent driver + toolkit package layout should work the same way but are not regularly tested). Verify with `nvidia-smi` over SSH or `kubectl debug node/` — it must enumerate the physical GPUs and report a working driver version. +- `nvidia-container-toolkit` installed on the same node and registered with containerd. `apt install nvidia-container-toolkit` lays down binaries only — it does not configure containerd. Register the runtime explicitly: + + ```bash + sudo nvidia-ctk runtime configure --runtime=containerd + sudo systemctl restart containerd + grep nvidia /etc/containerd/config.toml # must show the runtime entry + ``` + +- The GPU node must not carry a `nvidia.com/gpu.workload.config` label left over from the passthrough setup (`kubectl label node nvidia.com/gpu.workload.config-` to remove). The `container` variant relies on the upstream default `container` workload for unlabeled nodes; a leftover `vm-passthrough` label overrides that per-node and the device plugin will not serve the GPU. +- `kubectl` configured against the management cluster. + +With `driver.enabled=false` the operator uses the pre-installed host driver at its standard location, so on a stock Ubuntu/Debian install no `hostPaths.driverInstallDir` override is needed. Talos installs the driver under a non-standard prefix, so the operator does not find it at the default location and requires a different starting point — see `packages/system/gpu-operator/examples/values-native-talos.yaml` in the [cozystack repo](https://github.com/cozystack/cozystack) for a working reference with the compat DaemonSet and the matching `driverInstallDir` override. + +## 1. Install the GPU Operator (container variant) + +**Do not** add `cozystack.gpu-operator` to `bundles.enabledPackages` for this variant. The platform Helm chart's optional-package template hardcodes `spec.variant: default` for every name in `enabledPackages` and reconciles the resulting `Package` CR under Helm ownership — any user `Package` CR with `variant: container` is overwritten on the next reconcile. Apply the `Package` CR directly instead; the cozystack platform controller installs it without the bundle entry. + +Apply a `Package` CR with `variant: container`: + +```yaml +apiVersion: cozystack.io/v1alpha1 +kind: Package +metadata: + name: cozystack.gpu-operator +spec: + variant: container +``` + +```bash +kubectl apply -f gpu-operator-container.yaml +``` + +The platform controller resolves the variant against the `PackageSource` (`packages/core/platform/sources/gpu-operator.yaml`), pulls `values.yaml` + `values-container.yaml` from the OCI repository, and installs the chart into `cozy-gpu-operator`. + +## 2. Verify the operator is healthy + +All pods in the `cozy-gpu-operator` namespace should reach `Running`: + +```bash +kubectl get pods --namespace cozy-gpu-operator +``` + +Example output (pod names will vary): + +```console +NAME READY STATUS RESTARTS AGE +gpu-feature-discovery-7jpzv 1/1 Running 0 2m +gpu-operator-7976b5b8fb-xqg2z 1/1 Running 0 3m +nvidia-cuda-validator-tjkfh 0/1 Completed 0 2m +nvidia-dcgm-exporter-rmpfg 1/1 Running 0 2m +nvidia-device-plugin-daemonset-cqj9w 1/1 Running 0 2m +nvidia-operator-validator-q5n4k 1/1 Running 0 3m +``` + +The `container` variant does **not** spawn `nvidia-driver-daemonset`, `nvidia-container-toolkit-daemonset`, or `nvidia-vfio-manager` — all three are pinned off by design. + +The node should advertise `nvidia.com/gpu` as an allocatable resource: + +```bash +kubectl describe node +``` + +```console +... +Capacity: + ... + nvidia.com/gpu: 2 + ... +Allocatable: + ... + nvidia.com/gpu: 2 +... +``` + +## 3. Run a sample CUDA pod + +Create a pod that requests one GPU and runs `nvidia-smi`: + +```yaml +apiVersion: v1 +kind: Pod +metadata: + name: cuda-smoke +spec: + restartPolicy: OnFailure + containers: + - name: cuda + image: nvcr.io/nvidia/cuda:12.4.1-base-ubuntu22.04 + command: ["nvidia-smi"] + resources: + limits: + nvidia.com/gpu: 1 +``` + +```bash +kubectl apply -f cuda-smoke.yaml +kubectl wait --for=jsonpath='{.status.phase}'=Succeeded pod/cuda-smoke --timeout=5m +kubectl logs cuda-smoke +``` + +The output should enumerate the GPU(s) visible to the pod and report the driver version that the host runs. + +## Fractional GPU sharing + +The `container` variant exposes whole GPUs through the upstream NVIDIA device plugin. For fractional sharing (per-pod memory and compute quotas), see [GPU Sharing with HAMi](/docs/next/kubernetes/gpu-sharing/) — currently documented for tenant Kubernetes clusters, where enabling HAMi automatically disables the GPU Operator's built-in device plugin to avoid resource-registration conflicts. Stacking the `cozystack.hami` package directly on top of the `container` variant on the management cluster is not a supported combination yet: this variant pins the NVIDIA device plugin on, and HAMi ships its own device plugin, so the two would both register `nvidia.com/gpu`. The `cozystack.hami` PackageSource only declares `dependsOn: cozystack.gpu-operator` for install ordering — it does not disable the operator's device plugin the way the tenant `kubernetes` app chart does. + +## Variant comparison + +| Workload shape | Variant | Host driver | Host container toolkit | Notes | +| --- | --- | --- | --- | --- | +| Containers (CUDA pods, ML) | `container` | required | required | This page | +| Whole GPU to one VM | `default` | must NOT be loaded — operator binds `vfio-pci` | not used | [GPU Passthrough](/docs/next/virtualization/gpu/) | +| Sliced GPU to multiple VMs | `vgpu` | proprietary NVIDIA vGPU host driver | not used | Requires NVIDIA vGPU license + a Delegated License Service endpoint |