Skip to content

initial workspace#147

Merged
bennyz merged 13 commits into
centos-automotive-suite:mainfrom
bennyz:app-building
Mar 24, 2026
Merged

initial workspace#147
bennyz merged 13 commits into
centos-automotive-suite:mainfrom
bennyz:app-building

Conversation

@bennyz
Copy link
Copy Markdown
Contributor

@bennyz bennyz commented Mar 9, 2026

Summary by CodeRabbit

  • New Features

    • Full workspace lifecycle: create/list/show/delete, start/stop, sync (plan & upload), exec, interactive shell, and deploy.
  • CLI

    • New workspace command; new --workspace and repeatable --extra-repo flags; optional wait-for-Running behavior and workspace-aware build flags.
  • Build Integration

    • Workspace-aware builds with extra-repo forwarding, workspace reference, persistent build-cache option, and lease propagation.
  • Workspace Infrastructure

    • Workspace CRD/controller: automatic PVC/Pod provisioning, phases, tmpfs option, storage/node-selector and resource defaults.
  • Tooling

    • New developer toolchain image with entrypoint and wrapper scripts, rootless Podman support, Jumpstarter and ccache configured.
  • Security

    • Updated SCC/RBAC handling and workspace service account support.
  • Tests

    • New API, client, and controller tests covering workspace flows.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 9, 2026

Caution

Review failed

Pull request was closed or merged during review

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds a Workspace CRD, controller, API endpoints, client and CLI; integrates workspaces into build flows (extra repos, persistent cache, build-cache PVCs); deploys workspace infra and RBAC/SCC updates; provides a toolchain Containerfile, container wrapper and workspace entry scripts; updates Tekton params, build scripts, tests, and CLI wiring.

Changes

Cohort / File(s) Summary
Workspace API & server
internal/buildapi/workspace.go, internal/buildapi/server.go, internal/buildapi/workspace_test.go
New /v1/workspaces REST surface (create/list/get/delete/start/stop/sync/plan/exec/shell/deploy/lease); workspace resolution for builds; in-pod file server, tar upload/extract, exec/shell streaming, deploy via Jumpstarter; auth/validation and tests.
Build API client & types
internal/buildapi/client/client.go, internal/buildapi/types.go, internal/buildapi/client/client_test.go
Client methods for workspace lifecycle, sync/upload, exec/deploy streaming, and WebSocket shell; BuildRequest adds extraRepos and workspace; client tests for start/stop and error cases.
K8s Workspace CRD & deepcopies
api/v1alpha1/workspace_types.go, api/v1alpha1/zz_generated.deepcopy.go, config/crd/kustomization.yaml
Adds Workspace CRD types, status/print-columns, deepcopy implementations, scheme registration, and includes CRD in kustomize.
Workspace controller & tests
internal/controller/workspace/controller.go, internal/controller/workspace/controller_test.go
New controller ensures PVC + Pod, handles stopped state, tmpfs, client-config mounts, lease/device annotations, status mapping; SetupWithManager and unit tests.
OperatorConfig & operator controller changes
api/v1alpha1/operatorconfig_types.go, internal/controller/operatorconfig/controller.go, internal/controller/operatorconfig/resources.go
Adds WorkspacesConfig defaults/getters and OSBuilds.StorageClass; deploys/cleans workspace infra and SCC/RoleBindings; switches to k8s.io/utils/ptr; RBAC updates.
ImageBuild controller integration
internal/controller/imagebuild/controller.go, api/v1alpha1/imagebuild_types.go
ImageBuild gains buildCachePVC and workspace; controller propagates build lease back to Workspace; Tekton param use-persistent-cache and shared-workspace PVC binding.
CLI: caib workspace & wiring
cmd/caib/workspace/workspace.go, cmd/caib/root.go, cmd/caib/*
Adds caib workspace command and subcommands (create/list/show/delete/start/stop/sync/exec/shell/deploy), flags, polling/wait and upload/tty helpers; registers command.
Build CLI & runtime wiring
cmd/caib/buildcmd/build.go, cmd/caib/image/image.go, cmd/caib/runtime_wiring.go, cmd/caib/main.go, cmd/caib/buildcmd/logs.go
Threads --extra-repo and --workspace through runtime/options and build requests; optionally calls SetWorkspaceLease after build completion.
Tekton tasks & build scripts
internal/common/tasks/tasks.go, internal/common/tasks/scripts/build_image.sh, internal/common/tasks/scripts/find_manifest.sh, internal/common/tasks/scripts/push_artifact.sh
Adds use-persistent-cache task/pipeline param; persistent BUILD_DIR handling and cache logic; manifest/partition copy filters; reduced debug output.
Toolchain container & entry scripts
Containerfile.toolchain, hack/container-wrapper.sh, hack/workspace-entrypoint.sh, bundle.Dockerfile
New multi-stage toolchain Containerfile, container wrapper and workspace entrypoint for rootless workspaces; installs toolchain, configures ccache/podman, and updates image label.
RBAC / kustomize / SCC files
config/rbac/*, config/rbac/scc_role.yaml, config/rbac/scc_role_binding.yaml, config/rbac/kustomization.yaml
Adds workspace ServiceAccount/SCC constructs and rebuilds RBAC; removes old scc_role and binding from kustomize resources; HAProxy route target-port tweak.
Misc scripts & packaging
hack/deploy-catalog.sh, hack/container-wrapper.sh, hack/workspace-entrypoint.sh
mktemp temp-file naming tweak in deploy script; add container wrapper and workspace entrypoint scripts; minor packaging label update.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant CLI
    participant APIServer
    participant K8s as Kubernetes
    participant Pod

    rect rgba(100,150,200,0.5)
    Note over User,APIServer: Create workspace
    User->>CLI: caib workspace create (params)
    CLI->>APIServer: POST /v1/workspaces
    APIServer->>K8s: create Workspace CR (and optional Secret)
    APIServer-->>CLI: 201 WorkspaceResponse
    end

    rect rgba(150,100,200,0.5)
    Note over K8s,Pod: Controller provisions resources
    K8s->>K8s: Workspace Controller observes CR
    K8s->>K8s: Ensure PVC exists (create if needed)
    K8s->>K8s: Create Pod mounting PVC (toolchain image)
    K8s-->>APIServer: Status updates (Running/Failed)
    end

    rect rgba(200,150,100,0.5)
    Note over User,Pod: Sync / Shell / Exec flows
    User->>CLI: caib workspace sync / shell / exec
    CLI->>APIServer: POST /v1/workspaces/:name/sync or WS /shell or POST /exec
    APIServer->>Pod: podExec / SPDY attach / copy tar / start tunnel
    Pod-->>APIServer: streamed output / success
    APIServer-->>CLI: proxied stream/200 OK
    end
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related PRs

Suggested reviewers

  • bkhizgiy

Poem

🐰 I nibble PVCs and plant a pod,
I hop through tars and stream each clod,
Toolchains sprout where workspaces rest,
Ccache grins and builds its best —
A tiny hop, and CI’s impressed!

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 21.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The title 'initial workspace' is vague and generic, using a non-descriptive term that doesn't clearly convey the specific changes in this substantial PR. Use a more descriptive title that captures the main intent—e.g., 'Add Workspace CRD and controller implementation' or 'Implement developer workspace management feature'.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 10

🧹 Nitpick comments (2)
internal/buildapi/workspace_test.go (1)

61-112: These specs don't actually exercise the validation branches in their names.

The workspace group applies authMiddleware() before the handlers run, so each case returns 401 before JSON binding or field validation happens. As written, these tests only prove that auth runs first; if you want coverage for missing name / bad JSON / missing command / missing deploy fields, inject an authenticated request or stub auth out in this suite.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/buildapi/workspace_test.go` around lines 61 - 112, The tests in the
"Create Workspace"/"Exec Workspace"/"Deploy Workspace" contexts are failing to
exercise JSON binding and validation because authMiddleware() runs first and
returns 401; update these specs to authenticate the requests (or stub/bypass
authMiddleware) so the handler runs and validation is executed: either add a
valid auth header/token to the requests created for WorkspaceRequest,
WorkspaceExecRequest, and WorkspaceDeployRequest (in the It blocks) or
replace/mock authMiddleware for this suite with a no-op middleware so
server.router.ServeHTTP reaches the handler and triggers JSON parsing and field
validation.
cmd/caib/workspace/workspace.go (1)

376-387: sync materializes the full archive in RAM.

For larger repos this makes the CLI pay an O(repo size) memory spike before the upload even starts. A streaming tar writer plus a counting/progress reader keeps the UX the same without buffering the whole workspace locally.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cmd/caib/workspace/workspace.go` around lines 376 - 387, The current code
buffers the entire tar into bytes.Buffer via tarTrackedFiles then builds a
progress reader from buf, causing an O(repo size) memory spike; instead stream
the tar through an io.Pipe: create pr, pw := io.Pipe(), launch
tarTrackedFiles(absDir, files, pw) in a goroutine that closes pw (and calls
pw.CloseWithError on error) and capture its error, then wrap the pipe reader
with newProgressReader (or a counting reader) and pass that reader directly into
client.SyncWorkspace inside ExecuteWithReauth (keeping SyncWorkspace,
ExecuteWithReauth, tarTrackedFiles and newProgressReader references), ensuring
goroutine errors are propagated back so the upload fails if tar creation fails.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@cmd/caib/workspace/workspace.go`:
- Around line 460-466: The code currently flattens cmdParts into a single shell
string (cmdParts -> command via strings.Join) causing quotes and metacharacters
to be lost; update the code that builds WorkspaceExecRequest to preserve
argument boundaries by passing the original cmdParts slice (e.g., set a
CommandArgs/Args field to cmdParts or change WorkspaceExecRequest.Command to a
[]string) instead of joining into one string, and adjust any
serialization/consumer logic to accept the argv form so arguments are
transmitted verbatim.

In `@Containerfile.toolchain`:
- Around line 21-22: The Containerfile installs Jumpstarter into /root/.local
and sets ENV PATH to reference that, which is inaccessible to random non-root
UIDs on OpenShift; change the RUN that invokes the installer (the line using
"curl ... install.sh | bash -s -- --source main") to pass the install script's
--dir parameter (e.g., --dir /opt/jumpstarter) so the CLI is installed into a
system-accessible location, ensure the container creates and sets appropriate
permissions for /opt/jumpstarter during build, and update the ENV PATH (the
existing ENV PATH="/root/.local/jumpstarter/bin:${PATH}") to point to
/opt/jumpstarter/bin so non-root users can execute the Jumpstarter binary.

In `@internal/buildapi/workspace.go`:
- Around line 669-730: The helpers copyToPod and execInPod currently call
StreamWithContext(context.Background(), ...), ignoring request cancellation;
change both function signatures to accept a context.Context (e.g., copyToPod(ctx
context.Context, restCfg *rest.Config, ...) and execInPod(ctx context.Context,
restCfg *rest.Config, ...)), replace the context.Background() passed to
remotecommand.Executor.StreamWithContext with the incoming ctx, and update all
callers (where sync/exec/deploy handle an http request) to pass
c.Request.Context() into copyToPod and execInPod so SPDY streams honor client
disconnects and deadlines.
- Around line 357-365: The handler currently reads the entire request body into
memory via io.ReadAll (in sync) which can OOM; instead enforce the same max
upload size and stream the body directly into copyToPod: replace io.ReadAll
usage with a size-limited reader (e.g. wrap c.Request.Body with
http.MaxBytesReader or io.LimitReader using your existing maxUploadSize
constant) and pass that reader into copyToPod(restCfg, namespace, podName,
workspaceContainerName, limitedReader, "/workspace/src/"); remove buffering into
tarData and ensure errors from exceeding the limit are handled and returned to
the client.
- Around line 167-175: The code currently reuses ImageBuild.Status.LeaseID when
req.FromBuild is set without verifying ownership; update the call in the handler
to pass the caller identity (requester) into resolveLeaseFromBuild and change
resolveLeaseFromBuild to fetch the ImageBuild and validate the
automotive.sdv.cloud.redhat.com/requested-by annotation matches the requester
before returning ImageBuild.Status.LeaseID, returning an error if it doesn’t
match so unauthorized callers cannot attach another build's lease.
- Around line 177-199: Move creation of the Jumpstarter secret (jmpClientSecret
/ secret and the k8sClient.Create call that currently runs when req.ClientConfig
!= "") to after you perform the duplicate-name/uniqueness checks and after you
successfully create the Workspace CR (the k8sClient.Create(ws) path); do not
treat k8serrors.IsAlreadyExists as a silent success—ensure the name conflict is
surfaced during uniqueness checks and fail early. If you must create the secret
before persisting the Workspace, then delete the secret if k8sClient.Create(ws)
returns an error (call k8sClient.Delete with the same secret) to avoid orphaning
sensitive data; otherwise, set an OwnerReference on the secret (owner = the
created Workspace object) immediately after the Workspace creation so Kubernetes
will garbage-collect the secret with the Workspace. Ensure all references to
jmpClientSecret, secret, and k8sClient.Create are updated accordingly.
- Around line 598-632: The code copies wsConfig.Resources into res via
wsConfig.Resources.DeepCopy(), but DeepCopy can preserve nil Requests/Limits
maps and later assignments to res.Requests[...] or res.Limits[...] will panic;
after creating res (or after wsConfig.Resources.DeepCopy()), ensure res.Requests
and res.Limits are non-nil by initializing them to empty corev1.ResourceList{}
when nil (e.g., check res.Requests == nil and res.Limits == nil) before any use
in the CPU and memory handling paths (affects the res variable created from
corev1.ResourceRequirements and the use sites where
res.Requests[corev1.ResourceCPU], res.Limits[corev1.ResourceCPU],
res.Requests[corev1.ResourceMemory], res.Limits[corev1.ResourceMemory] are
assigned).

In `@internal/controller/workspace/controller.go`:
- Around line 103-127: The PVC creation uses GenerateName (pvcSuffix) which can
create duplicate PVCs if r.Status().Patch fails and ws.Status.PVCName stays
empty; modify the reconcile in this block to first search for an existing PVC
owned by the Workspace (look up PVCs in ws.Namespace with a deterministic name
pattern or by ownerReference) and reuse it instead of always creating a new one,
or switch to a deterministic name (e.g., build pvcName := ws.Name + pvcSuffix)
and use that for the PVC ObjectMeta.Name so retries address the same resource;
ensure you still call controllerutil.SetControllerReference(ws, pvc, r.Scheme)
before r.Create and only create when the lookup does not return an existing
owned PVC, then set ws.Status.PVCName and patch status as before.
- Around line 50-52: The reconcile currently swallows transient errors from
ensurePVC (and ensurePod) by only calling r.setStatus(...) and returning no
error, which prevents controller-runtime from retrying; change the flow so you
still call r.setStatus(ctx, ws, "Failed", fmt.Sprintf("PVC error: %v", err))
(and similarly for Pod errors) but then return the original error (or a wrapped
error) from Reconcile so controller-runtime will requeue and retry; update the
branches handling ensurePVC and ensurePod failures to set status first and then
return the error instead of returning only the result of setStatus.
- Around line 160-163: The current code uses user-controlled ws.Spec.Image
(falling back to automotivev1alpha1.DefaultToolchainImage) and launches pods as
privileged root; change this to validate ws.Spec.Image against an
operator-controlled allowlist (deny anything not explicitly allowed and fall
back to DefaultToolchainImage), and remove Privileged=true and RunAsUser=0 from
the pod's SecurityContext (or set a non-root uid and ensure
allowPrivilegeEscalation=false). Locate the image selection (image :=
ws.Spec.Image / DefaultToolchainImage) and the pod spec creation code (the block
around lines 213-235 that sets SecurityContext/Privileged) and implement: image
validation against the allowlist, enforce a safe non-root SecurityContext, and
ensure the pod runs under a restricted service account with no privileged
capabilities.

---

Nitpick comments:
In `@cmd/caib/workspace/workspace.go`:
- Around line 376-387: The current code buffers the entire tar into bytes.Buffer
via tarTrackedFiles then builds a progress reader from buf, causing an O(repo
size) memory spike; instead stream the tar through an io.Pipe: create pr, pw :=
io.Pipe(), launch tarTrackedFiles(absDir, files, pw) in a goroutine that closes
pw (and calls pw.CloseWithError on error) and capture its error, then wrap the
pipe reader with newProgressReader (or a counting reader) and pass that reader
directly into client.SyncWorkspace inside ExecuteWithReauth (keeping
SyncWorkspace, ExecuteWithReauth, tarTrackedFiles and newProgressReader
references), ensuring goroutine errors are propagated back so the upload fails
if tar creation fails.

In `@internal/buildapi/workspace_test.go`:
- Around line 61-112: The tests in the "Create Workspace"/"Exec
Workspace"/"Deploy Workspace" contexts are failing to exercise JSON binding and
validation because authMiddleware() runs first and returns 401; update these
specs to authenticate the requests (or stub/bypass authMiddleware) so the
handler runs and validation is executed: either add a valid auth header/token to
the requests created for WorkspaceRequest, WorkspaceExecRequest, and
WorkspaceDeployRequest (in the It blocks) or replace/mock authMiddleware for
this suite with a no-op middleware so server.router.ServeHTTP reaches the
handler and triggers JSON parsing and field validation.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: d373c1c1-fe1d-4985-af7b-5731e8ba75b8

📥 Commits

Reviewing files that changed from the base of the PR and between c556cff and 7aff4d1.

⛔ Files ignored due to path filters (3)
  • config/crd/bases/automotive.sdv.cloud.redhat.com_operatorconfigs.yaml is excluded by !config/crd/bases/**
  • config/crd/bases/automotive.sdv.cloud.redhat.com_workspaces.yaml is excluded by !config/crd/bases/**
  • config/rbac/role.yaml is excluded by !config/rbac/role.yaml
📒 Files selected for processing (16)
  • Containerfile.toolchain
  • api/v1alpha1/operatorconfig_types.go
  • api/v1alpha1/workspace_types.go
  • api/v1alpha1/zz_generated.deepcopy.go
  • cmd/caib/root.go
  • cmd/caib/workspace/workspace.go
  • cmd/main.go
  • config/crd/kustomization.yaml
  • internal/buildapi/client/client.go
  • internal/buildapi/server.go
  • internal/buildapi/workspace.go
  • internal/buildapi/workspace_test.go
  • internal/controller/imagebuild/controller.go
  • internal/controller/operatorconfig/controller.go
  • internal/controller/operatorconfig/resources.go
  • internal/controller/workspace/controller.go

Comment on lines +460 to +466
// Everything after the workspace name (or after --) is the command
cmdParts := args[1:]
if len(cmdParts) == 0 {
handleError(fmt.Errorf("no command specified; use: caib workspace exec <name> -- <command>"))
}
command := strings.Join(cmdParts, " ")

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Flattening cmdParts with spaces loses the original argv.

Commands with spaces, quotes, or shell metacharacters do not round-trip reliably once the args are rejoined. Preserve argument boundaries when building WorkspaceExecRequest.Command, or move the API to an argv form instead of a shell string.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cmd/caib/workspace/workspace.go` around lines 460 - 466, The code currently
flattens cmdParts into a single shell string (cmdParts -> command via
strings.Join) causing quotes and metacharacters to be lost; update the code that
builds WorkspaceExecRequest to preserve argument boundaries by passing the
original cmdParts slice (e.g., set a CommandArgs/Args field to cmdParts or
change WorkspaceExecRequest.Command to a []string) instead of joining into one
string, and adjust any serialization/consumer logic to accept the argv form so
arguments are transmitted verbatim.

Comment thread Containerfile.toolchain
Comment thread internal/buildapi/workspace.go
Comment on lines +177 to +199
// Create secret for Jumpstarter client config if provided
var jmpClientSecret string
if req.ClientConfig != "" {
clientConfigBytes, decErr := base64.StdEncoding.DecodeString(req.ClientConfig)
if decErr != nil {
c.JSON(http.StatusBadRequest, gin.H{"error": "clientConfig must be base64 encoded"})
return
}
jmpClientSecret = req.Name + "-jumpstarter-client"
secret := &corev1.Secret{
ObjectMeta: metav1.ObjectMeta{
Name: jmpClientSecret,
Namespace: namespace,
},
Type: corev1.SecretTypeOpaque,
Data: map[string][]byte{
"client.yaml": clientConfigBytes,
},
}
if createErr := k8sClient.Create(c.Request.Context(), secret); createErr != nil && !k8serrors.IsAlreadyExists(createErr) {
c.JSON(http.StatusInternalServerError, gin.H{"error": fmt.Sprintf("failed to create client config secret: %v", createErr)})
return
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Create the Jumpstarter secret only after the workspace request is known-good.

The secret is created before the duplicate-name check and before the Workspace CR write, and AlreadyExists is treated as success. A conflict or later create failure leaves sensitive client config orphaned, and a stale secret can be silently rebound to a new workspace with the same name. Move secret creation after the uniqueness/resource checks, delete it again if k8sClient.Create(ws) fails, and owner-reference it after a successful create.

Also applies to: 202-243

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/buildapi/workspace.go` around lines 177 - 199, Move creation of the
Jumpstarter secret (jmpClientSecret / secret and the k8sClient.Create call that
currently runs when req.ClientConfig != "") to after you perform the
duplicate-name/uniqueness checks and after you successfully create the Workspace
CR (the k8sClient.Create(ws) path); do not treat k8serrors.IsAlreadyExists as a
silent success—ensure the name conflict is surfaced during uniqueness checks and
fail early. If you must create the secret before persisting the Workspace, then
delete the secret if k8sClient.Create(ws) returns an error (call
k8sClient.Delete with the same secret) to avoid orphaning sensitive data;
otherwise, set an OwnerReference on the secret (owner = the created Workspace
object) immediately after the Workspace creation so Kubernetes will
garbage-collect the secret with the Workspace. Ensure all references to
jmpClientSecret, secret, and k8sClient.Create are updated accordingly.

Comment thread internal/buildapi/workspace.go
Comment thread internal/buildapi/workspace.go
Comment thread internal/buildapi/workspace.go Outdated
Comment thread internal/controller/workspace/controller.go
Comment thread internal/controller/workspace/controller.go
Comment thread internal/controller/workspace/controller.go
{
APIGroups: []string{"security.openshift.io"},
Resources: []string{"securitycontextconstraints"},
ResourceNames: []string{"privileged"},
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need the workspaces to be privileged?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't find another way (yet) to allow users to sudo dnf install

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still need to think about this, as this might be a problem with custom images

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like OpenShift 4.20 might have a solution for this, I'll check

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

♻️ Duplicate comments (6)
Containerfile.toolchain (1)

77-78: ⚠️ Potential issue | 🟠 Major

Install Jumpstarter into a non-root-accessible path.

Line 78 points to /root/.local/jumpstarter/bin, which is not usable for random non-root UIDs in OpenShift-style runtimes. This was already flagged previously and still appears unresolved.

Suggested fix
-RUN curl -fsSL https://raw.githubusercontent.com/jumpstarter-dev/jumpstarter/main/python/install.sh | bash -s -- --source main
-ENV PATH="/root/.local/jumpstarter/bin:${PATH}"
+RUN curl -fsSL https://raw.githubusercontent.com/jumpstarter-dev/jumpstarter/main/python/install.sh | \
+    bash -s -- --source main --dir /opt/jumpstarter
+ENV PATH="/opt/jumpstarter/bin:${PATH}"
#!/bin/bash
set -euo pipefail
# Verify installer supports custom target directory and current default path behavior
curl -fsSL https://raw.githubusercontent.com/jumpstarter-dev/jumpstarter/main/python/install.sh | rg -n -- '--dir|/root/.local|jumpstarter'
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Containerfile.toolchain` around lines 77 - 78, The Dockerfile currently pipes
the installer via the RUN line that executes "curl ... install.sh | bash -s --
--source main" and then sets ENV PATH to "/root/.local/jumpstarter/bin:${PATH}",
which breaks non-root OpenShift UIDs; update the RUN invocation to pass the
installer a non-root target (e.g. add "--dir /opt/jumpstarter" or another
directory writable by arbitrary container users) and change the subsequent ENV
PATH entry to point to that directory's bin (for example ENV
PATH="/opt/jumpstarter/bin:${PATH}"), and ensure the RUN step creates the
directory and sets permissive ownership/mode so non-root users can execute the
installed binaries (adjust commands surrounding the "curl ... install.sh" RUN
and the ENV PATH assignment).
cmd/caib/workspace/workspace.go (2)

463-469: ⚠️ Potential issue | 🟠 Major

Preserve argv boundaries for workspace exec.

strings.Join(cmdParts, " ") still changes the command whenever an argument contains spaces, quotes, or shell metacharacters. Transport the original cmdParts slice through the API instead of flattening it into one shell string.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cmd/caib/workspace/workspace.go` around lines 463 - 469, The code flattens
exec arguments with strings.Join(cmdParts, " ") which loses argv boundaries;
instead, keep and pass the original []string cmdParts through the API (don't
create the string variable command) so each argument, quotes and metacharacters
are preserved. Update any function/method signatures and payloads that currently
accept the single command string (look for uses of the variable command and
functions handling workspace exec) to accept []string (e.g., cmdParts or args)
and adjust call sites/serialization accordingly so the exec path transmits and
executes the argument slice intact.

421-433: ⚠️ Potential issue | 🔴 Critical

Reject or preserve symlinks during workspace sync.

os.Open plus f.Stat() dereferences tracked symlinks. A repo containing something like foo -> ~/.ssh/id_rsa will upload the key contents to the cluster under foo, and even benign symlinks no longer round-trip correctly. Use os.Lstat first and either emit a symlink entry or fail closed.

🔒 Suggested hardening
 	for _, relPath := range files {
 		absPath := filepath.Join(baseDir, relPath)
-		f, err := os.Open(absPath)
-		if err != nil {
-			continue // file may have been deleted since ls-files
-		}
-		fi, err := f.Stat()
+		fi, err := os.Lstat(absPath)
 		if err != nil {
-			_ = f.Close()
-			continue
+			if os.IsNotExist(err) {
+				continue
+			}
+			return fmt.Errorf("stat %s: %w", relPath, err)
+		}
+		if fi.Mode()&os.ModeSymlink != 0 {
+			return fmt.Errorf("tracked symlink %s is not supported by sync", relPath)
 		}
 		if !fi.Mode().IsRegular() {
-			_ = f.Close()
 			continue
 		}
+		f, err := os.Open(absPath)
+		if err != nil {
+			if os.IsNotExist(err) {
+				continue
+			}
+			return fmt.Errorf("open %s: %w", relPath, err)
+		}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cmd/caib/workspace/workspace.go` around lines 421 - 433, The loop currently
uses os.Open + f.Stat which dereferences symlinks; change to use
os.Lstat(absPath) to detect os.FileMode&os.ModeSymlink first, and if it's a
symlink call os.Readlink(absPath) and either emit a symlink entry (store the
link target and mark the entry as symlink) or return an error to fail closed;
only for non-symlinks proceed to open the file (os.Open) and use
f.Stat()/fi.Mode().IsRegular as before. Update handling around variables
absPath, relPath, files and ensure closed file handles are preserved.
internal/controller/workspace/controller.go (3)

50-52: ⚠️ Potential issue | 🟠 Major

Return reconcile errors after setting failure status.

Transient PVC/Pod errors are swallowed here. If status patch succeeds, the controller may stop retrying and workspace can stay stuck.

Also applies to: 58-60

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/controller/workspace/controller.go` around lines 50 - 52, The
current code calls r.setStatus(...) and returns ctrl.Result{} which swallows the
original reconcile error; change both the ensurePVC block (r.ensurePVC(ctx, ws))
and the similar block at lines 58-60 to set the workspace status via
r.setStatus(...) and then return the original error (or a wrapped version) so
the controller requeues and retries; locate the logic around r.ensurePVC and the
other ensure-* call and make the result return the error instead of an empty
ctrl.Result.

103-107: ⚠️ Potential issue | 🟠 Major

Use deterministic PVC naming (or lookup) to avoid duplicate claims.

GenerateName plus status-patch retries can create multiple PVCs for the same workspace when status isn’t persisted.

Also applies to: 124-127

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/controller/workspace/controller.go` around lines 103 - 107, The PVC
creation uses GenerateName which can produce duplicate PVCs on retries; change
to deterministic naming and idempotent creation: compute a stable PVC name
(e.g., based on ws.Name and ws.UID or a constant pvcSuffix) instead of using
GenerateName, then try to Get the PVC by that name (or List by a unique
label/ownerRef) and only Create if not found; update the pvc variable and any
ownerReference/labels consistently (references: the pvc variable, GenerateName
usage, and the workspace `ws` object) so repeated status-patch retries do not
create duplicate PVCs.

160-163: ⚠️ Potential issue | 🔴 Critical

Do not run user-selected workspace images as privileged root.

A caller-controlled image is launched with RunAsUser=0 and Privileged=true. This is a critical privilege-escalation path.

Also applies to: 228-248

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/controller/workspace/controller.go` around lines 160 - 163, The code
currently treats any caller-selected image the same as the default and ends up
launching it as root with Privileged=true; change the pod/security context logic
so that if image != automotivev1alpha1.DefaultToolchainImage (i.e., a
user-supplied image) you do NOT set RunAsUser=0 and do NOT enable privileged
mode — set securityContext.RunAsUser to a non-root UID (e.g., 1000) and
securityContext.Privileged to false; keep the existing root/privileged behavior
only for the trusted DefaultToolchainImage. Update the code paths that construct
the Pod spec/securityContext (the logic that uses image := ws.Spec.Image and any
function that applies the container.SecurityContext or pod.Spec.SecurityContext)
accordingly.
🧹 Nitpick comments (6)
Containerfile.toolchain (1)

57-57: Clean DNF cache in the same EPEL install layer.

Line 57 installs EPEL in a separate layer without cleanup, so metadata/artifacts from that layer still bloat the final image.

Suggested change
-RUN dnf install -y https://dl.fedoraproject.org/pub/epel/epel-release-latest-10.noarch.rpm
+RUN dnf install -y https://dl.fedoraproject.org/pub/epel/epel-release-latest-10.noarch.rpm \
+    && dnf clean all
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Containerfile.toolchain` at line 57, The Docker layer installing EPEL (the
RUN dnf install -y
https://dl.fedoraproject.org/pub/epel/epel-release-latest-10.noarch.rpm
instruction) leaves package metadata in a separate layer; modify that RUN to
perform the install and then immediately clean DNF caches in the same layer
(e.g., run dnf install ... && dnf clean all && rm -rf /var/cache/dnf) so
metadata/artifacts are removed before the layer is committed.
internal/buildapi/workspace_test.go (1)

32-44: The suite currently stops at auth middleware.

The 401 checks prove the routes exist, but the create/exec/deploy cases never reach JSON binding or field validation, and /v1/workspaces/:name/shell is still missing from the auth matrix. Add at least one authenticated 400-path test per handler plus a shell auth test so the new API surface is actually guarded against regressions.

Also applies to: 61-113

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/buildapi/workspace_test.go` around lines 32 - 44, Current auth-only
tests stop at middleware and don't exercise handler-level JSON
binding/validation; update the test suite in workspace_test.go so each route in
the Authentication matrix (e.g., POST /v1/workspaces, POST
/v1/workspaces/my-app/sync, POST /v1/workspaces/my-app/exec, POST
/v1/workspaces/my-app/deploy, plus add GET/DELETE entries) has one authenticated
request that includes valid auth (set the same auth header/token used elsewhere
in tests) but invalid or empty JSON to assert a 400 response from the handler
(this ensures create/exec/deploy paths hit binding/validation logic), and add
the missing /v1/workspaces/:name/shell route to the matrix with both
unauthenticated 401 and an authenticated 400 test; apply the same pattern to the
other test blocks referenced (lines ~61-113) so each handler has at least one
authenticated 400-path test.
api/v1alpha1/zz_generated.deepcopy.go (1)

1-1403: Prefer excluding generated deepcopy artifacts from this PR.

This file is generated output; keep PR scope to source changes where possible and rely on generation in the standard build/deploy flow.

Based on learnings: “Generated files that are not committed should be ignored in PRs… do not include api/v1alpha1/zz_generated.deepcopy.go … ensure PRs only contain source changes.”

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api/v1alpha1/zz_generated.deepcopy.go` around lines 1 - 1403, The diff adds
the generated file api/v1alpha1/zz_generated.deepcopy.go which should not be
committed; remove this file from the PR (undo the commit or git rm --cached if
already committed), update repository rules so generated deepcopy files are not
committed (add api/v1alpha1/zz_generated.deepcopy.go to .gitignore or adjust
existing ignore patterns), and ensure the generation step (e.g., controller-gen
/ make generate) remains the authoritative source so CI/build regenerates the
file as needed.
api/v1alpha1/workspace_types.go (1)

35-45: Add CRD validation markers for owner and pvcSize.

Consider +kubebuilder:validation:MinLength=1 for owner and a quantity pattern (or enum policy) for pvcSize to reject invalid specs at admission time instead of at reconcile time.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api/v1alpha1/workspace_types.go` around lines 35 - 45, Add kubebuilder
validation markers to the Workspace spec fields: annotate the Owner field
(Owner) with +kubebuilder:validation:MinLength=1 to prevent empty owner strings,
and annotate PVCSize (PVCSize) with a validation marker that enforces allowed
volume quantity formats (either a pattern regex like
+kubebuilder:validation:Pattern=`^([0-9]+)(Ei|Pi|Ti|Gi|Mi|Ki|E|P|T|G|M|K|Ei|Pi|Ti|Gi|Mi|Ki|e|p|t|g|m|k)?$`
or an enum of accepted sizes) so invalid PVC sizes are rejected at admission
time; add these comments immediately above the corresponding struct fields in
the Workspace type.
api/v1alpha1/operatorconfig_types.go (2)

50-58: Consider pinning a specific version for DefaultToolchainImage.

Using the :latest tag can lead to unpredictable behavior when the image is updated upstream. The other default images in this file (e.g., DefaultAutomotiveImageBuilderImage) use pinned versions like :1.1.11. Consider following the same pattern for consistency and reproducibility.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api/v1alpha1/operatorconfig_types.go` around lines 50 - 58, The
DefaultToolchainImage constant currently uses the unstable ":latest" tag; change
the DefaultToolchainImage value to a specific, pinned image tag (e.g., match the
project's image versioning like ":1.1.11" or the known stable release for
quay.io/centos-sig-automotive/autosd-toolchain) to ensure reproducibility—edit
the DefaultToolchainImage constant in operatorconfig_types.go to replace
":latest" with the chosen pinned tag.

329-331: Consider adding validation for PVCSize format.

The PVCSize field accepts any string, but it should conform to Kubernetes quantity format (e.g., 10Gi, 500Mi). While this is consistent with OSBuildsConfig.PVCSize, adding a kubebuilder validation pattern would catch invalid values at admission time.

Example validation annotation
 	// PVCSize is the size for persistent volume claims created for workspace storage
+	// +kubebuilder:validation:Pattern=`^([+-]?[0-9.]+)([eEinumkKMGTP]*[-+]?[0-9]*)$`
 	// +optional
 	PVCSize string `json:"pvcSize,omitempty"`
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api/v1alpha1/operatorconfig_types.go` around lines 329 - 331, The PVCSize
field currently accepts any string; add a kubebuilder validation annotation
above the PVCSize field to enforce Kubernetes quantity format (e.g., 10Gi,
500Mi). Update the struct field for PVCSize in operatorconfig_types.go by adding
a +kubebuilder:validation:Pattern annotation that matches Kubernetes resource
quantity syntax and keep the existing json tag and optional comment; ensure the
pattern is broad enough to accept common suffixes (Ki, Mi, Gi, Ti, Pi, Ei, or
plain integer) used by other fields like OSBuildsConfig.PVCSize for consistency.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@cmd/caib/workspace/workspace.go`:
- Around line 385-390: The progress reader `pr` (created via
`newProgressReader`) is built once before calling
`caibcommon.ExecuteWithReauth`, so if `ExecuteWithReauth` retries the callback
(after re-auth or kubeconfig fallback) subsequent `client.SyncWorkspace` calls
read an already-consumed stream; move creation of the tar reader into the
callback passed to `ExecuteWithReauth` so that a fresh reader (and fresh
`newProgressReader`/buffer) is created on each attempt; update the closure that
calls `client.SyncWorkspace(context.Background(), name, pr)` to recreate the
buffer and `pr` from the original data inside the function scope using the same
inputs (`serverURL`, `authToken`, `insecureSkipTLS`, `name`) before invoking
`SyncWorkspace`.

In `@Containerfile.toolchain`:
- Around line 73-74: The Dockerfile currently appends a global NOPASSWD sudo
rule (the RUN echo 'ALL ALL=(ALL) NOPASSWD: ALL' >> /etc/sudoers) which makes
the container overly privileged; instead remove or replace that global rule with
a least-privilege sudoers entry for a named non-root user, create that user and
set ownership, and add a USER directive to switch to that non-root user (e.g.,
create user via useradd/adduser, restrict sudo rights to that username in
/etc/sudoers.d/<name> without NOPASSWD unless explicitly required, and add USER
<username> before interactive/workspace layers around where USER is currently
missing) so the image does not default to root and sudo is limited to the
specific account.
- Around line 22-35: The repo and dnf install steps currently disable RPM
signature checks; update the repository file created by the heredoc for
${DISTRO} to set gpgcheck=1 (instead of gpgcheck=0) and remove the --nogpgcheck
flag from the RUN dnf install invocation so individual RPM signatures are
verified; do not set repo_gpgcheck=1 (leave it unset or 0) since the nightly
repo does not publish signed repomd.xml.asc, and ensure the GPG key imported at
the later step remains available for validation.

In `@internal/buildapi/server.go`:
- Around line 1533-1536: The current fire-and-forget podExec call that runs
"python3 -m http.server ... &" can leak processes and cause port conflicts;
modify the logic around
podExec/repoPath/ws.Status.PodName/workspaceContainerName to (1) check for an
existing server on the target port inside the workspace container (e.g., with
netstat/ss or a curl health check) and reuse it if healthy, (2) if no server
exists, start the server in a managed way that records the server PID to a known
file in the repo (e.g., /tmp/extra_repo_http.pid) instead of launching an
unchecked background process, and (3) ensure cleanup by killing the PID and
removing the PID file from the same codepath that tears down the workspace (add
explicit cleanup in your workspace shutdown/cleanup function using podExec to
run kill and rm against the PID file); also add a simple health-check invocation
after start to verify the server is up and fail the start if it doesn't respond.
- Around line 1516-1522: After fetching the Workspace (ws) with k8sClient.Get
and checking ws.Status.Phase, enforce ownership by comparing ws.Spec.Owner to
the authenticated requester (use the existing auth principal variable from the
handler/context — e.g. authUser, requesterID, or extract from ctx) and return a
forbidden error if they differ; apply the same ownership check before any
podExec invocation so that podExec is only called when ws.Spec.Owner ==
authenticated requester. Ensure the error returned is clear (forbidden/not
owner) and do not proceed to podExec when ownership fails.
- Around line 1534-1536: The current code builds a shell command string
embedding the untrusted repoPath (cmd := []string{"/bin/sh", "-c",
fmt.Sprintf("cd %s && python3 -m http.server %d > /dev/null 2>&1 &", repoPath,
port)}) which risks shell injection; instead construct a safe argv slice and
avoid /bin/sh -c: call podExec with an exec form like
{"python3","-m","http.server", strconv.Itoa(port), "--directory", repoPath} (no
shell interpolation) so repoPath is passed as a literal argument; keep using
podExec, workspaceContainerName, restCfg, namespace and ws.Status.PodName but
remove the fmt.Sprintf into a shell and eliminate redirection/backgrounding
handled by the exec call or podExec's API.

In `@internal/controller/operatorconfig/resources.go`:
- Around line 640-644: The RBAC rule granting broad verbs on the "workspaces" CR
should be tightened: in the APIGroups/Resources block that currently lists
"workspaces" and Verbs including "create" and "delete", remove create/delete
(and other unnecessary write verbs) and restrict to read + allowed mutation for
finalizers (e.g., "get","list","watch","patch"); then add a separate rule for
the status subresource ("workspaces/status") with verbs "get","patch","update"
so controllers can modify status; update the APIGroups/Resources/Verbs entries
accordingly (refer to the block that defines APIGroups:
[]string{"automotive.sdv.cloud.redhat.com"} and Resources:
[]string{"workspaces"}).

In `@internal/controller/workspace/controller.go`:
- Around line 98-113: The PVC creation uses resource.MustParse which will panic
on invalid ws.Spec.PVCSize; replace that call with resource.ParseQuantity, check
the returned (quantity, err) and if err != nil return a reconcile error (or
requeue) from the reconcile method so the controller does not crash; update the
code around the PersistentVolumeClaim.Spec.Resources.Requests assignment (the
pvc variable creation in controller.go) to use the parsed quantity on success
and handle/propagate the parse error with an appropriate reconcile.Result and
error.

---

Duplicate comments:
In `@cmd/caib/workspace/workspace.go`:
- Around line 463-469: The code flattens exec arguments with
strings.Join(cmdParts, " ") which loses argv boundaries; instead, keep and pass
the original []string cmdParts through the API (don't create the string variable
command) so each argument, quotes and metacharacters are preserved. Update any
function/method signatures and payloads that currently accept the single command
string (look for uses of the variable command and functions handling workspace
exec) to accept []string (e.g., cmdParts or args) and adjust call
sites/serialization accordingly so the exec path transmits and executes the
argument slice intact.
- Around line 421-433: The loop currently uses os.Open + f.Stat which
dereferences symlinks; change to use os.Lstat(absPath) to detect
os.FileMode&os.ModeSymlink first, and if it's a symlink call
os.Readlink(absPath) and either emit a symlink entry (store the link target and
mark the entry as symlink) or return an error to fail closed; only for
non-symlinks proceed to open the file (os.Open) and use
f.Stat()/fi.Mode().IsRegular as before. Update handling around variables
absPath, relPath, files and ensure closed file handles are preserved.

In `@Containerfile.toolchain`:
- Around line 77-78: The Dockerfile currently pipes the installer via the RUN
line that executes "curl ... install.sh | bash -s -- --source main" and then
sets ENV PATH to "/root/.local/jumpstarter/bin:${PATH}", which breaks non-root
OpenShift UIDs; update the RUN invocation to pass the installer a non-root
target (e.g. add "--dir /opt/jumpstarter" or another directory writable by
arbitrary container users) and change the subsequent ENV PATH entry to point to
that directory's bin (for example ENV PATH="/opt/jumpstarter/bin:${PATH}"), and
ensure the RUN step creates the directory and sets permissive ownership/mode so
non-root users can execute the installed binaries (adjust commands surrounding
the "curl ... install.sh" RUN and the ENV PATH assignment).

In `@internal/controller/workspace/controller.go`:
- Around line 50-52: The current code calls r.setStatus(...) and returns
ctrl.Result{} which swallows the original reconcile error; change both the
ensurePVC block (r.ensurePVC(ctx, ws)) and the similar block at lines 58-60 to
set the workspace status via r.setStatus(...) and then return the original error
(or a wrapped version) so the controller requeues and retries; locate the logic
around r.ensurePVC and the other ensure-* call and make the result return the
error instead of an empty ctrl.Result.
- Around line 103-107: The PVC creation uses GenerateName which can produce
duplicate PVCs on retries; change to deterministic naming and idempotent
creation: compute a stable PVC name (e.g., based on ws.Name and ws.UID or a
constant pvcSuffix) instead of using GenerateName, then try to Get the PVC by
that name (or List by a unique label/ownerRef) and only Create if not found;
update the pvc variable and any ownerReference/labels consistently (references:
the pvc variable, GenerateName usage, and the workspace `ws` object) so repeated
status-patch retries do not create duplicate PVCs.
- Around line 160-163: The code currently treats any caller-selected image the
same as the default and ends up launching it as root with Privileged=true;
change the pod/security context logic so that if image !=
automotivev1alpha1.DefaultToolchainImage (i.e., a user-supplied image) you do
NOT set RunAsUser=0 and do NOT enable privileged mode — set
securityContext.RunAsUser to a non-root UID (e.g., 1000) and
securityContext.Privileged to false; keep the existing root/privileged behavior
only for the trusted DefaultToolchainImage. Update the code paths that construct
the Pod spec/securityContext (the logic that uses image := ws.Spec.Image and any
function that applies the container.SecurityContext or pod.Spec.SecurityContext)
accordingly.

---

Nitpick comments:
In `@api/v1alpha1/operatorconfig_types.go`:
- Around line 50-58: The DefaultToolchainImage constant currently uses the
unstable ":latest" tag; change the DefaultToolchainImage value to a specific,
pinned image tag (e.g., match the project's image versioning like ":1.1.11" or
the known stable release for quay.io/centos-sig-automotive/autosd-toolchain) to
ensure reproducibility—edit the DefaultToolchainImage constant in
operatorconfig_types.go to replace ":latest" with the chosen pinned tag.
- Around line 329-331: The PVCSize field currently accepts any string; add a
kubebuilder validation annotation above the PVCSize field to enforce Kubernetes
quantity format (e.g., 10Gi, 500Mi). Update the struct field for PVCSize in
operatorconfig_types.go by adding a +kubebuilder:validation:Pattern annotation
that matches Kubernetes resource quantity syntax and keep the existing json tag
and optional comment; ensure the pattern is broad enough to accept common
suffixes (Ki, Mi, Gi, Ti, Pi, Ei, or plain integer) used by other fields like
OSBuildsConfig.PVCSize for consistency.

In `@api/v1alpha1/workspace_types.go`:
- Around line 35-45: Add kubebuilder validation markers to the Workspace spec
fields: annotate the Owner field (Owner) with
+kubebuilder:validation:MinLength=1 to prevent empty owner strings, and annotate
PVCSize (PVCSize) with a validation marker that enforces allowed volume quantity
formats (either a pattern regex like
+kubebuilder:validation:Pattern=`^([0-9]+)(Ei|Pi|Ti|Gi|Mi|Ki|E|P|T|G|M|K|Ei|Pi|Ti|Gi|Mi|Ki|e|p|t|g|m|k)?$`
or an enum of accepted sizes) so invalid PVC sizes are rejected at admission
time; add these comments immediately above the corresponding struct fields in
the Workspace type.

In `@api/v1alpha1/zz_generated.deepcopy.go`:
- Around line 1-1403: The diff adds the generated file
api/v1alpha1/zz_generated.deepcopy.go which should not be committed; remove this
file from the PR (undo the commit or git rm --cached if already committed),
update repository rules so generated deepcopy files are not committed (add
api/v1alpha1/zz_generated.deepcopy.go to .gitignore or adjust existing ignore
patterns), and ensure the generation step (e.g., controller-gen / make generate)
remains the authoritative source so CI/build regenerates the file as needed.

In `@Containerfile.toolchain`:
- Line 57: The Docker layer installing EPEL (the RUN dnf install -y
https://dl.fedoraproject.org/pub/epel/epel-release-latest-10.noarch.rpm
instruction) leaves package metadata in a separate layer; modify that RUN to
perform the install and then immediately clean DNF caches in the same layer
(e.g., run dnf install ... && dnf clean all && rm -rf /var/cache/dnf) so
metadata/artifacts are removed before the layer is committed.

In `@internal/buildapi/workspace_test.go`:
- Around line 32-44: Current auth-only tests stop at middleware and don't
exercise handler-level JSON binding/validation; update the test suite in
workspace_test.go so each route in the Authentication matrix (e.g., POST
/v1/workspaces, POST /v1/workspaces/my-app/sync, POST
/v1/workspaces/my-app/exec, POST /v1/workspaces/my-app/deploy, plus add
GET/DELETE entries) has one authenticated request that includes valid auth (set
the same auth header/token used elsewhere in tests) but invalid or empty JSON to
assert a 400 response from the handler (this ensures create/exec/deploy paths
hit binding/validation logic), and add the missing /v1/workspaces/:name/shell
route to the matrix with both unauthenticated 401 and an authenticated 400 test;
apply the same pattern to the other test blocks referenced (lines ~61-113) so
each handler has at least one authenticated 400-path test.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: cfec265b-20f0-4a3e-ac03-2d36c64dc09c

📥 Commits

Reviewing files that changed from the base of the PR and between 7aff4d1 and 0bee9dc.

⛔ Files ignored due to path filters (3)
  • config/crd/bases/automotive.sdv.cloud.redhat.com_operatorconfigs.yaml is excluded by !config/crd/bases/**
  • config/crd/bases/automotive.sdv.cloud.redhat.com_workspaces.yaml is excluded by !config/crd/bases/**
  • config/rbac/role.yaml is excluded by !config/rbac/role.yaml
📒 Files selected for processing (22)
  • Containerfile.toolchain
  • api/v1alpha1/operatorconfig_types.go
  • api/v1alpha1/workspace_types.go
  • api/v1alpha1/zz_generated.deepcopy.go
  • bundle.Dockerfile
  • cmd/caib/buildcmd/build.go
  • cmd/caib/image/image.go
  • cmd/caib/main.go
  • cmd/caib/root.go
  • cmd/caib/runtime_wiring.go
  • cmd/caib/workspace/workspace.go
  • cmd/main.go
  • config/crd/kustomization.yaml
  • internal/buildapi/client/client.go
  • internal/buildapi/server.go
  • internal/buildapi/types.go
  • internal/buildapi/workspace.go
  • internal/buildapi/workspace_test.go
  • internal/controller/imagebuild/controller.go
  • internal/controller/operatorconfig/controller.go
  • internal/controller/operatorconfig/resources.go
  • internal/controller/workspace/controller.go
✅ Files skipped from review due to trivial changes (1)
  • bundle.Dockerfile
🚧 Files skipped from review as they are similar to previous changes (4)
  • internal/controller/imagebuild/controller.go
  • cmd/main.go
  • cmd/caib/root.go
  • internal/buildapi/workspace.go

Comment thread cmd/caib/workspace/workspace.go
Comment thread Containerfile.toolchain
Comment thread Containerfile.toolchain
Comment thread internal/buildapi/server.go
Comment thread internal/buildapi/server.go Outdated
Comment on lines +1534 to +1536
cmd := []string{"/bin/sh", "-c",
fmt.Sprintf("cd %s && python3 -m http.server %d > /dev/null 2>&1 &", repoPath, port)}
if err := podExec(ctx, restCfg, namespace, ws.Status.PodName, workspaceContainerName, cmd, io.Discard); err != nil {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Avoid shell injection via untrusted repoPath.

Line 1535 interpolates user-controlled path into /bin/sh -c. This is command-injection prone.

🔧 Safer command construction
- cmd := []string{"/bin/sh", "-c",
-   fmt.Sprintf("cd %s && python3 -m http.server %d > /dev/null 2>&1 &", repoPath, port)}
+ cmd := []string{
+   "/bin/sh", "-c",
+   "cd -- \"$1\" && nohup python3 -m http.server \"$2\" >/dev/null 2>&1 &",
+   "--", repoPath, fmt.Sprintf("%d", port),
+ }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
cmd := []string{"/bin/sh", "-c",
fmt.Sprintf("cd %s && python3 -m http.server %d > /dev/null 2>&1 &", repoPath, port)}
if err := podExec(ctx, restCfg, namespace, ws.Status.PodName, workspaceContainerName, cmd, io.Discard); err != nil {
cmd := []string{
"/bin/sh", "-c",
"cd -- \"$1\" && nohup python3 -m http.server \"$2\" >/dev/null 2>&1 &",
"--", repoPath, fmt.Sprintf("%d", port),
}
if err := podExec(ctx, restCfg, namespace, ws.Status.PodName, workspaceContainerName, cmd, io.Discard); err != nil {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/buildapi/server.go` around lines 1534 - 1536, The current code
builds a shell command string embedding the untrusted repoPath (cmd :=
[]string{"/bin/sh", "-c", fmt.Sprintf("cd %s && python3 -m http.server %d >
/dev/null 2>&1 &", repoPath, port)}) which risks shell injection; instead
construct a safe argv slice and avoid /bin/sh -c: call podExec with an exec form
like {"python3","-m","http.server", strconv.Itoa(port), "--directory", repoPath}
(no shell interpolation) so repoPath is passed as a literal argument; keep using
podExec, workspaceContainerName, restCfg, namespace and ws.Status.PodName but
remove the fmt.Sprintf into a shell and eliminate redirection/backgrounding
handled by the exec call or podExec's API.

Comment thread internal/controller/operatorconfig/resources.go
Comment thread internal/controller/workspace/controller.go
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
api/v1alpha1/zz_generated.deepcopy.go (1)

1-1403: 🛠️ Refactor suggestion | 🟠 Major

Generated file should not be committed.

This file is auto-generated by controller-gen (line 19: "Code generated by controller-gen. DO NOT EDIT.") and will be regenerated at deploy/build time via make generate manifests. Including it in the PR adds noise and risks merge conflicts.

Remove this file from the PR and add it to .gitignore if not already present.

Based on learnings: "Generated files that are not committed should be ignored in PRs. Specifically, do not include api/v1alpha1/zz_generated.deepcopy.go (regenerated at deploy/build time via make generate manifests)."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api/v1alpha1/zz_generated.deepcopy.go` around lines 1 - 1403, The committed
autogenerated deepcopy file ("zz_generated.deepcopy.go" containing the header
"Code generated by controller-gen. DO NOT EDIT.") should be removed from the PR;
delete that file from your branch and add an entry to .gitignore to prevent
committing it in future, ensuring the file is produced by the build step (make
generate manifests) instead of being tracked; verify no other generated files
with the same header are staged before pushing.
♻️ Duplicate comments (19)
internal/controller/operatorconfig/resources.go (2)

640-654: ⚠️ Potential issue | 🟠 Major

Workspace CR RBAC is broader than reconciliation needs.

Granting create/delete on workspaces in this controller role increases blast radius; split read/watch + status/finalizer writes unless create/delete is truly required by this controller path.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/controller/operatorconfig/resources.go` around lines 640 - 654, The
role grants create/delete on the "workspaces" resource which is broader than
needed; modify the PolicyRule that has APIGroups
"automotive.sdv.cloud.redhat.com" and Resources "workspaces" to remove "create"
and "delete" (leave only "get", "list", "watch"), and keep the separate
PolicyRule entries for "workspaces/status" and "workspaces/finalizers" with
their update/patch verbs unchanged; if this controller truly needs
create/delete, instead add a narrowly-scoped rule or a separate role for that
specific path rather than granting create/delete in the main workspaces rule.

856-874: ⚠️ Potential issue | 🟠 Major

Binding workspace SA to the privileged SCC should be narrowed or made opt-in.

This defaults all workspace pods to privileged SCC capability, which is a significant security posture expansion.

Also applies to: 877-901

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/controller/operatorconfig/resources.go` around lines 856 - 874, The
code unconditionally grants the workspace service account the "privileged" SCC
by creating a ClusterRole named workspaceServiceAccountName+"-privileged" in
buildWorkspaceSCCClusterRole; change this to avoid default privileged access by
either narrowing the granted permissions (remove or restrict ResourceNames
"privileged" and/or use a less-privileged SCC) and making the creation/binding
opt-in via an OperatorConfig flag or annotation checked during reconciliation,
or create a separate ClusterRole that is only bound when an explicit config
field (e.g., EnablePrivilegedSCC) is true; update the reconciler that creates
the ClusterRole/ClusterRoleBinding to honor that flag and ensure
workspaceServiceAccountName is only given the privileged SCC when explicitly
enabled.
Containerfile.toolchain (3)

77-78: ⚠️ Potential issue | 🟠 Major

Jumpstarter install path is tied to /root, which breaks random-UID runtime access.

Installing into /root/.local/jumpstarter makes the binary inaccessible for non-root OpenShift UIDs even if PATH is set.

Suggested fix
-RUN curl -fsSL https://raw.githubusercontent.com/jumpstarter-dev/jumpstarter/main/python/install.sh | bash -s -- --source main
-ENV PATH="/root/.local/jumpstarter/bin:${PATH}"
+RUN curl -fsSL https://raw.githubusercontent.com/jumpstarter-dev/jumpstarter/main/python/install.sh | bash -s -- --source main --dir /opt/jumpstarter
+ENV PATH="/opt/jumpstarter/bin:${PATH}"
#!/bin/bash
# Verify install script supports --dir (read-only check).
curl -fsSL https://raw.githubusercontent.com/jumpstarter-dev/jumpstarter/main/python/install.sh | rg -n -- '--dir|install dir|destination'
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Containerfile.toolchain` around lines 77 - 78, The install puts Jumpstarter
under /root/.local/jumpstarter which prevents non-root/random UIDs from
executing it; modify the RUN that invokes the install script (the curl | bash -s
-- command) to install into a system-wide, non-root-readable path (e.g.,
/usr/local or /opt) by passing the install script's directory flag (e.g., --dir
or equivalent) or by moving the installed artifacts after install, and update
the ENV PATH line (currently setting PATH for /root/.local/jumpstarter) to point
to the new location so the binary is accessible to arbitrary container UIDs;
locate the invocation using the curl install command and the ENV PATH line to
make the change.

73-74: ⚠️ Potential issue | 🟠 Major

Runtime privilege model is too broad (root default + global NOPASSWD sudo).

Line 74 gives blanket sudo to everyone and the image still defaults to root (no USER), which is an avoidable escalation surface for workspace pods.

Suggested least-privilege direction
-# Allow any UID to use sudo
-RUN echo 'ALL ALL=(ALL) NOPASSWD: ALL' >> /etc/sudoers
+RUN useradd -m -u 10001 workspace && \
+    printf 'workspace ALL=(root) NOPASSWD: /usr/bin/podman, /usr/bin/mock\n' > /etc/sudoers.d/workspace && \
+    chmod 0440 /etc/sudoers.d/workspace
+USER 10001

Also applies to: 93-93

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Containerfile.toolchain` around lines 73 - 74, Remove the global NOPASSWD
sudo entry and stop defaulting to root: replace the RUN echo 'ALL ALL=(ALL)
NOPASSWD: ALL' >> /etc/sudoers step with a least-privilege approach (e.g.,
create a specific user/group, add a constrained sudoers entry in /etc/sudoers.d
for that user or group allowing only required commands without NOPASSWD, or
require password). Also add a USER directive to switch from root to that
non-root user at the end of the Containerfile (use the created username in the
USER line) so the image does not default to root. Ensure references to
/etc/sudoers are replaced by a scoped /etc/sudoers.d file and avoid blanket ALL
ALL=(ALL) NOPASSWD: ALL entries.

22-35: ⚠️ Potential issue | 🟠 Major

Re-enable RPM signature verification for AutoSD packages.

Line 26 (gpgcheck=0) plus Line 34 (--nogpgcheck) disables package authenticity checks and weakens supply-chain guarantees.

Suggested hardening
 RUN cat <<EOF > /etc/yum.repos.d/${DISTRO}.repo
 [${DISTRO}]
 name=${DISTRO}
 baseurl=${BASE_URL}/repos/${DISTRO}/compose/${DISTRO}/\$basearch/os/
-gpgcheck=0
+gpgcheck=1
 enabled=1
 skip_if_unavailable=1
 EOF

 RUN dnf install \
     --installroot ${ROOTFS} \
     --setopt=reposdir=/etc/yum.repos.d --repo ${DISTRO} --releasever 10 \
-    --setopt install_weak_deps=false --nodocs --nogpgcheck -y \
+    --setopt install_weak_deps=false --nodocs -y \
     ${RELEASE_PKG} ${PACKAGES} \
     && dnf --installroot ${ROOTFS} clean all
For AutoSD nightly repositories, what are the documented values for `gpgcheck` and `repo_gpgcheck`, and is signed repodata (`repomd.xml.asc`) currently published?
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Containerfile.toolchain` around lines 22 - 35, Update the repo and dnf
install invocations to enable RPM signature checks: set gpgcheck=1 in the
here-doc repo file (replace gpgcheck=0), remove the --nogpgcheck flag from the
dnf install command, and add --setopt=repo_gpgcheck=1 (and/or
--setopt=gpgcheck=1) to the dnf args so both package and repodata signing are
enforced; reference the DISTRO/BASE_URL repo block and the dnf install line that
uses ROOTFS, RELEASE_PKG and PACKAGES. Also ensure the repository publishes
signed repodata (repomd.xml.asc) so the enabled repo_gpgcheck succeeds.
cmd/caib/workspace/workspace.go (2)

463-469: ⚠️ Potential issue | 🟠 Major

Keep the original argv instead of joining it back into a shell string.

strings.Join(cmdParts, " ") loses argument boundaries and quoting, so commands containing spaces or shell metacharacters do not round-trip reliably. Send argv verbatim instead of flattening it into a single Command string.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cmd/caib/workspace/workspace.go` around lines 463 - 469, The code currently
flattens the argument slice into a single shell string using strings.Join into
the variable command, which loses quoting and argument boundaries; instead
preserve the original argv by keeping cmdParts (the []string) and pass it
verbatim to the executor (e.g., use cmdParts as the args for whatever run/exec
function you call) rather than creating the command string; remove or stop using
the joined command variable and ensure the existing len(cmdParts) check and
handleError call remain.

385-391: ⚠️ Potential issue | 🟠 Major

Recreate the sync reader inside each re-auth attempt.

ExecuteWithReauth can call the callback more than once, but pr is created once and is exhausted after the first SyncWorkspace call. Any retry uploads an empty or truncated tar stream instead of the original archive. Capture the tar bytes once, then build a fresh bytes.Reader/progressReader inside the callback.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cmd/caib/workspace/workspace.go` around lines 385 - 391, The current code
creates pr once from buf and passes it into ExecuteWithReauth, but
ExecuteWithReauth may call the callback multiple times so pr is exhausted after
the first call; fix by first capturing the archive bytes (copy buf.Bytes() into
a byte slice and compute totalBytes once), then inside the ExecuteWithReauth
callback recreate a fresh bytes.Reader and a fresh progress reader (use
bytes.NewReader(...) and newProgressReader(...)) on each attempt and call
client.SyncWorkspace with that fresh pr, ensuring pr.finish() is invoked after
each attempt (e.g., defer pr.finish() inside the callback) so retries upload the
full tar stream.
internal/controller/workspace/controller.go (4)

50-60: ⚠️ Potential issue | 🟠 Major

Return the PVC/Pod error after updating status.

If ensurePVC or ensurePod fails and setStatus succeeds, reconcile returns nil and controller-runtime stops retrying. Transient failures will strand the workspace in Failed until some unrelated event happens. Set the status, then return the original error so the request is requeued.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/controller/workspace/controller.go` around lines 50 - 60, When
ensurePVC or ensurePod fails, the code currently calls r.setStatus(...) and
returns its result so the original error is dropped; change those branches to
first call r.setStatus(ctx, ws, "Failed", fmt.Sprintf(...)) (ignoring its return
value or logging it) and then return ctrl.Result{}, err so the controller
requeues; update the failure handling around ensurePVC and ensurePod in the
reconcile flow (references: ensurePVC, ensurePod, setStatus) to set status and
then return the original error instead of returning the result of setStatus.

98-112: ⚠️ Potential issue | 🔴 Critical

Parse spec.pvcSize without panicking the controller.

resource.MustParse(pvcSize) turns a bad Workspace spec into a controller crash. Use resource.ParseQuantity and surface the validation error through reconcile instead.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/controller/workspace/controller.go` around lines 98 - 112, The
controller currently calls resource.MustParse(pvcSize) when building the
PersistentVolumeClaim (pvc), which will panic on an invalid spec and crash the
controller; replace this with resource.ParseQuantity(pvcSize) inside the
reconcile path (where pvcSize is derived) and handle the returned (qty, err): if
err != nil, record an event on the Workspace, return a non-nil error or a
requeue request from the reconcile (so the error surfaces to reconciliation and
is retryable) instead of letting the process panic; update the pvc construction
to use the parsed quantity variable and ensure functions/variables referenced
(pvcSize, pvc, resource.MustParse -> resource.ParseQuantity) are adjusted
accordingly.

160-248: ⚠️ Potential issue | 🔴 Critical

Don’t launch caller-selected images as privileged root.

ws.Spec.Image is user-controlled, and this pod runs it with RunAsUser: 0 and Privileged: true under the shared workspace service account. That makes workspace creation a cluster-privileged code-execution path. Restrict the image to operator-controlled values and drop privileged/root unless there is a hard requirement.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/controller/workspace/controller.go` around lines 160 - 248, The pod
currently runs user-controlled ws.Spec.Image as root with privileged=true;
validate or restrict images and drop privileged/root: reject or ignore any
caller-supplied ws.Spec.Image unless it matches an operator-managed allowlist
(fallback to automotivev1alpha1.DefaultToolchainImage) and remove the
container.SecurityContext.Privileged=true and PodSecurityContext RunAsUser:
int64Ptr(0) (use a non-root uid, e.g. int64Ptr(1000) or remove RunAsUser) so
containerName under the workspaceServiceAccountName no longer runs as privileged
root.

103-127: ⚠️ Potential issue | 🟠 Major

Use a stable PVC identity across retries.

Creating the claim with GenerateName means a successful Create followed by a failed/conflicting status patch will allocate a brand-new PVC on the next reconcile. Use a deterministic PVC name, or first look up an owned claim before creating another one.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/controller/workspace/controller.go` around lines 103 - 127, The PVC
is created with GenerateName which can produce duplicates across retries;
instead ensure a stable identity by first checking for an existing owned PVC and
only creating one with a deterministic Name if none exists. Update the reconcile
code around pvc, r.Create, controllerutil.SetControllerReference and
r.Status().Patch to: 1) list/get PVCs in ws.Namespace filtered by Name (or by
label/ownerRef) to find a PVC owned by the Workspace (ws) and if found set
ws.Status.PVCName and patch status; 2) if not found, create the PVC using a
deterministic Name (e.g., ws.Name + pvcSuffix) rather than GenerateName, call
controllerutil.SetControllerReference(ws, pvc, r.Scheme) and r.Create(ctx, pvc),
then set ws.Status.PVCName and patch. This prevents allocating a new PVC after
transient failures.
internal/buildapi/server.go (2)

1493-1537: ⚠️ Potential issue | 🔴 Critical

Authorize extraRepos against the caller before any pod exec.

resolveExtraRepos fetches any named Workspace and immediately execs into its pod, while createBuild only resolves requestedBy afterwards. Any authenticated caller can therefore point extraRepos at another user's workspace. Resolve the requester first and reject when ws.Spec.Owner does not match before calling podExec.

Also applies to: 1593-1605

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/buildapi/server.go` around lines 1493 - 1537, The function
resolveExtraRepos currently fetches Workspaces and execs into their pods before
verifying the requestor; update the flow to resolve and validate the caller (the
requestedBy user) before any podExec by retrieving the requestor identity (as
done in createBuild) and comparing it to ws.Spec.Owner for each workspace entry,
returning an unauthorized error if they differ; apply the same owner-check
authorization to the other similar block handling extraRepos (the one around the
later pod exec) so no podExec runs for workspaces not owned by the requester.

1533-1537: ⚠️ Potential issue | 🔴 Critical

Don’t start the repo server via interpolated /bin/sh -c.

repoPath is inserted directly into a shell command, and the backgrounded http.server is neither health-checked nor cleaned up. That leaves both a command-injection path and flaky/duplicate listeners across repeated builds. Prefer an argv-based exec plus explicit reuse/cleanup of the serving process.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/buildapi/server.go` around lines 1533 - 1537, Replace the
shell-interpolated, backgrounded command with an argv-based exec that uses
Python's --directory flag and explicit PID handling: call podExec with cmd set
to []string{"python3", "-m", "http.server", strconv.Itoa(port), "--directory",
repoPath} (use the existing repoPath, port, workspaceContainerName and wsName
identifiers to locate code), and implement logic to check for and
reuse/terminate an existing server by reading/writing a PID file (e.g.
repoPath/.http_server.pid) before starting; after starting capture and persist
the child PID so future builds can health-check and clean it up, and ensure
errors from podExec are returned as before.
internal/buildapi/workspace.go (6)

703-703: ⚠️ Potential issue | 🟠 Major

copyToPod ignores request cancellation.

StreamWithContext(context.Background(), ...) ignores HTTP request cancellation. A disconnected client or expired deadline won't stop the tar extraction. Accept a context.Context parameter and use it here.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/buildapi/workspace.go` at line 703, The copyToPod function currently
calls executor.StreamWithContext(context.Background(), streamOpts) which ignores
request cancellation; change copyToPod to accept a context.Context parameter
(e.g., ctx) and replace the use of context.Background() with that ctx when
calling executor.StreamWithContext, and propagate the new ctx through any
callers (and tests) that invoke copyToPod so HTTP request cancellation or
deadlines will cancel the tar extraction.

168-176: ⚠️ Potential issue | 🟠 Major

fromBuild still needs an ownership check before reusing the lease.

The resolveLeaseFromBuild function (lines 667-676) fetches the ImageBuild but does not verify the caller owns it. A user who knows another user's build name can attach their board lease to a new workspace. Pass requester into resolveLeaseFromBuild and reject builds where the automotive.sdv.cloud.redhat.com/requested-by annotation doesn't match.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/buildapi/workspace.go` around lines 168 - 176, The call path reusing
a lease from an ImageBuild must enforce ownership: change the caller that sets
leaseID (the block using req.FromBuild) to pass the current requester into
resolveLeaseFromBuild, then update resolveLeaseFromBuild to fetch the ImageBuild
and verify the automative.sdv.cloud.redhat.com/requested-by annotation equals
the requester; if it does not match, return an error so the caller returns a
400. Ensure the error message clearly states the ownership mismatch and keep the
function signature consistent (resolveLeaseFromBuild(ctx, k8sClient, namespace,
buildName, requester) or similar) and update all call sites accordingly.

178-217: ⚠️ Potential issue | 🟠 Major

Secret creation still occurs before workspace uniqueness check.

The Jumpstarter client config secret is created at lines 197-200 before the workspace existence check at lines 203-217. If a workspace with the same name already exists, the secret is orphaned. If the secret already exists (IsAlreadyExists at line 197), it's silently reused—potentially binding stale credentials to a new workspace.

Move secret creation after the uniqueness check, or set an OwnerReference on the secret after successful workspace creation so Kubernetes garbage-collects it with the workspace.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/buildapi/workspace.go` around lines 178 - 217, The secret creation
for Jumpstarter client config (jmpClientSecret, corev1.Secret and
k8sClient.Create) happens before the workspace existence check (existing :=
&automotivev1alpha1.Workspace{} and k8sClient.Get) which can orphan or
incorrectly reuse secrets; move the secret creation block to after the
uniqueness check succeeds (i.e., after verifying k8sClient.Get returns
not-found) or, if you must create it earlier, attach an OwnerReference to the
Secret referencing the created Workspace (use the Workspace object's UID and
controller=true) immediately after creating the Workspace so Kubernetes will
garbage-collect the Secret when the Workspace is deleted; ensure you still
handle IsAlreadyExists from k8sClient.Create appropriately (treat it as an error
or update the secret) to avoid silently reusing stale credentials.

366-373: ⚠️ Potential issue | 🟠 Major

sync still reads unbounded tarball into memory.

io.ReadAll(c.Request.Body) gives any authenticated caller an OOM lever against the API pod. Apply a size limit (e.g., http.MaxBytesReader) and stream directly into copyToPod instead of buffering.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/buildapi/workspace.go` around lines 366 - 373, The handler currently
reads the entire request body into memory using io.ReadAll(c.Request.Body),
which allows an attacker to OOM the API pod; instead wrap c.Request.Body with a
size limiter (e.g., http.MaxBytesReader or an io.LimitReader) before passing it
to copyToPod so the tarball is streamed rather than buffered. Replace the
io.ReadAll + bytes.NewReader(tarData) flow by creating a limited reader from
c.Request.Body (check and return a 413 Payload Too Large on limit exceeded) and
pass that reader directly to copyToPod; ensure you keep references to copyToPod,
c.Request.Body, and workspaceContainerName when making the change and close the
request body appropriately.

612-614: ⚠️ Potential issue | 🟠 Major

DeepCopy() can preserve nil Requests/Limits maps, causing panic.

After line 613, res is overwritten with wsConfig.Resources.DeepCopy(). If the source has nil Requests or Limits, the assignments at lines 626-627 and 640-641 will panic.

🐛 Proposed fix
 	if wsConfig != nil && wsConfig.Resources != nil {
 		res = wsConfig.Resources.DeepCopy()
 	}
+	if res.Requests == nil {
+		res.Requests = corev1.ResourceList{}
+	}
+	if res.Limits == nil {
+		res.Limits = corev1.ResourceList{}
+	}
 
 	if cpu != "" {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/buildapi/workspace.go` around lines 612 - 614, The DeepCopy of
wsConfig.Resources can produce a res whose CPU/Memory Requests or Limits maps
are nil and later assignments to res.Requests[...] or res.Limits[...] will
panic; after doing res = wsConfig.Resources.DeepCopy() (or immediately after
creating res) check each map field (res.Requests and res.Limits) for nil and
initialize them to empty maps if nil before any indexed assignments, or perform
the indexed assignments against a non-nil temporary map and then assign it back
to res.Requests/res.Limits to ensure safety when writing into those maps.

765-766: ⚠️ Potential issue | 🟠 Major

execInPod uses context.Background(), ignoring request cancellation.

The sync, exec, and deploy handlers won't respect client disconnects or HTTP deadlines. Thread c.Request.Context() through podExec instead.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/buildapi/workspace.go` around lines 765 - 766, execInPod currently
calls podExec with context.Background(), so request cancellation and deadlines
are ignored; change execInPod to accept a context.Context (e.g., func
execInPod(ctx context.Context, restCfg *rest.Config, namespace, podName,
containerName string, cmd []string, w http.ResponseWriter) error) and pass that
ctx to podExec instead of context.Background(); update all call sites (the sync,
exec, deploy handlers) to pass the HTTP request context (c.Request.Context())
into execInPod so podExec and its I/O respect client disconnects and timeouts.
🧹 Nitpick comments (1)
internal/buildapi/workspace.go (1)

416-418: WebSocket upgrader disables origin checking.

CheckOrigin: func(_ *http.Request) bool { return true } allows any origin to establish WebSocket connections. While the endpoint is protected by authMiddleware, this weakens defense-in-depth if the API is ever exposed to browsers.

Consider validating the Origin header against expected values if browser-based access is possible.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/buildapi/workspace.go` around lines 416 - 418, The websocket
upgrader currently disables origin checks via the wsUpgrader variable
(CheckOrigin: func(_ *http.Request) bool { return true }), which is unsafe;
change CheckOrigin to validate the request's Origin header against a whitelist
(or at least ensure it matches the configured API/frontend host) and return true
only for allowed origins, referencing wsUpgrader and its CheckOrigin function so
the upgrade path enforces origin validation rather than unconditionally allowing
all origins.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@cmd/caib/workspace/workspace.go`:
- Around line 421-450: The loop currently opens each path then calls Stat which
follows symlinks; instead use os.Lstat(absPath) to get file info without
following symlinks, detect symlinks via fi.Mode()&os.ModeSymlink, and for
symlinks call os.Readlink(absPath) and pass the link target as the second arg to
tar.FileInfoHeader (e.g., tar.FileInfoHeader(fi, linkTarget)) so the header is
written as a symlink entry (write header with tw.WriteHeader but do not open or
copy file contents). For non-symlinks, open the file as before, call
tar.FileInfoHeader(fi, "") and io.Copy(tw, f), and ensure any opened file is
closed in all paths; update references around tar.FileInfoHeader, os.Lstat,
os.Readlink, ModeSymlink, tw.WriteHeader, and io.Copy.

In `@internal/buildapi/workspace.go`:
- Line 300: The call to getClientFromRequest currently ignores its error which
can leave k8sClient nil and cause a panic when calling k8sClient.Delete; update
the caller to capture both return values (k8sClient, err :=
getClientFromRequest(c)), check err immediately, and handle it by logging and
returning an appropriate HTTP error response (or early return) before using
k8sClient; ensure subsequent code that calls k8sClient.Delete only runs when err
is nil and k8sClient is non-nil so the function is safe against nil dereference.
- Around line 566-588: The deployScript currently interpolates sshPassword,
req.ArtifactPath and req.DestPath directly into the shell string (variables:
sshPassword, req.ArtifactPath, req.DestPath, deployScript, sshOpts, cmd),
creating a shell-injection risk; fix it by removing direct interpolation and
instead pass those values as environment variables (e.g. SSH_PASS,
ARTIFACT_PATH, DEST_PATH, SSH_OPTS) and reference them inside the script with
robust quoting (e.g. "$SSH_PASS", "$ARTIFACT_PATH", "$DEST_PATH"), or better yet
build the exec.Cmd with Cmd.Env containing those variables and avoid embedding
secrets in the command line so sshpass and rsync read quoted env vars rather
than raw unescaped user input.

In `@internal/controller/workspace/controller.go`:
- Around line 98-112: The controller is using hardcoded defaults (e.g., pvcSize
fallback to DefaultWorkspacePVCSize, image fallback to DefaultToolchainImage,
resources via resourcesOrDefaults) and ignores OperatorConfig.Spec.Workspaces
policies; update the code paths that build the PVC, PodSpec, and resource
handling (look for pvcSize, ws.Spec.PVCSize, resourcesOrDefaults, image
creation, and any PodSpec mutations) to first read and merge
OperatorConfig.Spec.Workspaces for defaults and policy: use
OperatorConfig.Workspaces.PVCSize if ws.Spec.PVCSize is empty, apply
OperatorConfig.Workspaces.DefaultToolchainImage when ws image is unset, enforce
MaxResources by validating/clamping user-supplied resources in
resourcesOrDefaults, and copy operator-level NodeSelector, Tolerations, and
TmpfsBuildDir into the resulting Pod/PVC specs so admin policies are applied
everywhere the controller constructs workspace resources.

---

Outside diff comments:
In `@api/v1alpha1/zz_generated.deepcopy.go`:
- Around line 1-1403: The committed autogenerated deepcopy file
("zz_generated.deepcopy.go" containing the header "Code generated by
controller-gen. DO NOT EDIT.") should be removed from the PR; delete that file
from your branch and add an entry to .gitignore to prevent committing it in
future, ensuring the file is produced by the build step (make generate
manifests) instead of being tracked; verify no other generated files with the
same header are staged before pushing.

---

Duplicate comments:
In `@cmd/caib/workspace/workspace.go`:
- Around line 463-469: The code currently flattens the argument slice into a
single shell string using strings.Join into the variable command, which loses
quoting and argument boundaries; instead preserve the original argv by keeping
cmdParts (the []string) and pass it verbatim to the executor (e.g., use cmdParts
as the args for whatever run/exec function you call) rather than creating the
command string; remove or stop using the joined command variable and ensure the
existing len(cmdParts) check and handleError call remain.
- Around line 385-391: The current code creates pr once from buf and passes it
into ExecuteWithReauth, but ExecuteWithReauth may call the callback multiple
times so pr is exhausted after the first call; fix by first capturing the
archive bytes (copy buf.Bytes() into a byte slice and compute totalBytes once),
then inside the ExecuteWithReauth callback recreate a fresh bytes.Reader and a
fresh progress reader (use bytes.NewReader(...) and newProgressReader(...)) on
each attempt and call client.SyncWorkspace with that fresh pr, ensuring
pr.finish() is invoked after each attempt (e.g., defer pr.finish() inside the
callback) so retries upload the full tar stream.

In `@Containerfile.toolchain`:
- Around line 77-78: The install puts Jumpstarter under /root/.local/jumpstarter
which prevents non-root/random UIDs from executing it; modify the RUN that
invokes the install script (the curl | bash -s -- command) to install into a
system-wide, non-root-readable path (e.g., /usr/local or /opt) by passing the
install script's directory flag (e.g., --dir or equivalent) or by moving the
installed artifacts after install, and update the ENV PATH line (currently
setting PATH for /root/.local/jumpstarter) to point to the new location so the
binary is accessible to arbitrary container UIDs; locate the invocation using
the curl install command and the ENV PATH line to make the change.
- Around line 73-74: Remove the global NOPASSWD sudo entry and stop defaulting
to root: replace the RUN echo 'ALL ALL=(ALL) NOPASSWD: ALL' >> /etc/sudoers step
with a least-privilege approach (e.g., create a specific user/group, add a
constrained sudoers entry in /etc/sudoers.d for that user or group allowing only
required commands without NOPASSWD, or require password). Also add a USER
directive to switch from root to that non-root user at the end of the
Containerfile (use the created username in the USER line) so the image does not
default to root. Ensure references to /etc/sudoers are replaced by a scoped
/etc/sudoers.d file and avoid blanket ALL ALL=(ALL) NOPASSWD: ALL entries.
- Around line 22-35: Update the repo and dnf install invocations to enable RPM
signature checks: set gpgcheck=1 in the here-doc repo file (replace gpgcheck=0),
remove the --nogpgcheck flag from the dnf install command, and add
--setopt=repo_gpgcheck=1 (and/or --setopt=gpgcheck=1) to the dnf args so both
package and repodata signing are enforced; reference the DISTRO/BASE_URL repo
block and the dnf install line that uses ROOTFS, RELEASE_PKG and PACKAGES. Also
ensure the repository publishes signed repodata (repomd.xml.asc) so the enabled
repo_gpgcheck succeeds.

In `@internal/buildapi/server.go`:
- Around line 1493-1537: The function resolveExtraRepos currently fetches
Workspaces and execs into their pods before verifying the requestor; update the
flow to resolve and validate the caller (the requestedBy user) before any
podExec by retrieving the requestor identity (as done in createBuild) and
comparing it to ws.Spec.Owner for each workspace entry, returning an
unauthorized error if they differ; apply the same owner-check authorization to
the other similar block handling extraRepos (the one around the later pod exec)
so no podExec runs for workspaces not owned by the requester.
- Around line 1533-1537: Replace the shell-interpolated, backgrounded command
with an argv-based exec that uses Python's --directory flag and explicit PID
handling: call podExec with cmd set to []string{"python3", "-m", "http.server",
strconv.Itoa(port), "--directory", repoPath} (use the existing repoPath, port,
workspaceContainerName and wsName identifiers to locate code), and implement
logic to check for and reuse/terminate an existing server by reading/writing a
PID file (e.g. repoPath/.http_server.pid) before starting; after starting
capture and persist the child PID so future builds can health-check and clean it
up, and ensure errors from podExec are returned as before.

In `@internal/buildapi/workspace.go`:
- Line 703: The copyToPod function currently calls
executor.StreamWithContext(context.Background(), streamOpts) which ignores
request cancellation; change copyToPod to accept a context.Context parameter
(e.g., ctx) and replace the use of context.Background() with that ctx when
calling executor.StreamWithContext, and propagate the new ctx through any
callers (and tests) that invoke copyToPod so HTTP request cancellation or
deadlines will cancel the tar extraction.
- Around line 168-176: The call path reusing a lease from an ImageBuild must
enforce ownership: change the caller that sets leaseID (the block using
req.FromBuild) to pass the current requester into resolveLeaseFromBuild, then
update resolveLeaseFromBuild to fetch the ImageBuild and verify the
automative.sdv.cloud.redhat.com/requested-by annotation equals the requester; if
it does not match, return an error so the caller returns a 400. Ensure the error
message clearly states the ownership mismatch and keep the function signature
consistent (resolveLeaseFromBuild(ctx, k8sClient, namespace, buildName,
requester) or similar) and update all call sites accordingly.
- Around line 178-217: The secret creation for Jumpstarter client config
(jmpClientSecret, corev1.Secret and k8sClient.Create) happens before the
workspace existence check (existing := &automotivev1alpha1.Workspace{} and
k8sClient.Get) which can orphan or incorrectly reuse secrets; move the secret
creation block to after the uniqueness check succeeds (i.e., after verifying
k8sClient.Get returns not-found) or, if you must create it earlier, attach an
OwnerReference to the Secret referencing the created Workspace (use the
Workspace object's UID and controller=true) immediately after creating the
Workspace so Kubernetes will garbage-collect the Secret when the Workspace is
deleted; ensure you still handle IsAlreadyExists from k8sClient.Create
appropriately (treat it as an error or update the secret) to avoid silently
reusing stale credentials.
- Around line 366-373: The handler currently reads the entire request body into
memory using io.ReadAll(c.Request.Body), which allows an attacker to OOM the API
pod; instead wrap c.Request.Body with a size limiter (e.g., http.MaxBytesReader
or an io.LimitReader) before passing it to copyToPod so the tarball is streamed
rather than buffered. Replace the io.ReadAll + bytes.NewReader(tarData) flow by
creating a limited reader from c.Request.Body (check and return a 413 Payload
Too Large on limit exceeded) and pass that reader directly to copyToPod; ensure
you keep references to copyToPod, c.Request.Body, and workspaceContainerName
when making the change and close the request body appropriately.
- Around line 612-614: The DeepCopy of wsConfig.Resources can produce a res
whose CPU/Memory Requests or Limits maps are nil and later assignments to
res.Requests[...] or res.Limits[...] will panic; after doing res =
wsConfig.Resources.DeepCopy() (or immediately after creating res) check each map
field (res.Requests and res.Limits) for nil and initialize them to empty maps if
nil before any indexed assignments, or perform the indexed assignments against a
non-nil temporary map and then assign it back to res.Requests/res.Limits to
ensure safety when writing into those maps.
- Around line 765-766: execInPod currently calls podExec with
context.Background(), so request cancellation and deadlines are ignored; change
execInPod to accept a context.Context (e.g., func execInPod(ctx context.Context,
restCfg *rest.Config, namespace, podName, containerName string, cmd []string, w
http.ResponseWriter) error) and pass that ctx to podExec instead of
context.Background(); update all call sites (the sync, exec, deploy handlers) to
pass the HTTP request context (c.Request.Context()) into execInPod so podExec
and its I/O respect client disconnects and timeouts.

In `@internal/controller/operatorconfig/resources.go`:
- Around line 640-654: The role grants create/delete on the "workspaces"
resource which is broader than needed; modify the PolicyRule that has APIGroups
"automotive.sdv.cloud.redhat.com" and Resources "workspaces" to remove "create"
and "delete" (leave only "get", "list", "watch"), and keep the separate
PolicyRule entries for "workspaces/status" and "workspaces/finalizers" with
their update/patch verbs unchanged; if this controller truly needs
create/delete, instead add a narrowly-scoped rule or a separate role for that
specific path rather than granting create/delete in the main workspaces rule.
- Around line 856-874: The code unconditionally grants the workspace service
account the "privileged" SCC by creating a ClusterRole named
workspaceServiceAccountName+"-privileged" in buildWorkspaceSCCClusterRole;
change this to avoid default privileged access by either narrowing the granted
permissions (remove or restrict ResourceNames "privileged" and/or use a
less-privileged SCC) and making the creation/binding opt-in via an
OperatorConfig flag or annotation checked during reconciliation, or create a
separate ClusterRole that is only bound when an explicit config field (e.g.,
EnablePrivilegedSCC) is true; update the reconciler that creates the
ClusterRole/ClusterRoleBinding to honor that flag and ensure
workspaceServiceAccountName is only given the privileged SCC when explicitly
enabled.

In `@internal/controller/workspace/controller.go`:
- Around line 50-60: When ensurePVC or ensurePod fails, the code currently calls
r.setStatus(...) and returns its result so the original error is dropped; change
those branches to first call r.setStatus(ctx, ws, "Failed", fmt.Sprintf(...))
(ignoring its return value or logging it) and then return ctrl.Result{}, err so
the controller requeues; update the failure handling around ensurePVC and
ensurePod in the reconcile flow (references: ensurePVC, ensurePod, setStatus) to
set status and then return the original error instead of returning the result of
setStatus.
- Around line 98-112: The controller currently calls resource.MustParse(pvcSize)
when building the PersistentVolumeClaim (pvc), which will panic on an invalid
spec and crash the controller; replace this with resource.ParseQuantity(pvcSize)
inside the reconcile path (where pvcSize is derived) and handle the returned
(qty, err): if err != nil, record an event on the Workspace, return a non-nil
error or a requeue request from the reconcile (so the error surfaces to
reconciliation and is retryable) instead of letting the process panic; update
the pvc construction to use the parsed quantity variable and ensure
functions/variables referenced (pvcSize, pvc, resource.MustParse ->
resource.ParseQuantity) are adjusted accordingly.
- Around line 160-248: The pod currently runs user-controlled ws.Spec.Image as
root with privileged=true; validate or restrict images and drop privileged/root:
reject or ignore any caller-supplied ws.Spec.Image unless it matches an
operator-managed allowlist (fallback to
automotivev1alpha1.DefaultToolchainImage) and remove the
container.SecurityContext.Privileged=true and PodSecurityContext RunAsUser:
int64Ptr(0) (use a non-root uid, e.g. int64Ptr(1000) or remove RunAsUser) so
containerName under the workspaceServiceAccountName no longer runs as privileged
root.
- Around line 103-127: The PVC is created with GenerateName which can produce
duplicates across retries; instead ensure a stable identity by first checking
for an existing owned PVC and only creating one with a deterministic Name if
none exists. Update the reconcile code around pvc, r.Create,
controllerutil.SetControllerReference and r.Status().Patch to: 1) list/get PVCs
in ws.Namespace filtered by Name (or by label/ownerRef) to find a PVC owned by
the Workspace (ws) and if found set ws.Status.PVCName and patch status; 2) if
not found, create the PVC using a deterministic Name (e.g., ws.Name + pvcSuffix)
rather than GenerateName, call controllerutil.SetControllerReference(ws, pvc,
r.Scheme) and r.Create(ctx, pvc), then set ws.Status.PVCName and patch. This
prevents allocating a new PVC after transient failures.

---

Nitpick comments:
In `@internal/buildapi/workspace.go`:
- Around line 416-418: The websocket upgrader currently disables origin checks
via the wsUpgrader variable (CheckOrigin: func(_ *http.Request) bool { return
true }), which is unsafe; change CheckOrigin to validate the request's Origin
header against a whitelist (or at least ensure it matches the configured
API/frontend host) and return true only for allowed origins, referencing
wsUpgrader and its CheckOrigin function so the upgrade path enforces origin
validation rather than unconditionally allowing all origins.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 37fc93b2-d346-41eb-b875-bc74233cf57f

📥 Commits

Reviewing files that changed from the base of the PR and between 0bee9dc and a782e85.

⛔ Files ignored due to path filters (3)
  • config/crd/bases/automotive.sdv.cloud.redhat.com_operatorconfigs.yaml is excluded by !config/crd/bases/**
  • config/crd/bases/automotive.sdv.cloud.redhat.com_workspaces.yaml is excluded by !config/crd/bases/**
  • config/rbac/role.yaml is excluded by !config/rbac/role.yaml
📒 Files selected for processing (22)
  • Containerfile.toolchain
  • api/v1alpha1/operatorconfig_types.go
  • api/v1alpha1/workspace_types.go
  • api/v1alpha1/zz_generated.deepcopy.go
  • bundle.Dockerfile
  • cmd/caib/buildcmd/build.go
  • cmd/caib/image/image.go
  • cmd/caib/main.go
  • cmd/caib/root.go
  • cmd/caib/runtime_wiring.go
  • cmd/caib/workspace/workspace.go
  • cmd/main.go
  • config/crd/kustomization.yaml
  • internal/buildapi/client/client.go
  • internal/buildapi/server.go
  • internal/buildapi/types.go
  • internal/buildapi/workspace.go
  • internal/buildapi/workspace_test.go
  • internal/controller/imagebuild/controller.go
  • internal/controller/operatorconfig/controller.go
  • internal/controller/operatorconfig/resources.go
  • internal/controller/workspace/controller.go
🚧 Files skipped from review as they are similar to previous changes (5)
  • internal/buildapi/workspace_test.go
  • internal/controller/imagebuild/controller.go
  • cmd/caib/runtime_wiring.go
  • cmd/caib/image/image.go
  • cmd/main.go

Comment thread cmd/caib/workspace/workspace.go Outdated
Comment thread internal/buildapi/workspace.go Outdated
Comment thread internal/buildapi/workspace.go
Comment on lines +98 to +112
pvcSize := ws.Spec.PVCSize
if pvcSize == "" {
pvcSize = automotivev1alpha1.DefaultWorkspacePVCSize
}

pvc := &corev1.PersistentVolumeClaim{
ObjectMeta: metav1.ObjectMeta{
GenerateName: ws.Name + pvcSuffix + "-",
Namespace: ws.Namespace,
},
Spec: corev1.PersistentVolumeClaimSpec{
AccessModes: []corev1.PersistentVolumeAccessMode{corev1.ReadWriteOnce},
Resources: corev1.VolumeResourceRequirements{
Requests: corev1.ResourceList{
corev1.ResourceStorage: resource.MustParse(pvcSize),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Wire OperatorConfig.Spec.Workspaces into this controller instead of hardcoded defaults.

The new workspace config surface is never read here: PVC size falls back to DefaultWorkspacePVCSize, the image falls back to DefaultToolchainImage, resources default in resourcesOrDefaults, user-supplied resources are never checked against MaxResources, and operator-level NodeSelector, Tolerations, and TmpfsBuildDir policy are ignored. Right now those admin settings are effectively dead config.

Also applies to: 152-276, 302-315

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/controller/workspace/controller.go` around lines 98 - 112, The
controller is using hardcoded defaults (e.g., pvcSize fallback to
DefaultWorkspacePVCSize, image fallback to DefaultToolchainImage, resources via
resourcesOrDefaults) and ignores OperatorConfig.Spec.Workspaces policies; update
the code paths that build the PVC, PodSpec, and resource handling (look for
pvcSize, ws.Spec.PVCSize, resourcesOrDefaults, image creation, and any PodSpec
mutations) to first read and merge OperatorConfig.Spec.Workspaces for defaults
and policy: use OperatorConfig.Workspaces.PVCSize if ws.Spec.PVCSize is empty,
apply OperatorConfig.Workspaces.DefaultToolchainImage when ws image is unset,
enforce MaxResources by validating/clamping user-supplied resources in
resourcesOrDefaults, and copy operator-level NodeSelector, Tolerations, and
TmpfsBuildDir into the resulting Pod/PVC specs so admin policies are applied
everywhere the controller constructs workspace resources.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (6)
internal/controller/workspace/controller.go (3)

174-176: ⚠️ Potential issue | 🔴 Critical

Don't run caller-supplied workspace images as privileged root.

ws.Spec.Image comes straight from the CR, and this pod starts it with RunAsUser=0, Privileged=true, and the ado-workspace service account. That's a cluster-compromise path, not just a workspace escape.

Also applies to: 241-262

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/controller/workspace/controller.go` around lines 174 - 176, The
controller currently sets image := ws.Spec.Image and allows arbitrary
CR-provided images to run as root/privileged; change this to deny or override
caller-supplied images and harden the PodSecurityContext: only allow
ws.Spec.Image when it matches an approved/whitelist (or else use
automotivev1alpha1.DefaultToolchainImage), and in the Pod spec ensure
securityContext.Privileged is false, set securityContext.RunAsNonRoot = true
and/or securityContext.RunAsUser to a non-zero UID, and avoid using the elevated
ado-workspace service account for untrusted images; update the code paths
referencing image (the image variable and the Pod creation logic around
RunAsUser/Privileged) to enforce these checks and overrides.

109-129: ⚠️ Potential issue | 🟠 Major

Apply OperatorConfig.Spec.Workspaces in this controller.

Right now PVC size, default image, tmpfs behavior, node placement, and resource policy all come from hardcoded defaults or the CR itself because this controller never reads OperatorConfig.Spec.Workspaces. That makes the new admin config surface ineffective for sparse Workspace specs.

Also applies to: 174-177, 206-218, 259-285

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/controller/workspace/controller.go` around lines 109 - 129, The
controller currently ignores OperatorConfig.Spec.Workspaces and uses hardcoded
defaults (e.g., DefaultWorkspacePVCSize) or values directly from ws.Spec; update
the reconcile logic to first read the OperatorConfig
(OperatorConfig.Spec.Workspaces) and merge its workspace defaults into Workspace
creation so that fields like pvcSize (ws.Spec.PVCSize), default image, tmpfs
settings, node placement, and resource policy fallback to OperatorConfig values
when the Workspace CR omits them; locate the PVC creation
(pvcSize/DefaultWorkspacePVCSize usage), image/tmpfs/node placement/resource
policy handling in the same controller (references: pvcSize,
DefaultWorkspacePVCSize, ws.Spec.*, and the workspace-related create/update code
paths) and apply the merged defaults before constructing the PVC, PodSpec, and
resource objects so the OperatorConfig defaults are honored.

109-124: ⚠️ Potential issue | 🔴 Critical

Avoid panicking on invalid spec.pvcSize.

resource.MustParse lets one bad size value crash the controller. Parse the quantity and return a reconcile error instead.

Suggested fix
 	pvcSize := ws.Spec.PVCSize
 	if pvcSize == "" {
 		pvcSize = automotivev1alpha1.DefaultWorkspacePVCSize
 	}
+	qty, parseErr := resource.ParseQuantity(pvcSize)
+	if parseErr != nil {
+		return fmt.Errorf("invalid workspace PVC size %q: %w", pvcSize, parseErr)
+	}
 
 	pvc := &corev1.PersistentVolumeClaim{
 		ObjectMeta: metav1.ObjectMeta{
 			Name:      pvcName,
 			Namespace: ws.Namespace,
@@
 			AccessModes: []corev1.PersistentVolumeAccessMode{corev1.ReadWriteOnce},
 			Resources: corev1.VolumeResourceRequirements{
 				Requests: corev1.ResourceList{
-					corev1.ResourceStorage: resource.MustParse(pvcSize),
+					corev1.ResourceStorage: qty,
 				},
 			},
 		},
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/controller/workspace/controller.go` around lines 109 - 124, The code
uses resource.MustParse(pvcSize) when building the PersistentVolumeClaim which
will panic on an invalid PVC size; replace MustParse with
resource.ParseQuantity(pvcSize), check the returned error, and if parsing fails
return a reconcile error (or appropriate error from the current reconcile
function) instead of continuing; update the PVC creation path around the
pvcSize/pvc variable construction (referencing pvcSize, pvcName, and the
workspace reconcile function in controller.go) to use the parsed quantity on
corev1.ResourceStorage when parsing succeeds.
internal/buildapi/server.go (3)

1517-1538: ⚠️ Potential issue | 🔴 Critical

Enforce workspace ownership before podExec.

Because createBuild resolves extra repos before it computes requestedBy, this helper never verifies ws.Spec.Owner. Any authenticated caller who knows another workspace name can make the API exec inside that pod. Pass the requester into resolveExtraRepos and reject mismatched owners before the pod lookup/exec.

Also applies to: 1695-1711

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/buildapi/server.go` around lines 1517 - 1538, createBuild currently
calls resolveExtraRepos before computing requestedBy, so the helper never
enforces workspace ownership and allows anyone knowing a workspace name to call
podExec; fix by changing createBuild to compute the requester (requestedBy)
first and pass that requester into resolveExtraRepos, then explicitly verify
ws.Spec.Owner == requestedBy (or reject) before any pod lookup/podExec calls
(e.g., before using ws.Status.PodName and calling podExec); update
resolveExtraRepos signature to accept the requester and enforce owner checks in
the code paths referenced around podExec and the duplicate block at the later
location.

1535-1537: ⚠️ Potential issue | 🔴 Critical

Do not interpolate repoPath into /bin/sh -c.

repoPath is caller-controlled input. Embedding it in the shell command lets a crafted path execute arbitrary commands in the workspace pod.

Suggested fix
-		cmd := []string{"/bin/sh", "-c",
-			fmt.Sprintf("cd %s && python3 -m http.server %d > /dev/null 2>&1 &", repoPath, port)}
+		cmd := []string{
+			"/bin/sh", "-c",
+			"cd -- \"$1\" && nohup python3 -m http.server \"$2\" >/dev/null 2>&1 &",
+			"--", repoPath, fmt.Sprintf("%d", port),
+		}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/buildapi/server.go` around lines 1535 - 1537, The code currently
injects caller-controlled repoPath into a shell string (cmd) which permits
command injection; instead, avoid using "/bin/sh -c" with interpolated paths:
call the python binary directly (e.g., build cmd as []string{"python3", "-m",
"http.server", strconv.Itoa(port)}) and pass repoPath to podExec via a dedicated
working-directory option if podExec supports it, or—if podExec cannot set
cwd—validate/sanitize repoPath (use path/filepath.Clean, ensure no shell
metacharacters, enforce allowed prefix or whitelist) before using it and never
concatenate it into a shell command; update the code paths referencing repoPath,
podExec, workspaceContainerName and cmd accordingly.

1534-1539: ⚠️ Potential issue | 🟠 Major

Manage extra-repo servers explicitly.

This fire-and-forget daemonization leaks processes and silently hits port conflicts on repeated builds against the same workspace. podExec succeeding here only means the shell returned, not that the server actually bound the port.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/buildapi/server.go` around lines 1534 - 1539, The current
fire-and-forget shell command used in cmd and executed via podExec (targeting
workspaceContainerName / ws.Status.PodName) daemonizes python3 -m http.server
and can leak processes or silently fail to bind the port; change the exec
command so the server is started with explicit lifecycle and readiness checks:
before starting, kill any stale PID (e.g. /tmp/httpserver.pid) if present, start
the server and write its PID to a known file, then verify the port is actually
listening (using ss/netstat or a short curl loop) and return an error from
podExec if the bind fails (and clean up the PID/process), so that the call to
podExec only succeeds when the HTTP server truly bound port; update error
handling around podExec to propagate that failure and reference cmd, port,
repoPath, wsName, podExec, workspaceContainerName, and ws.Status.PodName when
implementing these changes.
🧹 Nitpick comments (2)
internal/common/tasks/scripts/find_manifest.sh (1)

28-35: Prefer warning on per-file copy failures instead of fully silent ignores.

The new filtering is good, but 2>/dev/null || true can hide actionable failures. A warning per failed file keeps behavior resilient while improving diagnosability.

🛠️ Suggested tweak
-    cp -rv "$item" /manifest-work/ 2>/dev/null || true
+    if ! cp -rv "$item" /manifest-work/ 2>/dev/null; then
+      echo "WARN: failed to copy '$item' into /manifest-work/" >&2
+    fi
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/common/tasks/scripts/find_manifest.sh` around lines 28 - 35, The
per-file copy currently swallows all errors with `cp -rv "$item" /manifest-work/
2>/dev/null || true`, losing diagnostics; update the loop in find_manifest.sh so
each failed `cp` emits a warning to stderr (including the source path and the cp
exit code or error text) instead of silencing it completely — e.g., run cp
without redirecting stderr, capture its exit status for each `item`, and on
non-zero exit print a concise warning message (use the same loop and the `cp -rv
"$item" /manifest-work/` invocation, then check `$?` and `echo` or `printf` to
stderr).
api/v1alpha1/operatorconfig_types.go (1)

359-362: Add a GetBuildCacheSize() fallback to match the documented default.

BuildCacheSize documents a default ("20Gi"), but this file doesn’t provide a getter/default path like GetPVCSize() and GetDefaultArchitecture(). Centralizing that fallback here reduces drift across callers.

♻️ Suggested refactor
 const (
@@
 	// DefaultWorkspacePVCSize is the default PVC size for workspace storage
 	DefaultWorkspacePVCSize = "10Gi"
+
+	// DefaultWorkspaceBuildCacheSize is the default PVC size for workspace build cache
+	DefaultWorkspaceBuildCacheSize = "20Gi"
 )
@@
 func (c *WorkspacesConfig) GetNodeSelector() map[string]string {
 	if c != nil {
 		return c.NodeSelector
 	}
 	return nil
 }
+
+// GetBuildCacheSize returns the workspace build cache PVC size, falling back to the default
+func (c *WorkspacesConfig) GetBuildCacheSize() string {
+	if c != nil && c.BuildCacheSize != "" {
+		return c.BuildCacheSize
+	}
+	return DefaultWorkspaceBuildCacheSize
+}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api/v1alpha1/operatorconfig_types.go` around lines 359 - 362, Add a getter
that returns the documented default for BuildCacheSize when empty: implement
GetBuildCacheSize() on the same type that holds BuildCacheSize (mirror the
pattern used by GetPVCSize() and GetDefaultArchitecture()), so it returns the
BuildCacheSize field if non-empty and otherwise returns the literal default
"20Gi". Update callers to use GetBuildCacheSize() instead of reading
BuildCacheSize directly to centralize the fallback logic.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@internal/buildapi/server.go`:
- Around line 1615-1645: The PVC creation currently uses
resource.MustParse(cacheSize) which will panic on invalid operator input; change
this to parse the quantity with resource.ParseQuantity (or resource.Parse) when
reading wsConfig.BuildCacheSize, check the returned error, and handle it by
returning a proper API error response (or falling back to the default "20Gi")
before constructing the PVC Spec so no panic can occur; update the code paths
around wsConfig.BuildCacheSize, cacheSize, and the PVC Spec/Resources usage to
use the validated quantity variable instead of resource.MustParse.

In `@internal/controller/imagebuild/controller.go`:
- Around line 754-767: The current branch replaces the entire shared-workspace
with the BuildCachePVC (variable sharedWorkspaceBinding), which discards
per-build upload content; instead keep a per-build shared workspace and mount
the cache PVC under a separate workspace name. Change the logic in controller.go
so that when imageBuild.Spec.BuildCachePVC is set you do NOT set
sharedWorkspaceBinding to the PVC; instead preserve the existing
shared-workspace binding (e.g., from VolumeClaimTemplate or an EmptyDir) and add
a second tektonv1.WorkspaceBinding with Name "build-cache" whose
PersistentVolumeClaim.ClaimName is imageBuild.Spec.BuildCachePVC (or, if you
prefer to prohibit the unsafe combination, return an error when both
BuildCachePVC and InputFilesServer/PVCName are present). Update any Task/TaskRun
bindings to reference the new "build-cache" workspace rather than mounting the
cache at "/workspace/shared".

---

Duplicate comments:
In `@internal/buildapi/server.go`:
- Around line 1517-1538: createBuild currently calls resolveExtraRepos before
computing requestedBy, so the helper never enforces workspace ownership and
allows anyone knowing a workspace name to call podExec; fix by changing
createBuild to compute the requester (requestedBy) first and pass that requester
into resolveExtraRepos, then explicitly verify ws.Spec.Owner == requestedBy (or
reject) before any pod lookup/podExec calls (e.g., before using
ws.Status.PodName and calling podExec); update resolveExtraRepos signature to
accept the requester and enforce owner checks in the code paths referenced
around podExec and the duplicate block at the later location.
- Around line 1535-1537: The code currently injects caller-controlled repoPath
into a shell string (cmd) which permits command injection; instead, avoid using
"/bin/sh -c" with interpolated paths: call the python binary directly (e.g.,
build cmd as []string{"python3", "-m", "http.server", strconv.Itoa(port)}) and
pass repoPath to podExec via a dedicated working-directory option if podExec
supports it, or—if podExec cannot set cwd—validate/sanitize repoPath (use
path/filepath.Clean, ensure no shell metacharacters, enforce allowed prefix or
whitelist) before using it and never concatenate it into a shell command; update
the code paths referencing repoPath, podExec, workspaceContainerName and cmd
accordingly.
- Around line 1534-1539: The current fire-and-forget shell command used in cmd
and executed via podExec (targeting workspaceContainerName / ws.Status.PodName)
daemonizes python3 -m http.server and can leak processes or silently fail to
bind the port; change the exec command so the server is started with explicit
lifecycle and readiness checks: before starting, kill any stale PID (e.g.
/tmp/httpserver.pid) if present, start the server and write its PID to a known
file, then verify the port is actually listening (using ss/netstat or a short
curl loop) and return an error from podExec if the bind fails (and clean up the
PID/process), so that the call to podExec only succeeds when the HTTP server
truly bound port; update error handling around podExec to propagate that failure
and reference cmd, port, repoPath, wsName, podExec, workspaceContainerName, and
ws.Status.PodName when implementing these changes.

In `@internal/controller/workspace/controller.go`:
- Around line 174-176: The controller currently sets image := ws.Spec.Image and
allows arbitrary CR-provided images to run as root/privileged; change this to
deny or override caller-supplied images and harden the PodSecurityContext: only
allow ws.Spec.Image when it matches an approved/whitelist (or else use
automotivev1alpha1.DefaultToolchainImage), and in the Pod spec ensure
securityContext.Privileged is false, set securityContext.RunAsNonRoot = true
and/or securityContext.RunAsUser to a non-zero UID, and avoid using the elevated
ado-workspace service account for untrusted images; update the code paths
referencing image (the image variable and the Pod creation logic around
RunAsUser/Privileged) to enforce these checks and overrides.
- Around line 109-129: The controller currently ignores
OperatorConfig.Spec.Workspaces and uses hardcoded defaults (e.g.,
DefaultWorkspacePVCSize) or values directly from ws.Spec; update the reconcile
logic to first read the OperatorConfig (OperatorConfig.Spec.Workspaces) and
merge its workspace defaults into Workspace creation so that fields like pvcSize
(ws.Spec.PVCSize), default image, tmpfs settings, node placement, and resource
policy fallback to OperatorConfig values when the Workspace CR omits them;
locate the PVC creation (pvcSize/DefaultWorkspacePVCSize usage),
image/tmpfs/node placement/resource policy handling in the same controller
(references: pvcSize, DefaultWorkspacePVCSize, ws.Spec.*, and the
workspace-related create/update code paths) and apply the merged defaults before
constructing the PVC, PodSpec, and resource objects so the OperatorConfig
defaults are honored.
- Around line 109-124: The code uses resource.MustParse(pvcSize) when building
the PersistentVolumeClaim which will panic on an invalid PVC size; replace
MustParse with resource.ParseQuantity(pvcSize), check the returned error, and if
parsing fails return a reconcile error (or appropriate error from the current
reconcile function) instead of continuing; update the PVC creation path around
the pvcSize/pvc variable construction (referencing pvcSize, pvcName, and the
workspace reconcile function in controller.go) to use the parsed quantity on
corev1.ResourceStorage when parsing succeeds.

---

Nitpick comments:
In `@api/v1alpha1/operatorconfig_types.go`:
- Around line 359-362: Add a getter that returns the documented default for
BuildCacheSize when empty: implement GetBuildCacheSize() on the same type that
holds BuildCacheSize (mirror the pattern used by GetPVCSize() and
GetDefaultArchitecture()), so it returns the BuildCacheSize field if non-empty
and otherwise returns the literal default "20Gi". Update callers to use
GetBuildCacheSize() instead of reading BuildCacheSize directly to centralize the
fallback logic.

In `@internal/common/tasks/scripts/find_manifest.sh`:
- Around line 28-35: The per-file copy currently swallows all errors with `cp
-rv "$item" /manifest-work/ 2>/dev/null || true`, losing diagnostics; update the
loop in find_manifest.sh so each failed `cp` emits a warning to stderr
(including the source path and the cp exit code or error text) instead of
silencing it completely — e.g., run cp without redirecting stderr, capture its
exit status for each `item`, and on non-zero exit print a concise warning
message (use the same loop and the `cp -rv "$item" /manifest-work/` invocation,
then check `$?` and `echo` or `printf` to stderr).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 45057903-3168-40ab-9df2-0033e61847f7

📥 Commits

Reviewing files that changed from the base of the PR and between a782e85 and 3b94a37.

⛔ Files ignored due to path filters (3)
  • config/crd/bases/automotive.sdv.cloud.redhat.com_imagebuilds.yaml is excluded by !config/crd/bases/**
  • config/crd/bases/automotive.sdv.cloud.redhat.com_operatorconfigs.yaml is excluded by !config/crd/bases/**
  • config/crd/bases/automotive.sdv.cloud.redhat.com_workspaces.yaml is excluded by !config/crd/bases/**
📒 Files selected for processing (17)
  • api/v1alpha1/imagebuild_types.go
  • api/v1alpha1/operatorconfig_types.go
  • api/v1alpha1/workspace_types.go
  • cmd/caib/buildcmd/build.go
  • cmd/caib/image/image.go
  • cmd/caib/main.go
  • cmd/caib/runtime_wiring.go
  • cmd/caib/workspace/workspace.go
  • internal/buildapi/server.go
  • internal/buildapi/types.go
  • internal/buildapi/workspace.go
  • internal/common/tasks/scripts/build_image.sh
  • internal/common/tasks/scripts/find_manifest.sh
  • internal/common/tasks/scripts/push_artifact.sh
  • internal/common/tasks/tasks.go
  • internal/controller/imagebuild/controller.go
  • internal/controller/workspace/controller.go
🚧 Files skipped from review as they are similar to previous changes (4)
  • cmd/caib/workspace/workspace.go
  • api/v1alpha1/workspace_types.go
  • internal/buildapi/workspace.go
  • cmd/caib/buildcmd/build.go

Comment thread internal/buildapi/server.go Outdated
Comment on lines 754 to +767
// Determine the shared-workspace binding:
// - If BuildCachePVC is set, use it as the shared workspace for build cache persistence
// - If InputFilesServer is enabled and a PVC already exists (from upload phase), use it
// - Otherwise, use VolumeClaimTemplate to create a new PVC with proper zone affinity
var sharedWorkspaceBinding tektonv1.WorkspaceBinding
if imageBuild.Spec.GetInputFilesServer() && imageBuild.Status.PVCName != "" {
if imageBuild.Spec.BuildCachePVC != "" {
log.Info("Using build-cache PVC as shared workspace", "pvc", imageBuild.Spec.BuildCachePVC)
sharedWorkspaceBinding = tektonv1.WorkspaceBinding{
Name: "shared-workspace",
PersistentVolumeClaim: &corev1.PersistentVolumeClaimVolumeSource{
ClaimName: imageBuild.Spec.BuildCachePVC,
},
}
} else if imageBuild.Spec.GetInputFilesServer() && imageBuild.Status.PVCName != "" {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don't use the build-cache claim as the whole shared-workspace.

BuildCachePVC and InputFilesServer can both be set, but this branch replaces the upload PVC with the cache PVC. That makes uploaded source_path content disappear from the build, and same-workspace builds also end up racing over one /workspace/shared root because the build script cleans that root on every run. Keep a per-build shared workspace and mount the persistent claim only under build-cache/, or reject the combination up front.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/controller/imagebuild/controller.go` around lines 754 - 767, The
current branch replaces the entire shared-workspace with the BuildCachePVC
(variable sharedWorkspaceBinding), which discards per-build upload content;
instead keep a per-build shared workspace and mount the cache PVC under a
separate workspace name. Change the logic in controller.go so that when
imageBuild.Spec.BuildCachePVC is set you do NOT set sharedWorkspaceBinding to
the PVC; instead preserve the existing shared-workspace binding (e.g., from
VolumeClaimTemplate or an EmptyDir) and add a second tektonv1.WorkspaceBinding
with Name "build-cache" whose PersistentVolumeClaim.ClaimName is
imageBuild.Spec.BuildCachePVC (or, if you prefer to prohibit the unsafe
combination, return an error when both BuildCachePVC and
InputFilesServer/PVCName are present). Update any Task/TaskRun bindings to
reference the new "build-cache" workspace rather than mounting the cache at
"/workspace/shared".

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

♻️ Duplicate comments (2)
internal/buildapi/workspace.go (2)

624-626: ⚠️ Potential issue | 🟠 Major

Reinitialize Requests and Limits after DeepCopy().

DeepCopy() preserves nil maps. If workspaces.resources is {} or only partially populated, the assignments below still panic on a nil map.

Suggested fix
 	if wsConfig != nil && wsConfig.Resources != nil {
 		res = wsConfig.Resources.DeepCopy()
 	}
+	if res.Requests == nil {
+		res.Requests = corev1.ResourceList{}
+	}
+	if res.Limits == nil {
+		res.Limits = corev1.ResourceList{}
+	}
 
 	if cpu != "" {

Also applies to: 628-653

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/buildapi/workspace.go` around lines 624 - 626, The DeepCopy() call
for wsConfig.Resources can leave Requests and Limits maps nil, causing panics
when code later assigns into them; after res = wsConfig.Resources.DeepCopy()
(and similarly in the other copy sites), check and reinitialize res.Requests and
res.Limits to empty maps when nil (e.g., if res.Requests == nil { res.Requests =
make(map[string]resource.Quantity) } and same for res.Limits) before any
assignments; update the code paths that mutate res (look for usages of
res.Requests and res.Limits in this function) to ensure they always operate on
non-nil maps.

375-382: ⚠️ Potential issue | 🟠 Major

Keep sync uploads size-bounded instead of io.ReadAll-ing them.

Even if buffering is needed for the SPDY EOF behavior, this still turns /sync into an easy OOM path. copyToPod already accepts an io.Reader; cap the body with the existing upload limit and stream or spool it instead of holding the whole tarball in RAM.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/buildapi/workspace.go` around lines 375 - 382, Replace the
io.ReadAll buffering of c.Request.Body with a bounded stream: wrap the request
body in an io.LimitReader using the existing upload limit (e.g.
maxUploadSize/uploadLimit) and stream it directly into copyToPod instead of
buffering into tarData; if you need to detect an exceeded limit, copy up to
limit+1 (or copy to io.Discard after the limit) and return an HTTP 413 if more
data was present. Update the call site that currently reads tarData and calls
copyToPod to pass the limited reader (c.Request.Body wrapped) and adjust error
responses accordingly (use 413 for too-large, 500 for other read/copy errors) so
/sync no longer OOMs.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@internal/buildapi/workspace.go`:
- Around line 155-160: The code currently ignores the error returned by
loadOperatorConfigFn and treats any failure as no config; update the logic
around the call to loadOperatorConfigFn(c.Request.Context(), k8sClient,
namespace) to capture its error return, and if the error is non-nil and not a
NotFound error (use the k8s errors helper, e.g. k8serrors.IsNotFound(err)),
return an internal server error (HTTP 500) or propagate the error to fail
closed; only when err is nil proceed to set wsConfig =
operatorConfig.Spec.Workspaces, and when err is a NotFound keep wsConfig nil.
- Around line 425-427: The wsUpgrader currently allows all origins via
wsUpgrader.CheckOrigin which opens the WebSocket shell to CSRF; replace the
permissive CheckOrigin with a strict same-origin check that validates
r.Header.Get("Origin") (parse and compare its host and scheme) against r.Host
(and expected scheme) or against a configured allowlist, and return false by
default on mismatch; update the wsUpgrader variable (and any usage in the
WebSocket handler that uses a.authMiddleware()) to use this stricter CheckOrigin
so only requests from the same origin or explicit allowed origins can upgrade.

In `@internal/common/tasks/scripts/build_image.sh`:
- Around line 59-69: The persistent cache cleanup leaves "$BUILD_DIR/image.json"
intact and that stale manifest gets copied later; update build_image.sh to
explicitly remove any persistent image manifest before publishing by deleting
"$BUILD_DIR/image.json" (or "$WORKSPACE_PATH/$BUILD_DIR/image.json" as
appropriate) during the cleanup stage that uses WORKSPACE_PATH and BUILD_DIR so
no old image.json can be carried forward.

In `@internal/controller/operatorconfig/controller.go`:
- Around line 875-889: The delete of the OpenShift-only
SecurityContextConstraints in cleanupWorkspaceInfra should not fail on a
non-OpenShift cluster; update the SCC deletion to either first check
detectOpenShift(ctx) (only call r.Delete for scc when running on OpenShift) or
treat a no-match error from r.Delete as non-fatal by using isNoMatchError(err)
alongside errors.IsNotFound(err) before returning an error; specifically adjust
the block that constructs scc (workspaceSCCName) and calls r.Delete so that
detectOpenShift or isNoMatchError guards the failure path.

In `@internal/controller/operatorconfig/resources.go`:
- Around line 332-334: The Route is currently targeting the service port name
"http" (RoutePort.TargetPort set via intstr.FromString("http")), which bypasses
the oauth-proxy sidecar; change the RoutePort.TargetPort to point to the service
port used by the oauth-proxy (e.g. intstr.FromString("proxy") or the actual
proxy port name) so traffic is routed through the proxy container and preserves
the OpenShift OAuth redirect path.

In `@internal/controller/workspace/controller.go`:
- Around line 254-257: The shell redirection in the Command literal in
controller.go uses the Bash-only operator "&>/dev/null" which breaks when
/bin/sh is not Bash; update the command string used where Command:
[]string{"/bin/sh", "-c", ...} (the fragment containing "if ! id workspace
&>/dev/null; then useradd -u 1000 -d /workspace -s /bin/bash workspace; fi") to
use the POSIX-compatible redirection ">/dev/null 2>&1" instead so the existence
check works under dash/ash and other /bin/sh implementations.

---

Duplicate comments:
In `@internal/buildapi/workspace.go`:
- Around line 624-626: The DeepCopy() call for wsConfig.Resources can leave
Requests and Limits maps nil, causing panics when code later assigns into them;
after res = wsConfig.Resources.DeepCopy() (and similarly in the other copy
sites), check and reinitialize res.Requests and res.Limits to empty maps when
nil (e.g., if res.Requests == nil { res.Requests =
make(map[string]resource.Quantity) } and same for res.Limits) before any
assignments; update the code paths that mutate res (look for usages of
res.Requests and res.Limits in this function) to ensure they always operate on
non-nil maps.
- Around line 375-382: Replace the io.ReadAll buffering of c.Request.Body with a
bounded stream: wrap the request body in an io.LimitReader using the existing
upload limit (e.g. maxUploadSize/uploadLimit) and stream it directly into
copyToPod instead of buffering into tarData; if you need to detect an exceeded
limit, copy up to limit+1 (or copy to io.Discard after the limit) and return an
HTTP 413 if more data was present. Update the call site that currently reads
tarData and calls copyToPod to pass the limited reader (c.Request.Body wrapped)
and adjust error responses accordingly (use 413 for too-large, 500 for other
read/copy errors) so /sync no longer OOMs.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 6ef2ce0e-6d01-46b4-9b8c-4bae443890a2

📥 Commits

Reviewing files that changed from the base of the PR and between 3b94a37 and fa62c10.

⛔ Files ignored due to path filters (2)
  • config/crd/bases/automotive.sdv.cloud.redhat.com_operatorconfigs.yaml is excluded by !config/crd/bases/**
  • config/rbac/role.yaml is excluded by !config/rbac/role.yaml
📒 Files selected for processing (8)
  • api/v1alpha1/operatorconfig_types.go
  • config/rbac/scc_role.yaml
  • config/rbac/scc_role_binding.yaml
  • internal/buildapi/workspace.go
  • internal/common/tasks/scripts/build_image.sh
  • internal/controller/operatorconfig/controller.go
  • internal/controller/operatorconfig/resources.go
  • internal/controller/workspace/controller.go

Comment thread internal/buildapi/workspace.go
Comment thread internal/buildapi/workspace.go
Comment on lines +59 to +69
echo "Cleaning up stale artifacts from persistent workspace..."
for item in "$WORKSPACE_PATH"/*; do
[ "$(basename "$item")" = "build-cache" ] && continue
rm -rf "$item"
done

for orphan in "$BUILD_DIR"/image_output--*; do
[ -e "$orphan" ] || break
echo "Removing orphaned build output: $(basename "$orphan")"
rm -rf "$orphan"
done
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Drop stale image.json from the persistent cache before publishing artifacts.

The new cleanup keeps "$BUILD_DIR/image.json" around, and Line 530 copies it for every build mode. disk builds do not obviously regenerate that file, so a persistent-cache run can publish the previous build's manifest as the current artifact metadata.

Suggested fix
   for orphan in "$BUILD_DIR"/image_output--*; do
     [ -e "$orphan" ] || break
     echo "Removing orphaned build output: $(basename "$orphan")"
     rm -rf "$orphan"
   done
+  rm -f "$BUILD_DIR/image.json"

Also applies to: 530-530

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/common/tasks/scripts/build_image.sh` around lines 59 - 69, The
persistent cache cleanup leaves "$BUILD_DIR/image.json" intact and that stale
manifest gets copied later; update build_image.sh to explicitly remove any
persistent image manifest before publishing by deleting "$BUILD_DIR/image.json"
(or "$WORKSPACE_PATH/$BUILD_DIR/image.json" as appropriate) during the cleanup
stage that uses WORKSPACE_PATH and BUILD_DIR so no old image.json can be carried
forward.

Comment thread internal/controller/operatorconfig/controller.go
Comment on lines 332 to 334
Port: &routev1.RoutePort{
TargetPort: intstr.FromString("proxy"),
TargetPort: intstr.FromString("http"),
},
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Keep the OpenShift Route on the proxy service port.

Pointing the Route at "http" sends traffic straight to the build-api container on 8080 and bypasses the oauth-proxy sidecar completely. That regresses the OpenShift auth path this deployment still sets up via the proxy container, service port, and OAuth redirect annotation.

Suggested fix
 			Port: &routev1.RoutePort{
-				TargetPort: intstr.FromString("http"),
+				TargetPort: intstr.FromString("proxy"),
 			},
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
Port: &routev1.RoutePort{
TargetPort: intstr.FromString("proxy"),
TargetPort: intstr.FromString("http"),
},
Port: &routev1.RoutePort{
TargetPort: intstr.FromString("proxy"),
},
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/controller/operatorconfig/resources.go` around lines 332 - 334, The
Route is currently targeting the service port name "http" (RoutePort.TargetPort
set via intstr.FromString("http")), which bypasses the oauth-proxy sidecar;
change the RoutePort.TargetPort to point to the service port used by the
oauth-proxy (e.g. intstr.FromString("proxy") or the actual proxy port name) so
traffic is routed through the proxy container and preserves the OpenShift OAuth
redirect path.

Comment thread internal/controller/workspace/controller.go Outdated
@bennyz bennyz force-pushed the app-building branch 5 times, most recently from 4ad1147 to 143d549 Compare March 21, 2026 09:03
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

♻️ Duplicate comments (8)
internal/common/tasks/scripts/build_image.sh (1)

59-69: ⚠️ Potential issue | 🟠 Major

Drop stale image.json before reusing the persistent cache.

The cleanup keeps "$BUILD_DIR/image.json" around, and Line 528 republishes whatever is already there. disk mode does not write a fresh manifest through COMMON_BUILD_ARGS, so a warm-cache run can attach stale manifest metadata to the current artifact.

Suggested fix
   for orphan in "$BUILD_DIR"/image_output--*; do
     [ -e "$orphan" ] || break
     echo "Removing orphaned build output: $(basename "$orphan")"
     rm -rf "$orphan"
   done
+  rm -f "$BUILD_DIR/image.json"

Also applies to: 528-528

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/common/tasks/scripts/build_image.sh` around lines 59 - 69, The
cleanup currently preserves "$BUILD_DIR/image.json" which can lead to reusing
stale manifests; update the cleanup in build_image.sh to explicitly remove the
persistent manifest before reusing the cache by deleting "$BUILD_DIR/image.json"
(or any manifest file) prior to the reuse logic so disk-mode runs using
COMMON_BUILD_ARGS don't republish old metadata—ensure the removal happens
alongside the existing workspace and orphan cleanup that references
WORKSPACE_PATH and BUILD_DIR.
internal/controller/operatorconfig/controller.go (1)

894-898: ⚠️ Potential issue | 🟠 Major

Ignore NoMatch when deleting the workspace SCC.

Line 896 still deletes an OpenShift-only SecurityContextConstraints unconditionally. On a Kubernetes cluster, that no-match error bubbles out of cleanupOSBuilds() and can block finalizer removal. Gate this behind detectOpenShift() or treat isNoMatchError(err) as non-fatal, like the Route cleanup path.

Suggested fix
-	scc := &securityv1.SecurityContextConstraints{}
-	scc.Name = workspaceSCCName
-	if err := r.Delete(ctx, scc); err != nil && !errors.IsNotFound(err) {
-		return fmt.Errorf("failed to delete workspace SCC: %w", err)
-	}
+	if r.detectOpenShift(ctx, config.Namespace) {
+		scc := &securityv1.SecurityContextConstraints{}
+		scc.Name = workspaceSCCName
+		if err := r.Delete(ctx, scc); err != nil && !errors.IsNotFound(err) && !isNoMatchError(err) {
+			return fmt.Errorf("failed to delete workspace SCC: %w", err)
+		}
+	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/controller/operatorconfig/controller.go` around lines 894 - 898, The
deletion of the OpenShift-only SecurityContextConstraints
(securityv1.SecurityContextConstraints with name workspaceSCCName via r.Delete
called from cleanupOSBuilds()) must not surface a "NoMatch" error on vanilla
Kubernetes; either gate the SCC delete behind detectOpenShift() or treat
isNoMatchError(err) as non-fatal (similar to the Route cleanup path) so
cleanupOSBuilds() ignores NoMatch and only returns real errors.
internal/controller/operatorconfig/resources.go (1)

645-660: ⚠️ Potential issue | 🟠 Major

Drop the extra write verbs from the root workspaces rule.

The separate workspaces/status and workspaces/finalizers rules already cover the controller's write paths. Leaving create/delete on workspaces gives this reconciler unnecessary power over user workspaces.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/controller/operatorconfig/resources.go` around lines 645 - 660, The
RBAC rule for the root "workspaces" resource grants unnecessary write
permissions; remove the extra write verbs from the APIGroups entry that lists
Resources: []string{"workspaces"} by deleting "create" and "delete" from its
Verbs slice so that the root rule only has read/list/watch verbs (e.g.,
"get","list","watch"); leave the separate "workspaces/status" and
"workspaces/finalizers" rules untouched (they cover controller write
operations).
internal/buildapi/server.go (3)

1642-1674: ⚠️ Potential issue | 🟠 Major

Parse BuildCacheSize defensively instead of using MustParse.

Workspaces.BuildCacheSize is operator-configurable. A bad value here drops this request onto the panic/recovery path instead of returning a normal validation error.

🛡️ Suggested change
 	if wsConfig != nil {
 		if wsConfig.BuildCacheSize != "" {
 			cacheSize = wsConfig.BuildCacheSize
 		}
 		if sc := wsConfig.GetStorageClass(); sc != "" {
 			storageClassName = &sc
 		}
 	}
+	qty, err := resource.ParseQuantity(cacheSize)
+	if err != nil {
+		return "", fmt.Errorf("invalid workspace build cache size %q: %w", cacheSize, err)
+	}
 
 	pvc := &corev1.PersistentVolumeClaim{
@@
 			Resources: corev1.VolumeResourceRequirements{
 				Requests: corev1.ResourceList{
-					corev1.ResourceStorage: resource.MustParse(cacheSize),
+					corev1.ResourceStorage: qty,
 				},
 			},
 		},
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/buildapi/server.go` around lines 1642 - 1674, The code currently
uses resource.MustParse(cacheSize) when building the PersistentVolumeClaim (pvc)
which will panic on invalid operator-configured Workspaces.BuildCacheSize;
replace this with resource.ParseQuantity(cacheSize) (or resource.ParseQuantity
on wsConfig.BuildCacheSize) and handle the returned (quantity, error) before
constructing the PersistentVolumeClaimSpec so you can return a validation error
(or reconcile error) instead of panicking; update the pvc creation to use the
parsed Quantity variable and ensure error handling returns a clear error (or
logs and requeues) referencing BuildCacheSize, pvc/PersistentVolumeClaimSpec,
and resource.MustParse to locate the change.

1723-1739: ⚠️ Potential issue | 🔴 Critical

Resolve the requester before expanding ExtraRepos.

This path does workspace lookup and pod access before requestedBy is available, so resolveExtraRepos has no way to enforce ws.Spec.Owner. Any authenticated caller can currently point --extra-repo at another user's running workspace by name.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/buildapi/server.go` around lines 1723 - 1739, Resolve the requestor
identity before expanding ExtraRepos: move or add the requester resolution step
(the code that sets req.RequestedBy / calls whatever resolves the requester from
the request context) to run before the block that calls resolveExtraRepos, or
change resolveExtraRepos(signature) to accept the resolved requester and use it
to enforce ws.Spec.Owner; then call
getClientFromRequest/getRESTConfigFromRequest and a.resolveExtraRepos only after
req.RequestedBy is set so extra-repo expansion cannot access another user's
workspace. Ensure you reference resolveExtraRepos and req.ExtraRepos when making
the change and enforce the owner check inside resolveExtraRepos if you add the
requester parameter.

1534-1540: ⚠️ Potential issue | 🔴 Critical

Don't launch the repo server through an interpolated shell command.

repoPath is inserted directly into /bin/sh -c, which is command-injection prone. The fire-and-forget background process is also never health-checked or cleaned up, so repeated builds can return dead or conflicting repo URLs even when podExec itself succeeds.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/buildapi/server.go` around lines 1534 - 1540, The current code
constructs a shell command with repoPath interpolated into "/bin/sh -c" (cmd
variable) which is command-injection prone and leaves a fire-and-forget process
unmanaged; fix by invoking the HTTP server without a shell (e.g. call python3
via exec form: cmd := []string{"python3","-m","http.server", fmt.Sprintf("%d",
port), "--directory", repoPath}) so repoPath is passed as an argument, then
update podExec usage to use that cmd (and workspaceContainerName/wsName
references) and implement lifecycle/health management: before starting
check/terminate any existing repo server (e.g. stored PID file or probe the
port), start the server in a way you can monitor (write its PID to a known file
inside repoPath), and perform a quick health-check (connect to localhost:port)
before returning so repeated builds don't return stale/conflicting URLs.
cmd/caib/workspace/workspace.go (2)

603-613: ⚠️ Potential issue | 🟡 Minor

Keep the exec command as argv instead of flattening it.

Joining cmdParts back into one string loses quoting and argument boundaries, so commands with spaces or shell metacharacters do not round-trip reliably. Send argv verbatim in WorkspaceExecRequest instead of rebuilding a shell string here.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cmd/caib/workspace/workspace.go` around lines 603 - 613, The code flattens
cmdParts into a single string (variable command) which loses argument quoting;
instead stop using strings.Join and pass the argv slice verbatim into the
request: remove creation of the string variable command and set the
buildapitypes.WorkspaceExecRequest to use the original cmdParts slice (e.g.,
WorkspaceExecRequest.Command = cmdParts or the appropriate slice field name) so
exec receives argv unchanged; update any downstream usage that expected a string
accordingly.

502-508: ⚠️ Potential issue | 🟠 Major

Create a fresh upload reader for each re-auth attempt.

ExecuteWithReauth can rerun the callback, but pr is built once from buf. Any retry after the first read uploads a truncated or empty tar stream.

♻️ Suggested change
 	totalBytes := int64(buf.Len())
-	pr := newProgressReader(&buf, totalBytes)
+	archive := append([]byte(nil), buf.Bytes()...)
 
 	err := caibcommon.ExecuteWithReauth(serverURL, &authToken, insecureSkipTLS, func(client *buildapiclient.Client) error {
-		return client.SyncWorkspace(context.Background(), name, pr)
+		pr := newProgressReader(bytes.NewReader(archive), totalBytes)
+		err := client.SyncWorkspace(context.Background(), name, pr)
+		pr.finish()
+		return err
 	})
-	pr.finish()
 	if err != nil {
 		handleError(fmt.Errorf("failed to sync workspace: %w", err))
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cmd/caib/workspace/workspace.go` around lines 502 - 508, The progress reader
`pr` is created once from `buf` but `ExecuteWithReauth` may retry the callback,
so subsequent attempts read an already-consumed stream; fix by constructing a
fresh progress reader for each attempt inside the callback passed to
`ExecuteWithReauth` (use the original buffer bytes and `totalBytes` to create a
new reader via `newProgressReader` on each call) and ensure `pr.finish()` is
invoked for that attempt (e.g., defer `pr.finish()` inside the callback). Update
the callback that calls `client.SyncWorkspace` to create and finish its own `pr`
rather than using the `pr` created once outside.
🧹 Nitpick comments (2)
Containerfile.toolchain (2)

103-103: Consider tightening workspace directory permissions.

Line 103 sets chmod 777, granting world-writable permissions. While this accommodates OpenShift's arbitrary UIDs, it allows any process in the container to modify workspace content. If the workspace Pod's security context sets a predictable fsGroup, you can use 775 or 770 and rely on group ownership instead.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Containerfile.toolchain` at line 103, The RUN command using "chmod 777
/workspace /workspace/src /workspace/cache" makes the workspace world-writable;
change the RUN step that creates the directories (the line with "RUN mkdir -p
/workspace/src /workspace/cache && chmod 777 ...") to tighten permissions (e.g.,
use 775 or 770) and, if relying on a Pod fsGroup, set appropriate group
ownership with chown so group access works for arbitrary UIDs; ensure the
mkdir/chmod/chown sequence preserves OpenShift compatibility while removing
global write permissions.

57-57: Clean DNF cache after EPEL installation.

The EPEL installation leaves package metadata cached, increasing the image size unnecessarily.

🧹 Proposed fix to reduce image size
-RUN dnf install -y https://dl.fedoraproject.org/pub/epel/epel-release-latest-10.noarch.rpm
+RUN dnf install -y https://dl.fedoraproject.org/pub/epel/epel-release-latest-10.noarch.rpm && dnf clean all
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Containerfile.toolchain` at line 57, After installing EPEL in the RUN that
currently executes "dnf install -y
https://dl.fedoraproject.org/pub/epel/epel-release-latest-10.noarch.rpm", chain
DNF cache cleanup to avoid leaving package metadata in the image; update the RUN
that contains this command to run dnf clean all (and remove /var/cache/dnf)
immediately after the install so the image doesn’t keep cached package metadata.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@Containerfile.toolchain`:
- Line 10: Update the GPG key ARG by replacing the incorrect value used in the
Containerfile: change the build arg named ARG GPG_KEY (currently
"RPM-GPG-KEY-centosofficial") to the Automotive SIG key
"RPM-GPG-KEY-CentOS-SIG-Automotive" so the centos-release-autosd/AutoSD
repository uses the correct key.

In `@internal/buildapi/client/client_test.go`:
- Around line 233-249: The test is checking URL escaping using the decoded
r.URL.Path; update the assertion to inspect the raw/escaped path instead by
replacing r.URL.Path with r.URL.EscapedPath() (or r.RequestURI) inside the
handler in the Context("workspaceAction with URL-unsafe names") test so the
StartWorkspace client call (created by New) is validated against the
percent-encoded path "/v1/workspaces/my%20app/start" rather than the decoded
"/v1/workspaces/my app/start".

In `@internal/buildapi/client/client.go`:
- Around line 923-926: The current code constructs a zero-value websocket.Dialer
(dialer := websocket.Dialer{}) which drops Gorilla's default Proxy and
HandshakeTimeout; instead clone websocket.DefaultDialer first and then overlay
TLS and proxy settings from the HTTP transport: get t :=
c.httpClient.Transport.(*http.Transport) (if ok), copy dialer :=
*websocket.DefaultDialer, set dialer.TLSClientConfig = t.TLSClientConfig if
non-nil and ensure dialer.Proxy is set from t.Proxy (or
http.ProxyFromEnvironment) so proxy resolution and HandshakeTimeout are
preserved for shell sessions.
- Around line 738-915: The code double-escapes workspace names by using
url.PathEscape + path.Join before calling c.resolve; replace those constructions
with url.JoinPath (or URL.JoinPath) to compose the path segments without
pre-escaping so the workspace name is encoded exactly once. Update endpoint
construction in GetWorkspace, StartWorkspace, StopWorkspace, workspaceAction,
DeleteWorkspace, SyncPlan, SyncWorkspace, workspaceStreamRequest, and
ShellWorkspace to call url.JoinPath("/v1/workspaces", name, ...) (or the
appropriate variants) and pass that result to c.resolve, removing manual
url.PathEscape usage.

In `@internal/buildapi/server.go`:
- Around line 1624-1680: The current list-then-GenerateName flow
(buildCacheLabels, pvc with GenerateName, k8sClient.Create) is TOCTOU; replace
it with a deterministic PVC name (e.g. wsName+"-build-cache") and create-or-get
semantics: construct the PVC Spec but set a fixed ObjectMeta.Name instead of
GenerateName, attempt k8sClient.Create(ctx, pvc), and if Create returns
AlreadyExists call k8sClient.Get to retrieve and return the existing PVC name
(or otherwise reconcile/update the existing PVC’s spec/storageClass/size if
needed); ensure you still set the same OwnerReferences and labels so lookups by
workspace remain deterministic.

---

Duplicate comments:
In `@cmd/caib/workspace/workspace.go`:
- Around line 603-613: The code flattens cmdParts into a single string (variable
command) which loses argument quoting; instead stop using strings.Join and pass
the argv slice verbatim into the request: remove creation of the string variable
command and set the buildapitypes.WorkspaceExecRequest to use the original
cmdParts slice (e.g., WorkspaceExecRequest.Command = cmdParts or the appropriate
slice field name) so exec receives argv unchanged; update any downstream usage
that expected a string accordingly.
- Around line 502-508: The progress reader `pr` is created once from `buf` but
`ExecuteWithReauth` may retry the callback, so subsequent attempts read an
already-consumed stream; fix by constructing a fresh progress reader for each
attempt inside the callback passed to `ExecuteWithReauth` (use the original
buffer bytes and `totalBytes` to create a new reader via `newProgressReader` on
each call) and ensure `pr.finish()` is invoked for that attempt (e.g., defer
`pr.finish()` inside the callback). Update the callback that calls
`client.SyncWorkspace` to create and finish its own `pr` rather than using the
`pr` created once outside.

In `@internal/buildapi/server.go`:
- Around line 1642-1674: The code currently uses resource.MustParse(cacheSize)
when building the PersistentVolumeClaim (pvc) which will panic on invalid
operator-configured Workspaces.BuildCacheSize; replace this with
resource.ParseQuantity(cacheSize) (or resource.ParseQuantity on
wsConfig.BuildCacheSize) and handle the returned (quantity, error) before
constructing the PersistentVolumeClaimSpec so you can return a validation error
(or reconcile error) instead of panicking; update the pvc creation to use the
parsed Quantity variable and ensure error handling returns a clear error (or
logs and requeues) referencing BuildCacheSize, pvc/PersistentVolumeClaimSpec,
and resource.MustParse to locate the change.
- Around line 1723-1739: Resolve the requestor identity before expanding
ExtraRepos: move or add the requester resolution step (the code that sets
req.RequestedBy / calls whatever resolves the requester from the request
context) to run before the block that calls resolveExtraRepos, or change
resolveExtraRepos(signature) to accept the resolved requester and use it to
enforce ws.Spec.Owner; then call getClientFromRequest/getRESTConfigFromRequest
and a.resolveExtraRepos only after req.RequestedBy is set so extra-repo
expansion cannot access another user's workspace. Ensure you reference
resolveExtraRepos and req.ExtraRepos when making the change and enforce the
owner check inside resolveExtraRepos if you add the requester parameter.
- Around line 1534-1540: The current code constructs a shell command with
repoPath interpolated into "/bin/sh -c" (cmd variable) which is
command-injection prone and leaves a fire-and-forget process unmanaged; fix by
invoking the HTTP server without a shell (e.g. call python3 via exec form: cmd
:= []string{"python3","-m","http.server", fmt.Sprintf("%d", port),
"--directory", repoPath}) so repoPath is passed as an argument, then update
podExec usage to use that cmd (and workspaceContainerName/wsName references) and
implement lifecycle/health management: before starting check/terminate any
existing repo server (e.g. stored PID file or probe the port), start the server
in a way you can monitor (write its PID to a known file inside repoPath), and
perform a quick health-check (connect to localhost:port) before returning so
repeated builds don't return stale/conflicting URLs.

In `@internal/common/tasks/scripts/build_image.sh`:
- Around line 59-69: The cleanup currently preserves "$BUILD_DIR/image.json"
which can lead to reusing stale manifests; update the cleanup in build_image.sh
to explicitly remove the persistent manifest before reusing the cache by
deleting "$BUILD_DIR/image.json" (or any manifest file) prior to the reuse logic
so disk-mode runs using COMMON_BUILD_ARGS don't republish old metadata—ensure
the removal happens alongside the existing workspace and orphan cleanup that
references WORKSPACE_PATH and BUILD_DIR.

In `@internal/controller/operatorconfig/controller.go`:
- Around line 894-898: The deletion of the OpenShift-only
SecurityContextConstraints (securityv1.SecurityContextConstraints with name
workspaceSCCName via r.Delete called from cleanupOSBuilds()) must not surface a
"NoMatch" error on vanilla Kubernetes; either gate the SCC delete behind
detectOpenShift() or treat isNoMatchError(err) as non-fatal (similar to the
Route cleanup path) so cleanupOSBuilds() ignores NoMatch and only returns real
errors.

In `@internal/controller/operatorconfig/resources.go`:
- Around line 645-660: The RBAC rule for the root "workspaces" resource grants
unnecessary write permissions; remove the extra write verbs from the APIGroups
entry that lists Resources: []string{"workspaces"} by deleting "create" and
"delete" from its Verbs slice so that the root rule only has read/list/watch
verbs (e.g., "get","list","watch"); leave the separate "workspaces/status" and
"workspaces/finalizers" rules untouched (they cover controller write
operations).

---

Nitpick comments:
In `@Containerfile.toolchain`:
- Line 103: The RUN command using "chmod 777 /workspace /workspace/src
/workspace/cache" makes the workspace world-writable; change the RUN step that
creates the directories (the line with "RUN mkdir -p /workspace/src
/workspace/cache && chmod 777 ...") to tighten permissions (e.g., use 775 or
770) and, if relying on a Pod fsGroup, set appropriate group ownership with
chown so group access works for arbitrary UIDs; ensure the mkdir/chmod/chown
sequence preserves OpenShift compatibility while removing global write
permissions.
- Line 57: After installing EPEL in the RUN that currently executes "dnf install
-y https://dl.fedoraproject.org/pub/epel/epel-release-latest-10.noarch.rpm",
chain DNF cache cleanup to avoid leaving package metadata in the image; update
the RUN that contains this command to run dnf clean all (and remove
/var/cache/dnf) immediately after the install so the image doesn’t keep cached
package metadata.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: ebcd16a8-e975-4b57-99be-3273409dae00

📥 Commits

Reviewing files that changed from the base of the PR and between 3b94a37 and 143d549.

⛔ Files ignored due to path filters (4)
  • config/crd/bases/automotive.sdv.cloud.redhat.com_imagebuilds.yaml is excluded by !config/crd/bases/**
  • config/crd/bases/automotive.sdv.cloud.redhat.com_operatorconfigs.yaml is excluded by !config/crd/bases/**
  • config/crd/bases/automotive.sdv.cloud.redhat.com_workspaces.yaml is excluded by !config/crd/bases/**
  • config/rbac/role.yaml is excluded by !config/rbac/role.yaml
📒 Files selected for processing (32)
  • Containerfile.toolchain
  • api/v1alpha1/imagebuild_types.go
  • api/v1alpha1/operatorconfig_types.go
  • api/v1alpha1/workspace_types.go
  • api/v1alpha1/zz_generated.deepcopy.go
  • bundle.Dockerfile
  • cmd/caib/buildcmd/build.go
  • cmd/caib/image/image.go
  • cmd/caib/main.go
  • cmd/caib/root.go
  • cmd/caib/runtime_wiring.go
  • cmd/caib/workspace/workspace.go
  • cmd/main.go
  • config/crd/kustomization.yaml
  • config/rbac/scc_role.yaml
  • config/rbac/scc_role_binding.yaml
  • hack/deploy-catalog.sh
  • internal/buildapi/client/client.go
  • internal/buildapi/client/client_test.go
  • internal/buildapi/server.go
  • internal/buildapi/types.go
  • internal/buildapi/workspace.go
  • internal/buildapi/workspace_test.go
  • internal/common/tasks/scripts/build_image.sh
  • internal/common/tasks/scripts/find_manifest.sh
  • internal/common/tasks/scripts/push_artifact.sh
  • internal/common/tasks/tasks.go
  • internal/controller/imagebuild/controller.go
  • internal/controller/operatorconfig/controller.go
  • internal/controller/operatorconfig/resources.go
  • internal/controller/workspace/controller.go
  • internal/controller/workspace/controller_test.go
✅ Files skipped from review due to trivial changes (6)
  • bundle.Dockerfile
  • cmd/caib/main.go
  • config/crd/kustomization.yaml
  • config/rbac/scc_role.yaml
  • internal/buildapi/workspace_test.go
  • api/v1alpha1/zz_generated.deepcopy.go
🚧 Files skipped from review as they are similar to previous changes (12)
  • config/rbac/scc_role_binding.yaml
  • cmd/main.go
  • internal/common/tasks/scripts/find_manifest.sh
  • api/v1alpha1/imagebuild_types.go
  • cmd/caib/buildcmd/build.go
  • internal/common/tasks/tasks.go
  • internal/common/tasks/scripts/push_artifact.sh
  • api/v1alpha1/workspace_types.go
  • api/v1alpha1/operatorconfig_types.go
  • internal/controller/imagebuild/controller.go
  • internal/controller/workspace/controller.go
  • internal/buildapi/workspace.go

Comment thread Containerfile.toolchain Outdated
Comment on lines +233 to +249
Context("workspaceAction with URL-unsafe names", func() {
It("should properly escape workspace names in the URL path", func() {
mockServer = httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
// The name "my app" should be escaped to "my%20app"
Expect(r.URL.Path).To(Equal("/v1/workspaces/my%20app/start"))

w.Header().Set("Content-Type", "application/json")
_ = json.NewEncoder(w).Encode(buildapi.WorkspaceResponse{Name: "my app"})
}))

var err error
apiClient, err = New(mockServer.URL)
Expect(err).NotTo(HaveOccurred())

resp, err := apiClient.StartWorkspace(context.Background(), "my app")
Expect(err).NotTo(HaveOccurred())
Expect(resp.Name).To(Equal("my app"))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🌐 Web query:

In Go net/http server handlers, is Request.URL.Path decoded while RequestURI or URL.EscapedPath() preserves the original percent-encoding?

💡 Result:

Yes, in Go net/http server handlers, r.Request.URL.Path is percent-decoded, while r.RequestURI preserves the original percent-encoded request-target from the HTTP request line, and r.Request.URL.EscapedPath preserves (or reconstructs a valid escaping of) the original percent-encoding.

Citations:


Use r.URL.EscapedPath() to verify URL escaping in the test.

The test currently uses r.URL.Path, which Go's net/http decodes before the handler sees it. This means the test would fail for properly-escaped URLs but incorrectly pass for double-escaped ones. Use r.URL.EscapedPath() or r.RequestURI instead to capture the original percent-encoding and properly validate that the client is escaping correctly.

Suggested fix
-				Expect(r.URL.Path).To(Equal("/v1/workspaces/my%20app/start"))
+				Expect(r.URL.EscapedPath()).To(Equal("/v1/workspaces/my%20app/start"))
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/buildapi/client/client_test.go` around lines 233 - 249, The test is
checking URL escaping using the decoded r.URL.Path; update the assertion to
inspect the raw/escaped path instead by replacing r.URL.Path with
r.URL.EscapedPath() (or r.RequestURI) inside the handler in the
Context("workspaceAction with URL-unsafe names") test so the StartWorkspace
client call (created by New) is validated against the percent-encoded path
"/v1/workspaces/my%20app/start" rather than the decoded "/v1/workspaces/my
app/start".

Comment on lines +738 to +915
endpoint := c.resolve(path.Join("/v1/workspaces", url.PathEscape(name)))
req, err := http.NewRequestWithContext(ctx, http.MethodGet, endpoint, nil)
if err != nil {
return nil, err
}
if c.authToken != "" {
req.Header.Set("Authorization", "Bearer "+c.authToken)
}
resp, err := c.httpClient.Do(req)
if err != nil {
return nil, err
}
defer func() { _ = resp.Body.Close() }()
if resp.StatusCode != http.StatusOK {
b, _ := io.ReadAll(io.LimitReader(resp.Body, 1024))
return nil, fmt.Errorf("get workspace failed: %s: %s", resp.Status, string(b))
}
var ws buildapi.WorkspaceResponse
if err := json.NewDecoder(resp.Body).Decode(&ws); err != nil {
return nil, fmt.Errorf("decoding response: %w", err)
}
return &ws, nil
}

// StartWorkspace starts a stopped workspace.
func (c *Client) StartWorkspace(ctx context.Context, name string) (*buildapi.WorkspaceResponse, error) {
return c.workspaceAction(ctx, name, "start")
}

// StopWorkspace stops a running workspace, preserving its storage.
func (c *Client) StopWorkspace(ctx context.Context, name string) (*buildapi.WorkspaceResponse, error) {
return c.workspaceAction(ctx, name, "stop")
}

// workspaceAction performs a POST to /v1/workspaces/:name/:action and decodes a WorkspaceResponse.
func (c *Client) workspaceAction(ctx context.Context, name, action string) (*buildapi.WorkspaceResponse, error) {
endpoint := c.resolve(path.Join("/v1/workspaces", url.PathEscape(name), action))
req, err := http.NewRequestWithContext(ctx, http.MethodPost, endpoint, nil)
if err != nil {
return nil, err
}
if c.authToken != "" {
req.Header.Set("Authorization", "Bearer "+c.authToken)
}
resp, err := c.httpClient.Do(req)
if err != nil {
return nil, err
}
defer func() { _ = resp.Body.Close() }()
if resp.StatusCode != http.StatusOK {
b, _ := io.ReadAll(io.LimitReader(resp.Body, 1024))
return nil, fmt.Errorf("%s workspace failed: %s: %s", action, resp.Status, string(b))
}
var ws buildapi.WorkspaceResponse
if err := json.NewDecoder(resp.Body).Decode(&ws); err != nil {
return nil, fmt.Errorf("decoding response: %w", err)
}
return &ws, nil
}

// DeleteWorkspace deletes a workspace by name.
func (c *Client) DeleteWorkspace(ctx context.Context, name string) error {
endpoint := c.resolve(path.Join("/v1/workspaces", url.PathEscape(name)))
req, err := http.NewRequestWithContext(ctx, http.MethodDelete, endpoint, nil)
if err != nil {
return err
}
if c.authToken != "" {
req.Header.Set("Authorization", "Bearer "+c.authToken)
}
resp, err := c.httpClient.Do(req)
if err != nil {
return err
}
defer func() { _ = resp.Body.Close() }()
if resp.StatusCode != http.StatusOK {
b, _ := io.ReadAll(io.LimitReader(resp.Body, 1024))
return fmt.Errorf("delete workspace failed: %s: %s", resp.Status, string(b))
}
return nil
}

// SyncPlan sends a file manifest and returns which files need uploading.
func (c *Client) SyncPlan(ctx context.Context, name string, req buildapi.SyncPlanRequest) (*buildapi.SyncPlanResponse, error) {
body, err := json.Marshal(req)
if err != nil {
return nil, err
}
endpoint := c.resolve(path.Join("/v1/workspaces", url.PathEscape(name), "sync", "plan"))
httpReq, err := http.NewRequestWithContext(ctx, http.MethodPost, endpoint, bytes.NewReader(body))
if err != nil {
return nil, err
}
httpReq.Header.Set("Content-Type", "application/json")
if c.authToken != "" {
httpReq.Header.Set("Authorization", "Bearer "+c.authToken)
}
resp, err := c.httpClient.Do(httpReq)
if err != nil {
return nil, err
}
defer func() { _ = resp.Body.Close() }()
if resp.StatusCode != http.StatusOK {
b, _ := io.ReadAll(io.LimitReader(resp.Body, 1024))
return nil, fmt.Errorf("sync plan failed: %s: %s", resp.Status, string(b))
}
var plan buildapi.SyncPlanResponse
if err := json.NewDecoder(resp.Body).Decode(&plan); err != nil {
return nil, fmt.Errorf("decoding response: %w", err)
}
return &plan, nil
}

// SyncWorkspace uploads a tar stream to a workspace.
func (c *Client) SyncWorkspace(ctx context.Context, name string, body io.Reader) error {
endpoint := c.resolve(path.Join("/v1/workspaces", url.PathEscape(name), "sync"))
req, err := http.NewRequestWithContext(ctx, http.MethodPost, endpoint, body)
if err != nil {
return err
}
req.Header.Set("Content-Type", "application/octet-stream")
if c.authToken != "" {
req.Header.Set("Authorization", "Bearer "+c.authToken)
}
resp, err := c.httpClient.Do(req)
if err != nil {
return err
}
defer func() { _ = resp.Body.Close() }()
if resp.StatusCode != http.StatusOK {
b, _ := io.ReadAll(io.LimitReader(resp.Body, 1024))
return fmt.Errorf("sync workspace failed: %s: %s", resp.Status, string(b))
}
return nil
}

// workspaceStreamRequest performs a POST with a JSON body and returns the streaming response body.
func (c *Client) workspaceStreamRequest(ctx context.Context, name, action string, reqBody interface{}) (io.ReadCloser, error) {
body, err := json.Marshal(reqBody)
if err != nil {
return nil, err
}
endpoint := c.resolve(path.Join("/v1/workspaces", url.PathEscape(name), action))
httpReq, err := http.NewRequestWithContext(ctx, http.MethodPost, endpoint, bytes.NewReader(body))
if err != nil {
return nil, err
}
httpReq.Header.Set("Content-Type", "application/json")
if c.authToken != "" {
httpReq.Header.Set("Authorization", "Bearer "+c.authToken)
}
resp, err := c.httpClient.Do(httpReq)
if err != nil {
return nil, err
}
if resp.StatusCode != http.StatusOK {
b, _ := io.ReadAll(io.LimitReader(resp.Body, 1024))
_ = resp.Body.Close()
return nil, fmt.Errorf("%s workspace failed: %s: %s", action, resp.Status, string(b))
}
return resp.Body, nil
}

// ExecWorkspace executes a command in a workspace and returns the streaming output.
func (c *Client) ExecWorkspace(ctx context.Context, name string, req buildapi.WorkspaceExecRequest) (io.ReadCloser, error) {
return c.workspaceStreamRequest(ctx, name, "exec", req)
}

// DeployWorkspace deploys an artifact from a workspace to a board and returns the streaming output.
func (c *Client) DeployWorkspace(ctx context.Context, name string, req buildapi.WorkspaceDeployRequest) (io.ReadCloser, error) {
return c.workspaceStreamRequest(ctx, name, "deploy", req)
}

// ShellWorkspace opens an interactive shell via WebSocket.
// Returns the raw WebSocket connection for the caller to read/write.
func (c *Client) ShellWorkspace(ctx context.Context, name string) (*websocket.Conn, error) {
endpoint := c.resolve(path.Join("/v1/workspaces", url.PathEscape(name), "shell"))
// Convert http(s) to ws(s)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

In Go net/url, if url.URL.Path already contains a percent-escaped segment like "my%20app" and RawPath is empty, does URL.String() emit "my%2520app"? Is url.JoinPath the correct way to assemble escaped path segments?

💡 Result:

Yes, if u.Path contains a percent-encoded segment like "my%20app" (which decodes to "my app") and u.RawPath is empty, then u.String emits "my%2520app". This is because Path is always stored decoded, String calls EscapedPath, and EscapedPath re-escapes the decoded Path using the path segment rules (encodePath), which escapes space as %20, but since Path has the literal '%' followed by non-hex '2' and '0', the '%' is escaped to %25, resulting in %2520. To preserve an already-escaped path like "/my%20app" without double-escaping, set both u.Path = url.PathUnescape("/my%20app") ("/my app") and u.RawPath = "/my%20app". Then EscapedPath validates RawPath as matching the escaped form of Path and uses it directly, avoiding re-escaping. For assembling escaped path segments correctly (treating each as a single segment), url.JoinPath (Go 1.19+) is the correct way: it properly escapes each elem with PathEscape and joins them, handling the segment boundaries properly. Manual concatenation into Path leads to incorrect splitting/escaping as shown in issues.

Citations:


🏁 Script executed:

cd /tmp && find . -name "client.go" -path "*/buildapi/client/*" 2>/dev/null | head -5

Repository: centos-automotive-suite/automotive-dev-operator

Length of output: 73


🏁 Script executed:

rg -A 10 "func \(c \*Client\) resolve" --type go

Repository: centos-automotive-suite/automotive-dev-operator

Length of output: 722


🏁 Script executed:

rg -B 5 -A 20 "type Client struct" internal/buildapi/client/client.go

Repository: centos-automotive-suite/automotive-dev-operator

Length of output: 867


🏁 Script executed:

rg -C 5 "url.PathEscape" internal/buildapi/client/client.go | head -100

Repository: centos-automotive-suite/automotive-dev-operator

Length of output: 3629


🏁 Script executed:

cat -n internal/buildapi/client/client.go | sed -n '1,150p' | tail -50

Repository: centos-automotive-suite/automotive-dev-operator

Length of output: 2127


🏁 Script executed:

# Check if there are tests for the client to understand the expected behavior
fd -e "test.go" -e "_test.go" internal/buildapi/client/

Repository: centos-automotive-suite/automotive-dev-operator

Length of output: 496


🏁 Script executed:

rg -B 3 -A 3 "New(" internal/buildapi/client/client.go | head -30

Repository: centos-automotive-suite/automotive-dev-operator

Length of output: 149


🏁 Script executed:

# Check how the Client is instantiated elsewhere to understand the baseURL format
rg "client\.New\(" --type go -A 2 -B 2 | head -50

Repository: centos-automotive-suite/automotive-dev-operator

Length of output: 3927


🏁 Script executed:

# Also check the resolve() implementation one more time with more context
rg -B 2 -A 15 "func \(c \*Client\) resolve" internal/buildapi/client/client.go

Repository: centos-automotive-suite/automotive-dev-operator

Length of output: 646


Fix double-escaping of workspace names in URL path segments.

These call sites build a percent-escaped segment and then pass it to resolve(), which stores it in url.URL.Path. When URL.String() is called, the percent sign in the literal string my%20app gets escaped again, producing .../my%2520app/... on the wire. Since this PR adds URL-unsafe-name coverage, switch to url.JoinPath() for all affected workspace endpoints to avoid pre-escaping:

Suggested pattern
-	endpoint := c.resolve(path.Join("/v1/workspaces", url.PathEscape(name)))
+	endpoint, err := url.JoinPath(c.baseURL.String(), "/v1/workspaces", name)
+	if err != nil {
+		return nil, err
+	}

Apply to all workspace methods that combine PathEscape with resolve(): GetWorkspace, StartWorkspace, StopWorkspace, workspaceAction, DeleteWorkspace, SyncPlan, SyncWorkspace, workspaceStreamRequest, and ShellWorkspace.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/buildapi/client/client.go` around lines 738 - 915, The code
double-escapes workspace names by using url.PathEscape + path.Join before
calling c.resolve; replace those constructions with url.JoinPath (or
URL.JoinPath) to compose the path segments without pre-escaping so the workspace
name is encoded exactly once. Update endpoint construction in GetWorkspace,
StartWorkspace, StopWorkspace, workspaceAction, DeleteWorkspace, SyncPlan,
SyncWorkspace, workspaceStreamRequest, and ShellWorkspace to call
url.JoinPath("/v1/workspaces", name, ...) (or the appropriate variants) and pass
that result to c.resolve, removing manual url.PathEscape usage.

Comment on lines +923 to +926
dialer := websocket.Dialer{}
if t, ok := c.httpClient.Transport.(*http.Transport); ok && t.TLSClientConfig != nil {
dialer.TLSClientConfig = t.TLSClientConfig
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

What defaults does github.com/gorilla/websocket.DefaultDialer set compared with a zero-value websocket.Dialer? Does the zero value omit ProxyFromEnvironment and the default handshake timeout?

💡 Result:

The github.com/gorilla/websocket.DefaultDialer sets Proxy to http.ProxyFromEnvironment and HandshakeTimeout to 45 * time.Second. All other fields use their zero values. A zero-value websocket.Dialer has nil for all pointer/func fields (NetDial, NetDialContext, Proxy, etc. are nil; TLSClientConfig is nil), zero for time.Duration (HandshakeTimeout is 0), empty slices/arrays, and zero for ints. Yes, the zero-value omits ProxyFromEnvironment (Proxy is nil) and has no handshake timeout (HandshakeTimeout is 0). DefaultDialer explicitly sets these two fields to non-zero values while leaving others at zero.

Citations:


🏁 Script executed:

# First, let's verify the exact code in the file at the specified lines
cat -n internal/buildapi/client/client.go | sed -n '920,930p'

Repository: centos-automotive-suite/automotive-dev-operator

Length of output: 486


🏁 Script executed:

# Also check the surrounding context to understand what this code does
cat -n internal/buildapi/client/client.go | sed -n '910,950p'

Repository: centos-automotive-suite/automotive-dev-operator

Length of output: 1351


🏁 Script executed:

# Let's also check if websocket is imported and how
head -50 internal/buildapi/client/client.go | grep -E "import|websocket"

Repository: centos-automotive-suite/automotive-dev-operator

Length of output: 131


Use websocket.DefaultDialer to preserve proxy resolution and handshake timeout for shell sessions.

A zero-value websocket.Dialer drops Gorilla's default Proxy (set to http.ProxyFromEnvironment) and HandshakeTimeout (set to 45 seconds), causing shell connections to hang when proxies are configured and lose reliable timeout protection. Clone websocket.DefaultDialer first, then overlay the HTTP transport's TLS and proxy settings.

Suggested fix
-	dialer := websocket.Dialer{}
-	if t, ok := c.httpClient.Transport.(*http.Transport); ok && t.TLSClientConfig != nil {
-		dialer.TLSClientConfig = t.TLSClientConfig
-	}
+	dialer := *websocket.DefaultDialer
+	if t, ok := c.httpClient.Transport.(*http.Transport); ok {
+		dialer.TLSClientConfig = t.TLSClientConfig
+		if t.Proxy != nil {
+			dialer.Proxy = t.Proxy
+		}
+	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/buildapi/client/client.go` around lines 923 - 926, The current code
constructs a zero-value websocket.Dialer (dialer := websocket.Dialer{}) which
drops Gorilla's default Proxy and HandshakeTimeout; instead clone
websocket.DefaultDialer first and then overlay TLS and proxy settings from the
HTTP transport: get t := c.httpClient.Transport.(*http.Transport) (if ok), copy
dialer := *websocket.DefaultDialer, set dialer.TLSClientConfig =
t.TLSClientConfig if non-nil and ensure dialer.Proxy is set from t.Proxy (or
http.ProxyFromEnvironment) so proxy resolution and HandshakeTimeout are
preserved for shell sessions.

Comment on lines +1624 to +1680
// Find existing build-cache PVC via labels, or create a new one
buildCacheLabels := map[string]string{
"automotive.sdv.cloud.redhat.com/workspace": wsName,
"app.kubernetes.io/component": "build-cache",
}
pvcList := &corev1.PersistentVolumeClaimList{}
if err := k8sClient.List(ctx, pvcList,
client.InNamespace(namespace),
client.MatchingLabels(buildCacheLabels),
); err != nil {
return "", fmt.Errorf("listing build-cache PVCs: %w", err)
}
for i := range pvcList.Items {
if pvcList.Items[i].DeletionTimestamp == nil {
return pvcList.Items[i].Name, nil
}
}

// Determine cache PVC size and storage class from OperatorConfig
cacheSize := "20Gi"
var storageClassName *string
if wsConfig != nil {
if wsConfig.BuildCacheSize != "" {
cacheSize = wsConfig.BuildCacheSize
}
if sc := wsConfig.GetStorageClass(); sc != "" {
storageClassName = &sc
}
}

pvc := &corev1.PersistentVolumeClaim{
ObjectMeta: metav1.ObjectMeta{
GenerateName: wsName + "-build-cache-",
Namespace: namespace,
Labels: buildCacheLabels,
OwnerReferences: []metav1.OwnerReference{
{
APIVersion: automotivev1alpha1.GroupVersion.String(),
Kind: "Workspace",
Name: ws.Name,
UID: ws.UID,
},
},
},
Spec: corev1.PersistentVolumeClaimSpec{
AccessModes: []corev1.PersistentVolumeAccessMode{corev1.ReadWriteOnce},
StorageClassName: storageClassName,
Resources: corev1.VolumeResourceRequirements{
Requests: corev1.ResourceList{
corev1.ResourceStorage: resource.MustParse(cacheSize),
},
},
},
}
if err := k8sClient.Create(ctx, pvc); err != nil {
return "", fmt.Errorf("creating build-cache PVC: %w", err)
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Use one deterministic build-cache PVC per workspace.

The list-then-GenerateName flow is TOCTOU: two concurrent builds for the same workspace can both miss the existing claim and create separate caches. That fragments warm-cache state and leaves extra PVCs behind.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/buildapi/server.go` around lines 1624 - 1680, The current
list-then-GenerateName flow (buildCacheLabels, pvc with GenerateName,
k8sClient.Create) is TOCTOU; replace it with a deterministic PVC name (e.g.
wsName+"-build-cache") and create-or-get semantics: construct the PVC Spec but
set a fixed ObjectMeta.Name instead of GenerateName, attempt
k8sClient.Create(ctx, pvc), and if Create returns AlreadyExists call
k8sClient.Get to retrieve and return the existing PVC name (or otherwise
reconcile/update the existing PVC’s spec/storageClass/size if needed); ensure
you still set the same OwnerReferences and labels so lookups by workspace remain
deterministic.

Comment thread internal/buildapi/server.go
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (3)
internal/buildapi/client/client_test.go (1)

235-237: ⚠️ Potential issue | 🟡 Minor

Use r.URL.EscapedPath() to verify URL escaping.

r.URL.Path is decoded by net/http, so the assertion will see /v1/workspaces/my app/start rather than the escaped form. This means the test passes regardless of whether the client actually escapes the URL correctly.

,

Suggested fix
-				Expect(r.URL.Path).To(Equal("/v1/workspaces/my%20app/start"))
+				Expect(r.URL.EscapedPath()).To(Equal("/v1/workspaces/my%20app/start"))
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/buildapi/client/client_test.go` around lines 235 - 237, The test's
mock server handler is asserting on r.URL.Path which is decoded by net/http;
change the assertion to use r.URL.EscapedPath() so the test verifies the client
is sending the escaped path (update the Expect call inside the
httptest.NewServer handler where r is inspected, e.g., the handler that
currently checks "/v1/workspaces/my%20app/start"). Ensure any other path
assertions in the same test use EscapedPath() as well.
internal/controller/imagebuild/controller.go (1)

765-772: ⚠️ Potential issue | 🟠 Major

BuildCachePVC still replaces entire shared-workspace — past concern unaddressed.

The logic binds BuildCachePVC directly as shared-workspace, which overwrites any per-build upload content (from InputFilesServer). If both BuildCachePVC and InputFilesServer are set, uploaded source_path files will not be visible to the build. Consider either:

  1. Rejecting the combination up-front with a validation error, or
  2. Mounting BuildCachePVC as a separate workspace (e.g., build-cache) while keeping the per-build shared workspace intact.

,

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/controller/imagebuild/controller.go` around lines 765 - 772, The
current logic binds imageBuild.Spec.BuildCachePVC directly to the
"shared-workspace" (sharedWorkspaceBinding), which overwrites per-build uploads
from InputFilesServer; change this so that when both
imageBuild.Spec.BuildCachePVC and InputFilesServer are set you do not replace
"shared-workspace" but instead add a separate workspace binding (e.g., Name:
"build-cache") using the PVC (PersistentVolumeClaimVolumeSource with ClaimName:
imageBuild.Spec.BuildCachePVC) and leave the existing "shared-workspace" binding
for per-build uploads; update the code paths that construct
sharedWorkspaceBinding and any workspace list to append the new build-cache
binding instead of assigning it to "shared-workspace", or alternatively add
validation to reject the combination up-front if you prefer explicit failure.
internal/controller/operatorconfig/controller.go (1)

894-898: ⚠️ Potential issue | 🟠 Major

SCC cleanup may fail on non-OpenShift clusters.

cleanupWorkspaceInfra deletes the SecurityContextConstraints object unconditionally. On non-OpenShift clusters, this will return a "no matches for kind" error since the SCC API doesn't exist. Add isNoMatchError(err) to the error check to avoid blocking cleanup.

Suggested fix
 	scc := &securityv1.SecurityContextConstraints{}
 	scc.Name = workspaceSCCName
-	if err := r.Delete(ctx, scc); err != nil && !errors.IsNotFound(err) {
+	if err := r.Delete(ctx, scc); err != nil && !errors.IsNotFound(err) && !isNoMatchError(err) {
 		return fmt.Errorf("failed to delete workspace SCC: %w", err)
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/controller/operatorconfig/controller.go` around lines 894 - 898, The
SCC deletion in cleanupWorkspaceInfra unconditionally calls r.Delete on a
securityv1.SecurityContextConstraints (named by workspaceSCCName) and treats any
error other than IsNotFound as fatal; update the error check to also ignore "no
matches for kind" errors by adding isNoMatchError(err) to the conditional (i.e.,
treat both errors.IsNotFound(err) and isNoMatchError(err) as non-fatal) so
cleanup proceeds on non-OpenShift clusters.
🧹 Nitpick comments (4)
internal/buildapi/workspace_test.go (1)

63-87: Consider adding authenticated happy-path tests.

The current tests validate that routes exist and require authentication, but don't test actual endpoint behavior with valid credentials. Consider adding tests that mock the Kubernetes client to verify successful create/exec/deploy operations.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/buildapi/workspace_test.go` around lines 63 - 87, Add authenticated
happy-path tests for the Create Workspace flow: create a test in the same
Describe/Context ("Create Workspace") that builds a valid WorkspaceRequest JSON,
sets a valid auth header (matching how server expects authentication on
server.router), and calls server.router.ServeHTTP to exercise the handler
instead of only unauthenticated checks; mock the Kubernetes client used by the
workspace handlers (the mock implementation your codebase uses for k8s
interactions) to return successful responses for create/exec/deploy operations
and assert the handler returns the expected success status (e.g., HTTP 201/200)
and response body. Ensure the test references WorkspaceRequest,
server.router.ServeHTTP, and the route "/v1/workspaces" so it targets the same
handler, and injects the mocked k8s client into the server/test setup before
invoking the request.
internal/controller/operatorconfig/resources.go (1)

956-1003: Workspace SCC grants extensive privileges — verify necessity.

The SCC allows AllowPrivilegedContainer: true, AllowedCapabilities: ALL, and RunAsUserStrategyRunAsAny. Based on the context snippet showing workspace pods run with Privileged: true and RunAsUser: 0 for sudo dnf install support, this appears intentional but represents a significant security surface.

Consider documenting these requirements and exploring alternatives like UserNamespaceLevel: NamespaceLevelRequirePod (which is set) with rootless podman for future improvements.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/controller/operatorconfig/resources.go` around lines 956 - 1003, The
workspace SCC created by buildWorkspaceSCC currently grants broad privileges
(AllowPrivilegedContainer: true, AllowedCapabilities: "ALL",
RunAsUserStrategyRunAsAny) which increases risk; add an inline comment in
buildWorkspaceSCC documenting why each broad setting is required for workspace
pods (e.g., privileged true and UID 0 for sudo dnf installs), enumerate the
exact capabilities needed instead of "ALL" if possible, and add a TODO/issue
pointer to tighten RunAsUserStrategyRunAsAny to a restricted range or use
rootless alternatives later; keep UserNamespaceLevel: NamespaceLevelRequirePod
but note it as a mitigation in the comment and also add the same rationale to
the operator's user-facing docs/config so reviewers know these are intentional
choices.
internal/controller/workspace/controller.go (1)

378-388: resourcesOrDefaults uses MustParse which can panic.

While the values are hardcoded constants, using MustParse in runtime code is generally discouraged. Since these are compile-time constants, this is acceptable but could be refactored to use package-level variables initialized at startup.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/controller/workspace/controller.go` around lines 378 - 388,
resourcesOrDefaults currently calls resource.MustParse which can panic at
runtime; refactor to use package-level variables (e.g., defaultCPUQty,
defaultMemoryQty, defaultResources) initialized once at startup using
resource.ParseQuantity (handle parse errors in init() by logging/failing early)
and then return those prebuilt defaults from resourcesOrDefaults when r is nil;
update the function to reference the package-level defaultResources and keep the
existing behavior for non-nil r.
internal/controller/workspace/controller_test.go (1)

359-409: Verify buildPod handles nil operatorConfig in test.

Line 377 passes nil for operatorConfig, which exercises the fallback path in buildPod. This is good coverage. Consider adding a test case with a non-nil operatorConfig to verify the config merging logic.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/controller/workspace/controller_test.go` around lines 359 - 409,
Test currently only calls Reconciler.buildPod with a nil operatorConfig and
doesn't exercise the config-merge path; add a parallel subtest that constructs a
non-nil operatorConfig (set fields that would interact with Workspace, e.g.,
Architecture override, Image, TmpfsBuildDir or extra env/volumes) and call
r.buildPod(ws, operatorConfig) to assert merged behavior (node affinity uses the
resulting architecture, JMP_LEASE env still present, tmpfs volume
preserved/overridden as expected, image resolved correctly). Use the same
TestBuildPod_PreservesWorkspaceConfig test to add this case (or a new
TestBuildPod_WithOperatorConfig) and reference Reconciler.buildPod and
operatorConfig merge logic when adding assertions. Ensure the new assertions
verify both preserved Workspace fields and operatorConfig overrides.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@Containerfile.toolchain`:
- Line 57: The RUN that installs EPEL in Containerfile.toolchain currently
leaves cached metadata in the image; update the RUN that contains "dnf install
-y https://dl.fedoraproject.org/pub/epel/epel-release-latest-10.noarch.rpm" to
run dnf clean all as part of the same RUN (e.g., append && dnf clean all) so the
cache is removed in the same layer and image size is reduced.

In `@internal/controller/workspace/controller.go`:
- Around line 306-310: Add a concise explanatory comment immediately above the
SecurityContext block (the struct setting Privileged: ptr.To(true),
AllowPrivilegeEscalation: ptr.To(true), ProcMount:
ptr.To(corev1.UnmaskedProcMount)) describing why privileged mode is required:
enable podman-in-container with rootless user (uid 1000), allow overlayfs mounts
for persistent package installation, and permit user namespace setup; reference
the existing container command justification so readers know this mirrors that
rationale.

---

Duplicate comments:
In `@internal/buildapi/client/client_test.go`:
- Around line 235-237: The test's mock server handler is asserting on r.URL.Path
which is decoded by net/http; change the assertion to use r.URL.EscapedPath() so
the test verifies the client is sending the escaped path (update the Expect call
inside the httptest.NewServer handler where r is inspected, e.g., the handler
that currently checks "/v1/workspaces/my%20app/start"). Ensure any other path
assertions in the same test use EscapedPath() as well.

In `@internal/controller/imagebuild/controller.go`:
- Around line 765-772: The current logic binds imageBuild.Spec.BuildCachePVC
directly to the "shared-workspace" (sharedWorkspaceBinding), which overwrites
per-build uploads from InputFilesServer; change this so that when both
imageBuild.Spec.BuildCachePVC and InputFilesServer are set you do not replace
"shared-workspace" but instead add a separate workspace binding (e.g., Name:
"build-cache") using the PVC (PersistentVolumeClaimVolumeSource with ClaimName:
imageBuild.Spec.BuildCachePVC) and leave the existing "shared-workspace" binding
for per-build uploads; update the code paths that construct
sharedWorkspaceBinding and any workspace list to append the new build-cache
binding instead of assigning it to "shared-workspace", or alternatively add
validation to reject the combination up-front if you prefer explicit failure.

In `@internal/controller/operatorconfig/controller.go`:
- Around line 894-898: The SCC deletion in cleanupWorkspaceInfra unconditionally
calls r.Delete on a securityv1.SecurityContextConstraints (named by
workspaceSCCName) and treats any error other than IsNotFound as fatal; update
the error check to also ignore "no matches for kind" errors by adding
isNoMatchError(err) to the conditional (i.e., treat both errors.IsNotFound(err)
and isNoMatchError(err) as non-fatal) so cleanup proceeds on non-OpenShift
clusters.

---

Nitpick comments:
In `@internal/buildapi/workspace_test.go`:
- Around line 63-87: Add authenticated happy-path tests for the Create Workspace
flow: create a test in the same Describe/Context ("Create Workspace") that
builds a valid WorkspaceRequest JSON, sets a valid auth header (matching how
server expects authentication on server.router), and calls
server.router.ServeHTTP to exercise the handler instead of only unauthenticated
checks; mock the Kubernetes client used by the workspace handlers (the mock
implementation your codebase uses for k8s interactions) to return successful
responses for create/exec/deploy operations and assert the handler returns the
expected success status (e.g., HTTP 201/200) and response body. Ensure the test
references WorkspaceRequest, server.router.ServeHTTP, and the route
"/v1/workspaces" so it targets the same handler, and injects the mocked k8s
client into the server/test setup before invoking the request.

In `@internal/controller/operatorconfig/resources.go`:
- Around line 956-1003: The workspace SCC created by buildWorkspaceSCC currently
grants broad privileges (AllowPrivilegedContainer: true, AllowedCapabilities:
"ALL", RunAsUserStrategyRunAsAny) which increases risk; add an inline comment in
buildWorkspaceSCC documenting why each broad setting is required for workspace
pods (e.g., privileged true and UID 0 for sudo dnf installs), enumerate the
exact capabilities needed instead of "ALL" if possible, and add a TODO/issue
pointer to tighten RunAsUserStrategyRunAsAny to a restricted range or use
rootless alternatives later; keep UserNamespaceLevel: NamespaceLevelRequirePod
but note it as a mitigation in the comment and also add the same rationale to
the operator's user-facing docs/config so reviewers know these are intentional
choices.

In `@internal/controller/workspace/controller_test.go`:
- Around line 359-409: Test currently only calls Reconciler.buildPod with a nil
operatorConfig and doesn't exercise the config-merge path; add a parallel
subtest that constructs a non-nil operatorConfig (set fields that would interact
with Workspace, e.g., Architecture override, Image, TmpfsBuildDir or extra
env/volumes) and call r.buildPod(ws, operatorConfig) to assert merged behavior
(node affinity uses the resulting architecture, JMP_LEASE env still present,
tmpfs volume preserved/overridden as expected, image resolved correctly). Use
the same TestBuildPod_PreservesWorkspaceConfig test to add this case (or a new
TestBuildPod_WithOperatorConfig) and reference Reconciler.buildPod and
operatorConfig merge logic when adding assertions. Ensure the new assertions
verify both preserved Workspace fields and operatorConfig overrides.

In `@internal/controller/workspace/controller.go`:
- Around line 378-388: resourcesOrDefaults currently calls resource.MustParse
which can panic at runtime; refactor to use package-level variables (e.g.,
defaultCPUQty, defaultMemoryQty, defaultResources) initialized once at startup
using resource.ParseQuantity (handle parse errors in init() by logging/failing
early) and then return those prebuilt defaults from resourcesOrDefaults when r
is nil; update the function to reference the package-level defaultResources and
keep the existing behavior for non-nil r.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: ee38f7a9-10e3-41ad-b1db-cef2e537002c

📥 Commits

Reviewing files that changed from the base of the PR and between 3b94a37 and 38cc1ac.

⛔ Files ignored due to path filters (4)
  • config/crd/bases/automotive.sdv.cloud.redhat.com_imagebuilds.yaml is excluded by !config/crd/bases/**
  • config/crd/bases/automotive.sdv.cloud.redhat.com_operatorconfigs.yaml is excluded by !config/crd/bases/**
  • config/crd/bases/automotive.sdv.cloud.redhat.com_workspaces.yaml is excluded by !config/crd/bases/**
  • config/rbac/role.yaml is excluded by !config/rbac/role.yaml
📒 Files selected for processing (32)
  • Containerfile.toolchain
  • api/v1alpha1/imagebuild_types.go
  • api/v1alpha1/operatorconfig_types.go
  • api/v1alpha1/workspace_types.go
  • api/v1alpha1/zz_generated.deepcopy.go
  • bundle.Dockerfile
  • cmd/caib/buildcmd/build.go
  • cmd/caib/image/image.go
  • cmd/caib/main.go
  • cmd/caib/root.go
  • cmd/caib/runtime_wiring.go
  • cmd/caib/workspace/workspace.go
  • cmd/main.go
  • config/crd/kustomization.yaml
  • config/rbac/scc_role.yaml
  • config/rbac/scc_role_binding.yaml
  • hack/deploy-catalog.sh
  • internal/buildapi/client/client.go
  • internal/buildapi/client/client_test.go
  • internal/buildapi/server.go
  • internal/buildapi/types.go
  • internal/buildapi/workspace.go
  • internal/buildapi/workspace_test.go
  • internal/common/tasks/scripts/build_image.sh
  • internal/common/tasks/scripts/find_manifest.sh
  • internal/common/tasks/scripts/push_artifact.sh
  • internal/common/tasks/tasks.go
  • internal/controller/imagebuild/controller.go
  • internal/controller/operatorconfig/controller.go
  • internal/controller/operatorconfig/resources.go
  • internal/controller/workspace/controller.go
  • internal/controller/workspace/controller_test.go
✅ Files skipped from review due to trivial changes (8)
  • cmd/caib/main.go
  • config/rbac/scc_role_binding.yaml
  • bundle.Dockerfile
  • config/crd/kustomization.yaml
  • hack/deploy-catalog.sh
  • config/rbac/scc_role.yaml
  • internal/buildapi/types.go
  • internal/common/tasks/tasks.go
🚧 Files skipped from review as they are similar to previous changes (10)
  • cmd/main.go
  • internal/common/tasks/scripts/push_artifact.sh
  • api/v1alpha1/imagebuild_types.go
  • cmd/caib/image/image.go
  • api/v1alpha1/workspace_types.go
  • api/v1alpha1/operatorconfig_types.go
  • internal/common/tasks/scripts/find_manifest.sh
  • internal/buildapi/server.go
  • api/v1alpha1/zz_generated.deepcopy.go
  • internal/common/tasks/scripts/build_image.sh

Comment thread Containerfile.toolchain
FROM autosd-base

# Enable EPEL for mock
RUN dnf install -y https://dl.fedoraproject.org/pub/epel/epel-release-latest-10.noarch.rpm
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add dnf clean all after EPEL installation to reduce image size.

Static analysis flagged this line. The EPEL install doesn't clean up afterwards, leaving cached metadata in the layer.

Proposed fix
 # Enable EPEL for mock
-RUN dnf install -y https://dl.fedoraproject.org/pub/epel/epel-release-latest-10.noarch.rpm
+RUN dnf install -y https://dl.fedoraproject.org/pub/epel/epel-release-latest-10.noarch.rpm && dnf clean all
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
RUN dnf install -y https://dl.fedoraproject.org/pub/epel/epel-release-latest-10.noarch.rpm
# Enable EPEL for mock
RUN dnf install -y https://dl.fedoraproject.org/pub/epel/epel-release-latest-10.noarch.rpm && dnf clean all
🧰 Tools
🪛 Trivy (0.69.3)

[error] 57-57: 'dnf clean all' missing

'dnf clean all' is missed: dnf install -y https://dl.fedoraproject.org/pub/epel/epel-release-latest-10.noarch.rpm

Rule: DS-0019

Learn more

(IaC/Dockerfile)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Containerfile.toolchain` at line 57, The RUN that installs EPEL in
Containerfile.toolchain currently leaves cached metadata in the image; update
the RUN that contains "dnf install -y
https://dl.fedoraproject.org/pub/epel/epel-release-latest-10.noarch.rpm" to run
dnf clean all as part of the same RUN (e.g., append && dnf clean all) so the
cache is removed in the same layer and image size is reduced.

Comment on lines +306 to +310
SecurityContext: &corev1.SecurityContext{
Privileged: ptr.To(true),
AllowPrivilegeEscalation: ptr.To(true),
ProcMount: ptr.To(corev1.UnmaskedProcMount),
},
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify the workspace SCC and service account are properly configured
rg -n "workspaceSCCName\|ado-workspace" internal/controller/operatorconfig/

Repository: centos-automotive-suite/automotive-dev-operator

Length of output: 73


🏁 Script executed:

# Examine the workspace controller file around the mentioned lines
head -n 350 internal/controller/workspace/controller.go | tail -n 100

Repository: centos-automotive-suite/automotive-dev-operator

Length of output: 4657


🏁 Script executed:

# Search for SCC (SecurityContextConstraints) definitions and ado-workspace references
rg -i "scc|securitycontextconstraints|ado-workspace|privileged" --type go -n | head -100

Repository: centos-automotive-suite/automotive-dev-operator

Length of output: 15039


🏁 Script executed:

# Look for YAML manifests or configs that might define the SCC
find . -type f \( -name "*.yaml" -o -name "*.yml" \) -exec grep -l "ado-workspace-scc\|SecurityContextConstraints" {} \;

Repository: centos-automotive-suite/automotive-dev-operator

Length of output: 209


🏁 Script executed:

# Examine the SCC role file
cat ./config/rbac/scc_role.yaml

Repository: centos-automotive-suite/automotive-dev-operator

Length of output: 347


🏁 Script executed:

# Search for any workspace-related SCC or service account configuration
find ./config -type f \( -name "*.yaml" -o -name "*.yml" \) -exec grep -l "workspace\|ado-workspace" {} \;

Repository: centos-automotive-suite/automotive-dev-operator

Length of output: 501


🏁 Script executed:

# Look for documentation or comments about privileged pod requirements
grep -r "podman\|overlayfs\|privilege" config/ --include="*.yaml" --include="*.yml" -B 2 -A 2

Repository: centos-automotive-suite/automotive-dev-operator

Length of output: 1478


🏁 Script executed:

# View the SCC role binding to see which service accounts it applies to
cat ./config/rbac/scc_role_binding.yaml

Repository: centos-automotive-suite/automotive-dev-operator

Length of output: 363


🏁 Script executed:

# Search for SecurityContextConstraints resource definitions in the entire repo
find . -type f \( -name "*.yaml" -o -name "*.yml" \) -exec grep -l "SecurityContextConstraints\|kind.*SCC" {} \; | grep -v vendor

Repository: centos-automotive-suite/automotive-dev-operator

Length of output: 73


🏁 Script executed:

# Search for any code that creates or configures the ado-workspace-scc
rg "ado-workspace-scc" --type go -B 3 -A 3

Repository: centos-automotive-suite/automotive-dev-operator

Length of output: 733


🏁 Script executed:

# Find where workspaceSCCName is used
rg "workspaceSCCName" --type go -B 5 -A 5

Repository: centos-automotive-suite/automotive-dev-operator

Length of output: 3819


🏁 Script executed:

# Search for SCC creation or configuration code
rg "SecurityContextConstraints\|CreateSecurityContextConstraints" --type go -B 3 -A 10

Repository: centos-automotive-suite/automotive-dev-operator

Length of output: 73


🏁 Script executed:

# Look for any helm charts or kustomize overlays that might define the SCC
find . -type f \( -name "*.yaml" -o -name "*.yml" \) -path "*/kustomize/*" -o -path "*/helm/*" | head -20

Repository: centos-automotive-suite/automotive-dev-operator

Length of output: 73


🏁 Script executed:

# Get the complete buildWorkspaceSCC function
rg "func.*buildWorkspaceSCC" --type go -A 50

Repository: centos-automotive-suite/automotive-dev-operator

Length of output: 9926


🏁 Script executed:

# Also check what imports are needed for securityv1 to understand the SCC type
head -n 50 internal/controller/operatorconfig/resources.go | grep -E "^import|securityv1"

Repository: centos-automotive-suite/automotive-dev-operator

Length of output: 150


Add explanatory comment to SecurityContext documenting privileged mode requirement.

The SCC (ado-workspace-scc) is correctly configured with AllowPrivilegedContainer=true and RunAsUserStrategyRunAsAny. However, add a comment to the SecurityContext block (lines 306-310) explaining the purpose: privileged mode is required for podman-in-container with rootless user support (uid 1000), overlayfs mounts for persistent package installation, and user namespace setup. The justification is documented in the container command but should also be noted at the SecurityContext definition for clarity.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/controller/workspace/controller.go` around lines 306 - 310, Add a
concise explanatory comment immediately above the SecurityContext block (the
struct setting Privileged: ptr.To(true), AllowPrivilegeEscalation: ptr.To(true),
ProcMount: ptr.To(corev1.UnmaskedProcMount)) describing why privileged mode is
required: enable podman-in-container with rootless user (uid 1000), allow
overlayfs mounts for persistent package installation, and permit user namespace
setup; reference the existing container command justification so readers know
this mirrors that rationale.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (5)
internal/buildapi/workspace.go (3)

220-243: ⚠️ Potential issue | 🟠 Major

Create Jumpstarter secret only after workspace uniqueness check succeeds.

The secret is created (lines 229-242) before the workspace existence check (lines 246-259). If a workspace with the same name already exists, the secret is orphaned. Additionally, treating AlreadyExists as success (line 239) could silently bind a stale secret to a new workspace.

Move secret creation after the uniqueness check passes.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/buildapi/workspace.go` around lines 220 - 243, Move the Jumpstarter
secret creation (the block that decodes req.ClientConfig, sets jmpClientSecret,
builds the corev1.Secret and calls k8sClient.Create) so it runs only after the
workspace uniqueness check completes successfully; locate the current secret
logic referencing jmpClientSecret, req.ClientConfig, corev1.Secret and
k8sClient.Create and relocate it to after the existing workspace existence check
(the code that queries for an existing workspace/name). Also stop treating
k8serrors.IsAlreadyExists as success for this secret — if Create returns
AlreadyExists, return an error response instead of proceeding, to avoid binding
a stale secret to a new workspace.

193-198: ⚠️ Potential issue | 🟠 Major

Fail closed if OperatorConfig cannot be loaded.

Line 194 ignores the error from loadOperatorConfigFn. A transient read failure (not NotFound) silently skips workspace policy enforcement (maxResources, tmpfsBuildDir, storage class). Only NotFound should result in nil config; other errors should return 500.

🔧 Suggested fix
 	// Load workspace configuration from OperatorConfig
-	operatorConfig, _ := loadOperatorConfigFn(c.Request.Context(), k8sClient, namespace)
+	operatorConfig, err := loadOperatorConfigFn(c.Request.Context(), k8sClient, namespace)
+	if err != nil && !k8serrors.IsNotFound(err) {
+		c.JSON(http.StatusInternalServerError, gin.H{"error": "failed to load operator configuration"})
+		return
+	}
 	var wsConfig *automotivev1alpha1.WorkspacesConfig
 	if operatorConfig != nil {
 		wsConfig = operatorConfig.Spec.Workspaces
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/buildapi/workspace.go` around lines 193 - 198, The code ignores the
error returned by loadOperatorConfigFn causing silent failure; update the call
to loadOperatorConfigFn (used with c.Request.Context(), k8sClient, namespace) to
capture and inspect the error, treat a NotFound error as a nil operatorConfig
(and continue), but for any other error return an HTTP 500 response (or surface
the error upstream) so workspace policy enforcement (wsConfig /
automotivev1alpha1.WorkspacesConfig) is not skipped; use the Kubernetes helper
(errors.IsNotFound) to distinguish NotFound from real errors and ensure
operatorConfig is only nil when NotFound.

812-819: ⚠️ Potential issue | 🟠 Major

Initialize Requests and Limits maps after DeepCopy, not before.

Lines 813-816 create res with initialized maps, but line 818 replaces res entirely with wsConfig.Resources.DeepCopy(). If the OperatorConfig has resources: {} or only partial configuration, the DeepCopy can return a struct with nil Requests or Limits maps, causing panics on lines 831-832 and 845-846.

🔧 Suggested fix
 	// Start from OperatorConfig defaults, then overlay user values
-	res := &corev1.ResourceRequirements{
-		Requests: corev1.ResourceList{},
-		Limits:   corev1.ResourceList{},
-	}
+	res := &corev1.ResourceRequirements{}
 	if wsConfig != nil && wsConfig.Resources != nil {
 		res = wsConfig.Resources.DeepCopy()
 	}
+	if res.Requests == nil {
+		res.Requests = corev1.ResourceList{}
+	}
+	if res.Limits == nil {
+		res.Limits = corev1.ResourceList{}
+	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/buildapi/workspace.go` around lines 812 - 819, The current code
initializes res with empty Requests/Limits then overwrites it with
wsConfig.Resources.DeepCopy(), which can leave Requests or Limits nil and cause
panics when later accessed; change the logic in the workspace resource-building
code so that when wsConfig != nil && wsConfig.Resources != nil you assign res =
wsConfig.Resources.DeepCopy() and then ensure res.Requests and res.Limits are
non-nil (allocate empty corev1.ResourceList maps if either is nil) before
returning/using res; reference the res variable, wsConfig.Resources.DeepCopy(),
and the later accesses to res.Requests/res.Limits to locate where to add the nil
checks and map initializations.
cmd/caib/workspace/workspace.go (2)

501-518: ⚠️ Potential issue | 🟠 Major

Sync body not retryable across re-auth attempts.

The progressReader is created once outside ExecuteWithReauth, wrapping a bytes.Buffer. If the helper retries the callback (after re-authentication), subsequent SyncWorkspace calls read from an already-consumed buffer and upload empty/truncated archives.

🔧 Suggested fix
 func uploadFiles(name, absDir string, files []string) {
 	var buf bytes.Buffer
 	if err := tarTrackedFiles(absDir, files, &buf); err != nil {
 		handleError(fmt.Errorf("failed to create tar archive: %w", err))
 	}
 
 	totalBytes := int64(buf.Len())
-	pr := newProgressReader(&buf, totalBytes)
+	archive := buf.Bytes()
 
 	err := caibcommon.ExecuteWithReauth(serverURL, &authToken, insecureSkipTLS, func(client *buildapiclient.Client) error {
-		return client.SyncWorkspace(context.Background(), name, pr)
+		pr := newProgressReader(bytes.NewReader(archive), totalBytes)
+		err := client.SyncWorkspace(context.Background(), name, pr)
+		pr.finish()
+		return err
 	})
-	pr.finish()
 	if err != nil {
 		handleError(fmt.Errorf("failed to sync workspace: %w", err))
 	}
 	fmt.Println("Files synced")
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cmd/caib/workspace/workspace.go` around lines 501 - 518, The current
uploadFiles creates a single progressReader over a bytes.Buffer before
ExecuteWithReauth, so retries will re-read an already-consumed reader; instead
stash the tar bytes (e.g., tarData := buf.Bytes(), totalBytes :=
int64(len(tarData))) and create a fresh reader+progressReader inside the
ExecuteWithReauth callback on each attempt (use bytes.NewReader(tarData) and
newProgressReader), assign that progressReader to an outer variable (e.g.,
finalPR) so you can call finalPR.finish() after ExecuteWithReauth returns, and
pass the freshly created pr into client.SyncWorkspace; update references to
newProgressReader and SyncWorkspace accordingly.

608-614: ⚠️ Potential issue | 🟡 Minor

Flattening command arguments loses argument boundaries.

Joining cmdParts with spaces (line 613) breaks commands that have arguments containing spaces, quotes, or shell metacharacters. For example, exec my-app -- echo "hello world" would send echo hello world as the command string.

Consider preserving the argument array in the API request or using proper shell escaping.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cmd/caib/workspace/workspace.go` around lines 608 - 614, The current code
flattens cmdParts into a single string (variable command) which loses argument
boundaries; instead keep cmdParts as a []string and pass that slice to the
API/executor rather than using strings.Join. Update the logic around
cmdParts/command in workspace.go (the block that builds command and uses
handleError) to stop creating the single command string and to propagate the
[]string (cmdParts) through the call path (or, if the backend requires a shell
string, perform proper shell-quoting/escaping there rather than naive joining)
so quoted arguments and spaces are preserved.
🧹 Nitpick comments (2)
internal/common/tasks/scripts/push_artifact.sh (1)

237-238: Comment is misleading: aib-manifest.yml is not added separately below.

The comment states the manifest is "added separately below," but there's no code in this script that adds aib-manifest.yml as a separate layer or includes it elsewhere. This aligns with find_manifest.sh (lines 31-32), which also excludes aib-manifest.yml from the shared workspace copy, treating it as a build artifact that shouldn't be pushed to the registry.

If the intent is simply to exclude it (consistent with find_manifest.sh), consider updating the comment:

📝 Suggested comment clarification
-    # Skip .size sidecar files and aib-manifest.yml (added separately below)
+    # Skip .size sidecar files and aib-manifest.yml (build artifact, not a partition layer)
     case "$part_file" in *.size|aib-manifest.yml) continue ;; esac
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/common/tasks/scripts/push_artifact.sh` around lines 237 - 238,
Update the misleading comment on the case that skips files: it currently claims
"aib-manifest.yml (added separately below)" even though nothing in this script
adds that manifest later; change the comment to state we are skipping .size
sidecar files and excluding aib-manifest.yml (treating it as a build artifact
that should not be pushed) so it matches the behavior in find_manifest.sh;
locate the case statement that checks "$part_file" (case "$part_file" in
*.size|aib-manifest.yml) and edit the comment text accordingly.
api/v1alpha1/operatorconfig_types.go (1)

422-425: Remember to regenerate CRD manifests.

After modifying types in api/v1alpha1/, run make generate manifests to regenerate DeepCopy methods and CRDs. As per coding guidelines: "After modifying types in api/v1alpha1/, run make generate manifests to regenerate DeepCopy methods and CRDs".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api/v1alpha1/operatorconfig_types.go` around lines 422 - 425, You added a
new/changed field Workspaces *WorkspacesConfig in the OperatorConfig types, so
you must regenerate the autogenerated DeepCopy methods and CRD manifests; run
the project codegen (e.g., run the repository's codegen command: make generate
manifests) to update DeepCopy, client, listers and the CRD YAMLs so the changes
to WorkspacesConfig and OperatorConfig are reflected in the generated artifacts.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@cmd/caib/workspace/workspace.go`:
- Around line 501-518: The current uploadFiles creates a single progressReader
over a bytes.Buffer before ExecuteWithReauth, so retries will re-read an
already-consumed reader; instead stash the tar bytes (e.g., tarData :=
buf.Bytes(), totalBytes := int64(len(tarData))) and create a fresh
reader+progressReader inside the ExecuteWithReauth callback on each attempt (use
bytes.NewReader(tarData) and newProgressReader), assign that progressReader to
an outer variable (e.g., finalPR) so you can call finalPR.finish() after
ExecuteWithReauth returns, and pass the freshly created pr into
client.SyncWorkspace; update references to newProgressReader and SyncWorkspace
accordingly.
- Around line 608-614: The current code flattens cmdParts into a single string
(variable command) which loses argument boundaries; instead keep cmdParts as a
[]string and pass that slice to the API/executor rather than using strings.Join.
Update the logic around cmdParts/command in workspace.go (the block that builds
command and uses handleError) to stop creating the single command string and to
propagate the []string (cmdParts) through the call path (or, if the backend
requires a shell string, perform proper shell-quoting/escaping there rather than
naive joining) so quoted arguments and spaces are preserved.

In `@internal/buildapi/workspace.go`:
- Around line 220-243: Move the Jumpstarter secret creation (the block that
decodes req.ClientConfig, sets jmpClientSecret, builds the corev1.Secret and
calls k8sClient.Create) so it runs only after the workspace uniqueness check
completes successfully; locate the current secret logic referencing
jmpClientSecret, req.ClientConfig, corev1.Secret and k8sClient.Create and
relocate it to after the existing workspace existence check (the code that
queries for an existing workspace/name). Also stop treating
k8serrors.IsAlreadyExists as success for this secret — if Create returns
AlreadyExists, return an error response instead of proceeding, to avoid binding
a stale secret to a new workspace.
- Around line 193-198: The code ignores the error returned by
loadOperatorConfigFn causing silent failure; update the call to
loadOperatorConfigFn (used with c.Request.Context(), k8sClient, namespace) to
capture and inspect the error, treat a NotFound error as a nil operatorConfig
(and continue), but for any other error return an HTTP 500 response (or surface
the error upstream) so workspace policy enforcement (wsConfig /
automotivev1alpha1.WorkspacesConfig) is not skipped; use the Kubernetes helper
(errors.IsNotFound) to distinguish NotFound from real errors and ensure
operatorConfig is only nil when NotFound.
- Around line 812-819: The current code initializes res with empty
Requests/Limits then overwrites it with wsConfig.Resources.DeepCopy(), which can
leave Requests or Limits nil and cause panics when later accessed; change the
logic in the workspace resource-building code so that when wsConfig != nil &&
wsConfig.Resources != nil you assign res = wsConfig.Resources.DeepCopy() and
then ensure res.Requests and res.Limits are non-nil (allocate empty
corev1.ResourceList maps if either is nil) before returning/using res; reference
the res variable, wsConfig.Resources.DeepCopy(), and the later accesses to
res.Requests/res.Limits to locate where to add the nil checks and map
initializations.

---

Nitpick comments:
In `@api/v1alpha1/operatorconfig_types.go`:
- Around line 422-425: You added a new/changed field Workspaces
*WorkspacesConfig in the OperatorConfig types, so you must regenerate the
autogenerated DeepCopy methods and CRD manifests; run the project codegen (e.g.,
run the repository's codegen command: make generate manifests) to update
DeepCopy, client, listers and the CRD YAMLs so the changes to WorkspacesConfig
and OperatorConfig are reflected in the generated artifacts.

In `@internal/common/tasks/scripts/push_artifact.sh`:
- Around line 237-238: Update the misleading comment on the case that skips
files: it currently claims "aib-manifest.yml (added separately below)" even
though nothing in this script adds that manifest later; change the comment to
state we are skipping .size sidecar files and excluding aib-manifest.yml
(treating it as a build artifact that should not be pushed) so it matches the
behavior in find_manifest.sh; locate the case statement that checks "$part_file"
(case "$part_file" in *.size|aib-manifest.yml) and edit the comment text
accordingly.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 9058dd81-66c1-4541-a139-0e81b6b207d5

📥 Commits

Reviewing files that changed from the base of the PR and between 38cc1ac and 3933c6c.

⛔ Files ignored due to path filters (4)
  • config/crd/bases/automotive.sdv.cloud.redhat.com_imagebuilds.yaml is excluded by !config/crd/bases/**
  • config/crd/bases/automotive.sdv.cloud.redhat.com_operatorconfigs.yaml is excluded by !config/crd/bases/**
  • config/crd/bases/automotive.sdv.cloud.redhat.com_workspaces.yaml is excluded by !config/crd/bases/**
  • config/rbac/role.yaml is excluded by !config/rbac/role.yaml
📒 Files selected for processing (34)
  • Containerfile.toolchain
  • api/v1alpha1/imagebuild_types.go
  • api/v1alpha1/operatorconfig_types.go
  • api/v1alpha1/workspace_types.go
  • api/v1alpha1/zz_generated.deepcopy.go
  • bundle.Dockerfile
  • cmd/caib/buildcmd/build.go
  • cmd/caib/image/image.go
  • cmd/caib/main.go
  • cmd/caib/root.go
  • cmd/caib/runtime_wiring.go
  • cmd/caib/workspace/workspace.go
  • cmd/main.go
  • config/crd/kustomization.yaml
  • config/rbac/scc_role.yaml
  • config/rbac/scc_role_binding.yaml
  • hack/container-wrapper.sh
  • hack/deploy-catalog.sh
  • hack/workspace-entrypoint.sh
  • internal/buildapi/client/client.go
  • internal/buildapi/client/client_test.go
  • internal/buildapi/server.go
  • internal/buildapi/types.go
  • internal/buildapi/workspace.go
  • internal/buildapi/workspace_test.go
  • internal/common/tasks/scripts/build_image.sh
  • internal/common/tasks/scripts/find_manifest.sh
  • internal/common/tasks/scripts/push_artifact.sh
  • internal/common/tasks/tasks.go
  • internal/controller/imagebuild/controller.go
  • internal/controller/operatorconfig/controller.go
  • internal/controller/operatorconfig/resources.go
  • internal/controller/workspace/controller.go
  • internal/controller/workspace/controller_test.go
✅ Files skipped from review due to trivial changes (11)
  • cmd/caib/main.go
  • bundle.Dockerfile
  • config/crd/kustomization.yaml
  • hack/deploy-catalog.sh
  • hack/container-wrapper.sh
  • cmd/caib/root.go
  • config/rbac/scc_role.yaml
  • internal/buildapi/client/client_test.go
  • internal/common/tasks/tasks.go
  • internal/controller/workspace/controller_test.go
  • api/v1alpha1/workspace_types.go
🚧 Files skipped from review as they are similar to previous changes (11)
  • config/rbac/scc_role_binding.yaml
  • api/v1alpha1/imagebuild_types.go
  • internal/buildapi/workspace_test.go
  • cmd/caib/runtime_wiring.go
  • cmd/caib/image/image.go
  • internal/controller/operatorconfig/controller.go
  • internal/buildapi/server.go
  • internal/common/tasks/scripts/find_manifest.sh
  • internal/controller/operatorconfig/resources.go
  • internal/buildapi/client/client.go
  • api/v1alpha1/zz_generated.deepcopy.go

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (11)
internal/common/tasks/scripts/build_image.sh (1)

59-69: ⚠️ Potential issue | 🟠 Major

Delete cached image.json before reusing BUILD_DIR.

The persistent-cache cleanup removes orphaned output directories but leaves $BUILD_DIR/image.json. Line 528 then republishes that file for every build, so a cached disk run can expose the previous build's manifest.

🔧 Proposed fix
   for orphan in "$BUILD_DIR"/image_output--*; do
     [ -e "$orphan" ] || break
     echo "Removing orphaned build output: $(basename "$orphan")"
     rm -rf "$orphan"
   done
+  rm -f "$BUILD_DIR/image.json"

Also applies to: 528-528

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/common/tasks/scripts/build_image.sh` around lines 59 - 69, The
cleanup loop currently leaves a cached manifest at $BUILD_DIR/image.json which
can be reused across builds; update the script to explicitly remove
"$BUILD_DIR/image.json" before reusing BUILD_DIR: detect if the file exists
(reference BUILD_DIR and image.json) and rm -f it (or equivalent) before the
republish step so stale manifests aren't carried forward; ensure this removal
happens alongside the existing orphan removal logic that iterates over
image_output--*.
internal/controller/imagebuild/controller.go (1)

760-773: ⚠️ Potential issue | 🟠 Major

Don't swap out uploaded inputs for the cache PVC.

When BuildCachePVC is set, this branch stops mounting imageBuild.Status.PVCName, so uploaded source_path content disappears from the build. It also points different builds at the same /workspace/shared root even though internal/common/tasks/scripts/build_image.sh now cleans that root on each run.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/controller/imagebuild/controller.go` around lines 760 - 773, The
current branch selects BuildCachePVC before checking uploaded inputs, causing
uploaded content (imageBuild.Status.PVCName) to be replaced by the cache PVC;
change the selection order so that when imageBuild.Spec.GetInputFilesServer() is
true and imageBuild.Status.PVCName is non-empty you bind that PVC to
sharedWorkspaceBinding first (use the PersistentVolumeClaim with ClaimName =
imageBuild.Status.PVCName), otherwise fall back to
imageBuild.Spec.BuildCachePVC, and only then to the VolumeClaimTemplate path —
update the conditional logic around sharedWorkspaceBinding in the controller.go
block that references imageBuild.Spec.BuildCachePVC,
imageBuild.Spec.GetInputFilesServer(), and imageBuild.Status.PVCName.
Containerfile.toolchain (2)

22-38: ⚠️ Potential issue | 🟠 Major

Don't build the rootfs with signature checks disabled.

Line 26 and Line 34 bypass RPM verification, so the later key import never protects this installroot. That turns the nightly repo fetch into an unsigned supply-chain hop for the toolchain image.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Containerfile.toolchain` around lines 22 - 38, The install currently disables
signature verification (repo file gpgcheck and the dnf --nogpgcheck flag) and
imports the GPG key only after install, so import the key before installing and
enable verification: write the repo with gpgcheck=1, remove the --nogpgcheck
flag from the RUN dnf install invocation, move the rpm --root=${ROOTFS} --import
${ROOTFS}/etc/pki/rpm-gpg/${GPG_KEY} call to run before the dnf install, and
fail the build if the import fails (remove the "|| true") so the installroot is
protected by the imported key.

90-91: ⚠️ Potential issue | 🟠 Major

Expose jmp from a non-root path.

Line 91 only adds /root/.local/jumpstarter/bin to PATH, but Line 37 in hack/workspace-entrypoint.sh runs jmp as UID/GID 1000. That user cannot traverse /root, so workspace startup silently skips client activation.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Containerfile.toolchain` around lines 90 - 91, The image exposes jmp only
under /root/.local/jumpstarter/bin which UID/GID 1000 in
hack/workspace-entrypoint.sh cannot access; change the install or PATH so the
jmp binary is placed in or visible from a non-root location. Update the RUN/curl
install invocation or set an install prefix (so jumpstarter is installed to a
shared location such as /usr/local/share/jumpstarter or
/home/jumpstarter/.local) and update the ENV PATH (replace or add to the
existing ENV PATH "/usr/local/bin" or the chosen non-root directory) so the jmp
executable is accessible to UID 1000; ensure references to jmp and the ENV PATH
change correspond to the install command and the workspace-entrypoint.sh usage.
internal/controller/operatorconfig/controller.go (1)

908-912: ⚠️ Potential issue | 🟠 Major

Ignore SCC NoMatch during workspace cleanup.

cleanupOSBuilds() now reaches this path unconditionally, but Line 910 still treats the OpenShift-only SecurityContextConstraints delete as fatal. On non-OpenShift clusters that can wedge cleanup/finalizer removal even though the SCC API was never available.

🔧 Proposed fix
-	if err := r.Delete(ctx, scc); err != nil && !errors.IsNotFound(err) {
+	if err := r.Delete(ctx, scc); err != nil && !errors.IsNotFound(err) && !isNoMatchError(err) {
 		return fmt.Errorf("failed to delete workspace SCC: %w", err)
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/controller/operatorconfig/controller.go` around lines 908 - 912, The
delete of SecurityContextConstraints in cleanupOSBuilds currently treats any
error other than NotFound as fatal; update the error handling around
r.Delete(ctx, scc) (the SecurityContextConstraints delete for workspaceSCCName)
to also ignore API "NoMatch" errors on clusters without the SCC resource.
Specifically, change the if condition to treat both errors.IsNotFound(err) and
the API "no match" case (use the helper apierrors.IsNoMatchError(err) /
errors.IsNoMatchError equivalent from k8s apimachinery) as non-fatal so cleanup
can proceed on non-OpenShift clusters.
internal/buildapi/workspace.go (3)

220-243: ⚠️ Potential issue | 🟠 Major

Jumpstarter secret created before workspace validation can be orphaned.

The client config secret is created (lines 229-242) before checking if the workspace already exists (lines 246-259). If workspace creation fails or conflicts, the secret remains orphaned with no owner reference for garbage collection.

Move secret creation after workspace CR creation succeeds, then set an owner reference.

🔧 Suggested approach
  1. Move secret creation to after successful k8sClient.Create(ctx, ws)
  2. Set OwnerReferences on the secret pointing to the workspace
  3. If secret creation fails, delete the workspace or add it to the cleanup path
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/buildapi/workspace.go` around lines 220 - 243, The secret creation
using req.ClientConfig (jmpClientSecret and Secret object) must be moved to
after the workspace CR is successfully created (after the call that creates ws
via k8sClient.Create(ctx, ws)); set the Secret.ObjectMeta.OwnerReferences to
point to the created workspace (use the workspace UID and appropriate
APIVersion/Kind) so Kubernetes garbage-collects it, and if the secret creation
fails, rollback by deleting the workspace (k8sClient.Delete(ctx, ws)) or include
the workspace in the existing cleanup/error path; ensure you still decode
req.ClientConfig and only attempt Secret creation when ws creation returned no
error.

812-819: ⚠️ Potential issue | 🟠 Major

Nil map panic after DeepCopy.

Lines 813-816 initialize empty maps, but line 818 reassigns res to a DeepCopy() which can have nil Requests/Limits maps (e.g., if OperatorConfig sets resources: {}). The subsequent assignments at lines 831-832 and 845-846 will panic.

🐛 Proposed fix
 	// Start from OperatorConfig defaults, then overlay user values
 	res := &corev1.ResourceRequirements{
 		Requests: corev1.ResourceList{},
 		Limits:   corev1.ResourceList{},
 	}
 	if wsConfig != nil && wsConfig.Resources != nil {
 		res = wsConfig.Resources.DeepCopy()
+		if res.Requests == nil {
+			res.Requests = corev1.ResourceList{}
+		}
+		if res.Limits == nil {
+			res.Limits = corev1.ResourceList{}
+		}
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/buildapi/workspace.go` around lines 812 - 819, The code assigns res
= wsConfig.Resources.DeepCopy() which may produce a ResourceRequirements whose
Requests or Limits maps are nil, causing panics later when writing to them;
after the DeepCopy call (when wsConfig != nil && wsConfig.Resources != nil)
ensure res.Requests and res.Limits are non-nil maps (initialize empty
corev1.ResourceList{} if nil) before any subsequent assignments. Update the
handling around res, wsConfig, and Resources.DeepCopy() so you never write into
a nil Requests or Limits map.

193-198: ⚠️ Potential issue | 🟠 Major

Silently ignoring OperatorConfig load errors allows policy bypass.

Same issue as in server.go: transient API errors fall back to nil config, skipping workspace policy enforcement (maxResources, tmpfsBuildDir, storage class defaults).

🛡️ Proposed fix
 	// Load workspace configuration from OperatorConfig
-	operatorConfig, _ := loadOperatorConfigFn(c.Request.Context(), k8sClient, namespace)
+	operatorConfig, err := loadOperatorConfigFn(c.Request.Context(), k8sClient, namespace)
+	if err != nil && !k8serrors.IsNotFound(err) {
+		c.JSON(http.StatusInternalServerError, gin.H{"error": "failed to load operator config"})
+		return
+	}
 	var wsConfig *automotivev1alpha1.WorkspacesConfig
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/buildapi/workspace.go` around lines 193 - 198, The code currently
ignores the error returned by loadOperatorConfigFn which lets transient API
failures silently disable workspace policies; update the logic that calls
loadOperatorConfigFn to check the returned error (from loadOperatorConfigFn),
and if non-nil return or propagate a wrapped error (instead of proceeding with
operatorConfig == nil), so caller can fail fast and enforce policies; ensure the
subsequent wsConfig assignment (using operatorConfig.Spec.Workspaces) only runs
when operatorConfig is non-nil and no error occurred, and add a clear error
message referencing loadOperatorConfigFn/operatorConfig/wsConfig to aid
debugging.
internal/buildapi/server.go (2)

1626-1686: ⚠️ Potential issue | 🟠 Major

TOCTOU: GenerateName can leak duplicate build-cache PVCs.

The list-then-create flow with GenerateName (lines 1631-1642, 1660-1686) is not atomic. If two concurrent builds for the same workspace both miss the existing PVC in the list, each creates a separate cache. This fragments warm-cache state and orphans extra PVCs.

Use a deterministic PVC name (e.g., <workspace>-build-cache) with create-or-get semantics.

🔧 Suggested approach
-	pvc := &corev1.PersistentVolumeClaim{
-		ObjectMeta: metav1.ObjectMeta{
-			GenerateName: wsName + "-build-cache-",
+	buildCachePVCName := wsName + "-build-cache"
+	pvc := &corev1.PersistentVolumeClaim{
+		ObjectMeta: metav1.ObjectMeta{
+			Name:      buildCachePVCName,
 			Namespace: namespace,
 			Labels:    buildCacheLabels,
 ...
 	if err := k8sClient.Create(ctx, pvc); err != nil {
+		if k8serrors.IsAlreadyExists(err) {
+			return buildCachePVCName, nil
+		}
 		return "", fmt.Errorf("creating build-cache PVC: %w", err)
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/buildapi/server.go` around lines 1626 - 1686, The current
list-then-create flow with GenerateName can create duplicate PVCs under race;
change to a deterministic PVC name (e.g., wsName + "-build-cache") instead of
using GenerateName in the PVC object and implement create-or-get semantics:
construct the PVC with Name set (not GenerateName), call k8sClient.Create(ctx,
pvc) and if it returns an AlreadyExists error (errors.IsAlreadyExists), fetch
the existing PVC via k8sClient.Get and return its Name (ensuring
DeletionTimestamp == nil); where Create succeeds return the new PVC name; handle
other errors normally. Refer to the PVC object creation and the GenerateName
usage and the surrounding create/List logic to locate the code to modify.

1536-1542: ⚠️ Potential issue | 🔴 Critical

Shell injection via untrusted repoPath.

The repoPath from user input is interpolated directly into a shell command without sanitization. A malicious path like foo; rm -rf / would execute arbitrary commands in the workspace pod.

The safeFilename function exists (lines 441-464) for similar validation. Apply it to repoPath, or use the shellQuote helper from workspace.go.

🛡️ Proposed fix using shellQuote
+// shellQuote returns s wrapped in POSIX single quotes with embedded single
+// quotes escaped, safe for interpolation into a /bin/sh -c script.
+func shellQuote(s string) string {
+	return "'" + strings.ReplaceAll(s, "'", `'\''`) + "'"
+}
+
 // Start HTTP server in the workspace (background, fire-and-forget).
 // Redirect shell's own FDs first so runc exec doesn't block waiting for SPDY pipes.
 cmd := []string{"/bin/sh", "-c",
-	fmt.Sprintf("exec 0</dev/null 1>/dev/null 2>/dev/null; cd %s && python3 -m http.server %d &", repoPath, port)}
+	fmt.Sprintf("exec 0</dev/null 1>/dev/null 2>/dev/null; cd %s && python3 -m http.server %d &", shellQuote(repoPath), port)}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/buildapi/server.go` around lines 1536 - 1542, The repoPath is being
interpolated into a shell -c command unsanitized, allowing shell injection;
before building cmd in the HTTP server start block, sanitize or escape repoPath
using the existing helpers (call shellQuote(repoPath) from workspace.go or
validate with safeFilename(repoPath)) and use that sanitized value in the
fmt.Sprintf that constructs cmd (instead of raw repoPath); keep the existing
podExec call and error wrapping (fmt.Errorf("starting HTTP server in workspace
%q: %w", wsName, err)) but ensure repoPath is sanitized so the spawned shell
cannot execute injected commands.
internal/controller/workspace/controller.go (1)

195-202: ⚠️ Potential issue | 🟠 Major

User-supplied images run with elevated privileges.

The pod runs with Privileged: true and accepts arbitrary images from ws.Spec.Image. While HostUsers: false provides some isolation via user namespaces, a malicious image can still escape to the node if kernel vulnerabilities exist.

Consider restricting ws.Spec.Image to an operator-controlled allowlist, or at minimum validate the image against a configurable prefix (e.g., registry.redhat.io/automotive/).

Also applies to: 282-286

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/controller/workspace/controller.go` around lines 195 - 202, The code
currently accepts ws.Spec.Image directly (see ws.Spec.Image usage and fallback
to WorkspacesConfig.GetToolchainImage) which allows arbitrary images; change the
logic to validate any user-supplied ws.Spec.Image against an operator-controlled
allowlist or configurable prefix (e.g.,
operatorConfig.Spec.Workspaces.ImagePrefix or AllowedImages) before accepting
it: if ws.Spec.Image is non-empty, call a validation helper (e.g.,
IsImageAllowed(image, operatorConfig.Spec.Workspaces)) and only use the image
when validation passes; otherwise fall back to
WorkspacesConfig.GetToolchainImage or return an error/deny the request. Apply
the same validation wherever ws.Spec.Image is read (also at the other occurrence
around lines 282-286).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@internal/buildapi/server.go`:
- Around line 1567-1571: The code currently ignores errors from
loadOperatorConfigFn and treats any error as a nil operatorConfig; change the
logic in the handler that calls loadOperatorConfigFn(ctx, k8sClient, namespace)
to inspect the returned error: if err == nil set operatorConfig and wsConfig as
before; if the error is a NotFound (use the appropriate k8s API check e.g.
apierrors.IsNotFound) treat operatorConfig as nil and continue; for any other
error return a 500/internal server error response so policy enforcement is not
bypassed. Keep the same symbols (loadOperatorConfigFn, operatorConfig, wsConfig)
so the fix is applied in the same call site.

In `@internal/controller/imagebuild/controller.go`:
- Around line 789-792: The storage-class fallback added for the
VolumeClaimTemplate branch wasn't applied to the upload-backed PVC code path;
update getOrCreateWorkspacePVC to mirror the same logic used where
storageClassName is set (check imageBuild.Spec.StorageClass first, else fall
back to operatorConfig.Spec.OSBuilds.StorageClass) and use that storageClassName
pointer when constructing the PVC (and when setting PVC.Spec.StorageClassName).
Locate getOrCreateWorkspacePVC and the PVC construction/creation logic and
ensure it references the same storageClassName resolution as the
VolumeClaimTemplate branch (using the same variable names like storageClassName
and imageBuild.Spec.StorageClass / operatorConfig.Spec.OSBuilds.StorageClass).
- Line 105: The RBAC annotation currently grants only "get;update" for
workspaces but updateWorkspaceLease() uses r.Patch(ctx, ws, patch) which
requires the "patch" verb; update the kubebuilder RBAC comment line that
references workspaces (the line containing
"+kubebuilder:rbac:groups=automotive.sdv.cloud.redhat.com,resources=workspaces,verbs=...")
to include "patch" in the verbs list so PATCH requests from r.Patch are
authorized.

---

Duplicate comments:
In `@Containerfile.toolchain`:
- Around line 22-38: The install currently disables signature verification (repo
file gpgcheck and the dnf --nogpgcheck flag) and imports the GPG key only after
install, so import the key before installing and enable verification: write the
repo with gpgcheck=1, remove the --nogpgcheck flag from the RUN dnf install
invocation, move the rpm --root=${ROOTFS} --import
${ROOTFS}/etc/pki/rpm-gpg/${GPG_KEY} call to run before the dnf install, and
fail the build if the import fails (remove the "|| true") so the installroot is
protected by the imported key.
- Around line 90-91: The image exposes jmp only under
/root/.local/jumpstarter/bin which UID/GID 1000 in hack/workspace-entrypoint.sh
cannot access; change the install or PATH so the jmp binary is placed in or
visible from a non-root location. Update the RUN/curl install invocation or set
an install prefix (so jumpstarter is installed to a shared location such as
/usr/local/share/jumpstarter or /home/jumpstarter/.local) and update the ENV
PATH (replace or add to the existing ENV PATH "/usr/local/bin" or the chosen
non-root directory) so the jmp executable is accessible to UID 1000; ensure
references to jmp and the ENV PATH change correspond to the install command and
the workspace-entrypoint.sh usage.

In `@internal/buildapi/server.go`:
- Around line 1626-1686: The current list-then-create flow with GenerateName can
create duplicate PVCs under race; change to a deterministic PVC name (e.g.,
wsName + "-build-cache") instead of using GenerateName in the PVC object and
implement create-or-get semantics: construct the PVC with Name set (not
GenerateName), call k8sClient.Create(ctx, pvc) and if it returns an
AlreadyExists error (errors.IsAlreadyExists), fetch the existing PVC via
k8sClient.Get and return its Name (ensuring DeletionTimestamp == nil); where
Create succeeds return the new PVC name; handle other errors normally. Refer to
the PVC object creation and the GenerateName usage and the surrounding
create/List logic to locate the code to modify.
- Around line 1536-1542: The repoPath is being interpolated into a shell -c
command unsanitized, allowing shell injection; before building cmd in the HTTP
server start block, sanitize or escape repoPath using the existing helpers (call
shellQuote(repoPath) from workspace.go or validate with safeFilename(repoPath))
and use that sanitized value in the fmt.Sprintf that constructs cmd (instead of
raw repoPath); keep the existing podExec call and error wrapping
(fmt.Errorf("starting HTTP server in workspace %q: %w", wsName, err)) but ensure
repoPath is sanitized so the spawned shell cannot execute injected commands.

In `@internal/buildapi/workspace.go`:
- Around line 220-243: The secret creation using req.ClientConfig
(jmpClientSecret and Secret object) must be moved to after the workspace CR is
successfully created (after the call that creates ws via k8sClient.Create(ctx,
ws)); set the Secret.ObjectMeta.OwnerReferences to point to the created
workspace (use the workspace UID and appropriate APIVersion/Kind) so Kubernetes
garbage-collects it, and if the secret creation fails, rollback by deleting the
workspace (k8sClient.Delete(ctx, ws)) or include the workspace in the existing
cleanup/error path; ensure you still decode req.ClientConfig and only attempt
Secret creation when ws creation returned no error.
- Around line 812-819: The code assigns res = wsConfig.Resources.DeepCopy()
which may produce a ResourceRequirements whose Requests or Limits maps are nil,
causing panics later when writing to them; after the DeepCopy call (when
wsConfig != nil && wsConfig.Resources != nil) ensure res.Requests and res.Limits
are non-nil maps (initialize empty corev1.ResourceList{} if nil) before any
subsequent assignments. Update the handling around res, wsConfig, and
Resources.DeepCopy() so you never write into a nil Requests or Limits map.
- Around line 193-198: The code currently ignores the error returned by
loadOperatorConfigFn which lets transient API failures silently disable
workspace policies; update the logic that calls loadOperatorConfigFn to check
the returned error (from loadOperatorConfigFn), and if non-nil return or
propagate a wrapped error (instead of proceeding with operatorConfig == nil), so
caller can fail fast and enforce policies; ensure the subsequent wsConfig
assignment (using operatorConfig.Spec.Workspaces) only runs when operatorConfig
is non-nil and no error occurred, and add a clear error message referencing
loadOperatorConfigFn/operatorConfig/wsConfig to aid debugging.

In `@internal/common/tasks/scripts/build_image.sh`:
- Around line 59-69: The cleanup loop currently leaves a cached manifest at
$BUILD_DIR/image.json which can be reused across builds; update the script to
explicitly remove "$BUILD_DIR/image.json" before reusing BUILD_DIR: detect if
the file exists (reference BUILD_DIR and image.json) and rm -f it (or
equivalent) before the republish step so stale manifests aren't carried forward;
ensure this removal happens alongside the existing orphan removal logic that
iterates over image_output--*.

In `@internal/controller/imagebuild/controller.go`:
- Around line 760-773: The current branch selects BuildCachePVC before checking
uploaded inputs, causing uploaded content (imageBuild.Status.PVCName) to be
replaced by the cache PVC; change the selection order so that when
imageBuild.Spec.GetInputFilesServer() is true and imageBuild.Status.PVCName is
non-empty you bind that PVC to sharedWorkspaceBinding first (use the
PersistentVolumeClaim with ClaimName = imageBuild.Status.PVCName), otherwise
fall back to imageBuild.Spec.BuildCachePVC, and only then to the
VolumeClaimTemplate path — update the conditional logic around
sharedWorkspaceBinding in the controller.go block that references
imageBuild.Spec.BuildCachePVC, imageBuild.Spec.GetInputFilesServer(), and
imageBuild.Status.PVCName.

In `@internal/controller/operatorconfig/controller.go`:
- Around line 908-912: The delete of SecurityContextConstraints in
cleanupOSBuilds currently treats any error other than NotFound as fatal; update
the error handling around r.Delete(ctx, scc) (the SecurityContextConstraints
delete for workspaceSCCName) to also ignore API "NoMatch" errors on clusters
without the SCC resource. Specifically, change the if condition to treat both
errors.IsNotFound(err) and the API "no match" case (use the helper
apierrors.IsNoMatchError(err) / errors.IsNoMatchError equivalent from k8s
apimachinery) as non-fatal so cleanup can proceed on non-OpenShift clusters.

In `@internal/controller/workspace/controller.go`:
- Around line 195-202: The code currently accepts ws.Spec.Image directly (see
ws.Spec.Image usage and fallback to WorkspacesConfig.GetToolchainImage) which
allows arbitrary images; change the logic to validate any user-supplied
ws.Spec.Image against an operator-controlled allowlist or configurable prefix
(e.g., operatorConfig.Spec.Workspaces.ImagePrefix or AllowedImages) before
accepting it: if ws.Spec.Image is non-empty, call a validation helper (e.g.,
IsImageAllowed(image, operatorConfig.Spec.Workspaces)) and only use the image
when validation passes; otherwise fall back to
WorkspacesConfig.GetToolchainImage or return an error/deny the request. Apply
the same validation wherever ws.Spec.Image is read (also at the other occurrence
around lines 282-286).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 290b544a-3d3c-456f-a157-93c8cdf4d46e

📥 Commits

Reviewing files that changed from the base of the PR and between 3933c6c and e2d9c98.

⛔ Files ignored due to path filters (4)
  • config/crd/bases/automotive.sdv.cloud.redhat.com_imagebuilds.yaml is excluded by !config/crd/bases/**
  • config/crd/bases/automotive.sdv.cloud.redhat.com_operatorconfigs.yaml is excluded by !config/crd/bases/**
  • config/crd/bases/automotive.sdv.cloud.redhat.com_workspaces.yaml is excluded by !config/crd/bases/**
  • config/rbac/role.yaml is excluded by !config/rbac/role.yaml
📒 Files selected for processing (34)
  • Containerfile.toolchain
  • api/v1alpha1/imagebuild_types.go
  • api/v1alpha1/operatorconfig_types.go
  • api/v1alpha1/workspace_types.go
  • api/v1alpha1/zz_generated.deepcopy.go
  • bundle.Dockerfile
  • cmd/caib/buildcmd/build.go
  • cmd/caib/image/image.go
  • cmd/caib/main.go
  • cmd/caib/root.go
  • cmd/caib/runtime_wiring.go
  • cmd/caib/workspace/workspace.go
  • cmd/main.go
  • config/crd/kustomization.yaml
  • config/rbac/scc_role.yaml
  • config/rbac/scc_role_binding.yaml
  • hack/container-wrapper.sh
  • hack/deploy-catalog.sh
  • hack/workspace-entrypoint.sh
  • internal/buildapi/client/client.go
  • internal/buildapi/client/client_test.go
  • internal/buildapi/server.go
  • internal/buildapi/types.go
  • internal/buildapi/workspace.go
  • internal/buildapi/workspace_test.go
  • internal/common/tasks/scripts/build_image.sh
  • internal/common/tasks/scripts/find_manifest.sh
  • internal/common/tasks/scripts/push_artifact.sh
  • internal/common/tasks/tasks.go
  • internal/controller/imagebuild/controller.go
  • internal/controller/operatorconfig/controller.go
  • internal/controller/operatorconfig/resources.go
  • internal/controller/workspace/controller.go
  • internal/controller/workspace/controller_test.go
✅ Files skipped from review due to trivial changes (12)
  • bundle.Dockerfile
  • config/crd/kustomization.yaml
  • internal/common/tasks/scripts/push_artifact.sh
  • config/rbac/scc_role.yaml
  • cmd/caib/runtime_wiring.go
  • cmd/caib/root.go
  • hack/container-wrapper.sh
  • internal/buildapi/workspace_test.go
  • api/v1alpha1/zz_generated.deepcopy.go
  • internal/common/tasks/tasks.go
  • api/v1alpha1/workspace_types.go
  • internal/buildapi/client/client.go
🚧 Files skipped from review as they are similar to previous changes (11)
  • cmd/caib/main.go
  • config/rbac/scc_role_binding.yaml
  • cmd/main.go
  • hack/deploy-catalog.sh
  • api/v1alpha1/imagebuild_types.go
  • cmd/caib/buildcmd/build.go
  • internal/buildapi/client/client_test.go
  • internal/common/tasks/scripts/find_manifest.sh
  • api/v1alpha1/operatorconfig_types.go
  • cmd/caib/workspace/workspace.go
  • internal/controller/operatorconfig/resources.go

Comment on lines +1567 to +1571
operatorConfig, _ := loadOperatorConfigFn(ctx, k8sClient, namespace)
var wsConfig *automotivev1alpha1.WorkspacesConfig
if operatorConfig != nil {
wsConfig = operatorConfig.Spec.Workspaces
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Silently ignoring OperatorConfig load errors allows policy bypass.

If the API server returns a transient error (timeout, connection reset), the code falls back to nil config instead of failing the request. This skips workspace policy enforcement (storage class, build cache size limits, etc.).

Only ignore NotFound; return 500 on other errors.

🛡️ Proposed fix
-	operatorConfig, _ := loadOperatorConfigFn(ctx, k8sClient, namespace)
+	operatorConfig, err := loadOperatorConfigFn(ctx, k8sClient, namespace)
+	if err != nil && !k8serrors.IsNotFound(err) {
+		return "", fmt.Errorf("loading operator config: %w", err)
+	}
 	var wsConfig *automotivev1alpha1.WorkspacesConfig
 	if operatorConfig != nil {
 		wsConfig = operatorConfig.Spec.Workspaces
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/buildapi/server.go` around lines 1567 - 1571, The code currently
ignores errors from loadOperatorConfigFn and treats any error as a nil
operatorConfig; change the logic in the handler that calls
loadOperatorConfigFn(ctx, k8sClient, namespace) to inspect the returned error:
if err == nil set operatorConfig and wsConfig as before; if the error is a
NotFound (use the appropriate k8s API check e.g. apierrors.IsNotFound) treat
operatorConfig as nil and continue; for any other error return a 500/internal
server error response so policy enforcement is not bypassed. Keep the same
symbols (loadOperatorConfigFn, operatorConfig, wsConfig) so the fix is applied
in the same call site.

RestConfig *rest.Config
}

// +kubebuilder:rbac:groups=automotive.sdv.cloud.redhat.com,resources=workspaces,verbs=get;update
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Expected: the Workspace RBAC rule includes "patch" because updateWorkspaceLease() calls Patch().
rg -n 'resources=workspaces,verbs=' internal/controller/imagebuild/controller.go
rg -n 'Patch\(ctx, ws, patch\)' internal/controller/imagebuild/controller.go

Repository: centos-automotive-suite/automotive-dev-operator

Length of output: 246


🏁 Script executed:

#!/bin/bash
# Verify the function context at line 1457
sed -n '1440,1465p' internal/controller/imagebuild/controller.go

# Check all RBAC annotations in the file
rg -n '\+kubebuilder:rbac' internal/controller/imagebuild/controller.go

Repository: centos-automotive-suite/automotive-dev-operator

Length of output: 3004


🌐 Web query:

Kubernetes RBAC update vs patch verb difference

💡 Result:

In Kubernetes RBAC, the 'update' and 'patch' verbs authorize different HTTP methods for modifying resources. 'update' authorizes PUT requests, which replace the entire resource object (atomic full replacement, requiring the client to provide the complete current state with updated resourceVersion for optimistic concurrency). 'patch' authorizes PATCH requests, which apply partial changes to specific fields without needing the full object (supports strategic merge patch, JSON patch, etc., often used by kubectl apply, edit, set image, scale, etc.). Most kubectl update commands translate to PATCH under the hood, so 'patch' is commonly needed even for operations that conceptually "update". Pure PUT updates (rare in practice) require 'update'. For subresources like /scale, even PUT may require 'patch' permission on the subresource. To allow full modification flexibility, grant both 'update' and 'patch'.

Citations:


Grant patch on workspaces RBAC rule.

updateWorkspaceLease() calls r.Patch(ctx, ws, patch) at line 1457, but the kubebuilder RBAC annotation at line 105 only grants get;update. In Kubernetes RBAC, patch is a separate verb from update and authorizes PATCH HTTP requests. Without it, every lease write-back will be denied.

🔧 Proposed fix
-// +kubebuilder:rbac:groups=automotive.sdv.cloud.redhat.com,resources=workspaces,verbs=get;update
+// +kubebuilder:rbac:groups=automotive.sdv.cloud.redhat.com,resources=workspaces,verbs=get;update;patch

Also applies to: 1455–1457

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/controller/imagebuild/controller.go` at line 105, The RBAC
annotation currently grants only "get;update" for workspaces but
updateWorkspaceLease() uses r.Patch(ctx, ws, patch) which requires the "patch"
verb; update the kubebuilder RBAC comment line that references workspaces (the
line containing
"+kubebuilder:rbac:groups=automotive.sdv.cloud.redhat.com,resources=workspaces,verbs=...")
to include "patch" in the verbs list so PATCH requests from r.Patch are
authorized.

Comment thread internal/controller/imagebuild/controller.go
@bennyz bennyz force-pushed the app-building branch 3 times, most recently from e21638c to b95f1ba Compare March 23, 2026 08:37
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (10)
internal/buildapi/client/client.go (2)

957-960: ⚠️ Potential issue | 🟠 Major

Clone websocket.DefaultDialer here.

A zero-value websocket.Dialer drops Gorilla's default proxy handling and handshake timeout. That makes shell sessions ignore proxy settings and removes the normal timeout protection.

What defaults does github.com/gorilla/websocket.DefaultDialer set compared with a zero-value websocket.Dialer?
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/buildapi/client/client.go` around lines 957 - 960, The code
currently creates a zero-value websocket.Dialer which loses Gorilla's defaults
(proxy handling and handshake timeout); instead initialize dialer by cloning
websocket.DefaultDialer (e.g., dialer := *websocket.DefaultDialer) and then, if
c.httpClient.Transport is an *http.Transport with non-nil TLSClientConfig,
assign that TLSClientConfig into the cloned dialer (dialer.TLSClientConfig =
t.TLSClientConfig) so you preserve default proxy/timeout behavior while applying
the custom TLS config.

770-915: ⚠️ Potential issue | 🟠 Major

Stop pre-escaping workspace names before resolve().

These methods pass url.PathEscape(name) into resolve(), which writes the value into url.URL.Path. URL.String() then escapes the percent sign again, so a workspace like my app goes out as my%2520app. Use url.JoinPath (or a helper that joins raw segments) for the workspace endpoints instead.

In Go net/url, if URL.Path already contains a percent-escaped segment like "my%20app" and RawPath is empty, does URL.String() emit "%2520"? Is url.JoinPath the correct way to compose path segments?

Also applies to: 947-950

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/buildapi/client/client.go` around lines 770 - 915, The code
pre-escapes workspace names with url.PathEscape before calling c.resolve (seen
in GetWorkspace, workspaceAction, DeleteWorkspace, SyncPlan, SyncWorkspace,
workspaceStreamRequest), which causes double-escaping (e.g., "my app" ->
"my%2520app"); remove url.PathEscape(name) calls and instead build the path
using url.JoinPath (or a small helper that joins raw path segments) to compose
"/v1/workspaces", the unescaped name, and optional action/segments, then pass
that joined path into c.resolve so names are escaped exactly once.
internal/controller/workspace/controller.go (2)

195-202: ⚠️ Potential issue | 🔴 Critical

Don't run caller-selected images as privileged root.

ws.Spec.Image is taken verbatim and then launched with RunAsUser=0, Privileged=true, AllowPrivilegeEscalation=true, and the ado-workspace service account. That turns workspace creation into an arbitrary privileged-pod launcher. Restrict image selection to an operator-controlled allowlist/default before pod creation, or drop the privileged/root requirement.

Also applies to: 269-286

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/controller/workspace/controller.go` around lines 195 - 202, The code
currently uses ws.Spec.Image directly (falling back to
wsConfig.GetToolchainImage()) and then creates pods with RunAsUser=0,
Privileged=true, AllowPrivilegeEscalation=true and the ado-workspace service
account, which allows arbitrary privileged images; fix by enforcing an
operator-controlled image allowlist or always using the operator default image
and never honoring arbitrary ws.Spec.Image (validate ws.Spec.Image against
operatorConfig.Spec.Workspaces allowlist before assignment to image), and remove
privileged/root execution by setting a non-zero runAsUser and disabling
Privileged and AllowPrivilegeEscalation in the pod securityContext (also audit
any use of the ado-workspace service account to ensure it has no elevated
permissions), and apply the same changes to the other similar block around the
code handling lines 269-286.

169-176: ⚠️ Potential issue | 🟠 Major

Apply the operator workspace policy when building the Pod.

This controller only uses OperatorConfig.Spec.Workspaces for GetToolchainImage(). resourcesOrDefaults(ws.Spec.Resources) still hardcodes resource defaults, NodeSelector comes only from ws.Spec, and Tolerations never get set, so most of the new workspace policy surface never affects the pods created here.

Also applies to: 195-202, 281-309, 354-363

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/controller/workspace/controller.go` around lines 169 - 176, The Pod
construction in r.buildPod (called as r.buildPod(ws, operatorConfig)) ignores
OperatorConfig.Spec.Workspaces policy: update buildPod (and other places that
call it: the other blocks around the mentioned ranges) to merge workspace policy
from operatorConfig into the Pod spec — specifically use
OperatorConfig.Spec.Workspaces when resolving resources instead of
resourcesOrDefaults(ws.Spec.Resources), apply NodeSelector from the workspace
policy (falling back to ws.Spec.NodeSelector only if policy absent), and set
Tolerations from the workspace policy into the Pod (not leaving Tolerations
unset); also ensure GetToolchainImage() still reads from
OperatorConfig.Spec.Workspaces. Locate and change buildPod, any direct
resource/toleration/nodeSelector handling, and callers at the other indicated
blocks so the operator workspace policy is applied consistently.
Containerfile.toolchain (3)

57-57: ⚠️ Potential issue | 🟡 Minor

Clean EPEL metadata in the same layer.

This leaves DNF cache behind in the final image.

🧹 Proposed fix
-RUN dnf install -y https://dl.fedoraproject.org/pub/epel/epel-release-latest-10.noarch.rpm
+RUN dnf install -y https://dl.fedoraproject.org/pub/epel/epel-release-latest-10.noarch.rpm && dnf clean all
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Containerfile.toolchain` at line 57, The RUN step that installs the EPEL RPM
(the line running "dnf install -y
https://dl.fedoraproject.org/pub/epel/epel-release-latest-10.noarch.rpm") leaves
DNF caches in the layer; modify that RUN to clean DNF metadata/cache in the same
layer by chaining cleanup commands (e.g., run dnf install ... && dnf clean all
&& rm -rf /var/cache/dnf or equivalent) so caches are removed before the layer
is committed.

90-91: ⚠️ Potential issue | 🟠 Major

Install jmp somewhere the workspace user can traverse.

hack/workspace-entrypoint.sh switches to UID 1000 before running jmp config ..., but the PATH here points into /root/.local/jumpstarter/bin. That makes Jumpstarter setup and non-root workspace sessions lose the CLI.

What is the default install directory for jumpstarter's python/install.sh, and does it support --dir for installing into a system-wide path?
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Containerfile.toolchain` around lines 90 - 91, The Containerfile currently
installs Jumpstarter into /root/.local/jumpstarter/bin and sets PATH there,
which breaks non-root workspace runs because hack/workspace-entrypoint.sh
switches to UID 1000 and then jmp config cannot be found; change the Dockerfile
steps so the installer is invoked with a system-writable install location (or
use the install script's --dir option) and set ENV PATH to that location (e.g.,
a system-wide bin under /usr/local) so jmp is executable by UID 1000, or if the
installer lacks --dir, move/copy the installed jmp binary into a system path
(e.g., /usr/local/bin) after installation and update ENV PATH accordingly so
hack/workspace-entrypoint.sh can run jmp config as the workspace user.

10-38: ⚠️ Potential issue | 🟠 Major

Verify AutoSD RPMs instead of bypassing signatures.

These lines disable package verification entirely, and the only key imported afterward is the CentOS official key rather than the Automotive SIG key used for centos-release-autosd. As written, none of the AutoSD packages installed here are authenticated, and the import on Line 38 happens only after the unchecked transaction.

For centos-release-autosd / AutoSD nightly, which RPM GPG key signs packages, and should package installs use gpgcheck while leaving repo_gpgcheck disabled?
internal/common/tasks/scripts/build_image.sh (1)

59-69: ⚠️ Potential issue | 🟠 Major

Drop stale image.json from the persistent cache cleanup.

disk mode does not rewrite $BUILD_DIR/image.json, so with persistent cache enabled the copy on Line 528 can publish the previous build's manifest as the current one. Remove the cached manifest alongside the orphaned image_output--* directories.

🧹 Proposed fix
   for orphan in "$BUILD_DIR"/image_output--*; do
     [ -e "$orphan" ] || break
     echo "Removing orphaned build output: $(basename "$orphan")"
     rm -rf "$orphan"
   done
+  rm -f "$BUILD_DIR/image.json"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/common/tasks/scripts/build_image.sh` around lines 59 - 69, The
persistent cache cleanup currently removes stale items in WORKSPACE_PATH and
orphaned image_output--* dirs but leaves a stale manifest; update the orphan
cleanup in build_image.sh to also remove the cached manifest file by deleting
$BUILD_DIR/image.json (alongside the existing loop that removes
image_output--*), ensuring the deletion runs only when the file exists so the
previous build's image.json cannot be published as the current manifest.
internal/controller/operatorconfig/controller.go (1)

908-912: ⚠️ Potential issue | 🟠 Major

Ignore NoMatch when deleting the workspace SCC.

cleanupWorkspaceInfra() still returns an error when the security.openshift.io API is absent. On a plain Kubernetes cluster that blocks OSBuilds cleanup and finalizer removal even though the SCC was never creatable there. Gate this delete behind detectOpenShift(...) or treat the no-match case as non-fatal.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/controller/operatorconfig/controller.go` around lines 908 - 912,
cleanupWorkspaceInfra() is attempting to delete a
securityv1.SecurityContextConstraints named by workspaceSCCName but still errors
when the security.openshift.io API is absent; update the SCC delete so it either
first checks detectOpenShift(ctx) and skips the delete on non-OpenShift
clusters, or treat a NoMatch API discovery error as non-fatal by catching
metav1.IsNoMatchError (or the client discovery NoMatch) and returning nil
instead of an error after the Delete call; locate the delete logic in
cleanupWorkspaceInfra(), which constructs scc :=
&securityv1.SecurityContextConstraints{} and calls r.Delete(ctx, scc), and
adjust the error handling to ignore NoMatch (or gate the whole block with
detectOpenShift()).
internal/buildapi/workspace.go (1)

224-263: ⚠️ Potential issue | 🟡 Minor

Secret created before workspace existence check can leave orphaned secrets.

The Jumpstarter client config secret is created at line 243 before checking if a workspace with that name already exists (line 251). If the workspace already exists or creation fails later, the secret remains orphaned.

Consider reordering to:

  1. Check workspace existence first
  2. Create the Workspace CR
  3. Create the secret only on successful workspace creation, or set an OwnerReference on the secret pointing to the workspace so it gets garbage-collected automatically.
♻️ Suggested approach
+	// Check if workspace CR already exists FIRST
+	existing := &automotivev1alpha1.Workspace{}
+	err = k8sClient.Get(c.Request.Context(), client.ObjectKey{Namespace: namespace, Name: req.Name}, existing)
+	if err == nil {
+		if existing.DeletionTimestamp != nil {
+			c.JSON(http.StatusConflict, gin.H{"error": fmt.Sprintf("workspace %q is still terminating, try again shortly", req.Name)})
+		} else {
+			c.JSON(http.StatusConflict, gin.H{"error": fmt.Sprintf("workspace %q already exists", req.Name)})
+		}
+		return
+	}
+	if !k8serrors.IsNotFound(err) {
+		c.JSON(http.StatusInternalServerError, gin.H{"error": "failed to check existing workspace"})
+		return
+	}
+
 	// Create secret for Jumpstarter client config if provided
 	var jmpClientSecret string
 	if req.ClientConfig != "" {
-		// ... secret creation ...
+		// ... secret creation (now after existence check) ...
 	}
-
-	// Check if workspace CR already exists
-	existing := &automotivev1alpha1.Workspace{}
-	// ... moved above ...
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/buildapi/workspace.go` around lines 224 - 263, The code creates the
Jumpstarter client Secret (jmpClientSecret) before checking for an existing
Workspace CR (existing / k8sClient.Get) which can leave orphaned Secrets;
reorder and/or attach ownership: first perform the workspace existence check
using k8sClient.Get on automotivev1alpha1.Workspace (existing) and return early
if it exists, then create the Workspace CR and only after successful creation
decode req.ClientConfig and create the Secret, or if you must create the Secret
earlier, set an OwnerReference on the Secret pointing to the created Workspace
(so the Secret is garbage-collected) and handle create errors consistently (use
k8serrors.IsAlreadyExists when appropriate).
🧹 Nitpick comments (1)
cmd/caib/workspace/workspace.go (1)

784-786: Consider logging or limiting silent retries on API errors.

Transient errors during polling are silently continued (line 785). While this is reasonable for brief network hiccups, persistent failures could cause the 5-minute timeout to be reached without any feedback to the user.

Consider adding a counter or logging after multiple consecutive failures.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cmd/caib/workspace/workspace.go` around lines 784 - 786, The loop currently
swallows transient API errors at the "if err != nil { continue // transient
error, retry }" site with no visibility; add a consecutive-failure counter
(e.g., consecutiveErrs) incremented when err != nil and reset on success, emit a
warning via the existing logger after a threshold (e.g., 3) and either apply a
short backoff or return a descriptive error if the count exceeds a larger
threshold (e.g., for the 5-minute poll duration) so persistent failures are
surfaced; update the branch around that "continue" to increment/reset the
counter and call the logger with context (including err) when thresholds are
reached.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@Containerfile.toolchain`:
- Line 57: The RUN step that installs the EPEL RPM (the line running "dnf
install -y
https://dl.fedoraproject.org/pub/epel/epel-release-latest-10.noarch.rpm") leaves
DNF caches in the layer; modify that RUN to clean DNF metadata/cache in the same
layer by chaining cleanup commands (e.g., run dnf install ... && dnf clean all
&& rm -rf /var/cache/dnf or equivalent) so caches are removed before the layer
is committed.
- Around line 90-91: The Containerfile currently installs Jumpstarter into
/root/.local/jumpstarter/bin and sets PATH there, which breaks non-root
workspace runs because hack/workspace-entrypoint.sh switches to UID 1000 and
then jmp config cannot be found; change the Dockerfile steps so the installer is
invoked with a system-writable install location (or use the install script's
--dir option) and set ENV PATH to that location (e.g., a system-wide bin under
/usr/local) so jmp is executable by UID 1000, or if the installer lacks --dir,
move/copy the installed jmp binary into a system path (e.g., /usr/local/bin)
after installation and update ENV PATH accordingly so
hack/workspace-entrypoint.sh can run jmp config as the workspace user.

In `@internal/buildapi/client/client.go`:
- Around line 957-960: The code currently creates a zero-value websocket.Dialer
which loses Gorilla's defaults (proxy handling and handshake timeout); instead
initialize dialer by cloning websocket.DefaultDialer (e.g., dialer :=
*websocket.DefaultDialer) and then, if c.httpClient.Transport is an
*http.Transport with non-nil TLSClientConfig, assign that TLSClientConfig into
the cloned dialer (dialer.TLSClientConfig = t.TLSClientConfig) so you preserve
default proxy/timeout behavior while applying the custom TLS config.
- Around line 770-915: The code pre-escapes workspace names with url.PathEscape
before calling c.resolve (seen in GetWorkspace, workspaceAction,
DeleteWorkspace, SyncPlan, SyncWorkspace, workspaceStreamRequest), which causes
double-escaping (e.g., "my app" -> "my%2520app"); remove url.PathEscape(name)
calls and instead build the path using url.JoinPath (or a small helper that
joins raw path segments) to compose "/v1/workspaces", the unescaped name, and
optional action/segments, then pass that joined path into c.resolve so names are
escaped exactly once.

In `@internal/buildapi/workspace.go`:
- Around line 224-263: The code creates the Jumpstarter client Secret
(jmpClientSecret) before checking for an existing Workspace CR (existing /
k8sClient.Get) which can leave orphaned Secrets; reorder and/or attach
ownership: first perform the workspace existence check using k8sClient.Get on
automotivev1alpha1.Workspace (existing) and return early if it exists, then
create the Workspace CR and only after successful creation decode
req.ClientConfig and create the Secret, or if you must create the Secret
earlier, set an OwnerReference on the Secret pointing to the created Workspace
(so the Secret is garbage-collected) and handle create errors consistently (use
k8serrors.IsAlreadyExists when appropriate).

In `@internal/common/tasks/scripts/build_image.sh`:
- Around line 59-69: The persistent cache cleanup currently removes stale items
in WORKSPACE_PATH and orphaned image_output--* dirs but leaves a stale manifest;
update the orphan cleanup in build_image.sh to also remove the cached manifest
file by deleting $BUILD_DIR/image.json (alongside the existing loop that removes
image_output--*), ensuring the deletion runs only when the file exists so the
previous build's image.json cannot be published as the current manifest.

In `@internal/controller/operatorconfig/controller.go`:
- Around line 908-912: cleanupWorkspaceInfra() is attempting to delete a
securityv1.SecurityContextConstraints named by workspaceSCCName but still errors
when the security.openshift.io API is absent; update the SCC delete so it either
first checks detectOpenShift(ctx) and skips the delete on non-OpenShift
clusters, or treat a NoMatch API discovery error as non-fatal by catching
metav1.IsNoMatchError (or the client discovery NoMatch) and returning nil
instead of an error after the Delete call; locate the delete logic in
cleanupWorkspaceInfra(), which constructs scc :=
&securityv1.SecurityContextConstraints{} and calls r.Delete(ctx, scc), and
adjust the error handling to ignore NoMatch (or gate the whole block with
detectOpenShift()).

In `@internal/controller/workspace/controller.go`:
- Around line 195-202: The code currently uses ws.Spec.Image directly (falling
back to wsConfig.GetToolchainImage()) and then creates pods with RunAsUser=0,
Privileged=true, AllowPrivilegeEscalation=true and the ado-workspace service
account, which allows arbitrary privileged images; fix by enforcing an
operator-controlled image allowlist or always using the operator default image
and never honoring arbitrary ws.Spec.Image (validate ws.Spec.Image against
operatorConfig.Spec.Workspaces allowlist before assignment to image), and remove
privileged/root execution by setting a non-zero runAsUser and disabling
Privileged and AllowPrivilegeEscalation in the pod securityContext (also audit
any use of the ado-workspace service account to ensure it has no elevated
permissions), and apply the same changes to the other similar block around the
code handling lines 269-286.
- Around line 169-176: The Pod construction in r.buildPod (called as
r.buildPod(ws, operatorConfig)) ignores OperatorConfig.Spec.Workspaces policy:
update buildPod (and other places that call it: the other blocks around the
mentioned ranges) to merge workspace policy from operatorConfig into the Pod
spec — specifically use OperatorConfig.Spec.Workspaces when resolving resources
instead of resourcesOrDefaults(ws.Spec.Resources), apply NodeSelector from the
workspace policy (falling back to ws.Spec.NodeSelector only if policy absent),
and set Tolerations from the workspace policy into the Pod (not leaving
Tolerations unset); also ensure GetToolchainImage() still reads from
OperatorConfig.Spec.Workspaces. Locate and change buildPod, any direct
resource/toleration/nodeSelector handling, and callers at the other indicated
blocks so the operator workspace policy is applied consistently.

---

Nitpick comments:
In `@cmd/caib/workspace/workspace.go`:
- Around line 784-786: The loop currently swallows transient API errors at the
"if err != nil { continue // transient error, retry }" site with no visibility;
add a consecutive-failure counter (e.g., consecutiveErrs) incremented when err
!= nil and reset on success, emit a warning via the existing logger after a
threshold (e.g., 3) and either apply a short backoff or return a descriptive
error if the count exceeds a larger threshold (e.g., for the 5-minute poll
duration) so persistent failures are surfaced; update the branch around that
"continue" to increment/reset the counter and call the logger with context
(including err) when thresholds are reached.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 37605479-ca63-4cb6-831b-4f22f363b0eb

📥 Commits

Reviewing files that changed from the base of the PR and between e2d9c98 and 64a3787.

⛔ Files ignored due to path filters (4)
  • config/crd/bases/automotive.sdv.cloud.redhat.com_imagebuilds.yaml is excluded by !config/crd/bases/**
  • config/crd/bases/automotive.sdv.cloud.redhat.com_operatorconfigs.yaml is excluded by !config/crd/bases/**
  • config/crd/bases/automotive.sdv.cloud.redhat.com_workspaces.yaml is excluded by !config/crd/bases/**
  • config/rbac/role.yaml is excluded by !config/rbac/role.yaml
📒 Files selected for processing (34)
  • Containerfile.toolchain
  • api/v1alpha1/imagebuild_types.go
  • api/v1alpha1/operatorconfig_types.go
  • api/v1alpha1/workspace_types.go
  • api/v1alpha1/zz_generated.deepcopy.go
  • bundle.Dockerfile
  • cmd/caib/buildcmd/build.go
  • cmd/caib/image/image.go
  • cmd/caib/main.go
  • cmd/caib/root.go
  • cmd/caib/runtime_wiring.go
  • cmd/caib/workspace/workspace.go
  • cmd/main.go
  • config/crd/kustomization.yaml
  • config/rbac/scc_role.yaml
  • config/rbac/scc_role_binding.yaml
  • hack/container-wrapper.sh
  • hack/deploy-catalog.sh
  • hack/workspace-entrypoint.sh
  • internal/buildapi/client/client.go
  • internal/buildapi/client/client_test.go
  • internal/buildapi/server.go
  • internal/buildapi/types.go
  • internal/buildapi/workspace.go
  • internal/buildapi/workspace_test.go
  • internal/common/tasks/scripts/build_image.sh
  • internal/common/tasks/scripts/find_manifest.sh
  • internal/common/tasks/scripts/push_artifact.sh
  • internal/common/tasks/tasks.go
  • internal/controller/imagebuild/controller.go
  • internal/controller/operatorconfig/controller.go
  • internal/controller/operatorconfig/resources.go
  • internal/controller/workspace/controller.go
  • internal/controller/workspace/controller_test.go
✅ Files skipped from review due to trivial changes (11)
  • config/crd/kustomization.yaml
  • cmd/caib/main.go
  • bundle.Dockerfile
  • internal/common/tasks/scripts/push_artifact.sh
  • hack/container-wrapper.sh
  • cmd/caib/root.go
  • internal/buildapi/types.go
  • config/rbac/scc_role.yaml
  • internal/common/tasks/tasks.go
  • internal/buildapi/client/client_test.go
  • api/v1alpha1/workspace_types.go
🚧 Files skipped from review as they are similar to previous changes (12)
  • config/rbac/scc_role_binding.yaml
  • api/v1alpha1/imagebuild_types.go
  • internal/common/tasks/scripts/find_manifest.sh
  • cmd/main.go
  • hack/deploy-catalog.sh
  • cmd/caib/image/image.go
  • cmd/caib/buildcmd/build.go
  • hack/workspace-entrypoint.sh
  • internal/controller/operatorconfig/resources.go
  • api/v1alpha1/operatorconfig_types.go
  • internal/buildapi/server.go
  • internal/controller/imagebuild/controller.go

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

♻️ Duplicate comments (12)
internal/controller/operatorconfig/controller.go (1)

908-912: ⚠️ Potential issue | 🟡 Minor

Handle isNoMatchError when deleting the workspace SCC on Kubernetes.

The SCC deletion at line 910 checks for errors.IsNotFound(err) but not isNoMatchError(err). On a non-OpenShift cluster, the securitycontextconstraints resource type doesn't exist, and the delete will fail with a "no matches for kind" error rather than NotFound. This will cause cleanup to fail on Kubernetes.

Suggested fix
 	scc := &securityv1.SecurityContextConstraints{}
 	scc.Name = workspaceSCCName
-	if err := r.Delete(ctx, scc); err != nil && !errors.IsNotFound(err) {
+	if err := r.Delete(ctx, scc); err != nil && !errors.IsNotFound(err) && !isNoMatchError(err) {
 		return fmt.Errorf("failed to delete workspace SCC: %w", err)
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/controller/operatorconfig/controller.go` around lines 908 - 912, The
deletion of the SecurityContextConstraints object (scc / workspaceSCCName) uses
r.Delete and only treats errors.IsNotFound(err) as ignorable; update the error
handling to also ignore "no matches for kind" errors by checking
isNoMatchError(err) (or apierrors.IsNoMatchError(err)) in the same conditional
so that a non-OpenShift Kubernetes cluster (where the SCC resource kind doesn't
exist) does not cause cleanup to fail.
cmd/caib/workspace/workspace.go (1)

611-617: ⚠️ Potential issue | 🟠 Major

Preserve argv boundaries in workspace exec.

strings.Join(cmdParts, " ") loses the original quoting and argument split. Commands with spaces or shell metacharacters do not round-trip correctly through WorkspaceExecRequest.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cmd/caib/workspace/workspace.go` around lines 611 - 617, The code currently
flattens command arguments with strings.Join(cmdParts, " ") losing quoting and
argv boundaries; instead preserve the original argument slice (cmdParts) and
pass that slice through WorkspaceExecRequest (do not build a single string named
command) so the executor receives the exact argv array; update places that
reference the joined "command" to use the []string (e.g., cmdParts /
commandArgs) and ensure WorkspaceExecRequest and any downstream handling consume
an argument slice rather than a single combined string.
internal/controller/imagebuild/controller.go (3)

105-105: ⚠️ Potential issue | 🟠 Major

Grant patch on workspaces.

updateWorkspaceLease() uses r.Patch(...) on Lines 1455-1457, but this RBAC rule only grants get;update. Successful builds will still fail the lease write-back with a forbidden error.

🔧 Suggested change
-// +kubebuilder:rbac:groups=automotive.sdv.cloud.redhat.com,resources=workspaces,verbs=get;update
+// +kubebuilder:rbac:groups=automotive.sdv.cloud.redhat.com,resources=workspaces,verbs=get;update;patch

Run this to confirm the local mismatch between the RBAC annotation and the r.Patch call:

#!/bin/bash
rg -n '\+kubebuilder:rbac:groups=automotive\.sdv\.cloud\.redhat\.com,resources=workspaces,verbs=' internal/controller/imagebuild/controller.go
sed -n '1452,1458p' internal/controller/imagebuild/controller.go
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/controller/imagebuild/controller.go` at line 105, The RBAC
annotation for workspaces is missing the "patch" verb while
updateWorkspaceLease() calls r.Patch(...); update the kubebuilder RBAC marker
that references group automotive.sdv.cloud.redhat.com and resource workspaces
(the "+kubebuilder:rbac:...resources=workspaces,verbs=...") to include "patch"
alongside existing verbs so the controller can perform r.Patch in
updateWorkspaceLease without Forbidden errors.

789-792: ⚠️ Potential issue | 🟠 Major

Mirror this fallback in getOrCreateWorkspacePVC().

This fixes the VolumeClaimTemplate branch, but upload-backed builds still go through getOrCreateWorkspacePVC(), which only honors imageBuild.Spec.StorageClass. Storage behavior will still differ whenever uploads are enabled.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/controller/imagebuild/controller.go` around lines 789 - 792, Update
getOrCreateWorkspacePVC to mirror the same storageClass fallback as the
VolumeClaimTemplate branch: when imageBuild.Spec.StorageClass is non-empty use
that, otherwise fall back to operatorConfig.Spec.OSBuilds.StorageClass if
present; adjust the storageClassName/pointer logic in getOrCreateWorkspacePVC
(the same variable used in the VolumeClaimTemplate code path) so upload-backed
builds honor the operatorConfig default storage class as well.

760-772: ⚠️ Potential issue | 🟠 Major

Keep upload-backed builds on the upload PVC.

When BuildCachePVC is set, this branch replaces shared-workspace outright, so the PVC populated by createUploadPod() is never mounted into the build. Workspace builds that still use source_path uploads will miss their uploaded content unless the cache is mounted separately or this combination is rejected.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/controller/imagebuild/controller.go` around lines 760 - 772, The
current branch unconditionally replaces the "shared-workspace" binding when
imageBuild.Spec.BuildCachePVC is set, which drops any upload PVC produced by
createUploadPod() and loses uploaded source files; change the logic in the
workspace-binding section so that if an upload-backed build
(InputFilesServer/upload PVC created earlier by createUploadPod()) exists you do
not overwrite that binding — instead either (A) keep the existing
"shared-workspace" binding and add the cache PVC as a separate workspace binding
(e.g., name it "build-cache") using imageBuild.Spec.BuildCachePVC, or (B) only
use imageBuild.Spec.BuildCachePVC as "shared-workspace" when no upload
PVC/UploadFiles binding exists; update the code around shared-workspace,
createUploadPod(), and VolumeClaimTemplate handling accordingly.
internal/buildapi/server.go (4)

1572-1590: ⚠️ Potential issue | 🔴 Critical

Authorize --extra-repo against ws.Spec.Owner before podExec.

This path checks existence and phase, but never verifies workspace ownership. Any authenticated caller can point workspace-name:/path at someone else’s running workspace and have the API expose that content over HTTP. Thread the requester into resolveExtraRepos() and reject non-owners before starting the server.

Also applies to: 1595-1617

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/buildapi/server.go` around lines 1572 - 1590, resolveExtraRepos
currently validates ExtraRepos format and workspace phase but does not verify
the requester is the workspace owner; update resolveExtraRepos to accept the
requester identity (e.g., caller user ID or principal) and for each workspace
name parsed from req.ExtraRepos load the Workspace object (using the existing
k8s client), compare workspace.Spec.Owner to the requester, and return an
authorization error if they differ before proceeding to start the pod/HTTP
server (podExec). Apply the same ownership check in the second occurrence
handling the other ExtraRepos block (the duplicate at 1595-1617), and reference
BuildRequest.ExtraRepos, resolveExtraRepos, Workspace.Spec.Owner and the podExec
startup point when making the change.

1643-1647: ⚠️ Potential issue | 🟠 Major

Don't swallow OperatorConfig read failures here.

Falling back to nil on every loadOperatorConfigFn error bypasses workspace defaults/policy on transient API failures. Only NotFound should be ignored; other errors should fail the request.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/buildapi/server.go` around lines 1643 - 1647, The code currently
ignores all errors from loadOperatorConfigFn by using the blank identifier;
change it to capture the error (e.g., operatorConfig, err :=
loadOperatorConfigFn(ctx, k8sClient, namespace)) and only suppress NotFound
errors using the Kubernetes API machinery check (apierrors.IsNotFound(err)); for
any other err return or propagate the error from the handler, and only set
wsConfig = operatorConfig.Spec.Workspaces when operatorConfig != nil. Reference:
loadOperatorConfigFn, operatorConfig, wsConfig.

1612-1617: ⚠️ Potential issue | 🟠 Major

Don't treat background http.server launches as success.

Both branches fire python3 -m http.server ... & on fixed ports and immediately inject URLs if podExec itself succeeds. Port conflicts or short-lived server exits still look like success here, which can leave extra_repos/workspace_url pointing at stale or dead content.

Also applies to: 1689-1692

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/buildapi/server.go` around lines 1612 - 1617, The current code
shells out with "python3 -m http.server %d &" via podExec (see variables cmd,
podExec, wsName, workspaceContainerName, port) which treats a successful podExec
as success even if the server immediately fails; change the command to start the
server and verify readiness before returning (for example: start the server
redirecting logs, wait briefly and probe localhost:%d with curl or wget inside
the same pod, and fail the podExec if the probe fails, printing the server log),
and apply the same change to the other launch site referenced (the other podExec
invocation around the second block).

1707-1716: ⚠️ Potential issue | 🟠 Major

Use a deterministic build-cache PVC name.

This list-then-GenerateName flow is TOCTOU. Two concurrent builds for the same workspace can both miss the existing claim and create separate caches, fragmenting warm state and leaving extra PVCs behind.

Also applies to: 1736-1762

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/buildapi/server.go` around lines 1707 - 1716, The current
list-then-GenerateName flow (using pvcList, k8sClient.List, buildCacheLabels and
GenerateName) is TOCTOU and can create duplicate build-cache PVCs; instead
derive a deterministic PVC name from the workspace/labels (e.g., hash of
workspace ID + buildCacheLabels), then try k8sClient.Get for a
corev1.PersistentVolumeClaim with that Name and namespace; if Get returns
NotFound, create a PVC with that exact Name (not GenerateName) and if Create
returns AlreadyExists, re-Get and return the existing PVC name—remove the
list-and-scan logic and the GenerateName usage (also fix the similar code around
the 1736-1762 block) so callers always use the deterministic name and handle
create-race by retrying Get on AlreadyExists.
internal/buildapi/workspace.go (1)

224-263: ⚠️ Potential issue | 🟠 Major

Secret created before workspace uniqueness check can leave orphaned secrets.

The Jumpstarter client config secret is created (lines 226-247) before checking if the workspace CR already exists (lines 249-263). If the workspace already exists, the handler returns a conflict error but the secret remains. Move secret creation after the uniqueness check, or set an OwnerReference on the secret after successful workspace creation.

Suggested restructuring
+	// Check if workspace CR already exists first
+	existing := &automotivev1alpha1.Workspace{}
+	err = k8sClient.Get(c.Request.Context(), client.ObjectKey{Namespace: namespace, Name: req.Name}, existing)
+	if err == nil {
+		if existing.DeletionTimestamp != nil {
+			c.JSON(http.StatusConflict, gin.H{"error": fmt.Sprintf("workspace %q is still terminating, try again shortly", req.Name)})
+		} else {
+			c.JSON(http.StatusConflict, gin.H{"error": fmt.Sprintf("workspace %q already exists", req.Name)})
+		}
+		return
+	}
+	if !k8serrors.IsNotFound(err) {
+		c.JSON(http.StatusInternalServerError, gin.H{"error": "failed to check existing workspace"})
+		return
+	}
+
 	// Create secret for Jumpstarter client config if provided
 	var jmpClientSecret string
 	if req.ClientConfig != "" {
         // ... secret creation ...
 	}
-
-	// Check if workspace CR already exists
-	existing := &automotivev1alpha1.Workspace{}
-	// ...

,

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/buildapi/workspace.go` around lines 224 - 263, The secret for
Jumpstarter client config (jmpClientSecret / variable set in the
req.ClientConfig block that creates a corev1.Secret) is created before the
uniqueness check of the Workspace CR (existing :=
&automotivev1alpha1.Workspace{} and k8sClient.Get), which can leave orphaned
secrets on conflict; move the entire secret creation logic (the req.ClientConfig
base64 decode and k8sClient.Create of the corev1.Secret) so it runs only after
the workspace existence check succeeds (after verifying
!k8serrors.IsNotFound(err)), or alternatively ensure the created secret gets an
OwnerReference to the Workspace CR right after creating the Workspace so it is
garbage-collected on failure—update error handling (k8serrors.IsAlreadyExists)
accordingly.
internal/buildapi/client/client.go (2)

957-960: ⚠️ Potential issue | 🟠 Major

Use websocket.DefaultDialer to preserve proxy and timeout settings.

A zero-value websocket.Dialer{} drops Gorilla's defaults: Proxy (set to http.ProxyFromEnvironment) and HandshakeTimeout (45 seconds). This can cause shell connections to hang when proxies are configured.

Suggested fix
-	dialer := websocket.Dialer{}
-	if t, ok := c.httpClient.Transport.(*http.Transport); ok && t.TLSClientConfig != nil {
-		dialer.TLSClientConfig = t.TLSClientConfig
-	}
+	dialer := *websocket.DefaultDialer
+	if t, ok := c.httpClient.Transport.(*http.Transport); ok {
+		dialer.TLSClientConfig = t.TLSClientConfig
+		if t.Proxy != nil {
+			dialer.Proxy = t.Proxy
+		}
+	}

,

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/buildapi/client/client.go` around lines 957 - 960, Replace the
zero-value websocket.Dialer with a copy of gorilla's defaults so Proxy and
HandshakeTimeout are preserved: initialize the dialer from
websocket.DefaultDialer (e.g., copy *websocket.DefaultDialer into your local
dialer variable), then if c.httpClient.Transport.(*http.Transport) has a non-nil
TLSClientConfig, assign that value to dialer.TLSClientConfig (leave other
DefaultDialer fields intact). Ensure you still reference the same checks around
c.httpClient.Transport and only overwrite TLSClientConfig on the copied dialer.

771-794: ⚠️ Potential issue | 🟠 Major

Double-escaping of workspace names in URL paths.

All workspace methods that handle names use url.PathEscape(name) before passing to resolve(), which assigns to url.URL.Path. When URL.String() is called, percent signs get re-encoded (e.g., my%20appmy%2520app). Use url.JoinPath() instead to properly compose path segments:

-endpoint := c.resolve(path.Join("/v1/workspaces", url.PathEscape(name)))
+endpoint, err := url.JoinPath(c.baseURL.String(), "/v1/workspaces", name)
+if err != nil {
+    return nil, err
+}

Apply this pattern to: GetWorkspace, workspaceAction, DeleteWorkspace, SyncPlan, SyncWorkspace, workspaceStreamRequest, and ShellWorkspace.

,

Also applies to: 806-830, 832-852, 854-883, 885-906, 908-933, 947-948

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/buildapi/client/client.go` around lines 771 - 794, The code
double-escapes workspace names by calling url.PathEscape(name) before composing
the request path; replace composing logic to use url.JoinPath(...) (or
url.URL.JoinPath) to join "/v1/workspaces" and the raw name so percent-encoding
is applied once. Concretely, in GetWorkspace (and similarly in workspaceAction,
DeleteWorkspace, SyncPlan, SyncWorkspace, workspaceStreamRequest,
ShellWorkspace) remove url.PathEscape(name) and instead build the endpoint with
url.JoinPath("/v1/workspaces", name) (or use the URL receiver variant), handle
any returned error, then pass that joined path to c.resolve(...) so names like
"my app" become "%20" only once and are not double-encoded.
🧹 Nitpick comments (2)
api/v1alpha1/zz_generated.deepcopy.go (1)

1159-1163: Generated file should not be committed per repository conventions.

Per retrieved learnings, api/v1alpha1/zz_generated.deepcopy.go is regenerated at deploy/build time via make generate manifests and should not be included in PRs. Consider removing this file from the commit and adding it to .gitignore if not already present.

The generated code itself appears correct - it properly deep-copies the new Workspace, WorkspaceList, WorkspaceSpec, WorkspaceStatus, and WorkspacesConfig types with appropriate handling for reference types (pointers, maps, slices).

Based on learnings: "Generated files that are not committed should be ignored in PRs. Specifically, do not include api/v1alpha1/zz_generated.deepcopy.go..."

Also applies to: 1264-1403

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api/v1alpha1/zz_generated.deepcopy.go` around lines 1159 - 1163, The
generated deepcopy file api/v1alpha1/zz_generated.deepcopy.go should not be
committed; remove this file from the PR commit and update repository ignore
rules so it isn’t tracked (e.g., add api/v1alpha1/zz_generated.deepcopy.go or a
pattern for generated files to .gitignore), then regenerate as part of CI/build
with make generate manifests; you can leave the generated DeepCopyInto/DeepCopy
functions in your working tree locally but ensure only source/type definitions
are committed and the generated file is excluded from version control.
internal/buildapi/client/client.go (1)

732-732: Inconsistent response body close error handling.

The new workspace methods silently discard resp.Body.Close() errors with _ = resp.Body.Close(), while existing methods (e.g., CreateBuild, GetBuild) log warnings. Consider aligning with the existing pattern for consistency.

Also applies to: 758-758, 784-784, 820-820, 846-846, 873-873, 900-900

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/buildapi/client/client.go` at line 732, The new workspace methods
currently discard resp.Body.Close() errors (defer func() { _ = resp.Body.Close()
}()), causing inconsistency with existing methods like CreateBuild and GetBuild;
update each occurrence (the defer at lines shown) to mirror the existing
pattern: replace the discard with a defer that checks the error and logs a
warning (e.g., defer func() { if err := resp.Body.Close(); err != nil {
c.logger.Warnf("failed to close response body: %v", err) } }()), and apply this
change to all listed occurrences so all response body closes are logged
consistently.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@cmd/caib/workspace/workspace.go`:
- Around line 275-280: If a user supplied an explicit --client file, fail fast
when ResolveJumpstarterClient cannot resolve it: after calling
caibcommon.ResolveJumpstarterClient(strings.TrimSpace(clientConfigFile)) check
for err and if clientConfigFile (trimmed) is non-empty, print a clear error
(including err) and exit with non-zero (or return the error) instead of silently
continuing; keep the existing successful-path behavior that sets clientConfigB64
from clientInfo.Data when err == nil. Reference ResolveJumpstarterClient,
clientConfigFile, clientInfo, clientConfigB64.
- Around line 523-539: computeManifest currently opens every path as a regular
file which drops dangling symlinks and hashes symlink targets instead of link
values; change computeManifest to call os.Lstat(absPath) and check
fileInfo.Mode()&os.ModeSymlink: if it's a symlink, call os.Readlink(absPath) and
hash the returned link target string (include dangling links), otherwise open
and io.Copy the file contents as before; keep the same manifest[relPath] =
hex.EncodeToString(h.Sum(nil)) behavior and preserve error
handling/continutation semantics so it stays consistent with tarTrackedFiles'
symlink-preserving behavior.

In `@internal/buildapi/workspace.go`:
- Around line 491-498: The loop that builds the hashing shell script iterates
req.Files and writes unchecked paths into scriptBuf (via scriptBuf.WriteString
and shellQuote), allowing path traversal like ../../etc/passwd; fix by
validating each path before using it: ensure it is not absolute (reject if
filepath.IsAbs), canonicalize with filepath.Clean and reject if it begins with
".." or otherwise escapes the workspace root, and return a 400 error for invalid
paths so only safe relative paths are included in the sha256sum lines.
- Around line 764-768: The echo line that writes "Deploying %s -> %s" uses
unquoted a.Src and a.Dest and can break the generated script if those paths
contain quotes or special chars; update the loop that builds rsyncCmds (the for
_, a := range req.Artifacts block) to pass quoted/escaped values (reuse
shellQuote(a.Src) and shellQuote(a.Dest) or an equivalent escaping helper) into
the fmt.Fprintf that emits the echo so the printed message is safe, while
leaving the existing rsync fmt.Fprintf as-is.

---

Duplicate comments:
In `@cmd/caib/workspace/workspace.go`:
- Around line 611-617: The code currently flattens command arguments with
strings.Join(cmdParts, " ") losing quoting and argv boundaries; instead preserve
the original argument slice (cmdParts) and pass that slice through
WorkspaceExecRequest (do not build a single string named command) so the
executor receives the exact argv array; update places that reference the joined
"command" to use the []string (e.g., cmdParts / commandArgs) and ensure
WorkspaceExecRequest and any downstream handling consume an argument slice
rather than a single combined string.

In `@internal/buildapi/client/client.go`:
- Around line 957-960: Replace the zero-value websocket.Dialer with a copy of
gorilla's defaults so Proxy and HandshakeTimeout are preserved: initialize the
dialer from websocket.DefaultDialer (e.g., copy *websocket.DefaultDialer into
your local dialer variable), then if c.httpClient.Transport.(*http.Transport)
has a non-nil TLSClientConfig, assign that value to dialer.TLSClientConfig
(leave other DefaultDialer fields intact). Ensure you still reference the same
checks around c.httpClient.Transport and only overwrite TLSClientConfig on the
copied dialer.
- Around line 771-794: The code double-escapes workspace names by calling
url.PathEscape(name) before composing the request path; replace composing logic
to use url.JoinPath(...) (or url.URL.JoinPath) to join "/v1/workspaces" and the
raw name so percent-encoding is applied once. Concretely, in GetWorkspace (and
similarly in workspaceAction, DeleteWorkspace, SyncPlan, SyncWorkspace,
workspaceStreamRequest, ShellWorkspace) remove url.PathEscape(name) and instead
build the endpoint with url.JoinPath("/v1/workspaces", name) (or use the URL
receiver variant), handle any returned error, then pass that joined path to
c.resolve(...) so names like "my app" become "%20" only once and are not
double-encoded.

In `@internal/buildapi/server.go`:
- Around line 1572-1590: resolveExtraRepos currently validates ExtraRepos format
and workspace phase but does not verify the requester is the workspace owner;
update resolveExtraRepos to accept the requester identity (e.g., caller user ID
or principal) and for each workspace name parsed from req.ExtraRepos load the
Workspace object (using the existing k8s client), compare workspace.Spec.Owner
to the requester, and return an authorization error if they differ before
proceeding to start the pod/HTTP server (podExec). Apply the same ownership
check in the second occurrence handling the other ExtraRepos block (the
duplicate at 1595-1617), and reference BuildRequest.ExtraRepos,
resolveExtraRepos, Workspace.Spec.Owner and the podExec startup point when
making the change.
- Around line 1643-1647: The code currently ignores all errors from
loadOperatorConfigFn by using the blank identifier; change it to capture the
error (e.g., operatorConfig, err := loadOperatorConfigFn(ctx, k8sClient,
namespace)) and only suppress NotFound errors using the Kubernetes API machinery
check (apierrors.IsNotFound(err)); for any other err return or propagate the
error from the handler, and only set wsConfig = operatorConfig.Spec.Workspaces
when operatorConfig != nil. Reference: loadOperatorConfigFn, operatorConfig,
wsConfig.
- Around line 1612-1617: The current code shells out with "python3 -m
http.server %d &" via podExec (see variables cmd, podExec, wsName,
workspaceContainerName, port) which treats a successful podExec as success even
if the server immediately fails; change the command to start the server and
verify readiness before returning (for example: start the server redirecting
logs, wait briefly and probe localhost:%d with curl or wget inside the same pod,
and fail the podExec if the probe fails, printing the server log), and apply the
same change to the other launch site referenced (the other podExec invocation
around the second block).
- Around line 1707-1716: The current list-then-GenerateName flow (using pvcList,
k8sClient.List, buildCacheLabels and GenerateName) is TOCTOU and can create
duplicate build-cache PVCs; instead derive a deterministic PVC name from the
workspace/labels (e.g., hash of workspace ID + buildCacheLabels), then try
k8sClient.Get for a corev1.PersistentVolumeClaim with that Name and namespace;
if Get returns NotFound, create a PVC with that exact Name (not GenerateName)
and if Create returns AlreadyExists, re-Get and return the existing PVC
name—remove the list-and-scan logic and the GenerateName usage (also fix the
similar code around the 1736-1762 block) so callers always use the deterministic
name and handle create-race by retrying Get on AlreadyExists.

In `@internal/buildapi/workspace.go`:
- Around line 224-263: The secret for Jumpstarter client config (jmpClientSecret
/ variable set in the req.ClientConfig block that creates a corev1.Secret) is
created before the uniqueness check of the Workspace CR (existing :=
&automotivev1alpha1.Workspace{} and k8sClient.Get), which can leave orphaned
secrets on conflict; move the entire secret creation logic (the req.ClientConfig
base64 decode and k8sClient.Create of the corev1.Secret) so it runs only after
the workspace existence check succeeds (after verifying
!k8serrors.IsNotFound(err)), or alternatively ensure the created secret gets an
OwnerReference to the Workspace CR right after creating the Workspace so it is
garbage-collected on failure—update error handling (k8serrors.IsAlreadyExists)
accordingly.

In `@internal/controller/imagebuild/controller.go`:
- Line 105: The RBAC annotation for workspaces is missing the "patch" verb while
updateWorkspaceLease() calls r.Patch(...); update the kubebuilder RBAC marker
that references group automotive.sdv.cloud.redhat.com and resource workspaces
(the "+kubebuilder:rbac:...resources=workspaces,verbs=...") to include "patch"
alongside existing verbs so the controller can perform r.Patch in
updateWorkspaceLease without Forbidden errors.
- Around line 789-792: Update getOrCreateWorkspacePVC to mirror the same
storageClass fallback as the VolumeClaimTemplate branch: when
imageBuild.Spec.StorageClass is non-empty use that, otherwise fall back to
operatorConfig.Spec.OSBuilds.StorageClass if present; adjust the
storageClassName/pointer logic in getOrCreateWorkspacePVC (the same variable
used in the VolumeClaimTemplate code path) so upload-backed builds honor the
operatorConfig default storage class as well.
- Around line 760-772: The current branch unconditionally replaces the
"shared-workspace" binding when imageBuild.Spec.BuildCachePVC is set, which
drops any upload PVC produced by createUploadPod() and loses uploaded source
files; change the logic in the workspace-binding section so that if an
upload-backed build (InputFilesServer/upload PVC created earlier by
createUploadPod()) exists you do not overwrite that binding — instead either (A)
keep the existing "shared-workspace" binding and add the cache PVC as a separate
workspace binding (e.g., name it "build-cache") using
imageBuild.Spec.BuildCachePVC, or (B) only use imageBuild.Spec.BuildCachePVC as
"shared-workspace" when no upload PVC/UploadFiles binding exists; update the
code around shared-workspace, createUploadPod(), and VolumeClaimTemplate
handling accordingly.

In `@internal/controller/operatorconfig/controller.go`:
- Around line 908-912: The deletion of the SecurityContextConstraints object
(scc / workspaceSCCName) uses r.Delete and only treats errors.IsNotFound(err) as
ignorable; update the error handling to also ignore "no matches for kind" errors
by checking isNoMatchError(err) (or apierrors.IsNoMatchError(err)) in the same
conditional so that a non-OpenShift Kubernetes cluster (where the SCC resource
kind doesn't exist) does not cause cleanup to fail.

---

Nitpick comments:
In `@api/v1alpha1/zz_generated.deepcopy.go`:
- Around line 1159-1163: The generated deepcopy file
api/v1alpha1/zz_generated.deepcopy.go should not be committed; remove this file
from the PR commit and update repository ignore rules so it isn’t tracked (e.g.,
add api/v1alpha1/zz_generated.deepcopy.go or a pattern for generated files to
.gitignore), then regenerate as part of CI/build with make generate manifests;
you can leave the generated DeepCopyInto/DeepCopy functions in your working tree
locally but ensure only source/type definitions are committed and the generated
file is excluded from version control.

In `@internal/buildapi/client/client.go`:
- Line 732: The new workspace methods currently discard resp.Body.Close() errors
(defer func() { _ = resp.Body.Close() }()), causing inconsistency with existing
methods like CreateBuild and GetBuild; update each occurrence (the defer at
lines shown) to mirror the existing pattern: replace the discard with a defer
that checks the error and logs a warning (e.g., defer func() { if err :=
resp.Body.Close(); err != nil { c.logger.Warnf("failed to close response body:
%v", err) } }()), and apply this change to all listed occurrences so all
response body closes are logged consistently.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: c40d4201-0728-4063-8522-fbfa3a7bbe36

📥 Commits

Reviewing files that changed from the base of the PR and between 64a3787 and b95f1ba.

⛔ Files ignored due to path filters (4)
  • config/crd/bases/automotive.sdv.cloud.redhat.com_imagebuilds.yaml is excluded by !config/crd/bases/**
  • config/crd/bases/automotive.sdv.cloud.redhat.com_operatorconfigs.yaml is excluded by !config/crd/bases/**
  • config/crd/bases/automotive.sdv.cloud.redhat.com_workspaces.yaml is excluded by !config/crd/bases/**
  • config/rbac/role.yaml is excluded by !config/rbac/role.yaml
📒 Files selected for processing (34)
  • Containerfile.toolchain
  • api/v1alpha1/imagebuild_types.go
  • api/v1alpha1/operatorconfig_types.go
  • api/v1alpha1/workspace_types.go
  • api/v1alpha1/zz_generated.deepcopy.go
  • bundle.Dockerfile
  • cmd/caib/buildcmd/build.go
  • cmd/caib/image/image.go
  • cmd/caib/main.go
  • cmd/caib/root.go
  • cmd/caib/runtime_wiring.go
  • cmd/caib/workspace/workspace.go
  • cmd/main.go
  • config/crd/kustomization.yaml
  • config/rbac/scc_role.yaml
  • config/rbac/scc_role_binding.yaml
  • hack/container-wrapper.sh
  • hack/deploy-catalog.sh
  • hack/workspace-entrypoint.sh
  • internal/buildapi/client/client.go
  • internal/buildapi/client/client_test.go
  • internal/buildapi/server.go
  • internal/buildapi/types.go
  • internal/buildapi/workspace.go
  • internal/buildapi/workspace_test.go
  • internal/common/tasks/scripts/build_image.sh
  • internal/common/tasks/scripts/find_manifest.sh
  • internal/common/tasks/scripts/push_artifact.sh
  • internal/common/tasks/tasks.go
  • internal/controller/imagebuild/controller.go
  • internal/controller/operatorconfig/controller.go
  • internal/controller/operatorconfig/resources.go
  • internal/controller/workspace/controller.go
  • internal/controller/workspace/controller_test.go
✅ Files skipped from review due to trivial changes (10)
  • bundle.Dockerfile
  • cmd/caib/main.go
  • config/crd/kustomization.yaml
  • internal/common/tasks/scripts/push_artifact.sh
  • cmd/caib/root.go
  • hack/container-wrapper.sh
  • config/rbac/scc_role.yaml
  • internal/buildapi/workspace_test.go
  • internal/controller/workspace/controller.go
  • api/v1alpha1/workspace_types.go
🚧 Files skipped from review as they are similar to previous changes (11)
  • cmd/main.go
  • api/v1alpha1/imagebuild_types.go
  • hack/deploy-catalog.sh
  • config/rbac/scc_role_binding.yaml
  • cmd/caib/image/image.go
  • cmd/caib/buildcmd/build.go
  • internal/controller/workspace/controller_test.go
  • internal/common/tasks/tasks.go
  • internal/common/tasks/scripts/find_manifest.sh
  • api/v1alpha1/operatorconfig_types.go
  • internal/controller/operatorconfig/resources.go

Comment on lines +275 to +280
var clientConfigB64 string
clientInfo, err := caibcommon.ResolveJumpstarterClient(strings.TrimSpace(clientConfigFile))
if err == nil {
fmt.Printf("Using Jumpstarter client %q (endpoint: %s)\n", clientInfo.Name, clientInfo.Endpoint)
clientConfigB64 = base64.StdEncoding.EncodeToString(clientInfo.Data)
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fail fast when an explicit --client file can't be resolved.

If the user passes --client and ResolveJumpstarterClient fails, this silently creates the workspace without any Jumpstarter config. That pushes the real error into later deploy/lease operations instead of failing at the flag that caused it.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cmd/caib/workspace/workspace.go` around lines 275 - 280, If a user supplied
an explicit --client file, fail fast when ResolveJumpstarterClient cannot
resolve it: after calling
caibcommon.ResolveJumpstarterClient(strings.TrimSpace(clientConfigFile)) check
for err and if clientConfigFile (trimmed) is non-empty, print a clear error
(including err) and exit with non-zero (or return the error) instead of silently
continuing; keep the existing successful-path behavior that sets clientConfigB64
from clientInfo.Data when err == nil. Reference ResolveJumpstarterClient,
clientConfigFile, clientInfo, clientConfigB64.

Comment on lines +523 to +539
func computeManifest(baseDir string, files []string) map[string]string {
manifest := make(map[string]string, len(files))
for _, relPath := range files {
absPath := filepath.Join(baseDir, relPath)
f, err := os.Open(absPath)
if err != nil {
continue // file may have been deleted since ls-files
}
h := sha256.New()
_, err = io.Copy(h, f)
_ = f.Close()
if err != nil {
continue
}
manifest[relPath] = hex.EncodeToString(h.Sum(nil))
}
return manifest
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Keep the delta-sync manifest consistent with symlink uploads.

tarTrackedFiles() now preserves symlinks, but computeManifest() still opens paths as regular files. Dangling symlinks get dropped from the manifest, and non-dangling ones hash the target contents instead of the link target, so incremental sync can miss symlink-only changes.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cmd/caib/workspace/workspace.go` around lines 523 - 539, computeManifest
currently opens every path as a regular file which drops dangling symlinks and
hashes symlink targets instead of link values; change computeManifest to call
os.Lstat(absPath) and check fileInfo.Mode()&os.ModeSymlink: if it's a symlink,
call os.Readlink(absPath) and hash the returned link target string (include
dangling links), otherwise open and io.Copy the file contents as before; keep
the same manifest[relPath] = hex.EncodeToString(h.Sum(nil)) behavior and
preserve error handling/continutation semantics so it stays consistent with
tarTrackedFiles' symlink-preserving behavior.

Comment on lines +491 to +498
// Build a shell script that hashes only the files the client cares about.
// This avoids scanning build artifacts or other untracked content.
var scriptBuf strings.Builder
scriptBuf.WriteString("cd /workspace/src\n")
for path := range req.Files {
// Only hash regular files that exist; skip missing ones silently
scriptBuf.WriteString(fmt.Sprintf("[ -f %s ] && sha256sum %s\n", shellQuote(path), shellQuote(path)))
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Path traversal allows probing files outside /workspace/src.

The sync plan handler hashes files based on client-provided paths without sanitization. A client can send paths like ../../etc/passwd to probe for file existence outside the workspace directory. Consider validating that paths don't escape the workspace root:

for path := range req.Files {
    if strings.Contains(path, "..") || strings.HasPrefix(path, "/") {
        c.JSON(http.StatusBadRequest, gin.H{"error": fmt.Sprintf("invalid path %q: must be relative without ..", path)})
        return
    }
    // ...
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/buildapi/workspace.go` around lines 491 - 498, The loop that builds
the hashing shell script iterates req.Files and writes unchecked paths into
scriptBuf (via scriptBuf.WriteString and shellQuote), allowing path traversal
like ../../etc/passwd; fix by validating each path before using it: ensure it is
not absolute (reject if filepath.IsAbs), canonicalize with filepath.Clean and
reject if it begins with ".." or otherwise escapes the workspace root, and
return a 400 error for invalid paths so only safe relative paths are included in
the sha256sum lines.

Comment on lines +764 to +768
for _, a := range req.Artifacts {
fmt.Fprintf(&rsyncCmds, "echo \"Deploying %s -> %s\"\n", a.Src, a.Dest)
fmt.Fprintf(&rsyncCmds, "rsync -avz --chmod=+x -e \"ssh -p $LOCAL_PORT $SSH_OPTS $SSH_KEY\" %s root@127.0.0.1:%s\n",
shellQuote(a.Src), shellQuote(a.Dest))
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Unquoted variables in echo can break on special characters.

The echo statement interpolates a.Src and a.Dest without quoting, which could break script syntax if paths contain ", $, or backticks. While the actual rsync commands correctly use shellQuote, the display output is vulnerable.

Suggested fix
 for _, a := range req.Artifacts {
-	fmt.Fprintf(&rsyncCmds, "echo \"Deploying %s -> %s\"\n", a.Src, a.Dest)
+	fmt.Fprintf(&rsyncCmds, "echo 'Deploying' %s '->' %s\n", shellQuote(a.Src), shellQuote(a.Dest))
 	fmt.Fprintf(&rsyncCmds, "rsync -avz --chmod=+x -e \"ssh -p $LOCAL_PORT $SSH_OPTS $SSH_KEY\" %s root@127.0.0.1:%s\n",
 		shellQuote(a.Src), shellQuote(a.Dest))
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/buildapi/workspace.go` around lines 764 - 768, The echo line that
writes "Deploying %s -> %s" uses unquoted a.Src and a.Dest and can break the
generated script if those paths contain quotes or special chars; update the loop
that builds rsyncCmds (the for _, a := range req.Artifacts block) to pass
quoted/escaped values (reuse shellQuote(a.Src) and shellQuote(a.Dest) or an
equivalent escaping helper) into the fmt.Fprintf that emits the echo so the
printed message is safe, while leaving the existing rsync fmt.Fprintf as-is.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (6)
internal/buildapi/workspace.go (3)

801-805: ⚠️ Potential issue | 🟡 Minor

Quote the status echo line too.

The rsync command already uses shellQuote, but the progress echo still interpolates raw a.Src and a.Dest. A path containing ", $, or backticks can break the generated script before the copy starts.

Proposed fix
-		fmt.Fprintf(&rsyncCmds, "echo \"Deploying %s -> %s\"\n", a.Src, a.Dest)
+		fmt.Fprintf(&rsyncCmds, "echo 'Deploying' %s '->' %s\n", shellQuote(a.Src), shellQuote(a.Dest))
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/buildapi/workspace.go` around lines 801 - 805, The echo status line
in the loop that builds rsync commands uses unquoted a.Src and a.Dest and can
break scripts for paths containing quotes or special chars; update the loop that
writes into rsyncCmds (the for _, a := range req.Artifacts block) to pass the
same shell-quoted values used for the rsync call (i.e., call shellQuote(a.Src)
and shellQuote(a.Dest) when constructing the echo line) so both the
human-readable echo and the rsync arguments are safely quoted.

481-495: ⚠️ Potential issue | 🟠 Major

Avoid holding the whole sync archive in RAM.

The 512 MiB cap prevents unbounded growth, but io.ReadAll still allocates up to 512 MiB per request and defeats streaming. A couple of large syncs can still OOM the API pod; if the SPDY path needs EOF-stable input, spool to a temp file or another bounded stream instead.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/buildapi/workspace.go` around lines 481 - 495, The current code
reads the entire upload into memory with io.ReadAll up to maxSyncSize, which can
still OOM; change to stream the request to a disk-backed temporary file instead:
create a temp file (os.CreateTemp), copy from c.Request.Body into it using
io.CopyN or io.LimitReader to enforce maxSyncSize+1, check the written size
(os.Stat or returned bytes) and return HTTP 413 if exceeded, then open the temp
file and pass an io.Reader (not a byte slice) to copyToPod (callsite referencing
copyToPod, workspaceContainerName, maxSyncSize), and ensure the temp file is
removed (defer os.Remove) after use so EOF-stable input is provided without
holding the archive in RAM.

225-245: ⚠️ Potential issue | 🟠 Major

Don’t persist the Jumpstarter secret before the workspace create is known-good.

A duplicate-name conflict or a later k8sClient.Create(ws) failure leaves sensitive client config orphaned, and treating AlreadyExists as success can silently bind stale secret data to a future workspace with the same name. Do the existence checks first, delete the secret again if workspace creation fails, and owner-reference it once the Workspace exists.

Also applies to: 250-264, 298-299

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/buildapi/workspace.go` around lines 225 - 245, The code currently
creates the Jumpstarter secret (jmpClientSecret) before creating the Workspace
and treats k8sClient.Create errors with IsAlreadyExists as success; change this
to first check for an existing secret (using k8sClient.Get) and fail if one
already exists for that name, defer creating the secret until after Workspace
creation succeeds (or if you must create it early, ensure you delete it on
Workspace creation failure), and once the Workspace is created set an
ownerReference on the secret to the created Workspace and update it via
k8sClient.Update; update all occurrences that create or treat AlreadyExists as
success (references: jmpClientSecret variable, k8sClient.Create call, Workspace
creation flow) so no orphaned or stale secret can be bound to a later workspace.
internal/buildapi/client/client.go (1)

980-983: ⚠️ Potential issue | 🟠 Major

Start from websocket.DefaultDialer here.

A zero-value dialer drops Gorilla's default proxy resolution and handshake timeout, so shell sessions can hang indefinitely or ignore HTTP_PROXY/HTTPS_PROXY. Clone websocket.DefaultDialer first and then overlay the transport settings you want to preserve.

Proposed fix
-	dialer := websocket.Dialer{}
-	if t, ok := c.httpClient.Transport.(*http.Transport); ok && t.TLSClientConfig != nil {
-		dialer.TLSClientConfig = t.TLSClientConfig
-	}
+	dialer := *websocket.DefaultDialer
+	if t, ok := c.httpClient.Transport.(*http.Transport); ok {
+		dialer.TLSClientConfig = t.TLSClientConfig
+		dialer.Proxy = t.Proxy
+	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/buildapi/client/client.go` around lines 980 - 983, Replace the
zero-value websocket.Dialer initialization with a clone of
websocket.DefaultDialer so Gorilla's default proxy resolution and handshake
timeout are preserved: start by assigning dialer := *websocket.DefaultDialer,
then if c.httpClient.Transport is an *http.Transport and t.TLSClientConfig !=
nil set dialer.TLSClientConfig = t.TLSClientConfig (leave other DefaultDialer
fields untouched). This keeps existing behavior for proxy and handshake timeout
while applying the transport TLSClientConfig used in the current code path.
Containerfile.toolchain (2)

57-57: ⚠️ Potential issue | 🟡 Minor

Clean DNF metadata in the EPEL layer.

This is the only install step that leaves cache behind, so the final image keeps an avoidable layer of RPM metadata.

Proposed fix
-RUN dnf install -y https://dl.fedoraproject.org/pub/epel/epel-release-latest-10.noarch.rpm
+RUN dnf install -y https://dl.fedoraproject.org/pub/epel/epel-release-latest-10.noarch.rpm && dnf clean all
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Containerfile.toolchain` at line 57, The RUN dnf install -y
https://dl.fedoraproject.org/pub/epel/epel-release-latest-10.noarch.rpm layer
leaves RPM metadata in the image; modify that RUN to clean DNF caches in the
same instruction (e.g., append && dnf clean all && rm -rf /var/cache/dnf
/var/cache/yum) so metadata is removed before the layer is committed, ensuring
no leftover cache remains from the epel install.

10-10: ⚠️ Potential issue | 🟠 Major

Enable RPM verification in the bootstrap stage.

gpgcheck=0, --nogpgcheck, and the best-effort key import after the install mean this stage never authenticates the RPMs it pulls. Import the repo key before dnf install, turn gpgcheck back on, and drop the verification bypasses.

Also applies to: 22-38

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Containerfile.toolchain` at line 10, The bootstrap stage currently disables
RPM verification (ARG GPG_KEY="RPM-GPG-KEY-centosofficial", gpgcheck=0, and
--nogpgcheck) and imports the key only after installs; change the Dockerfile so
you import the repo GPG key referenced by ARG GPG_KEY before any dnf install,
remove gpgcheck=0 and all --nogpgcheck flags, and ensure gpgcheck is enabled
(gpgcheck=1) for repo definitions and yum/dnf calls; update the key import step
to use rpm --import or dnf config-manager --import pointing at ${GPG_KEY} so
installs verify RPM signatures.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@cmd/caib/buildcmd/logs.go`:
- Around line 118-123: The lease writeback currently calls
api.SetWorkspaceLease(ctx, *h.opts.Workspace, leaseID) using the parent ctx
which can hang; wrap that call in a short timeout context (e.g., ctxWithTimeout,
cancel := context.WithTimeout(ctx, <shortDuration>); defer cancel()) and pass
ctxWithTimeout to api.SetWorkspaceLease so the PUT cannot block indefinitely;
reference h.opts.Workspace, leaseID and api.SetWorkspaceLease when applying the
change.

In `@internal/buildapi/workspace.go`:
- Around line 683-715: The websocket writes are performed concurrently by the
stdout pump goroutine (reading from stdoutR and calling wsConn.WriteMessage) and
the main goroutine that sends the close frame after executor.StreamWithContext
returns; protect and serialize writes to avoid gorilla/websocket panics by
funneling all writes through a single writer path or a mutex: either create a
dedicated writer goroutine that reads write requests (binary frames and the
final close message) from a channel and calls wsConn.WriteMessage, or add a
sync.Mutex (e.g., writeMu) and lock around every wsConn.WriteMessage call in the
stdout pump and the close-send logic; also ensure the stdout pump goroutine
exits (or is signaled) before sending the close frame (use a done channel or
waitgroup) so the final close is not raced with remaining binary writes
(references: stdoutR, wsConn.WriteMessage, executor.StreamWithContext,
stdoutW.Close, websocket.FormatCloseMessage).

---

Duplicate comments:
In `@Containerfile.toolchain`:
- Line 57: The RUN dnf install -y
https://dl.fedoraproject.org/pub/epel/epel-release-latest-10.noarch.rpm layer
leaves RPM metadata in the image; modify that RUN to clean DNF caches in the
same instruction (e.g., append && dnf clean all && rm -rf /var/cache/dnf
/var/cache/yum) so metadata is removed before the layer is committed, ensuring
no leftover cache remains from the epel install.
- Line 10: The bootstrap stage currently disables RPM verification (ARG
GPG_KEY="RPM-GPG-KEY-centosofficial", gpgcheck=0, and --nogpgcheck) and imports
the key only after installs; change the Dockerfile so you import the repo GPG
key referenced by ARG GPG_KEY before any dnf install, remove gpgcheck=0 and all
--nogpgcheck flags, and ensure gpgcheck is enabled (gpgcheck=1) for repo
definitions and yum/dnf calls; update the key import step to use rpm --import or
dnf config-manager --import pointing at ${GPG_KEY} so installs verify RPM
signatures.

In `@internal/buildapi/client/client.go`:
- Around line 980-983: Replace the zero-value websocket.Dialer initialization
with a clone of websocket.DefaultDialer so Gorilla's default proxy resolution
and handshake timeout are preserved: start by assigning dialer :=
*websocket.DefaultDialer, then if c.httpClient.Transport is an *http.Transport
and t.TLSClientConfig != nil set dialer.TLSClientConfig = t.TLSClientConfig
(leave other DefaultDialer fields untouched). This keeps existing behavior for
proxy and handshake timeout while applying the transport TLSClientConfig used in
the current code path.

In `@internal/buildapi/workspace.go`:
- Around line 801-805: The echo status line in the loop that builds rsync
commands uses unquoted a.Src and a.Dest and can break scripts for paths
containing quotes or special chars; update the loop that writes into rsyncCmds
(the for _, a := range req.Artifacts block) to pass the same shell-quoted values
used for the rsync call (i.e., call shellQuote(a.Src) and shellQuote(a.Dest)
when constructing the echo line) so both the human-readable echo and the rsync
arguments are safely quoted.
- Around line 481-495: The current code reads the entire upload into memory with
io.ReadAll up to maxSyncSize, which can still OOM; change to stream the request
to a disk-backed temporary file instead: create a temp file (os.CreateTemp),
copy from c.Request.Body into it using io.CopyN or io.LimitReader to enforce
maxSyncSize+1, check the written size (os.Stat or returned bytes) and return
HTTP 413 if exceeded, then open the temp file and pass an io.Reader (not a byte
slice) to copyToPod (callsite referencing copyToPod, workspaceContainerName,
maxSyncSize), and ensure the temp file is removed (defer os.Remove) after use so
EOF-stable input is provided without holding the archive in RAM.
- Around line 225-245: The code currently creates the Jumpstarter secret
(jmpClientSecret) before creating the Workspace and treats k8sClient.Create
errors with IsAlreadyExists as success; change this to first check for an
existing secret (using k8sClient.Get) and fail if one already exists for that
name, defer creating the secret until after Workspace creation succeeds (or if
you must create it early, ensure you delete it on Workspace creation failure),
and once the Workspace is created set an ownerReference on the secret to the
created Workspace and update it via k8sClient.Update; update all occurrences
that create or treat AlreadyExists as success (references: jmpClientSecret
variable, k8sClient.Create call, Workspace creation flow) so no orphaned or
stale secret can be bound to a later workspace.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: a450fbe7-4fc2-4384-95e2-73d34b30dee2

📥 Commits

Reviewing files that changed from the base of the PR and between b95f1ba and 4dca1dc.

📒 Files selected for processing (5)
  • Containerfile.toolchain
  • cmd/caib/buildcmd/logs.go
  • hack/workspace-entrypoint.sh
  • internal/buildapi/client/client.go
  • internal/buildapi/workspace.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • hack/workspace-entrypoint.sh

Comment thread cmd/caib/buildcmd/logs.go
Comment thread internal/buildapi/workspace.go
bennyz added 13 commits March 24, 2026 15:31
- Operator-managed ServiceAccount and SCC binding

Signed-off-by: Benny Zlotnik <bzlotnik@redhat.com>
Assisted-by: claude-opus-4.6
Signed-off-by: Benny Zlotnik <bzlotnik@redhat.com>
Assisted-by: claude-opus-4.6
Signed-off-by: Benny Zlotnik <bzlotnik@redhat.com>
Assisted-by: claude-opus-4.6
Signed-off-by: Benny Zlotnik <bzlotnik@redhat.com>
Assisted-by: claude-opus-4.6
Signed-off-by: Benny Zlotnik <bzlotnik@redhat.com>
Assisted-by: claude-opus-4.6
Signed-off-by: Benny Zlotnik <bzlotnik@redhat.com>
Assisted-by: claude-opus-4.6
Signed-off-by: Benny Zlotnik <bzlotnik@redhat.com>
Assisted-by: claude-opus-4.6
Signed-off-by: Benny Zlotnik <bzlotnik@redhat.com>
Assisted-by: claude-opus-4.6
Signed-off-by: Benny Zlotnik <bzlotnik@redhat.com>
Assisted-by: claude-opus-4.6
Signed-off-by: Benny Zlotnik <bzlotnik@redhat.com>
Assisted-by: claude-opus-4.6
Signed-off-by: Benny Zlotnik <bzlotnik@redhat.com>
Assisted-by: claude-opus-4.6
Signed-off-by: Benny Zlotnik <bzlotnik@redhat.com>
Assisted-by: claude-opus-4.6
The operator SA calls Update on secrets (e.g. setting OwnerReferences,
rotating OAuth cookie-secret) but the ClusterRole only had get, list,
watch, create, delete.

Signed-off-by: Benny Zlotnik <bzlotnik@redhat.com>
Assisted-by: claude-opus-4.6
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants