initial workspace#147
Conversation
|
Caution Review failedPull request was closed or merged during review Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds a Workspace CRD, controller, API endpoints, client and CLI; integrates workspaces into build flows (extra repos, persistent cache, build-cache PVCs); deploys workspace infra and RBAC/SCC updates; provides a toolchain Containerfile, container wrapper and workspace entry scripts; updates Tekton params, build scripts, tests, and CLI wiring. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant CLI
participant APIServer
participant K8s as Kubernetes
participant Pod
rect rgba(100,150,200,0.5)
Note over User,APIServer: Create workspace
User->>CLI: caib workspace create (params)
CLI->>APIServer: POST /v1/workspaces
APIServer->>K8s: create Workspace CR (and optional Secret)
APIServer-->>CLI: 201 WorkspaceResponse
end
rect rgba(150,100,200,0.5)
Note over K8s,Pod: Controller provisions resources
K8s->>K8s: Workspace Controller observes CR
K8s->>K8s: Ensure PVC exists (create if needed)
K8s->>K8s: Create Pod mounting PVC (toolchain image)
K8s-->>APIServer: Status updates (Running/Failed)
end
rect rgba(200,150,100,0.5)
Note over User,Pod: Sync / Shell / Exec flows
User->>CLI: caib workspace sync / shell / exec
CLI->>APIServer: POST /v1/workspaces/:name/sync or WS /shell or POST /exec
APIServer->>Pod: podExec / SPDY attach / copy tar / start tunnel
Pod-->>APIServer: streamed output / success
APIServer-->>CLI: proxied stream/200 OK
end
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 10
🧹 Nitpick comments (2)
internal/buildapi/workspace_test.go (1)
61-112: These specs don't actually exercise the validation branches in their names.The workspace group applies
authMiddleware()before the handlers run, so each case returns 401 before JSON binding or field validation happens. As written, these tests only prove that auth runs first; if you want coverage for missing name / bad JSON / missing command / missing deploy fields, inject an authenticated request or stub auth out in this suite.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/buildapi/workspace_test.go` around lines 61 - 112, The tests in the "Create Workspace"/"Exec Workspace"/"Deploy Workspace" contexts are failing to exercise JSON binding and validation because authMiddleware() runs first and returns 401; update these specs to authenticate the requests (or stub/bypass authMiddleware) so the handler runs and validation is executed: either add a valid auth header/token to the requests created for WorkspaceRequest, WorkspaceExecRequest, and WorkspaceDeployRequest (in the It blocks) or replace/mock authMiddleware for this suite with a no-op middleware so server.router.ServeHTTP reaches the handler and triggers JSON parsing and field validation.cmd/caib/workspace/workspace.go (1)
376-387:syncmaterializes the full archive in RAM.For larger repos this makes the CLI pay an O(repo size) memory spike before the upload even starts. A streaming tar writer plus a counting/progress reader keeps the UX the same without buffering the whole workspace locally.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cmd/caib/workspace/workspace.go` around lines 376 - 387, The current code buffers the entire tar into bytes.Buffer via tarTrackedFiles then builds a progress reader from buf, causing an O(repo size) memory spike; instead stream the tar through an io.Pipe: create pr, pw := io.Pipe(), launch tarTrackedFiles(absDir, files, pw) in a goroutine that closes pw (and calls pw.CloseWithError on error) and capture its error, then wrap the pipe reader with newProgressReader (or a counting reader) and pass that reader directly into client.SyncWorkspace inside ExecuteWithReauth (keeping SyncWorkspace, ExecuteWithReauth, tarTrackedFiles and newProgressReader references), ensuring goroutine errors are propagated back so the upload fails if tar creation fails.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@cmd/caib/workspace/workspace.go`:
- Around line 460-466: The code currently flattens cmdParts into a single shell
string (cmdParts -> command via strings.Join) causing quotes and metacharacters
to be lost; update the code that builds WorkspaceExecRequest to preserve
argument boundaries by passing the original cmdParts slice (e.g., set a
CommandArgs/Args field to cmdParts or change WorkspaceExecRequest.Command to a
[]string) instead of joining into one string, and adjust any
serialization/consumer logic to accept the argv form so arguments are
transmitted verbatim.
In `@Containerfile.toolchain`:
- Around line 21-22: The Containerfile installs Jumpstarter into /root/.local
and sets ENV PATH to reference that, which is inaccessible to random non-root
UIDs on OpenShift; change the RUN that invokes the installer (the line using
"curl ... install.sh | bash -s -- --source main") to pass the install script's
--dir parameter (e.g., --dir /opt/jumpstarter) so the CLI is installed into a
system-accessible location, ensure the container creates and sets appropriate
permissions for /opt/jumpstarter during build, and update the ENV PATH (the
existing ENV PATH="/root/.local/jumpstarter/bin:${PATH}") to point to
/opt/jumpstarter/bin so non-root users can execute the Jumpstarter binary.
In `@internal/buildapi/workspace.go`:
- Around line 669-730: The helpers copyToPod and execInPod currently call
StreamWithContext(context.Background(), ...), ignoring request cancellation;
change both function signatures to accept a context.Context (e.g., copyToPod(ctx
context.Context, restCfg *rest.Config, ...) and execInPod(ctx context.Context,
restCfg *rest.Config, ...)), replace the context.Background() passed to
remotecommand.Executor.StreamWithContext with the incoming ctx, and update all
callers (where sync/exec/deploy handle an http request) to pass
c.Request.Context() into copyToPod and execInPod so SPDY streams honor client
disconnects and deadlines.
- Around line 357-365: The handler currently reads the entire request body into
memory via io.ReadAll (in sync) which can OOM; instead enforce the same max
upload size and stream the body directly into copyToPod: replace io.ReadAll
usage with a size-limited reader (e.g. wrap c.Request.Body with
http.MaxBytesReader or io.LimitReader using your existing maxUploadSize
constant) and pass that reader into copyToPod(restCfg, namespace, podName,
workspaceContainerName, limitedReader, "/workspace/src/"); remove buffering into
tarData and ensure errors from exceeding the limit are handled and returned to
the client.
- Around line 167-175: The code currently reuses ImageBuild.Status.LeaseID when
req.FromBuild is set without verifying ownership; update the call in the handler
to pass the caller identity (requester) into resolveLeaseFromBuild and change
resolveLeaseFromBuild to fetch the ImageBuild and validate the
automotive.sdv.cloud.redhat.com/requested-by annotation matches the requester
before returning ImageBuild.Status.LeaseID, returning an error if it doesn’t
match so unauthorized callers cannot attach another build's lease.
- Around line 177-199: Move creation of the Jumpstarter secret (jmpClientSecret
/ secret and the k8sClient.Create call that currently runs when req.ClientConfig
!= "") to after you perform the duplicate-name/uniqueness checks and after you
successfully create the Workspace CR (the k8sClient.Create(ws) path); do not
treat k8serrors.IsAlreadyExists as a silent success—ensure the name conflict is
surfaced during uniqueness checks and fail early. If you must create the secret
before persisting the Workspace, then delete the secret if k8sClient.Create(ws)
returns an error (call k8sClient.Delete with the same secret) to avoid orphaning
sensitive data; otherwise, set an OwnerReference on the secret (owner = the
created Workspace object) immediately after the Workspace creation so Kubernetes
will garbage-collect the secret with the Workspace. Ensure all references to
jmpClientSecret, secret, and k8sClient.Create are updated accordingly.
- Around line 598-632: The code copies wsConfig.Resources into res via
wsConfig.Resources.DeepCopy(), but DeepCopy can preserve nil Requests/Limits
maps and later assignments to res.Requests[...] or res.Limits[...] will panic;
after creating res (or after wsConfig.Resources.DeepCopy()), ensure res.Requests
and res.Limits are non-nil by initializing them to empty corev1.ResourceList{}
when nil (e.g., check res.Requests == nil and res.Limits == nil) before any use
in the CPU and memory handling paths (affects the res variable created from
corev1.ResourceRequirements and the use sites where
res.Requests[corev1.ResourceCPU], res.Limits[corev1.ResourceCPU],
res.Requests[corev1.ResourceMemory], res.Limits[corev1.ResourceMemory] are
assigned).
In `@internal/controller/workspace/controller.go`:
- Around line 103-127: The PVC creation uses GenerateName (pvcSuffix) which can
create duplicate PVCs if r.Status().Patch fails and ws.Status.PVCName stays
empty; modify the reconcile in this block to first search for an existing PVC
owned by the Workspace (look up PVCs in ws.Namespace with a deterministic name
pattern or by ownerReference) and reuse it instead of always creating a new one,
or switch to a deterministic name (e.g., build pvcName := ws.Name + pvcSuffix)
and use that for the PVC ObjectMeta.Name so retries address the same resource;
ensure you still call controllerutil.SetControllerReference(ws, pvc, r.Scheme)
before r.Create and only create when the lookup does not return an existing
owned PVC, then set ws.Status.PVCName and patch status as before.
- Around line 50-52: The reconcile currently swallows transient errors from
ensurePVC (and ensurePod) by only calling r.setStatus(...) and returning no
error, which prevents controller-runtime from retrying; change the flow so you
still call r.setStatus(ctx, ws, "Failed", fmt.Sprintf("PVC error: %v", err))
(and similarly for Pod errors) but then return the original error (or a wrapped
error) from Reconcile so controller-runtime will requeue and retry; update the
branches handling ensurePVC and ensurePod failures to set status first and then
return the error instead of returning only the result of setStatus.
- Around line 160-163: The current code uses user-controlled ws.Spec.Image
(falling back to automotivev1alpha1.DefaultToolchainImage) and launches pods as
privileged root; change this to validate ws.Spec.Image against an
operator-controlled allowlist (deny anything not explicitly allowed and fall
back to DefaultToolchainImage), and remove Privileged=true and RunAsUser=0 from
the pod's SecurityContext (or set a non-root uid and ensure
allowPrivilegeEscalation=false). Locate the image selection (image :=
ws.Spec.Image / DefaultToolchainImage) and the pod spec creation code (the block
around lines 213-235 that sets SecurityContext/Privileged) and implement: image
validation against the allowlist, enforce a safe non-root SecurityContext, and
ensure the pod runs under a restricted service account with no privileged
capabilities.
---
Nitpick comments:
In `@cmd/caib/workspace/workspace.go`:
- Around line 376-387: The current code buffers the entire tar into bytes.Buffer
via tarTrackedFiles then builds a progress reader from buf, causing an O(repo
size) memory spike; instead stream the tar through an io.Pipe: create pr, pw :=
io.Pipe(), launch tarTrackedFiles(absDir, files, pw) in a goroutine that closes
pw (and calls pw.CloseWithError on error) and capture its error, then wrap the
pipe reader with newProgressReader (or a counting reader) and pass that reader
directly into client.SyncWorkspace inside ExecuteWithReauth (keeping
SyncWorkspace, ExecuteWithReauth, tarTrackedFiles and newProgressReader
references), ensuring goroutine errors are propagated back so the upload fails
if tar creation fails.
In `@internal/buildapi/workspace_test.go`:
- Around line 61-112: The tests in the "Create Workspace"/"Exec
Workspace"/"Deploy Workspace" contexts are failing to exercise JSON binding and
validation because authMiddleware() runs first and returns 401; update these
specs to authenticate the requests (or stub/bypass authMiddleware) so the
handler runs and validation is executed: either add a valid auth header/token to
the requests created for WorkspaceRequest, WorkspaceExecRequest, and
WorkspaceDeployRequest (in the It blocks) or replace/mock authMiddleware for
this suite with a no-op middleware so server.router.ServeHTTP reaches the
handler and triggers JSON parsing and field validation.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: d373c1c1-fe1d-4985-af7b-5731e8ba75b8
⛔ Files ignored due to path filters (3)
config/crd/bases/automotive.sdv.cloud.redhat.com_operatorconfigs.yamlis excluded by!config/crd/bases/**config/crd/bases/automotive.sdv.cloud.redhat.com_workspaces.yamlis excluded by!config/crd/bases/**config/rbac/role.yamlis excluded by!config/rbac/role.yaml
📒 Files selected for processing (16)
Containerfile.toolchainapi/v1alpha1/operatorconfig_types.goapi/v1alpha1/workspace_types.goapi/v1alpha1/zz_generated.deepcopy.gocmd/caib/root.gocmd/caib/workspace/workspace.gocmd/main.goconfig/crd/kustomization.yamlinternal/buildapi/client/client.gointernal/buildapi/server.gointernal/buildapi/workspace.gointernal/buildapi/workspace_test.gointernal/controller/imagebuild/controller.gointernal/controller/operatorconfig/controller.gointernal/controller/operatorconfig/resources.gointernal/controller/workspace/controller.go
| // Everything after the workspace name (or after --) is the command | ||
| cmdParts := args[1:] | ||
| if len(cmdParts) == 0 { | ||
| handleError(fmt.Errorf("no command specified; use: caib workspace exec <name> -- <command>")) | ||
| } | ||
| command := strings.Join(cmdParts, " ") | ||
|
|
There was a problem hiding this comment.
Flattening cmdParts with spaces loses the original argv.
Commands with spaces, quotes, or shell metacharacters do not round-trip reliably once the args are rejoined. Preserve argument boundaries when building WorkspaceExecRequest.Command, or move the API to an argv form instead of a shell string.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@cmd/caib/workspace/workspace.go` around lines 460 - 466, The code currently
flattens cmdParts into a single shell string (cmdParts -> command via
strings.Join) causing quotes and metacharacters to be lost; update the code that
builds WorkspaceExecRequest to preserve argument boundaries by passing the
original cmdParts slice (e.g., set a CommandArgs/Args field to cmdParts or
change WorkspaceExecRequest.Command to a []string) instead of joining into one
string, and adjust any serialization/consumer logic to accept the argv form so
arguments are transmitted verbatim.
| // Create secret for Jumpstarter client config if provided | ||
| var jmpClientSecret string | ||
| if req.ClientConfig != "" { | ||
| clientConfigBytes, decErr := base64.StdEncoding.DecodeString(req.ClientConfig) | ||
| if decErr != nil { | ||
| c.JSON(http.StatusBadRequest, gin.H{"error": "clientConfig must be base64 encoded"}) | ||
| return | ||
| } | ||
| jmpClientSecret = req.Name + "-jumpstarter-client" | ||
| secret := &corev1.Secret{ | ||
| ObjectMeta: metav1.ObjectMeta{ | ||
| Name: jmpClientSecret, | ||
| Namespace: namespace, | ||
| }, | ||
| Type: corev1.SecretTypeOpaque, | ||
| Data: map[string][]byte{ | ||
| "client.yaml": clientConfigBytes, | ||
| }, | ||
| } | ||
| if createErr := k8sClient.Create(c.Request.Context(), secret); createErr != nil && !k8serrors.IsAlreadyExists(createErr) { | ||
| c.JSON(http.StatusInternalServerError, gin.H{"error": fmt.Sprintf("failed to create client config secret: %v", createErr)}) | ||
| return | ||
| } |
There was a problem hiding this comment.
Create the Jumpstarter secret only after the workspace request is known-good.
The secret is created before the duplicate-name check and before the Workspace CR write, and AlreadyExists is treated as success. A conflict or later create failure leaves sensitive client config orphaned, and a stale secret can be silently rebound to a new workspace with the same name. Move secret creation after the uniqueness/resource checks, delete it again if k8sClient.Create(ws) fails, and owner-reference it after a successful create.
Also applies to: 202-243
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@internal/buildapi/workspace.go` around lines 177 - 199, Move creation of the
Jumpstarter secret (jmpClientSecret / secret and the k8sClient.Create call that
currently runs when req.ClientConfig != "") to after you perform the
duplicate-name/uniqueness checks and after you successfully create the Workspace
CR (the k8sClient.Create(ws) path); do not treat k8serrors.IsAlreadyExists as a
silent success—ensure the name conflict is surfaced during uniqueness checks and
fail early. If you must create the secret before persisting the Workspace, then
delete the secret if k8sClient.Create(ws) returns an error (call
k8sClient.Delete with the same secret) to avoid orphaning sensitive data;
otherwise, set an OwnerReference on the secret (owner = the created Workspace
object) immediately after the Workspace creation so Kubernetes will
garbage-collect the secret with the Workspace. Ensure all references to
jmpClientSecret, secret, and k8sClient.Create are updated accordingly.
| { | ||
| APIGroups: []string{"security.openshift.io"}, | ||
| Resources: []string{"securitycontextconstraints"}, | ||
| ResourceNames: []string{"privileged"}, |
There was a problem hiding this comment.
Do we really need the workspaces to be privileged?
There was a problem hiding this comment.
I didn't find another way (yet) to allow users to sudo dnf install
There was a problem hiding this comment.
I still need to think about this, as this might be a problem with custom images
There was a problem hiding this comment.
Looks like OpenShift 4.20 might have a solution for this, I'll check
There was a problem hiding this comment.
Actionable comments posted: 8
♻️ Duplicate comments (6)
Containerfile.toolchain (1)
77-78:⚠️ Potential issue | 🟠 MajorInstall Jumpstarter into a non-root-accessible path.
Line 78 points to
/root/.local/jumpstarter/bin, which is not usable for random non-root UIDs in OpenShift-style runtimes. This was already flagged previously and still appears unresolved.Suggested fix
-RUN curl -fsSL https://raw.githubusercontent.com/jumpstarter-dev/jumpstarter/main/python/install.sh | bash -s -- --source main -ENV PATH="/root/.local/jumpstarter/bin:${PATH}" +RUN curl -fsSL https://raw.githubusercontent.com/jumpstarter-dev/jumpstarter/main/python/install.sh | \ + bash -s -- --source main --dir /opt/jumpstarter +ENV PATH="/opt/jumpstarter/bin:${PATH}"#!/bin/bash set -euo pipefail # Verify installer supports custom target directory and current default path behavior curl -fsSL https://raw.githubusercontent.com/jumpstarter-dev/jumpstarter/main/python/install.sh | rg -n -- '--dir|/root/.local|jumpstarter'🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Containerfile.toolchain` around lines 77 - 78, The Dockerfile currently pipes the installer via the RUN line that executes "curl ... install.sh | bash -s -- --source main" and then sets ENV PATH to "/root/.local/jumpstarter/bin:${PATH}", which breaks non-root OpenShift UIDs; update the RUN invocation to pass the installer a non-root target (e.g. add "--dir /opt/jumpstarter" or another directory writable by arbitrary container users) and change the subsequent ENV PATH entry to point to that directory's bin (for example ENV PATH="/opt/jumpstarter/bin:${PATH}"), and ensure the RUN step creates the directory and sets permissive ownership/mode so non-root users can execute the installed binaries (adjust commands surrounding the "curl ... install.sh" RUN and the ENV PATH assignment).cmd/caib/workspace/workspace.go (2)
463-469:⚠️ Potential issue | 🟠 MajorPreserve argv boundaries for
workspace exec.
strings.Join(cmdParts, " ")still changes the command whenever an argument contains spaces, quotes, or shell metacharacters. Transport the originalcmdPartsslice through the API instead of flattening it into one shell string.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cmd/caib/workspace/workspace.go` around lines 463 - 469, The code flattens exec arguments with strings.Join(cmdParts, " ") which loses argv boundaries; instead, keep and pass the original []string cmdParts through the API (don't create the string variable command) so each argument, quotes and metacharacters are preserved. Update any function/method signatures and payloads that currently accept the single command string (look for uses of the variable command and functions handling workspace exec) to accept []string (e.g., cmdParts or args) and adjust call sites/serialization accordingly so the exec path transmits and executes the argument slice intact.
421-433:⚠️ Potential issue | 🔴 CriticalReject or preserve symlinks during
workspace sync.
os.Openplusf.Stat()dereferences tracked symlinks. A repo containing something likefoo -> ~/.ssh/id_rsawill upload the key contents to the cluster underfoo, and even benign symlinks no longer round-trip correctly. Useos.Lstatfirst and either emit a symlink entry or fail closed.🔒 Suggested hardening
for _, relPath := range files { absPath := filepath.Join(baseDir, relPath) - f, err := os.Open(absPath) - if err != nil { - continue // file may have been deleted since ls-files - } - fi, err := f.Stat() + fi, err := os.Lstat(absPath) if err != nil { - _ = f.Close() - continue + if os.IsNotExist(err) { + continue + } + return fmt.Errorf("stat %s: %w", relPath, err) + } + if fi.Mode()&os.ModeSymlink != 0 { + return fmt.Errorf("tracked symlink %s is not supported by sync", relPath) } if !fi.Mode().IsRegular() { - _ = f.Close() continue } + f, err := os.Open(absPath) + if err != nil { + if os.IsNotExist(err) { + continue + } + return fmt.Errorf("open %s: %w", relPath, err) + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cmd/caib/workspace/workspace.go` around lines 421 - 433, The loop currently uses os.Open + f.Stat which dereferences symlinks; change to use os.Lstat(absPath) to detect os.FileMode&os.ModeSymlink first, and if it's a symlink call os.Readlink(absPath) and either emit a symlink entry (store the link target and mark the entry as symlink) or return an error to fail closed; only for non-symlinks proceed to open the file (os.Open) and use f.Stat()/fi.Mode().IsRegular as before. Update handling around variables absPath, relPath, files and ensure closed file handles are preserved.internal/controller/workspace/controller.go (3)
50-52:⚠️ Potential issue | 🟠 MajorReturn reconcile errors after setting failure status.
Transient PVC/Pod errors are swallowed here. If status patch succeeds, the controller may stop retrying and workspace can stay stuck.
Also applies to: 58-60
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/controller/workspace/controller.go` around lines 50 - 52, The current code calls r.setStatus(...) and returns ctrl.Result{} which swallows the original reconcile error; change both the ensurePVC block (r.ensurePVC(ctx, ws)) and the similar block at lines 58-60 to set the workspace status via r.setStatus(...) and then return the original error (or a wrapped version) so the controller requeues and retries; locate the logic around r.ensurePVC and the other ensure-* call and make the result return the error instead of an empty ctrl.Result.
103-107:⚠️ Potential issue | 🟠 MajorUse deterministic PVC naming (or lookup) to avoid duplicate claims.
GenerateNameplus status-patch retries can create multiple PVCs for the same workspace when status isn’t persisted.Also applies to: 124-127
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/controller/workspace/controller.go` around lines 103 - 107, The PVC creation uses GenerateName which can produce duplicate PVCs on retries; change to deterministic naming and idempotent creation: compute a stable PVC name (e.g., based on ws.Name and ws.UID or a constant pvcSuffix) instead of using GenerateName, then try to Get the PVC by that name (or List by a unique label/ownerRef) and only Create if not found; update the pvc variable and any ownerReference/labels consistently (references: the pvc variable, GenerateName usage, and the workspace `ws` object) so repeated status-patch retries do not create duplicate PVCs.
160-163:⚠️ Potential issue | 🔴 CriticalDo not run user-selected workspace images as privileged root.
A caller-controlled image is launched with
RunAsUser=0andPrivileged=true. This is a critical privilege-escalation path.Also applies to: 228-248
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/controller/workspace/controller.go` around lines 160 - 163, The code currently treats any caller-selected image the same as the default and ends up launching it as root with Privileged=true; change the pod/security context logic so that if image != automotivev1alpha1.DefaultToolchainImage (i.e., a user-supplied image) you do NOT set RunAsUser=0 and do NOT enable privileged mode — set securityContext.RunAsUser to a non-root UID (e.g., 1000) and securityContext.Privileged to false; keep the existing root/privileged behavior only for the trusted DefaultToolchainImage. Update the code paths that construct the Pod spec/securityContext (the logic that uses image := ws.Spec.Image and any function that applies the container.SecurityContext or pod.Spec.SecurityContext) accordingly.
🧹 Nitpick comments (6)
Containerfile.toolchain (1)
57-57: Clean DNF cache in the same EPEL install layer.Line 57 installs EPEL in a separate layer without cleanup, so metadata/artifacts from that layer still bloat the final image.
Suggested change
-RUN dnf install -y https://dl.fedoraproject.org/pub/epel/epel-release-latest-10.noarch.rpm +RUN dnf install -y https://dl.fedoraproject.org/pub/epel/epel-release-latest-10.noarch.rpm \ + && dnf clean all🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Containerfile.toolchain` at line 57, The Docker layer installing EPEL (the RUN dnf install -y https://dl.fedoraproject.org/pub/epel/epel-release-latest-10.noarch.rpm instruction) leaves package metadata in a separate layer; modify that RUN to perform the install and then immediately clean DNF caches in the same layer (e.g., run dnf install ... && dnf clean all && rm -rf /var/cache/dnf) so metadata/artifacts are removed before the layer is committed.internal/buildapi/workspace_test.go (1)
32-44: The suite currently stops at auth middleware.The
401checks prove the routes exist, but the create/exec/deploy cases never reach JSON binding or field validation, and/v1/workspaces/:name/shellis still missing from the auth matrix. Add at least one authenticated400-path test per handler plus a shell auth test so the new API surface is actually guarded against regressions.Also applies to: 61-113
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/buildapi/workspace_test.go` around lines 32 - 44, Current auth-only tests stop at middleware and don't exercise handler-level JSON binding/validation; update the test suite in workspace_test.go so each route in the Authentication matrix (e.g., POST /v1/workspaces, POST /v1/workspaces/my-app/sync, POST /v1/workspaces/my-app/exec, POST /v1/workspaces/my-app/deploy, plus add GET/DELETE entries) has one authenticated request that includes valid auth (set the same auth header/token used elsewhere in tests) but invalid or empty JSON to assert a 400 response from the handler (this ensures create/exec/deploy paths hit binding/validation logic), and add the missing /v1/workspaces/:name/shell route to the matrix with both unauthenticated 401 and an authenticated 400 test; apply the same pattern to the other test blocks referenced (lines ~61-113) so each handler has at least one authenticated 400-path test.api/v1alpha1/zz_generated.deepcopy.go (1)
1-1403: Prefer excluding generated deepcopy artifacts from this PR.This file is generated output; keep PR scope to source changes where possible and rely on generation in the standard build/deploy flow.
Based on learnings: “Generated files that are not committed should be ignored in PRs… do not include api/v1alpha1/zz_generated.deepcopy.go … ensure PRs only contain source changes.”
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@api/v1alpha1/zz_generated.deepcopy.go` around lines 1 - 1403, The diff adds the generated file api/v1alpha1/zz_generated.deepcopy.go which should not be committed; remove this file from the PR (undo the commit or git rm --cached if already committed), update repository rules so generated deepcopy files are not committed (add api/v1alpha1/zz_generated.deepcopy.go to .gitignore or adjust existing ignore patterns), and ensure the generation step (e.g., controller-gen / make generate) remains the authoritative source so CI/build regenerates the file as needed.api/v1alpha1/workspace_types.go (1)
35-45: Add CRD validation markers forownerandpvcSize.Consider
+kubebuilder:validation:MinLength=1forownerand a quantity pattern (or enum policy) forpvcSizeto reject invalid specs at admission time instead of at reconcile time.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@api/v1alpha1/workspace_types.go` around lines 35 - 45, Add kubebuilder validation markers to the Workspace spec fields: annotate the Owner field (Owner) with +kubebuilder:validation:MinLength=1 to prevent empty owner strings, and annotate PVCSize (PVCSize) with a validation marker that enforces allowed volume quantity formats (either a pattern regex like +kubebuilder:validation:Pattern=`^([0-9]+)(Ei|Pi|Ti|Gi|Mi|Ki|E|P|T|G|M|K|Ei|Pi|Ti|Gi|Mi|Ki|e|p|t|g|m|k)?$` or an enum of accepted sizes) so invalid PVC sizes are rejected at admission time; add these comments immediately above the corresponding struct fields in the Workspace type.api/v1alpha1/operatorconfig_types.go (2)
50-58: Consider pinning a specific version forDefaultToolchainImage.Using the
:latesttag can lead to unpredictable behavior when the image is updated upstream. The other default images in this file (e.g.,DefaultAutomotiveImageBuilderImage) use pinned versions like:1.1.11. Consider following the same pattern for consistency and reproducibility.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@api/v1alpha1/operatorconfig_types.go` around lines 50 - 58, The DefaultToolchainImage constant currently uses the unstable ":latest" tag; change the DefaultToolchainImage value to a specific, pinned image tag (e.g., match the project's image versioning like ":1.1.11" or the known stable release for quay.io/centos-sig-automotive/autosd-toolchain) to ensure reproducibility—edit the DefaultToolchainImage constant in operatorconfig_types.go to replace ":latest" with the chosen pinned tag.
329-331: Consider adding validation forPVCSizeformat.The
PVCSizefield accepts any string, but it should conform to Kubernetes quantity format (e.g.,10Gi,500Mi). While this is consistent withOSBuildsConfig.PVCSize, adding a kubebuilder validation pattern would catch invalid values at admission time.Example validation annotation
// PVCSize is the size for persistent volume claims created for workspace storage + // +kubebuilder:validation:Pattern=`^([+-]?[0-9.]+)([eEinumkKMGTP]*[-+]?[0-9]*)$` // +optional PVCSize string `json:"pvcSize,omitempty"`🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@api/v1alpha1/operatorconfig_types.go` around lines 329 - 331, The PVCSize field currently accepts any string; add a kubebuilder validation annotation above the PVCSize field to enforce Kubernetes quantity format (e.g., 10Gi, 500Mi). Update the struct field for PVCSize in operatorconfig_types.go by adding a +kubebuilder:validation:Pattern annotation that matches Kubernetes resource quantity syntax and keep the existing json tag and optional comment; ensure the pattern is broad enough to accept common suffixes (Ki, Mi, Gi, Ti, Pi, Ei, or plain integer) used by other fields like OSBuildsConfig.PVCSize for consistency.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@cmd/caib/workspace/workspace.go`:
- Around line 385-390: The progress reader `pr` (created via
`newProgressReader`) is built once before calling
`caibcommon.ExecuteWithReauth`, so if `ExecuteWithReauth` retries the callback
(after re-auth or kubeconfig fallback) subsequent `client.SyncWorkspace` calls
read an already-consumed stream; move creation of the tar reader into the
callback passed to `ExecuteWithReauth` so that a fresh reader (and fresh
`newProgressReader`/buffer) is created on each attempt; update the closure that
calls `client.SyncWorkspace(context.Background(), name, pr)` to recreate the
buffer and `pr` from the original data inside the function scope using the same
inputs (`serverURL`, `authToken`, `insecureSkipTLS`, `name`) before invoking
`SyncWorkspace`.
In `@Containerfile.toolchain`:
- Around line 73-74: The Dockerfile currently appends a global NOPASSWD sudo
rule (the RUN echo 'ALL ALL=(ALL) NOPASSWD: ALL' >> /etc/sudoers) which makes
the container overly privileged; instead remove or replace that global rule with
a least-privilege sudoers entry for a named non-root user, create that user and
set ownership, and add a USER directive to switch to that non-root user (e.g.,
create user via useradd/adduser, restrict sudo rights to that username in
/etc/sudoers.d/<name> without NOPASSWD unless explicitly required, and add USER
<username> before interactive/workspace layers around where USER is currently
missing) so the image does not default to root and sudo is limited to the
specific account.
- Around line 22-35: The repo and dnf install steps currently disable RPM
signature checks; update the repository file created by the heredoc for
${DISTRO} to set gpgcheck=1 (instead of gpgcheck=0) and remove the --nogpgcheck
flag from the RUN dnf install invocation so individual RPM signatures are
verified; do not set repo_gpgcheck=1 (leave it unset or 0) since the nightly
repo does not publish signed repomd.xml.asc, and ensure the GPG key imported at
the later step remains available for validation.
In `@internal/buildapi/server.go`:
- Around line 1533-1536: The current fire-and-forget podExec call that runs
"python3 -m http.server ... &" can leak processes and cause port conflicts;
modify the logic around
podExec/repoPath/ws.Status.PodName/workspaceContainerName to (1) check for an
existing server on the target port inside the workspace container (e.g., with
netstat/ss or a curl health check) and reuse it if healthy, (2) if no server
exists, start the server in a managed way that records the server PID to a known
file in the repo (e.g., /tmp/extra_repo_http.pid) instead of launching an
unchecked background process, and (3) ensure cleanup by killing the PID and
removing the PID file from the same codepath that tears down the workspace (add
explicit cleanup in your workspace shutdown/cleanup function using podExec to
run kill and rm against the PID file); also add a simple health-check invocation
after start to verify the server is up and fail the start if it doesn't respond.
- Around line 1516-1522: After fetching the Workspace (ws) with k8sClient.Get
and checking ws.Status.Phase, enforce ownership by comparing ws.Spec.Owner to
the authenticated requester (use the existing auth principal variable from the
handler/context — e.g. authUser, requesterID, or extract from ctx) and return a
forbidden error if they differ; apply the same ownership check before any
podExec invocation so that podExec is only called when ws.Spec.Owner ==
authenticated requester. Ensure the error returned is clear (forbidden/not
owner) and do not proceed to podExec when ownership fails.
- Around line 1534-1536: The current code builds a shell command string
embedding the untrusted repoPath (cmd := []string{"/bin/sh", "-c",
fmt.Sprintf("cd %s && python3 -m http.server %d > /dev/null 2>&1 &", repoPath,
port)}) which risks shell injection; instead construct a safe argv slice and
avoid /bin/sh -c: call podExec with an exec form like
{"python3","-m","http.server", strconv.Itoa(port), "--directory", repoPath} (no
shell interpolation) so repoPath is passed as a literal argument; keep using
podExec, workspaceContainerName, restCfg, namespace and ws.Status.PodName but
remove the fmt.Sprintf into a shell and eliminate redirection/backgrounding
handled by the exec call or podExec's API.
In `@internal/controller/operatorconfig/resources.go`:
- Around line 640-644: The RBAC rule granting broad verbs on the "workspaces" CR
should be tightened: in the APIGroups/Resources block that currently lists
"workspaces" and Verbs including "create" and "delete", remove create/delete
(and other unnecessary write verbs) and restrict to read + allowed mutation for
finalizers (e.g., "get","list","watch","patch"); then add a separate rule for
the status subresource ("workspaces/status") with verbs "get","patch","update"
so controllers can modify status; update the APIGroups/Resources/Verbs entries
accordingly (refer to the block that defines APIGroups:
[]string{"automotive.sdv.cloud.redhat.com"} and Resources:
[]string{"workspaces"}).
In `@internal/controller/workspace/controller.go`:
- Around line 98-113: The PVC creation uses resource.MustParse which will panic
on invalid ws.Spec.PVCSize; replace that call with resource.ParseQuantity, check
the returned (quantity, err) and if err != nil return a reconcile error (or
requeue) from the reconcile method so the controller does not crash; update the
code around the PersistentVolumeClaim.Spec.Resources.Requests assignment (the
pvc variable creation in controller.go) to use the parsed quantity on success
and handle/propagate the parse error with an appropriate reconcile.Result and
error.
---
Duplicate comments:
In `@cmd/caib/workspace/workspace.go`:
- Around line 463-469: The code flattens exec arguments with
strings.Join(cmdParts, " ") which loses argv boundaries; instead, keep and pass
the original []string cmdParts through the API (don't create the string variable
command) so each argument, quotes and metacharacters are preserved. Update any
function/method signatures and payloads that currently accept the single command
string (look for uses of the variable command and functions handling workspace
exec) to accept []string (e.g., cmdParts or args) and adjust call
sites/serialization accordingly so the exec path transmits and executes the
argument slice intact.
- Around line 421-433: The loop currently uses os.Open + f.Stat which
dereferences symlinks; change to use os.Lstat(absPath) to detect
os.FileMode&os.ModeSymlink first, and if it's a symlink call
os.Readlink(absPath) and either emit a symlink entry (store the link target and
mark the entry as symlink) or return an error to fail closed; only for
non-symlinks proceed to open the file (os.Open) and use
f.Stat()/fi.Mode().IsRegular as before. Update handling around variables
absPath, relPath, files and ensure closed file handles are preserved.
In `@Containerfile.toolchain`:
- Around line 77-78: The Dockerfile currently pipes the installer via the RUN
line that executes "curl ... install.sh | bash -s -- --source main" and then
sets ENV PATH to "/root/.local/jumpstarter/bin:${PATH}", which breaks non-root
OpenShift UIDs; update the RUN invocation to pass the installer a non-root
target (e.g. add "--dir /opt/jumpstarter" or another directory writable by
arbitrary container users) and change the subsequent ENV PATH entry to point to
that directory's bin (for example ENV PATH="/opt/jumpstarter/bin:${PATH}"), and
ensure the RUN step creates the directory and sets permissive ownership/mode so
non-root users can execute the installed binaries (adjust commands surrounding
the "curl ... install.sh" RUN and the ENV PATH assignment).
In `@internal/controller/workspace/controller.go`:
- Around line 50-52: The current code calls r.setStatus(...) and returns
ctrl.Result{} which swallows the original reconcile error; change both the
ensurePVC block (r.ensurePVC(ctx, ws)) and the similar block at lines 58-60 to
set the workspace status via r.setStatus(...) and then return the original error
(or a wrapped version) so the controller requeues and retries; locate the logic
around r.ensurePVC and the other ensure-* call and make the result return the
error instead of an empty ctrl.Result.
- Around line 103-107: The PVC creation uses GenerateName which can produce
duplicate PVCs on retries; change to deterministic naming and idempotent
creation: compute a stable PVC name (e.g., based on ws.Name and ws.UID or a
constant pvcSuffix) instead of using GenerateName, then try to Get the PVC by
that name (or List by a unique label/ownerRef) and only Create if not found;
update the pvc variable and any ownerReference/labels consistently (references:
the pvc variable, GenerateName usage, and the workspace `ws` object) so repeated
status-patch retries do not create duplicate PVCs.
- Around line 160-163: The code currently treats any caller-selected image the
same as the default and ends up launching it as root with Privileged=true;
change the pod/security context logic so that if image !=
automotivev1alpha1.DefaultToolchainImage (i.e., a user-supplied image) you do
NOT set RunAsUser=0 and do NOT enable privileged mode — set
securityContext.RunAsUser to a non-root UID (e.g., 1000) and
securityContext.Privileged to false; keep the existing root/privileged behavior
only for the trusted DefaultToolchainImage. Update the code paths that construct
the Pod spec/securityContext (the logic that uses image := ws.Spec.Image and any
function that applies the container.SecurityContext or pod.Spec.SecurityContext)
accordingly.
---
Nitpick comments:
In `@api/v1alpha1/operatorconfig_types.go`:
- Around line 50-58: The DefaultToolchainImage constant currently uses the
unstable ":latest" tag; change the DefaultToolchainImage value to a specific,
pinned image tag (e.g., match the project's image versioning like ":1.1.11" or
the known stable release for quay.io/centos-sig-automotive/autosd-toolchain) to
ensure reproducibility—edit the DefaultToolchainImage constant in
operatorconfig_types.go to replace ":latest" with the chosen pinned tag.
- Around line 329-331: The PVCSize field currently accepts any string; add a
kubebuilder validation annotation above the PVCSize field to enforce Kubernetes
quantity format (e.g., 10Gi, 500Mi). Update the struct field for PVCSize in
operatorconfig_types.go by adding a +kubebuilder:validation:Pattern annotation
that matches Kubernetes resource quantity syntax and keep the existing json tag
and optional comment; ensure the pattern is broad enough to accept common
suffixes (Ki, Mi, Gi, Ti, Pi, Ei, or plain integer) used by other fields like
OSBuildsConfig.PVCSize for consistency.
In `@api/v1alpha1/workspace_types.go`:
- Around line 35-45: Add kubebuilder validation markers to the Workspace spec
fields: annotate the Owner field (Owner) with
+kubebuilder:validation:MinLength=1 to prevent empty owner strings, and annotate
PVCSize (PVCSize) with a validation marker that enforces allowed volume quantity
formats (either a pattern regex like
+kubebuilder:validation:Pattern=`^([0-9]+)(Ei|Pi|Ti|Gi|Mi|Ki|E|P|T|G|M|K|Ei|Pi|Ti|Gi|Mi|Ki|e|p|t|g|m|k)?$`
or an enum of accepted sizes) so invalid PVC sizes are rejected at admission
time; add these comments immediately above the corresponding struct fields in
the Workspace type.
In `@api/v1alpha1/zz_generated.deepcopy.go`:
- Around line 1-1403: The diff adds the generated file
api/v1alpha1/zz_generated.deepcopy.go which should not be committed; remove this
file from the PR (undo the commit or git rm --cached if already committed),
update repository rules so generated deepcopy files are not committed (add
api/v1alpha1/zz_generated.deepcopy.go to .gitignore or adjust existing ignore
patterns), and ensure the generation step (e.g., controller-gen / make generate)
remains the authoritative source so CI/build regenerates the file as needed.
In `@Containerfile.toolchain`:
- Line 57: The Docker layer installing EPEL (the RUN dnf install -y
https://dl.fedoraproject.org/pub/epel/epel-release-latest-10.noarch.rpm
instruction) leaves package metadata in a separate layer; modify that RUN to
perform the install and then immediately clean DNF caches in the same layer
(e.g., run dnf install ... && dnf clean all && rm -rf /var/cache/dnf) so
metadata/artifacts are removed before the layer is committed.
In `@internal/buildapi/workspace_test.go`:
- Around line 32-44: Current auth-only tests stop at middleware and don't
exercise handler-level JSON binding/validation; update the test suite in
workspace_test.go so each route in the Authentication matrix (e.g., POST
/v1/workspaces, POST /v1/workspaces/my-app/sync, POST
/v1/workspaces/my-app/exec, POST /v1/workspaces/my-app/deploy, plus add
GET/DELETE entries) has one authenticated request that includes valid auth (set
the same auth header/token used elsewhere in tests) but invalid or empty JSON to
assert a 400 response from the handler (this ensures create/exec/deploy paths
hit binding/validation logic), and add the missing /v1/workspaces/:name/shell
route to the matrix with both unauthenticated 401 and an authenticated 400 test;
apply the same pattern to the other test blocks referenced (lines ~61-113) so
each handler has at least one authenticated 400-path test.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: cfec265b-20f0-4a3e-ac03-2d36c64dc09c
⛔ Files ignored due to path filters (3)
config/crd/bases/automotive.sdv.cloud.redhat.com_operatorconfigs.yamlis excluded by!config/crd/bases/**config/crd/bases/automotive.sdv.cloud.redhat.com_workspaces.yamlis excluded by!config/crd/bases/**config/rbac/role.yamlis excluded by!config/rbac/role.yaml
📒 Files selected for processing (22)
Containerfile.toolchainapi/v1alpha1/operatorconfig_types.goapi/v1alpha1/workspace_types.goapi/v1alpha1/zz_generated.deepcopy.gobundle.Dockerfilecmd/caib/buildcmd/build.gocmd/caib/image/image.gocmd/caib/main.gocmd/caib/root.gocmd/caib/runtime_wiring.gocmd/caib/workspace/workspace.gocmd/main.goconfig/crd/kustomization.yamlinternal/buildapi/client/client.gointernal/buildapi/server.gointernal/buildapi/types.gointernal/buildapi/workspace.gointernal/buildapi/workspace_test.gointernal/controller/imagebuild/controller.gointernal/controller/operatorconfig/controller.gointernal/controller/operatorconfig/resources.gointernal/controller/workspace/controller.go
✅ Files skipped from review due to trivial changes (1)
- bundle.Dockerfile
🚧 Files skipped from review as they are similar to previous changes (4)
- internal/controller/imagebuild/controller.go
- cmd/main.go
- cmd/caib/root.go
- internal/buildapi/workspace.go
| cmd := []string{"/bin/sh", "-c", | ||
| fmt.Sprintf("cd %s && python3 -m http.server %d > /dev/null 2>&1 &", repoPath, port)} | ||
| if err := podExec(ctx, restCfg, namespace, ws.Status.PodName, workspaceContainerName, cmd, io.Discard); err != nil { |
There was a problem hiding this comment.
Avoid shell injection via untrusted repoPath.
Line 1535 interpolates user-controlled path into /bin/sh -c. This is command-injection prone.
🔧 Safer command construction
- cmd := []string{"/bin/sh", "-c",
- fmt.Sprintf("cd %s && python3 -m http.server %d > /dev/null 2>&1 &", repoPath, port)}
+ cmd := []string{
+ "/bin/sh", "-c",
+ "cd -- \"$1\" && nohup python3 -m http.server \"$2\" >/dev/null 2>&1 &",
+ "--", repoPath, fmt.Sprintf("%d", port),
+ }📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| cmd := []string{"/bin/sh", "-c", | |
| fmt.Sprintf("cd %s && python3 -m http.server %d > /dev/null 2>&1 &", repoPath, port)} | |
| if err := podExec(ctx, restCfg, namespace, ws.Status.PodName, workspaceContainerName, cmd, io.Discard); err != nil { | |
| cmd := []string{ | |
| "/bin/sh", "-c", | |
| "cd -- \"$1\" && nohup python3 -m http.server \"$2\" >/dev/null 2>&1 &", | |
| "--", repoPath, fmt.Sprintf("%d", port), | |
| } | |
| if err := podExec(ctx, restCfg, namespace, ws.Status.PodName, workspaceContainerName, cmd, io.Discard); err != nil { |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@internal/buildapi/server.go` around lines 1534 - 1536, The current code
builds a shell command string embedding the untrusted repoPath (cmd :=
[]string{"/bin/sh", "-c", fmt.Sprintf("cd %s && python3 -m http.server %d >
/dev/null 2>&1 &", repoPath, port)}) which risks shell injection; instead
construct a safe argv slice and avoid /bin/sh -c: call podExec with an exec form
like {"python3","-m","http.server", strconv.Itoa(port), "--directory", repoPath}
(no shell interpolation) so repoPath is passed as a literal argument; keep using
podExec, workspaceContainerName, restCfg, namespace and ws.Status.PodName but
remove the fmt.Sprintf into a shell and eliminate redirection/backgrounding
handled by the exec call or podExec's API.
There was a problem hiding this comment.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
api/v1alpha1/zz_generated.deepcopy.go (1)
1-1403: 🛠️ Refactor suggestion | 🟠 MajorGenerated file should not be committed.
This file is auto-generated by
controller-gen(line 19: "Code generated by controller-gen. DO NOT EDIT.") and will be regenerated at deploy/build time viamake generate manifests. Including it in the PR adds noise and risks merge conflicts.Remove this file from the PR and add it to
.gitignoreif not already present.Based on learnings: "Generated files that are not committed should be ignored in PRs. Specifically, do not include api/v1alpha1/zz_generated.deepcopy.go (regenerated at deploy/build time via make generate manifests)."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@api/v1alpha1/zz_generated.deepcopy.go` around lines 1 - 1403, The committed autogenerated deepcopy file ("zz_generated.deepcopy.go" containing the header "Code generated by controller-gen. DO NOT EDIT.") should be removed from the PR; delete that file from your branch and add an entry to .gitignore to prevent committing it in future, ensuring the file is produced by the build step (make generate manifests) instead of being tracked; verify no other generated files with the same header are staged before pushing.
♻️ Duplicate comments (19)
internal/controller/operatorconfig/resources.go (2)
640-654:⚠️ Potential issue | 🟠 MajorWorkspace CR RBAC is broader than reconciliation needs.
Granting
create/deleteonworkspacesin this controller role increases blast radius; split read/watch + status/finalizer writes unless create/delete is truly required by this controller path.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/controller/operatorconfig/resources.go` around lines 640 - 654, The role grants create/delete on the "workspaces" resource which is broader than needed; modify the PolicyRule that has APIGroups "automotive.sdv.cloud.redhat.com" and Resources "workspaces" to remove "create" and "delete" (leave only "get", "list", "watch"), and keep the separate PolicyRule entries for "workspaces/status" and "workspaces/finalizers" with their update/patch verbs unchanged; if this controller truly needs create/delete, instead add a narrowly-scoped rule or a separate role for that specific path rather than granting create/delete in the main workspaces rule.
856-874:⚠️ Potential issue | 🟠 MajorBinding workspace SA to the
privilegedSCC should be narrowed or made opt-in.This defaults all workspace pods to privileged SCC capability, which is a significant security posture expansion.
Also applies to: 877-901
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/controller/operatorconfig/resources.go` around lines 856 - 874, The code unconditionally grants the workspace service account the "privileged" SCC by creating a ClusterRole named workspaceServiceAccountName+"-privileged" in buildWorkspaceSCCClusterRole; change this to avoid default privileged access by either narrowing the granted permissions (remove or restrict ResourceNames "privileged" and/or use a less-privileged SCC) and making the creation/binding opt-in via an OperatorConfig flag or annotation checked during reconciliation, or create a separate ClusterRole that is only bound when an explicit config field (e.g., EnablePrivilegedSCC) is true; update the reconciler that creates the ClusterRole/ClusterRoleBinding to honor that flag and ensure workspaceServiceAccountName is only given the privileged SCC when explicitly enabled.Containerfile.toolchain (3)
77-78:⚠️ Potential issue | 🟠 MajorJumpstarter install path is tied to
/root, which breaks random-UID runtime access.Installing into
/root/.local/jumpstartermakes the binary inaccessible for non-root OpenShift UIDs even if PATH is set.Suggested fix
-RUN curl -fsSL https://raw.githubusercontent.com/jumpstarter-dev/jumpstarter/main/python/install.sh | bash -s -- --source main -ENV PATH="/root/.local/jumpstarter/bin:${PATH}" +RUN curl -fsSL https://raw.githubusercontent.com/jumpstarter-dev/jumpstarter/main/python/install.sh | bash -s -- --source main --dir /opt/jumpstarter +ENV PATH="/opt/jumpstarter/bin:${PATH}"#!/bin/bash # Verify install script supports --dir (read-only check). curl -fsSL https://raw.githubusercontent.com/jumpstarter-dev/jumpstarter/main/python/install.sh | rg -n -- '--dir|install dir|destination'🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Containerfile.toolchain` around lines 77 - 78, The install puts Jumpstarter under /root/.local/jumpstarter which prevents non-root/random UIDs from executing it; modify the RUN that invokes the install script (the curl | bash -s -- command) to install into a system-wide, non-root-readable path (e.g., /usr/local or /opt) by passing the install script's directory flag (e.g., --dir or equivalent) or by moving the installed artifacts after install, and update the ENV PATH line (currently setting PATH for /root/.local/jumpstarter) to point to the new location so the binary is accessible to arbitrary container UIDs; locate the invocation using the curl install command and the ENV PATH line to make the change.
73-74:⚠️ Potential issue | 🟠 MajorRuntime privilege model is too broad (root default + global NOPASSWD sudo).
Line 74 gives blanket sudo to everyone and the image still defaults to root (no
USER), which is an avoidable escalation surface for workspace pods.Suggested least-privilege direction
-# Allow any UID to use sudo -RUN echo 'ALL ALL=(ALL) NOPASSWD: ALL' >> /etc/sudoers +RUN useradd -m -u 10001 workspace && \ + printf 'workspace ALL=(root) NOPASSWD: /usr/bin/podman, /usr/bin/mock\n' > /etc/sudoers.d/workspace && \ + chmod 0440 /etc/sudoers.d/workspace +USER 10001Also applies to: 93-93
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Containerfile.toolchain` around lines 73 - 74, Remove the global NOPASSWD sudo entry and stop defaulting to root: replace the RUN echo 'ALL ALL=(ALL) NOPASSWD: ALL' >> /etc/sudoers step with a least-privilege approach (e.g., create a specific user/group, add a constrained sudoers entry in /etc/sudoers.d for that user or group allowing only required commands without NOPASSWD, or require password). Also add a USER directive to switch from root to that non-root user at the end of the Containerfile (use the created username in the USER line) so the image does not default to root. Ensure references to /etc/sudoers are replaced by a scoped /etc/sudoers.d file and avoid blanket ALL ALL=(ALL) NOPASSWD: ALL entries.
22-35:⚠️ Potential issue | 🟠 MajorRe-enable RPM signature verification for AutoSD packages.
Line 26 (
gpgcheck=0) plus Line 34 (--nogpgcheck) disables package authenticity checks and weakens supply-chain guarantees.Suggested hardening
RUN cat <<EOF > /etc/yum.repos.d/${DISTRO}.repo [${DISTRO}] name=${DISTRO} baseurl=${BASE_URL}/repos/${DISTRO}/compose/${DISTRO}/\$basearch/os/ -gpgcheck=0 +gpgcheck=1 enabled=1 skip_if_unavailable=1 EOF RUN dnf install \ --installroot ${ROOTFS} \ --setopt=reposdir=/etc/yum.repos.d --repo ${DISTRO} --releasever 10 \ - --setopt install_weak_deps=false --nodocs --nogpgcheck -y \ + --setopt install_weak_deps=false --nodocs -y \ ${RELEASE_PKG} ${PACKAGES} \ && dnf --installroot ${ROOTFS} clean allFor AutoSD nightly repositories, what are the documented values for `gpgcheck` and `repo_gpgcheck`, and is signed repodata (`repomd.xml.asc`) currently published?🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Containerfile.toolchain` around lines 22 - 35, Update the repo and dnf install invocations to enable RPM signature checks: set gpgcheck=1 in the here-doc repo file (replace gpgcheck=0), remove the --nogpgcheck flag from the dnf install command, and add --setopt=repo_gpgcheck=1 (and/or --setopt=gpgcheck=1) to the dnf args so both package and repodata signing are enforced; reference the DISTRO/BASE_URL repo block and the dnf install line that uses ROOTFS, RELEASE_PKG and PACKAGES. Also ensure the repository publishes signed repodata (repomd.xml.asc) so the enabled repo_gpgcheck succeeds.cmd/caib/workspace/workspace.go (2)
463-469:⚠️ Potential issue | 🟠 MajorKeep the original argv instead of joining it back into a shell string.
strings.Join(cmdParts, " ")loses argument boundaries and quoting, so commands containing spaces or shell metacharacters do not round-trip reliably. Send argv verbatim instead of flattening it into a singleCommandstring.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cmd/caib/workspace/workspace.go` around lines 463 - 469, The code currently flattens the argument slice into a single shell string using strings.Join into the variable command, which loses quoting and argument boundaries; instead preserve the original argv by keeping cmdParts (the []string) and pass it verbatim to the executor (e.g., use cmdParts as the args for whatever run/exec function you call) rather than creating the command string; remove or stop using the joined command variable and ensure the existing len(cmdParts) check and handleError call remain.
385-391:⚠️ Potential issue | 🟠 MajorRecreate the sync reader inside each re-auth attempt.
ExecuteWithReauthcan call the callback more than once, butpris created once and is exhausted after the firstSyncWorkspacecall. Any retry uploads an empty or truncated tar stream instead of the original archive. Capture the tar bytes once, then build a freshbytes.Reader/progressReaderinside the callback.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cmd/caib/workspace/workspace.go` around lines 385 - 391, The current code creates pr once from buf and passes it into ExecuteWithReauth, but ExecuteWithReauth may call the callback multiple times so pr is exhausted after the first call; fix by first capturing the archive bytes (copy buf.Bytes() into a byte slice and compute totalBytes once), then inside the ExecuteWithReauth callback recreate a fresh bytes.Reader and a fresh progress reader (use bytes.NewReader(...) and newProgressReader(...)) on each attempt and call client.SyncWorkspace with that fresh pr, ensuring pr.finish() is invoked after each attempt (e.g., defer pr.finish() inside the callback) so retries upload the full tar stream.internal/controller/workspace/controller.go (4)
50-60:⚠️ Potential issue | 🟠 MajorReturn the PVC/Pod error after updating status.
If
ensurePVCorensurePodfails andsetStatussucceeds, reconcile returnsniland controller-runtime stops retrying. Transient failures will strand the workspace inFaileduntil some unrelated event happens. Set the status, then return the original error so the request is requeued.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/controller/workspace/controller.go` around lines 50 - 60, When ensurePVC or ensurePod fails, the code currently calls r.setStatus(...) and returns its result so the original error is dropped; change those branches to first call r.setStatus(ctx, ws, "Failed", fmt.Sprintf(...)) (ignoring its return value or logging it) and then return ctrl.Result{}, err so the controller requeues; update the failure handling around ensurePVC and ensurePod in the reconcile flow (references: ensurePVC, ensurePod, setStatus) to set status and then return the original error instead of returning the result of setStatus.
98-112:⚠️ Potential issue | 🔴 CriticalParse
spec.pvcSizewithout panicking the controller.
resource.MustParse(pvcSize)turns a badWorkspacespec into a controller crash. Useresource.ParseQuantityand surface the validation error through reconcile instead.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/controller/workspace/controller.go` around lines 98 - 112, The controller currently calls resource.MustParse(pvcSize) when building the PersistentVolumeClaim (pvc), which will panic on an invalid spec and crash the controller; replace this with resource.ParseQuantity(pvcSize) inside the reconcile path (where pvcSize is derived) and handle the returned (qty, err): if err != nil, record an event on the Workspace, return a non-nil error or a requeue request from the reconcile (so the error surfaces to reconciliation and is retryable) instead of letting the process panic; update the pvc construction to use the parsed quantity variable and ensure functions/variables referenced (pvcSize, pvc, resource.MustParse -> resource.ParseQuantity) are adjusted accordingly.
160-248:⚠️ Potential issue | 🔴 CriticalDon’t launch caller-selected images as privileged root.
ws.Spec.Imageis user-controlled, and this pod runs it withRunAsUser: 0andPrivileged: trueunder the shared workspace service account. That makes workspace creation a cluster-privileged code-execution path. Restrict the image to operator-controlled values and drop privileged/root unless there is a hard requirement.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/controller/workspace/controller.go` around lines 160 - 248, The pod currently runs user-controlled ws.Spec.Image as root with privileged=true; validate or restrict images and drop privileged/root: reject or ignore any caller-supplied ws.Spec.Image unless it matches an operator-managed allowlist (fallback to automotivev1alpha1.DefaultToolchainImage) and remove the container.SecurityContext.Privileged=true and PodSecurityContext RunAsUser: int64Ptr(0) (use a non-root uid, e.g. int64Ptr(1000) or remove RunAsUser) so containerName under the workspaceServiceAccountName no longer runs as privileged root.
103-127:⚠️ Potential issue | 🟠 MajorUse a stable PVC identity across retries.
Creating the claim with
GenerateNamemeans a successfulCreatefollowed by a failed/conflicting status patch will allocate a brand-new PVC on the next reconcile. Use a deterministic PVC name, or first look up an owned claim before creating another one.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/controller/workspace/controller.go` around lines 103 - 127, The PVC is created with GenerateName which can produce duplicates across retries; instead ensure a stable identity by first checking for an existing owned PVC and only creating one with a deterministic Name if none exists. Update the reconcile code around pvc, r.Create, controllerutil.SetControllerReference and r.Status().Patch to: 1) list/get PVCs in ws.Namespace filtered by Name (or by label/ownerRef) to find a PVC owned by the Workspace (ws) and if found set ws.Status.PVCName and patch status; 2) if not found, create the PVC using a deterministic Name (e.g., ws.Name + pvcSuffix) rather than GenerateName, call controllerutil.SetControllerReference(ws, pvc, r.Scheme) and r.Create(ctx, pvc), then set ws.Status.PVCName and patch. This prevents allocating a new PVC after transient failures.internal/buildapi/server.go (2)
1493-1537:⚠️ Potential issue | 🔴 CriticalAuthorize
extraReposagainst the caller before any pod exec.
resolveExtraReposfetches any namedWorkspaceand immediately execs into its pod, whilecreateBuildonly resolvesrequestedByafterwards. Any authenticated caller can therefore pointextraReposat another user's workspace. Resolve the requester first and reject whenws.Spec.Ownerdoes not match before callingpodExec.Also applies to: 1593-1605
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/buildapi/server.go` around lines 1493 - 1537, The function resolveExtraRepos currently fetches Workspaces and execs into their pods before verifying the requestor; update the flow to resolve and validate the caller (the requestedBy user) before any podExec by retrieving the requestor identity (as done in createBuild) and comparing it to ws.Spec.Owner for each workspace entry, returning an unauthorized error if they differ; apply the same owner-check authorization to the other similar block handling extraRepos (the one around the later pod exec) so no podExec runs for workspaces not owned by the requester.
1533-1537:⚠️ Potential issue | 🔴 CriticalDon’t start the repo server via interpolated
/bin/sh -c.
repoPathis inserted directly into a shell command, and the backgroundedhttp.serveris neither health-checked nor cleaned up. That leaves both a command-injection path and flaky/duplicate listeners across repeated builds. Prefer an argv-based exec plus explicit reuse/cleanup of the serving process.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/buildapi/server.go` around lines 1533 - 1537, Replace the shell-interpolated, backgrounded command with an argv-based exec that uses Python's --directory flag and explicit PID handling: call podExec with cmd set to []string{"python3", "-m", "http.server", strconv.Itoa(port), "--directory", repoPath} (use the existing repoPath, port, workspaceContainerName and wsName identifiers to locate code), and implement logic to check for and reuse/terminate an existing server by reading/writing a PID file (e.g. repoPath/.http_server.pid) before starting; after starting capture and persist the child PID so future builds can health-check and clean it up, and ensure errors from podExec are returned as before.internal/buildapi/workspace.go (6)
703-703:⚠️ Potential issue | 🟠 Major
copyToPodignores request cancellation.
StreamWithContext(context.Background(), ...)ignores HTTP request cancellation. A disconnected client or expired deadline won't stop the tar extraction. Accept acontext.Contextparameter and use it here.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/buildapi/workspace.go` at line 703, The copyToPod function currently calls executor.StreamWithContext(context.Background(), streamOpts) which ignores request cancellation; change copyToPod to accept a context.Context parameter (e.g., ctx) and replace the use of context.Background() with that ctx when calling executor.StreamWithContext, and propagate the new ctx through any callers (and tests) that invoke copyToPod so HTTP request cancellation or deadlines will cancel the tar extraction.
168-176:⚠️ Potential issue | 🟠 Major
fromBuildstill needs an ownership check before reusing the lease.The
resolveLeaseFromBuildfunction (lines 667-676) fetches theImageBuildbut does not verify the caller owns it. A user who knows another user's build name can attach their board lease to a new workspace. PassrequesterintoresolveLeaseFromBuildand reject builds where theautomotive.sdv.cloud.redhat.com/requested-byannotation doesn't match.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/buildapi/workspace.go` around lines 168 - 176, The call path reusing a lease from an ImageBuild must enforce ownership: change the caller that sets leaseID (the block using req.FromBuild) to pass the current requester into resolveLeaseFromBuild, then update resolveLeaseFromBuild to fetch the ImageBuild and verify the automative.sdv.cloud.redhat.com/requested-by annotation equals the requester; if it does not match, return an error so the caller returns a 400. Ensure the error message clearly states the ownership mismatch and keep the function signature consistent (resolveLeaseFromBuild(ctx, k8sClient, namespace, buildName, requester) or similar) and update all call sites accordingly.
178-217:⚠️ Potential issue | 🟠 MajorSecret creation still occurs before workspace uniqueness check.
The Jumpstarter client config secret is created at lines 197-200 before the workspace existence check at lines 203-217. If a workspace with the same name already exists, the secret is orphaned. If the secret already exists (IsAlreadyExists at line 197), it's silently reused—potentially binding stale credentials to a new workspace.
Move secret creation after the uniqueness check, or set an OwnerReference on the secret after successful workspace creation so Kubernetes garbage-collects it with the workspace.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/buildapi/workspace.go` around lines 178 - 217, The secret creation for Jumpstarter client config (jmpClientSecret, corev1.Secret and k8sClient.Create) happens before the workspace existence check (existing := &automotivev1alpha1.Workspace{} and k8sClient.Get) which can orphan or incorrectly reuse secrets; move the secret creation block to after the uniqueness check succeeds (i.e., after verifying k8sClient.Get returns not-found) or, if you must create it earlier, attach an OwnerReference to the Secret referencing the created Workspace (use the Workspace object's UID and controller=true) immediately after creating the Workspace so Kubernetes will garbage-collect the Secret when the Workspace is deleted; ensure you still handle IsAlreadyExists from k8sClient.Create appropriately (treat it as an error or update the secret) to avoid silently reusing stale credentials.
366-373:⚠️ Potential issue | 🟠 Major
syncstill reads unbounded tarball into memory.
io.ReadAll(c.Request.Body)gives any authenticated caller an OOM lever against the API pod. Apply a size limit (e.g.,http.MaxBytesReader) and stream directly intocopyToPodinstead of buffering.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/buildapi/workspace.go` around lines 366 - 373, The handler currently reads the entire request body into memory using io.ReadAll(c.Request.Body), which allows an attacker to OOM the API pod; instead wrap c.Request.Body with a size limiter (e.g., http.MaxBytesReader or an io.LimitReader) before passing it to copyToPod so the tarball is streamed rather than buffered. Replace the io.ReadAll + bytes.NewReader(tarData) flow by creating a limited reader from c.Request.Body (check and return a 413 Payload Too Large on limit exceeded) and pass that reader directly to copyToPod; ensure you keep references to copyToPod, c.Request.Body, and workspaceContainerName when making the change and close the request body appropriately.
612-614:⚠️ Potential issue | 🟠 Major
DeepCopy()can preserve nilRequests/Limitsmaps, causing panic.After line 613,
resis overwritten withwsConfig.Resources.DeepCopy(). If the source has nilRequestsorLimits, the assignments at lines 626-627 and 640-641 will panic.🐛 Proposed fix
if wsConfig != nil && wsConfig.Resources != nil { res = wsConfig.Resources.DeepCopy() } + if res.Requests == nil { + res.Requests = corev1.ResourceList{} + } + if res.Limits == nil { + res.Limits = corev1.ResourceList{} + } if cpu != "" {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/buildapi/workspace.go` around lines 612 - 614, The DeepCopy of wsConfig.Resources can produce a res whose CPU/Memory Requests or Limits maps are nil and later assignments to res.Requests[...] or res.Limits[...] will panic; after doing res = wsConfig.Resources.DeepCopy() (or immediately after creating res) check each map field (res.Requests and res.Limits) for nil and initialize them to empty maps if nil before any indexed assignments, or perform the indexed assignments against a non-nil temporary map and then assign it back to res.Requests/res.Limits to ensure safety when writing into those maps.
765-766:⚠️ Potential issue | 🟠 Major
execInPodusescontext.Background(), ignoring request cancellation.The
sync,exec, anddeployhandlers won't respect client disconnects or HTTP deadlines. Threadc.Request.Context()throughpodExecinstead.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/buildapi/workspace.go` around lines 765 - 766, execInPod currently calls podExec with context.Background(), so request cancellation and deadlines are ignored; change execInPod to accept a context.Context (e.g., func execInPod(ctx context.Context, restCfg *rest.Config, namespace, podName, containerName string, cmd []string, w http.ResponseWriter) error) and pass that ctx to podExec instead of context.Background(); update all call sites (the sync, exec, deploy handlers) to pass the HTTP request context (c.Request.Context()) into execInPod so podExec and its I/O respect client disconnects and timeouts.
🧹 Nitpick comments (1)
internal/buildapi/workspace.go (1)
416-418: WebSocket upgrader disables origin checking.
CheckOrigin: func(_ *http.Request) bool { return true }allows any origin to establish WebSocket connections. While the endpoint is protected byauthMiddleware, this weakens defense-in-depth if the API is ever exposed to browsers.Consider validating the
Originheader against expected values if browser-based access is possible.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/buildapi/workspace.go` around lines 416 - 418, The websocket upgrader currently disables origin checks via the wsUpgrader variable (CheckOrigin: func(_ *http.Request) bool { return true }), which is unsafe; change CheckOrigin to validate the request's Origin header against a whitelist (or at least ensure it matches the configured API/frontend host) and return true only for allowed origins, referencing wsUpgrader and its CheckOrigin function so the upgrade path enforces origin validation rather than unconditionally allowing all origins.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@cmd/caib/workspace/workspace.go`:
- Around line 421-450: The loop currently opens each path then calls Stat which
follows symlinks; instead use os.Lstat(absPath) to get file info without
following symlinks, detect symlinks via fi.Mode()&os.ModeSymlink, and for
symlinks call os.Readlink(absPath) and pass the link target as the second arg to
tar.FileInfoHeader (e.g., tar.FileInfoHeader(fi, linkTarget)) so the header is
written as a symlink entry (write header with tw.WriteHeader but do not open or
copy file contents). For non-symlinks, open the file as before, call
tar.FileInfoHeader(fi, "") and io.Copy(tw, f), and ensure any opened file is
closed in all paths; update references around tar.FileInfoHeader, os.Lstat,
os.Readlink, ModeSymlink, tw.WriteHeader, and io.Copy.
In `@internal/buildapi/workspace.go`:
- Line 300: The call to getClientFromRequest currently ignores its error which
can leave k8sClient nil and cause a panic when calling k8sClient.Delete; update
the caller to capture both return values (k8sClient, err :=
getClientFromRequest(c)), check err immediately, and handle it by logging and
returning an appropriate HTTP error response (or early return) before using
k8sClient; ensure subsequent code that calls k8sClient.Delete only runs when err
is nil and k8sClient is non-nil so the function is safe against nil dereference.
- Around line 566-588: The deployScript currently interpolates sshPassword,
req.ArtifactPath and req.DestPath directly into the shell string (variables:
sshPassword, req.ArtifactPath, req.DestPath, deployScript, sshOpts, cmd),
creating a shell-injection risk; fix it by removing direct interpolation and
instead pass those values as environment variables (e.g. SSH_PASS,
ARTIFACT_PATH, DEST_PATH, SSH_OPTS) and reference them inside the script with
robust quoting (e.g. "$SSH_PASS", "$ARTIFACT_PATH", "$DEST_PATH"), or better yet
build the exec.Cmd with Cmd.Env containing those variables and avoid embedding
secrets in the command line so sshpass and rsync read quoted env vars rather
than raw unescaped user input.
In `@internal/controller/workspace/controller.go`:
- Around line 98-112: The controller is using hardcoded defaults (e.g., pvcSize
fallback to DefaultWorkspacePVCSize, image fallback to DefaultToolchainImage,
resources via resourcesOrDefaults) and ignores OperatorConfig.Spec.Workspaces
policies; update the code paths that build the PVC, PodSpec, and resource
handling (look for pvcSize, ws.Spec.PVCSize, resourcesOrDefaults, image
creation, and any PodSpec mutations) to first read and merge
OperatorConfig.Spec.Workspaces for defaults and policy: use
OperatorConfig.Workspaces.PVCSize if ws.Spec.PVCSize is empty, apply
OperatorConfig.Workspaces.DefaultToolchainImage when ws image is unset, enforce
MaxResources by validating/clamping user-supplied resources in
resourcesOrDefaults, and copy operator-level NodeSelector, Tolerations, and
TmpfsBuildDir into the resulting Pod/PVC specs so admin policies are applied
everywhere the controller constructs workspace resources.
---
Outside diff comments:
In `@api/v1alpha1/zz_generated.deepcopy.go`:
- Around line 1-1403: The committed autogenerated deepcopy file
("zz_generated.deepcopy.go" containing the header "Code generated by
controller-gen. DO NOT EDIT.") should be removed from the PR; delete that file
from your branch and add an entry to .gitignore to prevent committing it in
future, ensuring the file is produced by the build step (make generate
manifests) instead of being tracked; verify no other generated files with the
same header are staged before pushing.
---
Duplicate comments:
In `@cmd/caib/workspace/workspace.go`:
- Around line 463-469: The code currently flattens the argument slice into a
single shell string using strings.Join into the variable command, which loses
quoting and argument boundaries; instead preserve the original argv by keeping
cmdParts (the []string) and pass it verbatim to the executor (e.g., use cmdParts
as the args for whatever run/exec function you call) rather than creating the
command string; remove or stop using the joined command variable and ensure the
existing len(cmdParts) check and handleError call remain.
- Around line 385-391: The current code creates pr once from buf and passes it
into ExecuteWithReauth, but ExecuteWithReauth may call the callback multiple
times so pr is exhausted after the first call; fix by first capturing the
archive bytes (copy buf.Bytes() into a byte slice and compute totalBytes once),
then inside the ExecuteWithReauth callback recreate a fresh bytes.Reader and a
fresh progress reader (use bytes.NewReader(...) and newProgressReader(...)) on
each attempt and call client.SyncWorkspace with that fresh pr, ensuring
pr.finish() is invoked after each attempt (e.g., defer pr.finish() inside the
callback) so retries upload the full tar stream.
In `@Containerfile.toolchain`:
- Around line 77-78: The install puts Jumpstarter under /root/.local/jumpstarter
which prevents non-root/random UIDs from executing it; modify the RUN that
invokes the install script (the curl | bash -s -- command) to install into a
system-wide, non-root-readable path (e.g., /usr/local or /opt) by passing the
install script's directory flag (e.g., --dir or equivalent) or by moving the
installed artifacts after install, and update the ENV PATH line (currently
setting PATH for /root/.local/jumpstarter) to point to the new location so the
binary is accessible to arbitrary container UIDs; locate the invocation using
the curl install command and the ENV PATH line to make the change.
- Around line 73-74: Remove the global NOPASSWD sudo entry and stop defaulting
to root: replace the RUN echo 'ALL ALL=(ALL) NOPASSWD: ALL' >> /etc/sudoers step
with a least-privilege approach (e.g., create a specific user/group, add a
constrained sudoers entry in /etc/sudoers.d for that user or group allowing only
required commands without NOPASSWD, or require password). Also add a USER
directive to switch from root to that non-root user at the end of the
Containerfile (use the created username in the USER line) so the image does not
default to root. Ensure references to /etc/sudoers are replaced by a scoped
/etc/sudoers.d file and avoid blanket ALL ALL=(ALL) NOPASSWD: ALL entries.
- Around line 22-35: Update the repo and dnf install invocations to enable RPM
signature checks: set gpgcheck=1 in the here-doc repo file (replace gpgcheck=0),
remove the --nogpgcheck flag from the dnf install command, and add
--setopt=repo_gpgcheck=1 (and/or --setopt=gpgcheck=1) to the dnf args so both
package and repodata signing are enforced; reference the DISTRO/BASE_URL repo
block and the dnf install line that uses ROOTFS, RELEASE_PKG and PACKAGES. Also
ensure the repository publishes signed repodata (repomd.xml.asc) so the enabled
repo_gpgcheck succeeds.
In `@internal/buildapi/server.go`:
- Around line 1493-1537: The function resolveExtraRepos currently fetches
Workspaces and execs into their pods before verifying the requestor; update the
flow to resolve and validate the caller (the requestedBy user) before any
podExec by retrieving the requestor identity (as done in createBuild) and
comparing it to ws.Spec.Owner for each workspace entry, returning an
unauthorized error if they differ; apply the same owner-check authorization to
the other similar block handling extraRepos (the one around the later pod exec)
so no podExec runs for workspaces not owned by the requester.
- Around line 1533-1537: Replace the shell-interpolated, backgrounded command
with an argv-based exec that uses Python's --directory flag and explicit PID
handling: call podExec with cmd set to []string{"python3", "-m", "http.server",
strconv.Itoa(port), "--directory", repoPath} (use the existing repoPath, port,
workspaceContainerName and wsName identifiers to locate code), and implement
logic to check for and reuse/terminate an existing server by reading/writing a
PID file (e.g. repoPath/.http_server.pid) before starting; after starting
capture and persist the child PID so future builds can health-check and clean it
up, and ensure errors from podExec are returned as before.
In `@internal/buildapi/workspace.go`:
- Line 703: The copyToPod function currently calls
executor.StreamWithContext(context.Background(), streamOpts) which ignores
request cancellation; change copyToPod to accept a context.Context parameter
(e.g., ctx) and replace the use of context.Background() with that ctx when
calling executor.StreamWithContext, and propagate the new ctx through any
callers (and tests) that invoke copyToPod so HTTP request cancellation or
deadlines will cancel the tar extraction.
- Around line 168-176: The call path reusing a lease from an ImageBuild must
enforce ownership: change the caller that sets leaseID (the block using
req.FromBuild) to pass the current requester into resolveLeaseFromBuild, then
update resolveLeaseFromBuild to fetch the ImageBuild and verify the
automative.sdv.cloud.redhat.com/requested-by annotation equals the requester; if
it does not match, return an error so the caller returns a 400. Ensure the error
message clearly states the ownership mismatch and keep the function signature
consistent (resolveLeaseFromBuild(ctx, k8sClient, namespace, buildName,
requester) or similar) and update all call sites accordingly.
- Around line 178-217: The secret creation for Jumpstarter client config
(jmpClientSecret, corev1.Secret and k8sClient.Create) happens before the
workspace existence check (existing := &automotivev1alpha1.Workspace{} and
k8sClient.Get) which can orphan or incorrectly reuse secrets; move the secret
creation block to after the uniqueness check succeeds (i.e., after verifying
k8sClient.Get returns not-found) or, if you must create it earlier, attach an
OwnerReference to the Secret referencing the created Workspace (use the
Workspace object's UID and controller=true) immediately after creating the
Workspace so Kubernetes will garbage-collect the Secret when the Workspace is
deleted; ensure you still handle IsAlreadyExists from k8sClient.Create
appropriately (treat it as an error or update the secret) to avoid silently
reusing stale credentials.
- Around line 366-373: The handler currently reads the entire request body into
memory using io.ReadAll(c.Request.Body), which allows an attacker to OOM the API
pod; instead wrap c.Request.Body with a size limiter (e.g., http.MaxBytesReader
or an io.LimitReader) before passing it to copyToPod so the tarball is streamed
rather than buffered. Replace the io.ReadAll + bytes.NewReader(tarData) flow by
creating a limited reader from c.Request.Body (check and return a 413 Payload
Too Large on limit exceeded) and pass that reader directly to copyToPod; ensure
you keep references to copyToPod, c.Request.Body, and workspaceContainerName
when making the change and close the request body appropriately.
- Around line 612-614: The DeepCopy of wsConfig.Resources can produce a res
whose CPU/Memory Requests or Limits maps are nil and later assignments to
res.Requests[...] or res.Limits[...] will panic; after doing res =
wsConfig.Resources.DeepCopy() (or immediately after creating res) check each map
field (res.Requests and res.Limits) for nil and initialize them to empty maps if
nil before any indexed assignments, or perform the indexed assignments against a
non-nil temporary map and then assign it back to res.Requests/res.Limits to
ensure safety when writing into those maps.
- Around line 765-766: execInPod currently calls podExec with
context.Background(), so request cancellation and deadlines are ignored; change
execInPod to accept a context.Context (e.g., func execInPod(ctx context.Context,
restCfg *rest.Config, namespace, podName, containerName string, cmd []string, w
http.ResponseWriter) error) and pass that ctx to podExec instead of
context.Background(); update all call sites (the sync, exec, deploy handlers) to
pass the HTTP request context (c.Request.Context()) into execInPod so podExec
and its I/O respect client disconnects and timeouts.
In `@internal/controller/operatorconfig/resources.go`:
- Around line 640-654: The role grants create/delete on the "workspaces"
resource which is broader than needed; modify the PolicyRule that has APIGroups
"automotive.sdv.cloud.redhat.com" and Resources "workspaces" to remove "create"
and "delete" (leave only "get", "list", "watch"), and keep the separate
PolicyRule entries for "workspaces/status" and "workspaces/finalizers" with
their update/patch verbs unchanged; if this controller truly needs
create/delete, instead add a narrowly-scoped rule or a separate role for that
specific path rather than granting create/delete in the main workspaces rule.
- Around line 856-874: The code unconditionally grants the workspace service
account the "privileged" SCC by creating a ClusterRole named
workspaceServiceAccountName+"-privileged" in buildWorkspaceSCCClusterRole;
change this to avoid default privileged access by either narrowing the granted
permissions (remove or restrict ResourceNames "privileged" and/or use a
less-privileged SCC) and making the creation/binding opt-in via an
OperatorConfig flag or annotation checked during reconciliation, or create a
separate ClusterRole that is only bound when an explicit config field (e.g.,
EnablePrivilegedSCC) is true; update the reconciler that creates the
ClusterRole/ClusterRoleBinding to honor that flag and ensure
workspaceServiceAccountName is only given the privileged SCC when explicitly
enabled.
In `@internal/controller/workspace/controller.go`:
- Around line 50-60: When ensurePVC or ensurePod fails, the code currently calls
r.setStatus(...) and returns its result so the original error is dropped; change
those branches to first call r.setStatus(ctx, ws, "Failed", fmt.Sprintf(...))
(ignoring its return value or logging it) and then return ctrl.Result{}, err so
the controller requeues; update the failure handling around ensurePVC and
ensurePod in the reconcile flow (references: ensurePVC, ensurePod, setStatus) to
set status and then return the original error instead of returning the result of
setStatus.
- Around line 98-112: The controller currently calls resource.MustParse(pvcSize)
when building the PersistentVolumeClaim (pvc), which will panic on an invalid
spec and crash the controller; replace this with resource.ParseQuantity(pvcSize)
inside the reconcile path (where pvcSize is derived) and handle the returned
(qty, err): if err != nil, record an event on the Workspace, return a non-nil
error or a requeue request from the reconcile (so the error surfaces to
reconciliation and is retryable) instead of letting the process panic; update
the pvc construction to use the parsed quantity variable and ensure
functions/variables referenced (pvcSize, pvc, resource.MustParse ->
resource.ParseQuantity) are adjusted accordingly.
- Around line 160-248: The pod currently runs user-controlled ws.Spec.Image as
root with privileged=true; validate or restrict images and drop privileged/root:
reject or ignore any caller-supplied ws.Spec.Image unless it matches an
operator-managed allowlist (fallback to
automotivev1alpha1.DefaultToolchainImage) and remove the
container.SecurityContext.Privileged=true and PodSecurityContext RunAsUser:
int64Ptr(0) (use a non-root uid, e.g. int64Ptr(1000) or remove RunAsUser) so
containerName under the workspaceServiceAccountName no longer runs as privileged
root.
- Around line 103-127: The PVC is created with GenerateName which can produce
duplicates across retries; instead ensure a stable identity by first checking
for an existing owned PVC and only creating one with a deterministic Name if
none exists. Update the reconcile code around pvc, r.Create,
controllerutil.SetControllerReference and r.Status().Patch to: 1) list/get PVCs
in ws.Namespace filtered by Name (or by label/ownerRef) to find a PVC owned by
the Workspace (ws) and if found set ws.Status.PVCName and patch status; 2) if
not found, create the PVC using a deterministic Name (e.g., ws.Name + pvcSuffix)
rather than GenerateName, call controllerutil.SetControllerReference(ws, pvc,
r.Scheme) and r.Create(ctx, pvc), then set ws.Status.PVCName and patch. This
prevents allocating a new PVC after transient failures.
---
Nitpick comments:
In `@internal/buildapi/workspace.go`:
- Around line 416-418: The websocket upgrader currently disables origin checks
via the wsUpgrader variable (CheckOrigin: func(_ *http.Request) bool { return
true }), which is unsafe; change CheckOrigin to validate the request's Origin
header against a whitelist (or at least ensure it matches the configured
API/frontend host) and return true only for allowed origins, referencing
wsUpgrader and its CheckOrigin function so the upgrade path enforces origin
validation rather than unconditionally allowing all origins.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 37fc93b2-d346-41eb-b875-bc74233cf57f
⛔ Files ignored due to path filters (3)
config/crd/bases/automotive.sdv.cloud.redhat.com_operatorconfigs.yamlis excluded by!config/crd/bases/**config/crd/bases/automotive.sdv.cloud.redhat.com_workspaces.yamlis excluded by!config/crd/bases/**config/rbac/role.yamlis excluded by!config/rbac/role.yaml
📒 Files selected for processing (22)
Containerfile.toolchainapi/v1alpha1/operatorconfig_types.goapi/v1alpha1/workspace_types.goapi/v1alpha1/zz_generated.deepcopy.gobundle.Dockerfilecmd/caib/buildcmd/build.gocmd/caib/image/image.gocmd/caib/main.gocmd/caib/root.gocmd/caib/runtime_wiring.gocmd/caib/workspace/workspace.gocmd/main.goconfig/crd/kustomization.yamlinternal/buildapi/client/client.gointernal/buildapi/server.gointernal/buildapi/types.gointernal/buildapi/workspace.gointernal/buildapi/workspace_test.gointernal/controller/imagebuild/controller.gointernal/controller/operatorconfig/controller.gointernal/controller/operatorconfig/resources.gointernal/controller/workspace/controller.go
🚧 Files skipped from review as they are similar to previous changes (5)
- internal/buildapi/workspace_test.go
- internal/controller/imagebuild/controller.go
- cmd/caib/runtime_wiring.go
- cmd/caib/image/image.go
- cmd/main.go
| pvcSize := ws.Spec.PVCSize | ||
| if pvcSize == "" { | ||
| pvcSize = automotivev1alpha1.DefaultWorkspacePVCSize | ||
| } | ||
|
|
||
| pvc := &corev1.PersistentVolumeClaim{ | ||
| ObjectMeta: metav1.ObjectMeta{ | ||
| GenerateName: ws.Name + pvcSuffix + "-", | ||
| Namespace: ws.Namespace, | ||
| }, | ||
| Spec: corev1.PersistentVolumeClaimSpec{ | ||
| AccessModes: []corev1.PersistentVolumeAccessMode{corev1.ReadWriteOnce}, | ||
| Resources: corev1.VolumeResourceRequirements{ | ||
| Requests: corev1.ResourceList{ | ||
| corev1.ResourceStorage: resource.MustParse(pvcSize), |
There was a problem hiding this comment.
Wire OperatorConfig.Spec.Workspaces into this controller instead of hardcoded defaults.
The new workspace config surface is never read here: PVC size falls back to DefaultWorkspacePVCSize, the image falls back to DefaultToolchainImage, resources default in resourcesOrDefaults, user-supplied resources are never checked against MaxResources, and operator-level NodeSelector, Tolerations, and TmpfsBuildDir policy are ignored. Right now those admin settings are effectively dead config.
Also applies to: 152-276, 302-315
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@internal/controller/workspace/controller.go` around lines 98 - 112, The
controller is using hardcoded defaults (e.g., pvcSize fallback to
DefaultWorkspacePVCSize, image fallback to DefaultToolchainImage, resources via
resourcesOrDefaults) and ignores OperatorConfig.Spec.Workspaces policies; update
the code paths that build the PVC, PodSpec, and resource handling (look for
pvcSize, ws.Spec.PVCSize, resourcesOrDefaults, image creation, and any PodSpec
mutations) to first read and merge OperatorConfig.Spec.Workspaces for defaults
and policy: use OperatorConfig.Workspaces.PVCSize if ws.Spec.PVCSize is empty,
apply OperatorConfig.Workspaces.DefaultToolchainImage when ws image is unset,
enforce MaxResources by validating/clamping user-supplied resources in
resourcesOrDefaults, and copy operator-level NodeSelector, Tolerations, and
TmpfsBuildDir into the resulting Pod/PVC specs so admin policies are applied
everywhere the controller constructs workspace resources.
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (6)
internal/controller/workspace/controller.go (3)
174-176:⚠️ Potential issue | 🔴 CriticalDon't run caller-supplied workspace images as privileged root.
ws.Spec.Imagecomes straight from the CR, and this pod starts it withRunAsUser=0,Privileged=true, and theado-workspaceservice account. That's a cluster-compromise path, not just a workspace escape.Also applies to: 241-262
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/controller/workspace/controller.go` around lines 174 - 176, The controller currently sets image := ws.Spec.Image and allows arbitrary CR-provided images to run as root/privileged; change this to deny or override caller-supplied images and harden the PodSecurityContext: only allow ws.Spec.Image when it matches an approved/whitelist (or else use automotivev1alpha1.DefaultToolchainImage), and in the Pod spec ensure securityContext.Privileged is false, set securityContext.RunAsNonRoot = true and/or securityContext.RunAsUser to a non-zero UID, and avoid using the elevated ado-workspace service account for untrusted images; update the code paths referencing image (the image variable and the Pod creation logic around RunAsUser/Privileged) to enforce these checks and overrides.
109-129:⚠️ Potential issue | 🟠 MajorApply
OperatorConfig.Spec.Workspacesin this controller.Right now PVC size, default image, tmpfs behavior, node placement, and resource policy all come from hardcoded defaults or the CR itself because this controller never reads
OperatorConfig.Spec.Workspaces. That makes the new admin config surface ineffective for sparseWorkspacespecs.Also applies to: 174-177, 206-218, 259-285
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/controller/workspace/controller.go` around lines 109 - 129, The controller currently ignores OperatorConfig.Spec.Workspaces and uses hardcoded defaults (e.g., DefaultWorkspacePVCSize) or values directly from ws.Spec; update the reconcile logic to first read the OperatorConfig (OperatorConfig.Spec.Workspaces) and merge its workspace defaults into Workspace creation so that fields like pvcSize (ws.Spec.PVCSize), default image, tmpfs settings, node placement, and resource policy fallback to OperatorConfig values when the Workspace CR omits them; locate the PVC creation (pvcSize/DefaultWorkspacePVCSize usage), image/tmpfs/node placement/resource policy handling in the same controller (references: pvcSize, DefaultWorkspacePVCSize, ws.Spec.*, and the workspace-related create/update code paths) and apply the merged defaults before constructing the PVC, PodSpec, and resource objects so the OperatorConfig defaults are honored.
109-124:⚠️ Potential issue | 🔴 CriticalAvoid panicking on invalid
spec.pvcSize.
resource.MustParselets one bad size value crash the controller. Parse the quantity and return a reconcile error instead.Suggested fix
pvcSize := ws.Spec.PVCSize if pvcSize == "" { pvcSize = automotivev1alpha1.DefaultWorkspacePVCSize } + qty, parseErr := resource.ParseQuantity(pvcSize) + if parseErr != nil { + return fmt.Errorf("invalid workspace PVC size %q: %w", pvcSize, parseErr) + } pvc := &corev1.PersistentVolumeClaim{ ObjectMeta: metav1.ObjectMeta{ Name: pvcName, Namespace: ws.Namespace, @@ AccessModes: []corev1.PersistentVolumeAccessMode{corev1.ReadWriteOnce}, Resources: corev1.VolumeResourceRequirements{ Requests: corev1.ResourceList{ - corev1.ResourceStorage: resource.MustParse(pvcSize), + corev1.ResourceStorage: qty, }, }, }, }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/controller/workspace/controller.go` around lines 109 - 124, The code uses resource.MustParse(pvcSize) when building the PersistentVolumeClaim which will panic on an invalid PVC size; replace MustParse with resource.ParseQuantity(pvcSize), check the returned error, and if parsing fails return a reconcile error (or appropriate error from the current reconcile function) instead of continuing; update the PVC creation path around the pvcSize/pvc variable construction (referencing pvcSize, pvcName, and the workspace reconcile function in controller.go) to use the parsed quantity on corev1.ResourceStorage when parsing succeeds.internal/buildapi/server.go (3)
1517-1538:⚠️ Potential issue | 🔴 CriticalEnforce workspace ownership before
podExec.Because
createBuildresolves extra repos before it computesrequestedBy, this helper never verifiesws.Spec.Owner. Any authenticated caller who knows another workspace name can make the API exec inside that pod. Pass the requester intoresolveExtraReposand reject mismatched owners before the pod lookup/exec.Also applies to: 1695-1711
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/buildapi/server.go` around lines 1517 - 1538, createBuild currently calls resolveExtraRepos before computing requestedBy, so the helper never enforces workspace ownership and allows anyone knowing a workspace name to call podExec; fix by changing createBuild to compute the requester (requestedBy) first and pass that requester into resolveExtraRepos, then explicitly verify ws.Spec.Owner == requestedBy (or reject) before any pod lookup/podExec calls (e.g., before using ws.Status.PodName and calling podExec); update resolveExtraRepos signature to accept the requester and enforce owner checks in the code paths referenced around podExec and the duplicate block at the later location.
1535-1537:⚠️ Potential issue | 🔴 CriticalDo not interpolate
repoPathinto/bin/sh -c.
repoPathis caller-controlled input. Embedding it in the shell command lets a crafted path execute arbitrary commands in the workspace pod.Suggested fix
- cmd := []string{"/bin/sh", "-c", - fmt.Sprintf("cd %s && python3 -m http.server %d > /dev/null 2>&1 &", repoPath, port)} + cmd := []string{ + "/bin/sh", "-c", + "cd -- \"$1\" && nohup python3 -m http.server \"$2\" >/dev/null 2>&1 &", + "--", repoPath, fmt.Sprintf("%d", port), + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/buildapi/server.go` around lines 1535 - 1537, The code currently injects caller-controlled repoPath into a shell string (cmd) which permits command injection; instead, avoid using "/bin/sh -c" with interpolated paths: call the python binary directly (e.g., build cmd as []string{"python3", "-m", "http.server", strconv.Itoa(port)}) and pass repoPath to podExec via a dedicated working-directory option if podExec supports it, or—if podExec cannot set cwd—validate/sanitize repoPath (use path/filepath.Clean, ensure no shell metacharacters, enforce allowed prefix or whitelist) before using it and never concatenate it into a shell command; update the code paths referencing repoPath, podExec, workspaceContainerName and cmd accordingly.
1534-1539:⚠️ Potential issue | 🟠 MajorManage extra-repo servers explicitly.
This fire-and-forget daemonization leaks processes and silently hits port conflicts on repeated builds against the same workspace.
podExecsucceeding here only means the shell returned, not that the server actually bound the port.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/buildapi/server.go` around lines 1534 - 1539, The current fire-and-forget shell command used in cmd and executed via podExec (targeting workspaceContainerName / ws.Status.PodName) daemonizes python3 -m http.server and can leak processes or silently fail to bind the port; change the exec command so the server is started with explicit lifecycle and readiness checks: before starting, kill any stale PID (e.g. /tmp/httpserver.pid) if present, start the server and write its PID to a known file, then verify the port is actually listening (using ss/netstat or a short curl loop) and return an error from podExec if the bind fails (and clean up the PID/process), so that the call to podExec only succeeds when the HTTP server truly bound port; update error handling around podExec to propagate that failure and reference cmd, port, repoPath, wsName, podExec, workspaceContainerName, and ws.Status.PodName when implementing these changes.
🧹 Nitpick comments (2)
internal/common/tasks/scripts/find_manifest.sh (1)
28-35: Prefer warning on per-file copy failures instead of fully silent ignores.The new filtering is good, but
2>/dev/null || truecan hide actionable failures. A warning per failed file keeps behavior resilient while improving diagnosability.🛠️ Suggested tweak
- cp -rv "$item" /manifest-work/ 2>/dev/null || true + if ! cp -rv "$item" /manifest-work/ 2>/dev/null; then + echo "WARN: failed to copy '$item' into /manifest-work/" >&2 + fi🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/common/tasks/scripts/find_manifest.sh` around lines 28 - 35, The per-file copy currently swallows all errors with `cp -rv "$item" /manifest-work/ 2>/dev/null || true`, losing diagnostics; update the loop in find_manifest.sh so each failed `cp` emits a warning to stderr (including the source path and the cp exit code or error text) instead of silencing it completely — e.g., run cp without redirecting stderr, capture its exit status for each `item`, and on non-zero exit print a concise warning message (use the same loop and the `cp -rv "$item" /manifest-work/` invocation, then check `$?` and `echo` or `printf` to stderr).api/v1alpha1/operatorconfig_types.go (1)
359-362: Add aGetBuildCacheSize()fallback to match the documented default.
BuildCacheSizedocuments a default ("20Gi"), but this file doesn’t provide a getter/default path likeGetPVCSize()andGetDefaultArchitecture(). Centralizing that fallback here reduces drift across callers.♻️ Suggested refactor
const ( @@ // DefaultWorkspacePVCSize is the default PVC size for workspace storage DefaultWorkspacePVCSize = "10Gi" + + // DefaultWorkspaceBuildCacheSize is the default PVC size for workspace build cache + DefaultWorkspaceBuildCacheSize = "20Gi" ) @@ func (c *WorkspacesConfig) GetNodeSelector() map[string]string { if c != nil { return c.NodeSelector } return nil } + +// GetBuildCacheSize returns the workspace build cache PVC size, falling back to the default +func (c *WorkspacesConfig) GetBuildCacheSize() string { + if c != nil && c.BuildCacheSize != "" { + return c.BuildCacheSize + } + return DefaultWorkspaceBuildCacheSize +}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@api/v1alpha1/operatorconfig_types.go` around lines 359 - 362, Add a getter that returns the documented default for BuildCacheSize when empty: implement GetBuildCacheSize() on the same type that holds BuildCacheSize (mirror the pattern used by GetPVCSize() and GetDefaultArchitecture()), so it returns the BuildCacheSize field if non-empty and otherwise returns the literal default "20Gi". Update callers to use GetBuildCacheSize() instead of reading BuildCacheSize directly to centralize the fallback logic.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@internal/buildapi/server.go`:
- Around line 1615-1645: The PVC creation currently uses
resource.MustParse(cacheSize) which will panic on invalid operator input; change
this to parse the quantity with resource.ParseQuantity (or resource.Parse) when
reading wsConfig.BuildCacheSize, check the returned error, and handle it by
returning a proper API error response (or falling back to the default "20Gi")
before constructing the PVC Spec so no panic can occur; update the code paths
around wsConfig.BuildCacheSize, cacheSize, and the PVC Spec/Resources usage to
use the validated quantity variable instead of resource.MustParse.
In `@internal/controller/imagebuild/controller.go`:
- Around line 754-767: The current branch replaces the entire shared-workspace
with the BuildCachePVC (variable sharedWorkspaceBinding), which discards
per-build upload content; instead keep a per-build shared workspace and mount
the cache PVC under a separate workspace name. Change the logic in controller.go
so that when imageBuild.Spec.BuildCachePVC is set you do NOT set
sharedWorkspaceBinding to the PVC; instead preserve the existing
shared-workspace binding (e.g., from VolumeClaimTemplate or an EmptyDir) and add
a second tektonv1.WorkspaceBinding with Name "build-cache" whose
PersistentVolumeClaim.ClaimName is imageBuild.Spec.BuildCachePVC (or, if you
prefer to prohibit the unsafe combination, return an error when both
BuildCachePVC and InputFilesServer/PVCName are present). Update any Task/TaskRun
bindings to reference the new "build-cache" workspace rather than mounting the
cache at "/workspace/shared".
---
Duplicate comments:
In `@internal/buildapi/server.go`:
- Around line 1517-1538: createBuild currently calls resolveExtraRepos before
computing requestedBy, so the helper never enforces workspace ownership and
allows anyone knowing a workspace name to call podExec; fix by changing
createBuild to compute the requester (requestedBy) first and pass that requester
into resolveExtraRepos, then explicitly verify ws.Spec.Owner == requestedBy (or
reject) before any pod lookup/podExec calls (e.g., before using
ws.Status.PodName and calling podExec); update resolveExtraRepos signature to
accept the requester and enforce owner checks in the code paths referenced
around podExec and the duplicate block at the later location.
- Around line 1535-1537: The code currently injects caller-controlled repoPath
into a shell string (cmd) which permits command injection; instead, avoid using
"/bin/sh -c" with interpolated paths: call the python binary directly (e.g.,
build cmd as []string{"python3", "-m", "http.server", strconv.Itoa(port)}) and
pass repoPath to podExec via a dedicated working-directory option if podExec
supports it, or—if podExec cannot set cwd—validate/sanitize repoPath (use
path/filepath.Clean, ensure no shell metacharacters, enforce allowed prefix or
whitelist) before using it and never concatenate it into a shell command; update
the code paths referencing repoPath, podExec, workspaceContainerName and cmd
accordingly.
- Around line 1534-1539: The current fire-and-forget shell command used in cmd
and executed via podExec (targeting workspaceContainerName / ws.Status.PodName)
daemonizes python3 -m http.server and can leak processes or silently fail to
bind the port; change the exec command so the server is started with explicit
lifecycle and readiness checks: before starting, kill any stale PID (e.g.
/tmp/httpserver.pid) if present, start the server and write its PID to a known
file, then verify the port is actually listening (using ss/netstat or a short
curl loop) and return an error from podExec if the bind fails (and clean up the
PID/process), so that the call to podExec only succeeds when the HTTP server
truly bound port; update error handling around podExec to propagate that failure
and reference cmd, port, repoPath, wsName, podExec, workspaceContainerName, and
ws.Status.PodName when implementing these changes.
In `@internal/controller/workspace/controller.go`:
- Around line 174-176: The controller currently sets image := ws.Spec.Image and
allows arbitrary CR-provided images to run as root/privileged; change this to
deny or override caller-supplied images and harden the PodSecurityContext: only
allow ws.Spec.Image when it matches an approved/whitelist (or else use
automotivev1alpha1.DefaultToolchainImage), and in the Pod spec ensure
securityContext.Privileged is false, set securityContext.RunAsNonRoot = true
and/or securityContext.RunAsUser to a non-zero UID, and avoid using the elevated
ado-workspace service account for untrusted images; update the code paths
referencing image (the image variable and the Pod creation logic around
RunAsUser/Privileged) to enforce these checks and overrides.
- Around line 109-129: The controller currently ignores
OperatorConfig.Spec.Workspaces and uses hardcoded defaults (e.g.,
DefaultWorkspacePVCSize) or values directly from ws.Spec; update the reconcile
logic to first read the OperatorConfig (OperatorConfig.Spec.Workspaces) and
merge its workspace defaults into Workspace creation so that fields like pvcSize
(ws.Spec.PVCSize), default image, tmpfs settings, node placement, and resource
policy fallback to OperatorConfig values when the Workspace CR omits them;
locate the PVC creation (pvcSize/DefaultWorkspacePVCSize usage),
image/tmpfs/node placement/resource policy handling in the same controller
(references: pvcSize, DefaultWorkspacePVCSize, ws.Spec.*, and the
workspace-related create/update code paths) and apply the merged defaults before
constructing the PVC, PodSpec, and resource objects so the OperatorConfig
defaults are honored.
- Around line 109-124: The code uses resource.MustParse(pvcSize) when building
the PersistentVolumeClaim which will panic on an invalid PVC size; replace
MustParse with resource.ParseQuantity(pvcSize), check the returned error, and if
parsing fails return a reconcile error (or appropriate error from the current
reconcile function) instead of continuing; update the PVC creation path around
the pvcSize/pvc variable construction (referencing pvcSize, pvcName, and the
workspace reconcile function in controller.go) to use the parsed quantity on
corev1.ResourceStorage when parsing succeeds.
---
Nitpick comments:
In `@api/v1alpha1/operatorconfig_types.go`:
- Around line 359-362: Add a getter that returns the documented default for
BuildCacheSize when empty: implement GetBuildCacheSize() on the same type that
holds BuildCacheSize (mirror the pattern used by GetPVCSize() and
GetDefaultArchitecture()), so it returns the BuildCacheSize field if non-empty
and otherwise returns the literal default "20Gi". Update callers to use
GetBuildCacheSize() instead of reading BuildCacheSize directly to centralize the
fallback logic.
In `@internal/common/tasks/scripts/find_manifest.sh`:
- Around line 28-35: The per-file copy currently swallows all errors with `cp
-rv "$item" /manifest-work/ 2>/dev/null || true`, losing diagnostics; update the
loop in find_manifest.sh so each failed `cp` emits a warning to stderr
(including the source path and the cp exit code or error text) instead of
silencing it completely — e.g., run cp without redirecting stderr, capture its
exit status for each `item`, and on non-zero exit print a concise warning
message (use the same loop and the `cp -rv "$item" /manifest-work/` invocation,
then check `$?` and `echo` or `printf` to stderr).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 45057903-3168-40ab-9df2-0033e61847f7
⛔ Files ignored due to path filters (3)
config/crd/bases/automotive.sdv.cloud.redhat.com_imagebuilds.yamlis excluded by!config/crd/bases/**config/crd/bases/automotive.sdv.cloud.redhat.com_operatorconfigs.yamlis excluded by!config/crd/bases/**config/crd/bases/automotive.sdv.cloud.redhat.com_workspaces.yamlis excluded by!config/crd/bases/**
📒 Files selected for processing (17)
api/v1alpha1/imagebuild_types.goapi/v1alpha1/operatorconfig_types.goapi/v1alpha1/workspace_types.gocmd/caib/buildcmd/build.gocmd/caib/image/image.gocmd/caib/main.gocmd/caib/runtime_wiring.gocmd/caib/workspace/workspace.gointernal/buildapi/server.gointernal/buildapi/types.gointernal/buildapi/workspace.gointernal/common/tasks/scripts/build_image.shinternal/common/tasks/scripts/find_manifest.shinternal/common/tasks/scripts/push_artifact.shinternal/common/tasks/tasks.gointernal/controller/imagebuild/controller.gointernal/controller/workspace/controller.go
🚧 Files skipped from review as they are similar to previous changes (4)
- cmd/caib/workspace/workspace.go
- api/v1alpha1/workspace_types.go
- internal/buildapi/workspace.go
- cmd/caib/buildcmd/build.go
| // Determine the shared-workspace binding: | ||
| // - If BuildCachePVC is set, use it as the shared workspace for build cache persistence | ||
| // - If InputFilesServer is enabled and a PVC already exists (from upload phase), use it | ||
| // - Otherwise, use VolumeClaimTemplate to create a new PVC with proper zone affinity | ||
| var sharedWorkspaceBinding tektonv1.WorkspaceBinding | ||
| if imageBuild.Spec.GetInputFilesServer() && imageBuild.Status.PVCName != "" { | ||
| if imageBuild.Spec.BuildCachePVC != "" { | ||
| log.Info("Using build-cache PVC as shared workspace", "pvc", imageBuild.Spec.BuildCachePVC) | ||
| sharedWorkspaceBinding = tektonv1.WorkspaceBinding{ | ||
| Name: "shared-workspace", | ||
| PersistentVolumeClaim: &corev1.PersistentVolumeClaimVolumeSource{ | ||
| ClaimName: imageBuild.Spec.BuildCachePVC, | ||
| }, | ||
| } | ||
| } else if imageBuild.Spec.GetInputFilesServer() && imageBuild.Status.PVCName != "" { |
There was a problem hiding this comment.
Don't use the build-cache claim as the whole shared-workspace.
BuildCachePVC and InputFilesServer can both be set, but this branch replaces the upload PVC with the cache PVC. That makes uploaded source_path content disappear from the build, and same-workspace builds also end up racing over one /workspace/shared root because the build script cleans that root on every run. Keep a per-build shared workspace and mount the persistent claim only under build-cache/, or reject the combination up front.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@internal/controller/imagebuild/controller.go` around lines 754 - 767, The
current branch replaces the entire shared-workspace with the BuildCachePVC
(variable sharedWorkspaceBinding), which discards per-build upload content;
instead keep a per-build shared workspace and mount the cache PVC under a
separate workspace name. Change the logic in controller.go so that when
imageBuild.Spec.BuildCachePVC is set you do NOT set sharedWorkspaceBinding to
the PVC; instead preserve the existing shared-workspace binding (e.g., from
VolumeClaimTemplate or an EmptyDir) and add a second tektonv1.WorkspaceBinding
with Name "build-cache" whose PersistentVolumeClaim.ClaimName is
imageBuild.Spec.BuildCachePVC (or, if you prefer to prohibit the unsafe
combination, return an error when both BuildCachePVC and
InputFilesServer/PVCName are present). Update any Task/TaskRun bindings to
reference the new "build-cache" workspace rather than mounting the cache at
"/workspace/shared".
There was a problem hiding this comment.
Actionable comments posted: 6
♻️ Duplicate comments (2)
internal/buildapi/workspace.go (2)
624-626:⚠️ Potential issue | 🟠 MajorReinitialize
RequestsandLimitsafterDeepCopy().
DeepCopy()preserves nil maps. Ifworkspaces.resourcesis{}or only partially populated, the assignments below still panic on a nil map.Suggested fix
if wsConfig != nil && wsConfig.Resources != nil { res = wsConfig.Resources.DeepCopy() } + if res.Requests == nil { + res.Requests = corev1.ResourceList{} + } + if res.Limits == nil { + res.Limits = corev1.ResourceList{} + } if cpu != "" {Also applies to: 628-653
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/buildapi/workspace.go` around lines 624 - 626, The DeepCopy() call for wsConfig.Resources can leave Requests and Limits maps nil, causing panics when code later assigns into them; after res = wsConfig.Resources.DeepCopy() (and similarly in the other copy sites), check and reinitialize res.Requests and res.Limits to empty maps when nil (e.g., if res.Requests == nil { res.Requests = make(map[string]resource.Quantity) } and same for res.Limits) before any assignments; update the code paths that mutate res (look for usages of res.Requests and res.Limits in this function) to ensure they always operate on non-nil maps.
375-382:⚠️ Potential issue | 🟠 MajorKeep sync uploads size-bounded instead of
io.ReadAll-ing them.Even if buffering is needed for the SPDY EOF behavior, this still turns
/syncinto an easy OOM path.copyToPodalready accepts anio.Reader; cap the body with the existing upload limit and stream or spool it instead of holding the whole tarball in RAM.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/buildapi/workspace.go` around lines 375 - 382, Replace the io.ReadAll buffering of c.Request.Body with a bounded stream: wrap the request body in an io.LimitReader using the existing upload limit (e.g. maxUploadSize/uploadLimit) and stream it directly into copyToPod instead of buffering into tarData; if you need to detect an exceeded limit, copy up to limit+1 (or copy to io.Discard after the limit) and return an HTTP 413 if more data was present. Update the call site that currently reads tarData and calls copyToPod to pass the limited reader (c.Request.Body wrapped) and adjust error responses accordingly (use 413 for too-large, 500 for other read/copy errors) so /sync no longer OOMs.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@internal/buildapi/workspace.go`:
- Around line 155-160: The code currently ignores the error returned by
loadOperatorConfigFn and treats any failure as no config; update the logic
around the call to loadOperatorConfigFn(c.Request.Context(), k8sClient,
namespace) to capture its error return, and if the error is non-nil and not a
NotFound error (use the k8s errors helper, e.g. k8serrors.IsNotFound(err)),
return an internal server error (HTTP 500) or propagate the error to fail
closed; only when err is nil proceed to set wsConfig =
operatorConfig.Spec.Workspaces, and when err is a NotFound keep wsConfig nil.
- Around line 425-427: The wsUpgrader currently allows all origins via
wsUpgrader.CheckOrigin which opens the WebSocket shell to CSRF; replace the
permissive CheckOrigin with a strict same-origin check that validates
r.Header.Get("Origin") (parse and compare its host and scheme) against r.Host
(and expected scheme) or against a configured allowlist, and return false by
default on mismatch; update the wsUpgrader variable (and any usage in the
WebSocket handler that uses a.authMiddleware()) to use this stricter CheckOrigin
so only requests from the same origin or explicit allowed origins can upgrade.
In `@internal/common/tasks/scripts/build_image.sh`:
- Around line 59-69: The persistent cache cleanup leaves "$BUILD_DIR/image.json"
intact and that stale manifest gets copied later; update build_image.sh to
explicitly remove any persistent image manifest before publishing by deleting
"$BUILD_DIR/image.json" (or "$WORKSPACE_PATH/$BUILD_DIR/image.json" as
appropriate) during the cleanup stage that uses WORKSPACE_PATH and BUILD_DIR so
no old image.json can be carried forward.
In `@internal/controller/operatorconfig/controller.go`:
- Around line 875-889: The delete of the OpenShift-only
SecurityContextConstraints in cleanupWorkspaceInfra should not fail on a
non-OpenShift cluster; update the SCC deletion to either first check
detectOpenShift(ctx) (only call r.Delete for scc when running on OpenShift) or
treat a no-match error from r.Delete as non-fatal by using isNoMatchError(err)
alongside errors.IsNotFound(err) before returning an error; specifically adjust
the block that constructs scc (workspaceSCCName) and calls r.Delete so that
detectOpenShift or isNoMatchError guards the failure path.
In `@internal/controller/operatorconfig/resources.go`:
- Around line 332-334: The Route is currently targeting the service port name
"http" (RoutePort.TargetPort set via intstr.FromString("http")), which bypasses
the oauth-proxy sidecar; change the RoutePort.TargetPort to point to the service
port used by the oauth-proxy (e.g. intstr.FromString("proxy") or the actual
proxy port name) so traffic is routed through the proxy container and preserves
the OpenShift OAuth redirect path.
In `@internal/controller/workspace/controller.go`:
- Around line 254-257: The shell redirection in the Command literal in
controller.go uses the Bash-only operator "&>/dev/null" which breaks when
/bin/sh is not Bash; update the command string used where Command:
[]string{"/bin/sh", "-c", ...} (the fragment containing "if ! id workspace
&>/dev/null; then useradd -u 1000 -d /workspace -s /bin/bash workspace; fi") to
use the POSIX-compatible redirection ">/dev/null 2>&1" instead so the existence
check works under dash/ash and other /bin/sh implementations.
---
Duplicate comments:
In `@internal/buildapi/workspace.go`:
- Around line 624-626: The DeepCopy() call for wsConfig.Resources can leave
Requests and Limits maps nil, causing panics when code later assigns into them;
after res = wsConfig.Resources.DeepCopy() (and similarly in the other copy
sites), check and reinitialize res.Requests and res.Limits to empty maps when
nil (e.g., if res.Requests == nil { res.Requests =
make(map[string]resource.Quantity) } and same for res.Limits) before any
assignments; update the code paths that mutate res (look for usages of
res.Requests and res.Limits in this function) to ensure they always operate on
non-nil maps.
- Around line 375-382: Replace the io.ReadAll buffering of c.Request.Body with a
bounded stream: wrap the request body in an io.LimitReader using the existing
upload limit (e.g. maxUploadSize/uploadLimit) and stream it directly into
copyToPod instead of buffering into tarData; if you need to detect an exceeded
limit, copy up to limit+1 (or copy to io.Discard after the limit) and return an
HTTP 413 if more data was present. Update the call site that currently reads
tarData and calls copyToPod to pass the limited reader (c.Request.Body wrapped)
and adjust error responses accordingly (use 413 for too-large, 500 for other
read/copy errors) so /sync no longer OOMs.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 6ef2ce0e-6d01-46b4-9b8c-4bae443890a2
⛔ Files ignored due to path filters (2)
config/crd/bases/automotive.sdv.cloud.redhat.com_operatorconfigs.yamlis excluded by!config/crd/bases/**config/rbac/role.yamlis excluded by!config/rbac/role.yaml
📒 Files selected for processing (8)
api/v1alpha1/operatorconfig_types.goconfig/rbac/scc_role.yamlconfig/rbac/scc_role_binding.yamlinternal/buildapi/workspace.gointernal/common/tasks/scripts/build_image.shinternal/controller/operatorconfig/controller.gointernal/controller/operatorconfig/resources.gointernal/controller/workspace/controller.go
| echo "Cleaning up stale artifacts from persistent workspace..." | ||
| for item in "$WORKSPACE_PATH"/*; do | ||
| [ "$(basename "$item")" = "build-cache" ] && continue | ||
| rm -rf "$item" | ||
| done | ||
|
|
||
| for orphan in "$BUILD_DIR"/image_output--*; do | ||
| [ -e "$orphan" ] || break | ||
| echo "Removing orphaned build output: $(basename "$orphan")" | ||
| rm -rf "$orphan" | ||
| done |
There was a problem hiding this comment.
Drop stale image.json from the persistent cache before publishing artifacts.
The new cleanup keeps "$BUILD_DIR/image.json" around, and Line 530 copies it for every build mode. disk builds do not obviously regenerate that file, so a persistent-cache run can publish the previous build's manifest as the current artifact metadata.
Suggested fix
for orphan in "$BUILD_DIR"/image_output--*; do
[ -e "$orphan" ] || break
echo "Removing orphaned build output: $(basename "$orphan")"
rm -rf "$orphan"
done
+ rm -f "$BUILD_DIR/image.json"Also applies to: 530-530
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@internal/common/tasks/scripts/build_image.sh` around lines 59 - 69, The
persistent cache cleanup leaves "$BUILD_DIR/image.json" intact and that stale
manifest gets copied later; update build_image.sh to explicitly remove any
persistent image manifest before publishing by deleting "$BUILD_DIR/image.json"
(or "$WORKSPACE_PATH/$BUILD_DIR/image.json" as appropriate) during the cleanup
stage that uses WORKSPACE_PATH and BUILD_DIR so no old image.json can be carried
forward.
| Port: &routev1.RoutePort{ | ||
| TargetPort: intstr.FromString("proxy"), | ||
| TargetPort: intstr.FromString("http"), | ||
| }, |
There was a problem hiding this comment.
Keep the OpenShift Route on the proxy service port.
Pointing the Route at "http" sends traffic straight to the build-api container on 8080 and bypasses the oauth-proxy sidecar completely. That regresses the OpenShift auth path this deployment still sets up via the proxy container, service port, and OAuth redirect annotation.
Suggested fix
Port: &routev1.RoutePort{
- TargetPort: intstr.FromString("http"),
+ TargetPort: intstr.FromString("proxy"),
},📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| Port: &routev1.RoutePort{ | |
| TargetPort: intstr.FromString("proxy"), | |
| TargetPort: intstr.FromString("http"), | |
| }, | |
| Port: &routev1.RoutePort{ | |
| TargetPort: intstr.FromString("proxy"), | |
| }, |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@internal/controller/operatorconfig/resources.go` around lines 332 - 334, The
Route is currently targeting the service port name "http" (RoutePort.TargetPort
set via intstr.FromString("http")), which bypasses the oauth-proxy sidecar;
change the RoutePort.TargetPort to point to the service port used by the
oauth-proxy (e.g. intstr.FromString("proxy") or the actual proxy port name) so
traffic is routed through the proxy container and preserves the OpenShift OAuth
redirect path.
4ad1147 to
143d549
Compare
There was a problem hiding this comment.
Actionable comments posted: 6
♻️ Duplicate comments (8)
internal/common/tasks/scripts/build_image.sh (1)
59-69:⚠️ Potential issue | 🟠 MajorDrop stale
image.jsonbefore reusing the persistent cache.The cleanup keeps
"$BUILD_DIR/image.json"around, and Line 528 republishes whatever is already there.diskmode does not write a fresh manifest throughCOMMON_BUILD_ARGS, so a warm-cache run can attach stale manifest metadata to the current artifact.Suggested fix
for orphan in "$BUILD_DIR"/image_output--*; do [ -e "$orphan" ] || break echo "Removing orphaned build output: $(basename "$orphan")" rm -rf "$orphan" done + rm -f "$BUILD_DIR/image.json"Also applies to: 528-528
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/common/tasks/scripts/build_image.sh` around lines 59 - 69, The cleanup currently preserves "$BUILD_DIR/image.json" which can lead to reusing stale manifests; update the cleanup in build_image.sh to explicitly remove the persistent manifest before reusing the cache by deleting "$BUILD_DIR/image.json" (or any manifest file) prior to the reuse logic so disk-mode runs using COMMON_BUILD_ARGS don't republish old metadata—ensure the removal happens alongside the existing workspace and orphan cleanup that references WORKSPACE_PATH and BUILD_DIR.internal/controller/operatorconfig/controller.go (1)
894-898:⚠️ Potential issue | 🟠 MajorIgnore
NoMatchwhen deleting the workspace SCC.Line 896 still deletes an OpenShift-only
SecurityContextConstraintsunconditionally. On a Kubernetes cluster, that no-match error bubbles out ofcleanupOSBuilds()and can block finalizer removal. Gate this behinddetectOpenShift()or treatisNoMatchError(err)as non-fatal, like the Route cleanup path.Suggested fix
- scc := &securityv1.SecurityContextConstraints{} - scc.Name = workspaceSCCName - if err := r.Delete(ctx, scc); err != nil && !errors.IsNotFound(err) { - return fmt.Errorf("failed to delete workspace SCC: %w", err) - } + if r.detectOpenShift(ctx, config.Namespace) { + scc := &securityv1.SecurityContextConstraints{} + scc.Name = workspaceSCCName + if err := r.Delete(ctx, scc); err != nil && !errors.IsNotFound(err) && !isNoMatchError(err) { + return fmt.Errorf("failed to delete workspace SCC: %w", err) + } + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/controller/operatorconfig/controller.go` around lines 894 - 898, The deletion of the OpenShift-only SecurityContextConstraints (securityv1.SecurityContextConstraints with name workspaceSCCName via r.Delete called from cleanupOSBuilds()) must not surface a "NoMatch" error on vanilla Kubernetes; either gate the SCC delete behind detectOpenShift() or treat isNoMatchError(err) as non-fatal (similar to the Route cleanup path) so cleanupOSBuilds() ignores NoMatch and only returns real errors.internal/controller/operatorconfig/resources.go (1)
645-660:⚠️ Potential issue | 🟠 MajorDrop the extra write verbs from the root
workspacesrule.The separate
workspaces/statusandworkspaces/finalizersrules already cover the controller's write paths. Leavingcreate/deleteonworkspacesgives this reconciler unnecessary power over user workspaces.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/controller/operatorconfig/resources.go` around lines 645 - 660, The RBAC rule for the root "workspaces" resource grants unnecessary write permissions; remove the extra write verbs from the APIGroups entry that lists Resources: []string{"workspaces"} by deleting "create" and "delete" from its Verbs slice so that the root rule only has read/list/watch verbs (e.g., "get","list","watch"); leave the separate "workspaces/status" and "workspaces/finalizers" rules untouched (they cover controller write operations).internal/buildapi/server.go (3)
1642-1674:⚠️ Potential issue | 🟠 MajorParse
BuildCacheSizedefensively instead of usingMustParse.
Workspaces.BuildCacheSizeis operator-configurable. A bad value here drops this request onto the panic/recovery path instead of returning a normal validation error.🛡️ Suggested change
if wsConfig != nil { if wsConfig.BuildCacheSize != "" { cacheSize = wsConfig.BuildCacheSize } if sc := wsConfig.GetStorageClass(); sc != "" { storageClassName = &sc } } + qty, err := resource.ParseQuantity(cacheSize) + if err != nil { + return "", fmt.Errorf("invalid workspace build cache size %q: %w", cacheSize, err) + } pvc := &corev1.PersistentVolumeClaim{ @@ Resources: corev1.VolumeResourceRequirements{ Requests: corev1.ResourceList{ - corev1.ResourceStorage: resource.MustParse(cacheSize), + corev1.ResourceStorage: qty, }, }, }, }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/buildapi/server.go` around lines 1642 - 1674, The code currently uses resource.MustParse(cacheSize) when building the PersistentVolumeClaim (pvc) which will panic on invalid operator-configured Workspaces.BuildCacheSize; replace this with resource.ParseQuantity(cacheSize) (or resource.ParseQuantity on wsConfig.BuildCacheSize) and handle the returned (quantity, error) before constructing the PersistentVolumeClaimSpec so you can return a validation error (or reconcile error) instead of panicking; update the pvc creation to use the parsed Quantity variable and ensure error handling returns a clear error (or logs and requeues) referencing BuildCacheSize, pvc/PersistentVolumeClaimSpec, and resource.MustParse to locate the change.
1723-1739:⚠️ Potential issue | 🔴 CriticalResolve the requester before expanding
ExtraRepos.This path does workspace lookup and pod access before
requestedByis available, soresolveExtraReposhas no way to enforcews.Spec.Owner. Any authenticated caller can currently point--extra-repoat another user's running workspace by name.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/buildapi/server.go` around lines 1723 - 1739, Resolve the requestor identity before expanding ExtraRepos: move or add the requester resolution step (the code that sets req.RequestedBy / calls whatever resolves the requester from the request context) to run before the block that calls resolveExtraRepos, or change resolveExtraRepos(signature) to accept the resolved requester and use it to enforce ws.Spec.Owner; then call getClientFromRequest/getRESTConfigFromRequest and a.resolveExtraRepos only after req.RequestedBy is set so extra-repo expansion cannot access another user's workspace. Ensure you reference resolveExtraRepos and req.ExtraRepos when making the change and enforce the owner check inside resolveExtraRepos if you add the requester parameter.
1534-1540:⚠️ Potential issue | 🔴 CriticalDon't launch the repo server through an interpolated shell command.
repoPathis inserted directly into/bin/sh -c, which is command-injection prone. The fire-and-forget background process is also never health-checked or cleaned up, so repeated builds can return dead or conflicting repo URLs even whenpodExecitself succeeds.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/buildapi/server.go` around lines 1534 - 1540, The current code constructs a shell command with repoPath interpolated into "/bin/sh -c" (cmd variable) which is command-injection prone and leaves a fire-and-forget process unmanaged; fix by invoking the HTTP server without a shell (e.g. call python3 via exec form: cmd := []string{"python3","-m","http.server", fmt.Sprintf("%d", port), "--directory", repoPath}) so repoPath is passed as an argument, then update podExec usage to use that cmd (and workspaceContainerName/wsName references) and implement lifecycle/health management: before starting check/terminate any existing repo server (e.g. stored PID file or probe the port), start the server in a way you can monitor (write its PID to a known file inside repoPath), and perform a quick health-check (connect to localhost:port) before returning so repeated builds don't return stale/conflicting URLs.cmd/caib/workspace/workspace.go (2)
603-613:⚠️ Potential issue | 🟡 MinorKeep the exec command as argv instead of flattening it.
Joining
cmdPartsback into one string loses quoting and argument boundaries, so commands with spaces or shell metacharacters do not round-trip reliably. Send argv verbatim inWorkspaceExecRequestinstead of rebuilding a shell string here.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cmd/caib/workspace/workspace.go` around lines 603 - 613, The code flattens cmdParts into a single string (variable command) which loses argument quoting; instead stop using strings.Join and pass the argv slice verbatim into the request: remove creation of the string variable command and set the buildapitypes.WorkspaceExecRequest to use the original cmdParts slice (e.g., WorkspaceExecRequest.Command = cmdParts or the appropriate slice field name) so exec receives argv unchanged; update any downstream usage that expected a string accordingly.
502-508:⚠️ Potential issue | 🟠 MajorCreate a fresh upload reader for each re-auth attempt.
ExecuteWithReauthcan rerun the callback, butpris built once frombuf. Any retry after the first read uploads a truncated or empty tar stream.♻️ Suggested change
totalBytes := int64(buf.Len()) - pr := newProgressReader(&buf, totalBytes) + archive := append([]byte(nil), buf.Bytes()...) err := caibcommon.ExecuteWithReauth(serverURL, &authToken, insecureSkipTLS, func(client *buildapiclient.Client) error { - return client.SyncWorkspace(context.Background(), name, pr) + pr := newProgressReader(bytes.NewReader(archive), totalBytes) + err := client.SyncWorkspace(context.Background(), name, pr) + pr.finish() + return err }) - pr.finish() if err != nil { handleError(fmt.Errorf("failed to sync workspace: %w", err)) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cmd/caib/workspace/workspace.go` around lines 502 - 508, The progress reader `pr` is created once from `buf` but `ExecuteWithReauth` may retry the callback, so subsequent attempts read an already-consumed stream; fix by constructing a fresh progress reader for each attempt inside the callback passed to `ExecuteWithReauth` (use the original buffer bytes and `totalBytes` to create a new reader via `newProgressReader` on each call) and ensure `pr.finish()` is invoked for that attempt (e.g., defer `pr.finish()` inside the callback). Update the callback that calls `client.SyncWorkspace` to create and finish its own `pr` rather than using the `pr` created once outside.
🧹 Nitpick comments (2)
Containerfile.toolchain (2)
103-103: Consider tightening workspace directory permissions.Line 103 sets
chmod 777, granting world-writable permissions. While this accommodates OpenShift's arbitrary UIDs, it allows any process in the container to modify workspace content. If the workspace Pod's security context sets a predictablefsGroup, you can use775or770and rely on group ownership instead.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Containerfile.toolchain` at line 103, The RUN command using "chmod 777 /workspace /workspace/src /workspace/cache" makes the workspace world-writable; change the RUN step that creates the directories (the line with "RUN mkdir -p /workspace/src /workspace/cache && chmod 777 ...") to tighten permissions (e.g., use 775 or 770) and, if relying on a Pod fsGroup, set appropriate group ownership with chown so group access works for arbitrary UIDs; ensure the mkdir/chmod/chown sequence preserves OpenShift compatibility while removing global write permissions.
57-57: Clean DNF cache after EPEL installation.The EPEL installation leaves package metadata cached, increasing the image size unnecessarily.
🧹 Proposed fix to reduce image size
-RUN dnf install -y https://dl.fedoraproject.org/pub/epel/epel-release-latest-10.noarch.rpm +RUN dnf install -y https://dl.fedoraproject.org/pub/epel/epel-release-latest-10.noarch.rpm && dnf clean all🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Containerfile.toolchain` at line 57, After installing EPEL in the RUN that currently executes "dnf install -y https://dl.fedoraproject.org/pub/epel/epel-release-latest-10.noarch.rpm", chain DNF cache cleanup to avoid leaving package metadata in the image; update the RUN that contains this command to run dnf clean all (and remove /var/cache/dnf) immediately after the install so the image doesn’t keep cached package metadata.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@Containerfile.toolchain`:
- Line 10: Update the GPG key ARG by replacing the incorrect value used in the
Containerfile: change the build arg named ARG GPG_KEY (currently
"RPM-GPG-KEY-centosofficial") to the Automotive SIG key
"RPM-GPG-KEY-CentOS-SIG-Automotive" so the centos-release-autosd/AutoSD
repository uses the correct key.
In `@internal/buildapi/client/client_test.go`:
- Around line 233-249: The test is checking URL escaping using the decoded
r.URL.Path; update the assertion to inspect the raw/escaped path instead by
replacing r.URL.Path with r.URL.EscapedPath() (or r.RequestURI) inside the
handler in the Context("workspaceAction with URL-unsafe names") test so the
StartWorkspace client call (created by New) is validated against the
percent-encoded path "/v1/workspaces/my%20app/start" rather than the decoded
"/v1/workspaces/my app/start".
In `@internal/buildapi/client/client.go`:
- Around line 923-926: The current code constructs a zero-value websocket.Dialer
(dialer := websocket.Dialer{}) which drops Gorilla's default Proxy and
HandshakeTimeout; instead clone websocket.DefaultDialer first and then overlay
TLS and proxy settings from the HTTP transport: get t :=
c.httpClient.Transport.(*http.Transport) (if ok), copy dialer :=
*websocket.DefaultDialer, set dialer.TLSClientConfig = t.TLSClientConfig if
non-nil and ensure dialer.Proxy is set from t.Proxy (or
http.ProxyFromEnvironment) so proxy resolution and HandshakeTimeout are
preserved for shell sessions.
- Around line 738-915: The code double-escapes workspace names by using
url.PathEscape + path.Join before calling c.resolve; replace those constructions
with url.JoinPath (or URL.JoinPath) to compose the path segments without
pre-escaping so the workspace name is encoded exactly once. Update endpoint
construction in GetWorkspace, StartWorkspace, StopWorkspace, workspaceAction,
DeleteWorkspace, SyncPlan, SyncWorkspace, workspaceStreamRequest, and
ShellWorkspace to call url.JoinPath("/v1/workspaces", name, ...) (or the
appropriate variants) and pass that result to c.resolve, removing manual
url.PathEscape usage.
In `@internal/buildapi/server.go`:
- Around line 1624-1680: The current list-then-GenerateName flow
(buildCacheLabels, pvc with GenerateName, k8sClient.Create) is TOCTOU; replace
it with a deterministic PVC name (e.g. wsName+"-build-cache") and create-or-get
semantics: construct the PVC Spec but set a fixed ObjectMeta.Name instead of
GenerateName, attempt k8sClient.Create(ctx, pvc), and if Create returns
AlreadyExists call k8sClient.Get to retrieve and return the existing PVC name
(or otherwise reconcile/update the existing PVC’s spec/storageClass/size if
needed); ensure you still set the same OwnerReferences and labels so lookups by
workspace remain deterministic.
---
Duplicate comments:
In `@cmd/caib/workspace/workspace.go`:
- Around line 603-613: The code flattens cmdParts into a single string (variable
command) which loses argument quoting; instead stop using strings.Join and pass
the argv slice verbatim into the request: remove creation of the string variable
command and set the buildapitypes.WorkspaceExecRequest to use the original
cmdParts slice (e.g., WorkspaceExecRequest.Command = cmdParts or the appropriate
slice field name) so exec receives argv unchanged; update any downstream usage
that expected a string accordingly.
- Around line 502-508: The progress reader `pr` is created once from `buf` but
`ExecuteWithReauth` may retry the callback, so subsequent attempts read an
already-consumed stream; fix by constructing a fresh progress reader for each
attempt inside the callback passed to `ExecuteWithReauth` (use the original
buffer bytes and `totalBytes` to create a new reader via `newProgressReader` on
each call) and ensure `pr.finish()` is invoked for that attempt (e.g., defer
`pr.finish()` inside the callback). Update the callback that calls
`client.SyncWorkspace` to create and finish its own `pr` rather than using the
`pr` created once outside.
In `@internal/buildapi/server.go`:
- Around line 1642-1674: The code currently uses resource.MustParse(cacheSize)
when building the PersistentVolumeClaim (pvc) which will panic on invalid
operator-configured Workspaces.BuildCacheSize; replace this with
resource.ParseQuantity(cacheSize) (or resource.ParseQuantity on
wsConfig.BuildCacheSize) and handle the returned (quantity, error) before
constructing the PersistentVolumeClaimSpec so you can return a validation error
(or reconcile error) instead of panicking; update the pvc creation to use the
parsed Quantity variable and ensure error handling returns a clear error (or
logs and requeues) referencing BuildCacheSize, pvc/PersistentVolumeClaimSpec,
and resource.MustParse to locate the change.
- Around line 1723-1739: Resolve the requestor identity before expanding
ExtraRepos: move or add the requester resolution step (the code that sets
req.RequestedBy / calls whatever resolves the requester from the request
context) to run before the block that calls resolveExtraRepos, or change
resolveExtraRepos(signature) to accept the resolved requester and use it to
enforce ws.Spec.Owner; then call getClientFromRequest/getRESTConfigFromRequest
and a.resolveExtraRepos only after req.RequestedBy is set so extra-repo
expansion cannot access another user's workspace. Ensure you reference
resolveExtraRepos and req.ExtraRepos when making the change and enforce the
owner check inside resolveExtraRepos if you add the requester parameter.
- Around line 1534-1540: The current code constructs a shell command with
repoPath interpolated into "/bin/sh -c" (cmd variable) which is
command-injection prone and leaves a fire-and-forget process unmanaged; fix by
invoking the HTTP server without a shell (e.g. call python3 via exec form: cmd
:= []string{"python3","-m","http.server", fmt.Sprintf("%d", port),
"--directory", repoPath}) so repoPath is passed as an argument, then update
podExec usage to use that cmd (and workspaceContainerName/wsName references) and
implement lifecycle/health management: before starting check/terminate any
existing repo server (e.g. stored PID file or probe the port), start the server
in a way you can monitor (write its PID to a known file inside repoPath), and
perform a quick health-check (connect to localhost:port) before returning so
repeated builds don't return stale/conflicting URLs.
In `@internal/common/tasks/scripts/build_image.sh`:
- Around line 59-69: The cleanup currently preserves "$BUILD_DIR/image.json"
which can lead to reusing stale manifests; update the cleanup in build_image.sh
to explicitly remove the persistent manifest before reusing the cache by
deleting "$BUILD_DIR/image.json" (or any manifest file) prior to the reuse logic
so disk-mode runs using COMMON_BUILD_ARGS don't republish old metadata—ensure
the removal happens alongside the existing workspace and orphan cleanup that
references WORKSPACE_PATH and BUILD_DIR.
In `@internal/controller/operatorconfig/controller.go`:
- Around line 894-898: The deletion of the OpenShift-only
SecurityContextConstraints (securityv1.SecurityContextConstraints with name
workspaceSCCName via r.Delete called from cleanupOSBuilds()) must not surface a
"NoMatch" error on vanilla Kubernetes; either gate the SCC delete behind
detectOpenShift() or treat isNoMatchError(err) as non-fatal (similar to the
Route cleanup path) so cleanupOSBuilds() ignores NoMatch and only returns real
errors.
In `@internal/controller/operatorconfig/resources.go`:
- Around line 645-660: The RBAC rule for the root "workspaces" resource grants
unnecessary write permissions; remove the extra write verbs from the APIGroups
entry that lists Resources: []string{"workspaces"} by deleting "create" and
"delete" from its Verbs slice so that the root rule only has read/list/watch
verbs (e.g., "get","list","watch"); leave the separate "workspaces/status" and
"workspaces/finalizers" rules untouched (they cover controller write
operations).
---
Nitpick comments:
In `@Containerfile.toolchain`:
- Line 103: The RUN command using "chmod 777 /workspace /workspace/src
/workspace/cache" makes the workspace world-writable; change the RUN step that
creates the directories (the line with "RUN mkdir -p /workspace/src
/workspace/cache && chmod 777 ...") to tighten permissions (e.g., use 775 or
770) and, if relying on a Pod fsGroup, set appropriate group ownership with
chown so group access works for arbitrary UIDs; ensure the mkdir/chmod/chown
sequence preserves OpenShift compatibility while removing global write
permissions.
- Line 57: After installing EPEL in the RUN that currently executes "dnf install
-y https://dl.fedoraproject.org/pub/epel/epel-release-latest-10.noarch.rpm",
chain DNF cache cleanup to avoid leaving package metadata in the image; update
the RUN that contains this command to run dnf clean all (and remove
/var/cache/dnf) immediately after the install so the image doesn’t keep cached
package metadata.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: ebcd16a8-e975-4b57-99be-3273409dae00
⛔ Files ignored due to path filters (4)
config/crd/bases/automotive.sdv.cloud.redhat.com_imagebuilds.yamlis excluded by!config/crd/bases/**config/crd/bases/automotive.sdv.cloud.redhat.com_operatorconfigs.yamlis excluded by!config/crd/bases/**config/crd/bases/automotive.sdv.cloud.redhat.com_workspaces.yamlis excluded by!config/crd/bases/**config/rbac/role.yamlis excluded by!config/rbac/role.yaml
📒 Files selected for processing (32)
Containerfile.toolchainapi/v1alpha1/imagebuild_types.goapi/v1alpha1/operatorconfig_types.goapi/v1alpha1/workspace_types.goapi/v1alpha1/zz_generated.deepcopy.gobundle.Dockerfilecmd/caib/buildcmd/build.gocmd/caib/image/image.gocmd/caib/main.gocmd/caib/root.gocmd/caib/runtime_wiring.gocmd/caib/workspace/workspace.gocmd/main.goconfig/crd/kustomization.yamlconfig/rbac/scc_role.yamlconfig/rbac/scc_role_binding.yamlhack/deploy-catalog.shinternal/buildapi/client/client.gointernal/buildapi/client/client_test.gointernal/buildapi/server.gointernal/buildapi/types.gointernal/buildapi/workspace.gointernal/buildapi/workspace_test.gointernal/common/tasks/scripts/build_image.shinternal/common/tasks/scripts/find_manifest.shinternal/common/tasks/scripts/push_artifact.shinternal/common/tasks/tasks.gointernal/controller/imagebuild/controller.gointernal/controller/operatorconfig/controller.gointernal/controller/operatorconfig/resources.gointernal/controller/workspace/controller.gointernal/controller/workspace/controller_test.go
✅ Files skipped from review due to trivial changes (6)
- bundle.Dockerfile
- cmd/caib/main.go
- config/crd/kustomization.yaml
- config/rbac/scc_role.yaml
- internal/buildapi/workspace_test.go
- api/v1alpha1/zz_generated.deepcopy.go
🚧 Files skipped from review as they are similar to previous changes (12)
- config/rbac/scc_role_binding.yaml
- cmd/main.go
- internal/common/tasks/scripts/find_manifest.sh
- api/v1alpha1/imagebuild_types.go
- cmd/caib/buildcmd/build.go
- internal/common/tasks/tasks.go
- internal/common/tasks/scripts/push_artifact.sh
- api/v1alpha1/workspace_types.go
- api/v1alpha1/operatorconfig_types.go
- internal/controller/imagebuild/controller.go
- internal/controller/workspace/controller.go
- internal/buildapi/workspace.go
| Context("workspaceAction with URL-unsafe names", func() { | ||
| It("should properly escape workspace names in the URL path", func() { | ||
| mockServer = httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { | ||
| // The name "my app" should be escaped to "my%20app" | ||
| Expect(r.URL.Path).To(Equal("/v1/workspaces/my%20app/start")) | ||
|
|
||
| w.Header().Set("Content-Type", "application/json") | ||
| _ = json.NewEncoder(w).Encode(buildapi.WorkspaceResponse{Name: "my app"}) | ||
| })) | ||
|
|
||
| var err error | ||
| apiClient, err = New(mockServer.URL) | ||
| Expect(err).NotTo(HaveOccurred()) | ||
|
|
||
| resp, err := apiClient.StartWorkspace(context.Background(), "my app") | ||
| Expect(err).NotTo(HaveOccurred()) | ||
| Expect(resp.Name).To(Equal("my app")) |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
In Go net/http server handlers, is Request.URL.Path decoded while RequestURI or URL.EscapedPath() preserves the original percent-encoding?
💡 Result:
Yes, in Go net/http server handlers, r.Request.URL.Path is percent-decoded, while r.RequestURI preserves the original percent-encoded request-target from the HTTP request line, and r.Request.URL.EscapedPath preserves (or reconstructs a valid escaping of) the original percent-encoding.
Citations:
- 1: https://pkg.go.dev/net/url
- 2: https://pkg.go.dev/net/url@go1.25.6
- 3: https://go.dev/src/net/url/url.go?s=6331:6364
- 4: net/http: add Request.RequestURI golang/go#2782
- 5: https://go.dev/src/net/http/request.go
- 6: https://groups.google.com/g/golang-nuts/c/2pM0NbhTLSM
- 7: https://go.dev/src/net/http/request.go?h=Reader
Use r.URL.EscapedPath() to verify URL escaping in the test.
The test currently uses r.URL.Path, which Go's net/http decodes before the handler sees it. This means the test would fail for properly-escaped URLs but incorrectly pass for double-escaped ones. Use r.URL.EscapedPath() or r.RequestURI instead to capture the original percent-encoding and properly validate that the client is escaping correctly.
Suggested fix
- Expect(r.URL.Path).To(Equal("/v1/workspaces/my%20app/start"))
+ Expect(r.URL.EscapedPath()).To(Equal("/v1/workspaces/my%20app/start"))🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@internal/buildapi/client/client_test.go` around lines 233 - 249, The test is
checking URL escaping using the decoded r.URL.Path; update the assertion to
inspect the raw/escaped path instead by replacing r.URL.Path with
r.URL.EscapedPath() (or r.RequestURI) inside the handler in the
Context("workspaceAction with URL-unsafe names") test so the StartWorkspace
client call (created by New) is validated against the percent-encoded path
"/v1/workspaces/my%20app/start" rather than the decoded "/v1/workspaces/my
app/start".
| endpoint := c.resolve(path.Join("/v1/workspaces", url.PathEscape(name))) | ||
| req, err := http.NewRequestWithContext(ctx, http.MethodGet, endpoint, nil) | ||
| if err != nil { | ||
| return nil, err | ||
| } | ||
| if c.authToken != "" { | ||
| req.Header.Set("Authorization", "Bearer "+c.authToken) | ||
| } | ||
| resp, err := c.httpClient.Do(req) | ||
| if err != nil { | ||
| return nil, err | ||
| } | ||
| defer func() { _ = resp.Body.Close() }() | ||
| if resp.StatusCode != http.StatusOK { | ||
| b, _ := io.ReadAll(io.LimitReader(resp.Body, 1024)) | ||
| return nil, fmt.Errorf("get workspace failed: %s: %s", resp.Status, string(b)) | ||
| } | ||
| var ws buildapi.WorkspaceResponse | ||
| if err := json.NewDecoder(resp.Body).Decode(&ws); err != nil { | ||
| return nil, fmt.Errorf("decoding response: %w", err) | ||
| } | ||
| return &ws, nil | ||
| } | ||
|
|
||
| // StartWorkspace starts a stopped workspace. | ||
| func (c *Client) StartWorkspace(ctx context.Context, name string) (*buildapi.WorkspaceResponse, error) { | ||
| return c.workspaceAction(ctx, name, "start") | ||
| } | ||
|
|
||
| // StopWorkspace stops a running workspace, preserving its storage. | ||
| func (c *Client) StopWorkspace(ctx context.Context, name string) (*buildapi.WorkspaceResponse, error) { | ||
| return c.workspaceAction(ctx, name, "stop") | ||
| } | ||
|
|
||
| // workspaceAction performs a POST to /v1/workspaces/:name/:action and decodes a WorkspaceResponse. | ||
| func (c *Client) workspaceAction(ctx context.Context, name, action string) (*buildapi.WorkspaceResponse, error) { | ||
| endpoint := c.resolve(path.Join("/v1/workspaces", url.PathEscape(name), action)) | ||
| req, err := http.NewRequestWithContext(ctx, http.MethodPost, endpoint, nil) | ||
| if err != nil { | ||
| return nil, err | ||
| } | ||
| if c.authToken != "" { | ||
| req.Header.Set("Authorization", "Bearer "+c.authToken) | ||
| } | ||
| resp, err := c.httpClient.Do(req) | ||
| if err != nil { | ||
| return nil, err | ||
| } | ||
| defer func() { _ = resp.Body.Close() }() | ||
| if resp.StatusCode != http.StatusOK { | ||
| b, _ := io.ReadAll(io.LimitReader(resp.Body, 1024)) | ||
| return nil, fmt.Errorf("%s workspace failed: %s: %s", action, resp.Status, string(b)) | ||
| } | ||
| var ws buildapi.WorkspaceResponse | ||
| if err := json.NewDecoder(resp.Body).Decode(&ws); err != nil { | ||
| return nil, fmt.Errorf("decoding response: %w", err) | ||
| } | ||
| return &ws, nil | ||
| } | ||
|
|
||
| // DeleteWorkspace deletes a workspace by name. | ||
| func (c *Client) DeleteWorkspace(ctx context.Context, name string) error { | ||
| endpoint := c.resolve(path.Join("/v1/workspaces", url.PathEscape(name))) | ||
| req, err := http.NewRequestWithContext(ctx, http.MethodDelete, endpoint, nil) | ||
| if err != nil { | ||
| return err | ||
| } | ||
| if c.authToken != "" { | ||
| req.Header.Set("Authorization", "Bearer "+c.authToken) | ||
| } | ||
| resp, err := c.httpClient.Do(req) | ||
| if err != nil { | ||
| return err | ||
| } | ||
| defer func() { _ = resp.Body.Close() }() | ||
| if resp.StatusCode != http.StatusOK { | ||
| b, _ := io.ReadAll(io.LimitReader(resp.Body, 1024)) | ||
| return fmt.Errorf("delete workspace failed: %s: %s", resp.Status, string(b)) | ||
| } | ||
| return nil | ||
| } | ||
|
|
||
| // SyncPlan sends a file manifest and returns which files need uploading. | ||
| func (c *Client) SyncPlan(ctx context.Context, name string, req buildapi.SyncPlanRequest) (*buildapi.SyncPlanResponse, error) { | ||
| body, err := json.Marshal(req) | ||
| if err != nil { | ||
| return nil, err | ||
| } | ||
| endpoint := c.resolve(path.Join("/v1/workspaces", url.PathEscape(name), "sync", "plan")) | ||
| httpReq, err := http.NewRequestWithContext(ctx, http.MethodPost, endpoint, bytes.NewReader(body)) | ||
| if err != nil { | ||
| return nil, err | ||
| } | ||
| httpReq.Header.Set("Content-Type", "application/json") | ||
| if c.authToken != "" { | ||
| httpReq.Header.Set("Authorization", "Bearer "+c.authToken) | ||
| } | ||
| resp, err := c.httpClient.Do(httpReq) | ||
| if err != nil { | ||
| return nil, err | ||
| } | ||
| defer func() { _ = resp.Body.Close() }() | ||
| if resp.StatusCode != http.StatusOK { | ||
| b, _ := io.ReadAll(io.LimitReader(resp.Body, 1024)) | ||
| return nil, fmt.Errorf("sync plan failed: %s: %s", resp.Status, string(b)) | ||
| } | ||
| var plan buildapi.SyncPlanResponse | ||
| if err := json.NewDecoder(resp.Body).Decode(&plan); err != nil { | ||
| return nil, fmt.Errorf("decoding response: %w", err) | ||
| } | ||
| return &plan, nil | ||
| } | ||
|
|
||
| // SyncWorkspace uploads a tar stream to a workspace. | ||
| func (c *Client) SyncWorkspace(ctx context.Context, name string, body io.Reader) error { | ||
| endpoint := c.resolve(path.Join("/v1/workspaces", url.PathEscape(name), "sync")) | ||
| req, err := http.NewRequestWithContext(ctx, http.MethodPost, endpoint, body) | ||
| if err != nil { | ||
| return err | ||
| } | ||
| req.Header.Set("Content-Type", "application/octet-stream") | ||
| if c.authToken != "" { | ||
| req.Header.Set("Authorization", "Bearer "+c.authToken) | ||
| } | ||
| resp, err := c.httpClient.Do(req) | ||
| if err != nil { | ||
| return err | ||
| } | ||
| defer func() { _ = resp.Body.Close() }() | ||
| if resp.StatusCode != http.StatusOK { | ||
| b, _ := io.ReadAll(io.LimitReader(resp.Body, 1024)) | ||
| return fmt.Errorf("sync workspace failed: %s: %s", resp.Status, string(b)) | ||
| } | ||
| return nil | ||
| } | ||
|
|
||
| // workspaceStreamRequest performs a POST with a JSON body and returns the streaming response body. | ||
| func (c *Client) workspaceStreamRequest(ctx context.Context, name, action string, reqBody interface{}) (io.ReadCloser, error) { | ||
| body, err := json.Marshal(reqBody) | ||
| if err != nil { | ||
| return nil, err | ||
| } | ||
| endpoint := c.resolve(path.Join("/v1/workspaces", url.PathEscape(name), action)) | ||
| httpReq, err := http.NewRequestWithContext(ctx, http.MethodPost, endpoint, bytes.NewReader(body)) | ||
| if err != nil { | ||
| return nil, err | ||
| } | ||
| httpReq.Header.Set("Content-Type", "application/json") | ||
| if c.authToken != "" { | ||
| httpReq.Header.Set("Authorization", "Bearer "+c.authToken) | ||
| } | ||
| resp, err := c.httpClient.Do(httpReq) | ||
| if err != nil { | ||
| return nil, err | ||
| } | ||
| if resp.StatusCode != http.StatusOK { | ||
| b, _ := io.ReadAll(io.LimitReader(resp.Body, 1024)) | ||
| _ = resp.Body.Close() | ||
| return nil, fmt.Errorf("%s workspace failed: %s: %s", action, resp.Status, string(b)) | ||
| } | ||
| return resp.Body, nil | ||
| } | ||
|
|
||
| // ExecWorkspace executes a command in a workspace and returns the streaming output. | ||
| func (c *Client) ExecWorkspace(ctx context.Context, name string, req buildapi.WorkspaceExecRequest) (io.ReadCloser, error) { | ||
| return c.workspaceStreamRequest(ctx, name, "exec", req) | ||
| } | ||
|
|
||
| // DeployWorkspace deploys an artifact from a workspace to a board and returns the streaming output. | ||
| func (c *Client) DeployWorkspace(ctx context.Context, name string, req buildapi.WorkspaceDeployRequest) (io.ReadCloser, error) { | ||
| return c.workspaceStreamRequest(ctx, name, "deploy", req) | ||
| } | ||
|
|
||
| // ShellWorkspace opens an interactive shell via WebSocket. | ||
| // Returns the raw WebSocket connection for the caller to read/write. | ||
| func (c *Client) ShellWorkspace(ctx context.Context, name string) (*websocket.Conn, error) { | ||
| endpoint := c.resolve(path.Join("/v1/workspaces", url.PathEscape(name), "shell")) | ||
| // Convert http(s) to ws(s) |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
In Go net/url, if url.URL.Path already contains a percent-escaped segment like "my%20app" and RawPath is empty, does URL.String() emit "my%2520app"? Is url.JoinPath the correct way to assemble escaped path segments?
💡 Result:
Yes, if u.Path contains a percent-encoded segment like "my%20app" (which decodes to "my app") and u.RawPath is empty, then u.String emits "my%2520app". This is because Path is always stored decoded, String calls EscapedPath, and EscapedPath re-escapes the decoded Path using the path segment rules (encodePath), which escapes space as %20, but since Path has the literal '%' followed by non-hex '2' and '0', the '%' is escaped to %25, resulting in %2520. To preserve an already-escaped path like "/my%20app" without double-escaping, set both u.Path = url.PathUnescape("/my%20app") ("/my app") and u.RawPath = "/my%20app". Then EscapedPath validates RawPath as matching the escaped form of Path and uses it directly, avoiding re-escaping. For assembling escaped path segments correctly (treating each as a single segment), url.JoinPath (Go 1.19+) is the correct way: it properly escapes each elem with PathEscape and joins them, handling the segment boundaries properly. Manual concatenation into Path leads to incorrect splitting/escaping as shown in issues.
Citations:
- 1: https://pkg.go.dev/net/url@go1.25.6
- 2: net/url: Path and RawPath are hard to use golang/go#17340
- 3: https://stackoverflow.com/questions/39270633/why-does-url-parse-not-populate-url-rawpath
- 4: https://pkg.go.dev/net/url@go1.20.4
- 5: net/http: double escape of URL golang/go#25208
- 6: https://go.dev/src/net/url/url.go?s=6331:6364
🏁 Script executed:
cd /tmp && find . -name "client.go" -path "*/buildapi/client/*" 2>/dev/null | head -5Repository: centos-automotive-suite/automotive-dev-operator
Length of output: 73
🏁 Script executed:
rg -A 10 "func \(c \*Client\) resolve" --type goRepository: centos-automotive-suite/automotive-dev-operator
Length of output: 722
🏁 Script executed:
rg -B 5 -A 20 "type Client struct" internal/buildapi/client/client.goRepository: centos-automotive-suite/automotive-dev-operator
Length of output: 867
🏁 Script executed:
rg -C 5 "url.PathEscape" internal/buildapi/client/client.go | head -100Repository: centos-automotive-suite/automotive-dev-operator
Length of output: 3629
🏁 Script executed:
cat -n internal/buildapi/client/client.go | sed -n '1,150p' | tail -50Repository: centos-automotive-suite/automotive-dev-operator
Length of output: 2127
🏁 Script executed:
# Check if there are tests for the client to understand the expected behavior
fd -e "test.go" -e "_test.go" internal/buildapi/client/Repository: centos-automotive-suite/automotive-dev-operator
Length of output: 496
🏁 Script executed:
rg -B 3 -A 3 "New(" internal/buildapi/client/client.go | head -30Repository: centos-automotive-suite/automotive-dev-operator
Length of output: 149
🏁 Script executed:
# Check how the Client is instantiated elsewhere to understand the baseURL format
rg "client\.New\(" --type go -A 2 -B 2 | head -50Repository: centos-automotive-suite/automotive-dev-operator
Length of output: 3927
🏁 Script executed:
# Also check the resolve() implementation one more time with more context
rg -B 2 -A 15 "func \(c \*Client\) resolve" internal/buildapi/client/client.goRepository: centos-automotive-suite/automotive-dev-operator
Length of output: 646
Fix double-escaping of workspace names in URL path segments.
These call sites build a percent-escaped segment and then pass it to resolve(), which stores it in url.URL.Path. When URL.String() is called, the percent sign in the literal string my%20app gets escaped again, producing .../my%2520app/... on the wire. Since this PR adds URL-unsafe-name coverage, switch to url.JoinPath() for all affected workspace endpoints to avoid pre-escaping:
Suggested pattern
- endpoint := c.resolve(path.Join("/v1/workspaces", url.PathEscape(name)))
+ endpoint, err := url.JoinPath(c.baseURL.String(), "/v1/workspaces", name)
+ if err != nil {
+ return nil, err
+ }Apply to all workspace methods that combine PathEscape with resolve(): GetWorkspace, StartWorkspace, StopWorkspace, workspaceAction, DeleteWorkspace, SyncPlan, SyncWorkspace, workspaceStreamRequest, and ShellWorkspace.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@internal/buildapi/client/client.go` around lines 738 - 915, The code
double-escapes workspace names by using url.PathEscape + path.Join before
calling c.resolve; replace those constructions with url.JoinPath (or
URL.JoinPath) to compose the path segments without pre-escaping so the workspace
name is encoded exactly once. Update endpoint construction in GetWorkspace,
StartWorkspace, StopWorkspace, workspaceAction, DeleteWorkspace, SyncPlan,
SyncWorkspace, workspaceStreamRequest, and ShellWorkspace to call
url.JoinPath("/v1/workspaces", name, ...) (or the appropriate variants) and pass
that result to c.resolve, removing manual url.PathEscape usage.
| dialer := websocket.Dialer{} | ||
| if t, ok := c.httpClient.Transport.(*http.Transport); ok && t.TLSClientConfig != nil { | ||
| dialer.TLSClientConfig = t.TLSClientConfig | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
What defaults does github.com/gorilla/websocket.DefaultDialer set compared with a zero-value websocket.Dialer? Does the zero value omit ProxyFromEnvironment and the default handshake timeout?
💡 Result:
The github.com/gorilla/websocket.DefaultDialer sets Proxy to http.ProxyFromEnvironment and HandshakeTimeout to 45 * time.Second. All other fields use their zero values. A zero-value websocket.Dialer has nil for all pointer/func fields (NetDial, NetDialContext, Proxy, etc. are nil; TLSClientConfig is nil), zero for time.Duration (HandshakeTimeout is 0), empty slices/arrays, and zero for ints. Yes, the zero-value omits ProxyFromEnvironment (Proxy is nil) and has no handshake timeout (HandshakeTimeout is 0). DefaultDialer explicitly sets these two fields to non-zero values while leaving others at zero.
Citations:
- 1: https://chromium.googlesource.com/external/github.com/gorilla/websocket/+/refs/tags/v1.3.0/client.go
- 2: https://github.com/gorilla/websocket/blob/main/client.go
🏁 Script executed:
# First, let's verify the exact code in the file at the specified lines
cat -n internal/buildapi/client/client.go | sed -n '920,930p'Repository: centos-automotive-suite/automotive-dev-operator
Length of output: 486
🏁 Script executed:
# Also check the surrounding context to understand what this code does
cat -n internal/buildapi/client/client.go | sed -n '910,950p'Repository: centos-automotive-suite/automotive-dev-operator
Length of output: 1351
🏁 Script executed:
# Let's also check if websocket is imported and how
head -50 internal/buildapi/client/client.go | grep -E "import|websocket"Repository: centos-automotive-suite/automotive-dev-operator
Length of output: 131
Use websocket.DefaultDialer to preserve proxy resolution and handshake timeout for shell sessions.
A zero-value websocket.Dialer drops Gorilla's default Proxy (set to http.ProxyFromEnvironment) and HandshakeTimeout (set to 45 seconds), causing shell connections to hang when proxies are configured and lose reliable timeout protection. Clone websocket.DefaultDialer first, then overlay the HTTP transport's TLS and proxy settings.
Suggested fix
- dialer := websocket.Dialer{}
- if t, ok := c.httpClient.Transport.(*http.Transport); ok && t.TLSClientConfig != nil {
- dialer.TLSClientConfig = t.TLSClientConfig
- }
+ dialer := *websocket.DefaultDialer
+ if t, ok := c.httpClient.Transport.(*http.Transport); ok {
+ dialer.TLSClientConfig = t.TLSClientConfig
+ if t.Proxy != nil {
+ dialer.Proxy = t.Proxy
+ }
+ }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@internal/buildapi/client/client.go` around lines 923 - 926, The current code
constructs a zero-value websocket.Dialer (dialer := websocket.Dialer{}) which
drops Gorilla's default Proxy and HandshakeTimeout; instead clone
websocket.DefaultDialer first and then overlay TLS and proxy settings from the
HTTP transport: get t := c.httpClient.Transport.(*http.Transport) (if ok), copy
dialer := *websocket.DefaultDialer, set dialer.TLSClientConfig =
t.TLSClientConfig if non-nil and ensure dialer.Proxy is set from t.Proxy (or
http.ProxyFromEnvironment) so proxy resolution and HandshakeTimeout are
preserved for shell sessions.
| // Find existing build-cache PVC via labels, or create a new one | ||
| buildCacheLabels := map[string]string{ | ||
| "automotive.sdv.cloud.redhat.com/workspace": wsName, | ||
| "app.kubernetes.io/component": "build-cache", | ||
| } | ||
| pvcList := &corev1.PersistentVolumeClaimList{} | ||
| if err := k8sClient.List(ctx, pvcList, | ||
| client.InNamespace(namespace), | ||
| client.MatchingLabels(buildCacheLabels), | ||
| ); err != nil { | ||
| return "", fmt.Errorf("listing build-cache PVCs: %w", err) | ||
| } | ||
| for i := range pvcList.Items { | ||
| if pvcList.Items[i].DeletionTimestamp == nil { | ||
| return pvcList.Items[i].Name, nil | ||
| } | ||
| } | ||
|
|
||
| // Determine cache PVC size and storage class from OperatorConfig | ||
| cacheSize := "20Gi" | ||
| var storageClassName *string | ||
| if wsConfig != nil { | ||
| if wsConfig.BuildCacheSize != "" { | ||
| cacheSize = wsConfig.BuildCacheSize | ||
| } | ||
| if sc := wsConfig.GetStorageClass(); sc != "" { | ||
| storageClassName = &sc | ||
| } | ||
| } | ||
|
|
||
| pvc := &corev1.PersistentVolumeClaim{ | ||
| ObjectMeta: metav1.ObjectMeta{ | ||
| GenerateName: wsName + "-build-cache-", | ||
| Namespace: namespace, | ||
| Labels: buildCacheLabels, | ||
| OwnerReferences: []metav1.OwnerReference{ | ||
| { | ||
| APIVersion: automotivev1alpha1.GroupVersion.String(), | ||
| Kind: "Workspace", | ||
| Name: ws.Name, | ||
| UID: ws.UID, | ||
| }, | ||
| }, | ||
| }, | ||
| Spec: corev1.PersistentVolumeClaimSpec{ | ||
| AccessModes: []corev1.PersistentVolumeAccessMode{corev1.ReadWriteOnce}, | ||
| StorageClassName: storageClassName, | ||
| Resources: corev1.VolumeResourceRequirements{ | ||
| Requests: corev1.ResourceList{ | ||
| corev1.ResourceStorage: resource.MustParse(cacheSize), | ||
| }, | ||
| }, | ||
| }, | ||
| } | ||
| if err := k8sClient.Create(ctx, pvc); err != nil { | ||
| return "", fmt.Errorf("creating build-cache PVC: %w", err) | ||
| } |
There was a problem hiding this comment.
Use one deterministic build-cache PVC per workspace.
The list-then-GenerateName flow is TOCTOU: two concurrent builds for the same workspace can both miss the existing claim and create separate caches. That fragments warm-cache state and leaves extra PVCs behind.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@internal/buildapi/server.go` around lines 1624 - 1680, The current
list-then-GenerateName flow (buildCacheLabels, pvc with GenerateName,
k8sClient.Create) is TOCTOU; replace it with a deterministic PVC name (e.g.
wsName+"-build-cache") and create-or-get semantics: construct the PVC Spec but
set a fixed ObjectMeta.Name instead of GenerateName, attempt
k8sClient.Create(ctx, pvc), and if Create returns AlreadyExists call
k8sClient.Get to retrieve and return the existing PVC name (or otherwise
reconcile/update the existing PVC’s spec/storageClass/size if needed); ensure
you still set the same OwnerReferences and labels so lookups by workspace remain
deterministic.
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (3)
internal/buildapi/client/client_test.go (1)
235-237:⚠️ Potential issue | 🟡 MinorUse
r.URL.EscapedPath()to verify URL escaping.
r.URL.Pathis decoded bynet/http, so the assertion will see/v1/workspaces/my app/startrather than the escaped form. This means the test passes regardless of whether the client actually escapes the URL correctly.,
Suggested fix
- Expect(r.URL.Path).To(Equal("/v1/workspaces/my%20app/start")) + Expect(r.URL.EscapedPath()).To(Equal("/v1/workspaces/my%20app/start"))🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/buildapi/client/client_test.go` around lines 235 - 237, The test's mock server handler is asserting on r.URL.Path which is decoded by net/http; change the assertion to use r.URL.EscapedPath() so the test verifies the client is sending the escaped path (update the Expect call inside the httptest.NewServer handler where r is inspected, e.g., the handler that currently checks "/v1/workspaces/my%20app/start"). Ensure any other path assertions in the same test use EscapedPath() as well.internal/controller/imagebuild/controller.go (1)
765-772:⚠️ Potential issue | 🟠 MajorBuildCachePVC still replaces entire shared-workspace — past concern unaddressed.
The logic binds
BuildCachePVCdirectly asshared-workspace, which overwrites any per-build upload content (fromInputFilesServer). If bothBuildCachePVCandInputFilesServerare set, uploadedsource_pathfiles will not be visible to the build. Consider either:
- Rejecting the combination up-front with a validation error, or
- Mounting
BuildCachePVCas a separate workspace (e.g.,build-cache) while keeping the per-build shared workspace intact.,
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/controller/imagebuild/controller.go` around lines 765 - 772, The current logic binds imageBuild.Spec.BuildCachePVC directly to the "shared-workspace" (sharedWorkspaceBinding), which overwrites per-build uploads from InputFilesServer; change this so that when both imageBuild.Spec.BuildCachePVC and InputFilesServer are set you do not replace "shared-workspace" but instead add a separate workspace binding (e.g., Name: "build-cache") using the PVC (PersistentVolumeClaimVolumeSource with ClaimName: imageBuild.Spec.BuildCachePVC) and leave the existing "shared-workspace" binding for per-build uploads; update the code paths that construct sharedWorkspaceBinding and any workspace list to append the new build-cache binding instead of assigning it to "shared-workspace", or alternatively add validation to reject the combination up-front if you prefer explicit failure.internal/controller/operatorconfig/controller.go (1)
894-898:⚠️ Potential issue | 🟠 MajorSCC cleanup may fail on non-OpenShift clusters.
cleanupWorkspaceInfradeletes theSecurityContextConstraintsobject unconditionally. On non-OpenShift clusters, this will return a "no matches for kind" error since the SCC API doesn't exist. AddisNoMatchError(err)to the error check to avoid blocking cleanup.Suggested fix
scc := &securityv1.SecurityContextConstraints{} scc.Name = workspaceSCCName - if err := r.Delete(ctx, scc); err != nil && !errors.IsNotFound(err) { + if err := r.Delete(ctx, scc); err != nil && !errors.IsNotFound(err) && !isNoMatchError(err) { return fmt.Errorf("failed to delete workspace SCC: %w", err) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/controller/operatorconfig/controller.go` around lines 894 - 898, The SCC deletion in cleanupWorkspaceInfra unconditionally calls r.Delete on a securityv1.SecurityContextConstraints (named by workspaceSCCName) and treats any error other than IsNotFound as fatal; update the error check to also ignore "no matches for kind" errors by adding isNoMatchError(err) to the conditional (i.e., treat both errors.IsNotFound(err) and isNoMatchError(err) as non-fatal) so cleanup proceeds on non-OpenShift clusters.
🧹 Nitpick comments (4)
internal/buildapi/workspace_test.go (1)
63-87: Consider adding authenticated happy-path tests.The current tests validate that routes exist and require authentication, but don't test actual endpoint behavior with valid credentials. Consider adding tests that mock the Kubernetes client to verify successful create/exec/deploy operations.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/buildapi/workspace_test.go` around lines 63 - 87, Add authenticated happy-path tests for the Create Workspace flow: create a test in the same Describe/Context ("Create Workspace") that builds a valid WorkspaceRequest JSON, sets a valid auth header (matching how server expects authentication on server.router), and calls server.router.ServeHTTP to exercise the handler instead of only unauthenticated checks; mock the Kubernetes client used by the workspace handlers (the mock implementation your codebase uses for k8s interactions) to return successful responses for create/exec/deploy operations and assert the handler returns the expected success status (e.g., HTTP 201/200) and response body. Ensure the test references WorkspaceRequest, server.router.ServeHTTP, and the route "/v1/workspaces" so it targets the same handler, and injects the mocked k8s client into the server/test setup before invoking the request.internal/controller/operatorconfig/resources.go (1)
956-1003: Workspace SCC grants extensive privileges — verify necessity.The SCC allows
AllowPrivilegedContainer: true,AllowedCapabilities: ALL, andRunAsUserStrategyRunAsAny. Based on the context snippet showing workspace pods run withPrivileged: trueandRunAsUser: 0forsudo dnf installsupport, this appears intentional but represents a significant security surface.Consider documenting these requirements and exploring alternatives like
UserNamespaceLevel: NamespaceLevelRequirePod(which is set) with rootless podman for future improvements.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/controller/operatorconfig/resources.go` around lines 956 - 1003, The workspace SCC created by buildWorkspaceSCC currently grants broad privileges (AllowPrivilegedContainer: true, AllowedCapabilities: "ALL", RunAsUserStrategyRunAsAny) which increases risk; add an inline comment in buildWorkspaceSCC documenting why each broad setting is required for workspace pods (e.g., privileged true and UID 0 for sudo dnf installs), enumerate the exact capabilities needed instead of "ALL" if possible, and add a TODO/issue pointer to tighten RunAsUserStrategyRunAsAny to a restricted range or use rootless alternatives later; keep UserNamespaceLevel: NamespaceLevelRequirePod but note it as a mitigation in the comment and also add the same rationale to the operator's user-facing docs/config so reviewers know these are intentional choices.internal/controller/workspace/controller.go (1)
378-388:resourcesOrDefaultsusesMustParsewhich can panic.While the values are hardcoded constants, using
MustParsein runtime code is generally discouraged. Since these are compile-time constants, this is acceptable but could be refactored to use package-level variables initialized at startup.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/controller/workspace/controller.go` around lines 378 - 388, resourcesOrDefaults currently calls resource.MustParse which can panic at runtime; refactor to use package-level variables (e.g., defaultCPUQty, defaultMemoryQty, defaultResources) initialized once at startup using resource.ParseQuantity (handle parse errors in init() by logging/failing early) and then return those prebuilt defaults from resourcesOrDefaults when r is nil; update the function to reference the package-level defaultResources and keep the existing behavior for non-nil r.internal/controller/workspace/controller_test.go (1)
359-409: VerifybuildPodhandles niloperatorConfigin test.Line 377 passes
nilforoperatorConfig, which exercises the fallback path inbuildPod. This is good coverage. Consider adding a test case with a non-niloperatorConfigto verify the config merging logic.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/controller/workspace/controller_test.go` around lines 359 - 409, Test currently only calls Reconciler.buildPod with a nil operatorConfig and doesn't exercise the config-merge path; add a parallel subtest that constructs a non-nil operatorConfig (set fields that would interact with Workspace, e.g., Architecture override, Image, TmpfsBuildDir or extra env/volumes) and call r.buildPod(ws, operatorConfig) to assert merged behavior (node affinity uses the resulting architecture, JMP_LEASE env still present, tmpfs volume preserved/overridden as expected, image resolved correctly). Use the same TestBuildPod_PreservesWorkspaceConfig test to add this case (or a new TestBuildPod_WithOperatorConfig) and reference Reconciler.buildPod and operatorConfig merge logic when adding assertions. Ensure the new assertions verify both preserved Workspace fields and operatorConfig overrides.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@Containerfile.toolchain`:
- Line 57: The RUN that installs EPEL in Containerfile.toolchain currently
leaves cached metadata in the image; update the RUN that contains "dnf install
-y https://dl.fedoraproject.org/pub/epel/epel-release-latest-10.noarch.rpm" to
run dnf clean all as part of the same RUN (e.g., append && dnf clean all) so the
cache is removed in the same layer and image size is reduced.
In `@internal/controller/workspace/controller.go`:
- Around line 306-310: Add a concise explanatory comment immediately above the
SecurityContext block (the struct setting Privileged: ptr.To(true),
AllowPrivilegeEscalation: ptr.To(true), ProcMount:
ptr.To(corev1.UnmaskedProcMount)) describing why privileged mode is required:
enable podman-in-container with rootless user (uid 1000), allow overlayfs mounts
for persistent package installation, and permit user namespace setup; reference
the existing container command justification so readers know this mirrors that
rationale.
---
Duplicate comments:
In `@internal/buildapi/client/client_test.go`:
- Around line 235-237: The test's mock server handler is asserting on r.URL.Path
which is decoded by net/http; change the assertion to use r.URL.EscapedPath() so
the test verifies the client is sending the escaped path (update the Expect call
inside the httptest.NewServer handler where r is inspected, e.g., the handler
that currently checks "/v1/workspaces/my%20app/start"). Ensure any other path
assertions in the same test use EscapedPath() as well.
In `@internal/controller/imagebuild/controller.go`:
- Around line 765-772: The current logic binds imageBuild.Spec.BuildCachePVC
directly to the "shared-workspace" (sharedWorkspaceBinding), which overwrites
per-build uploads from InputFilesServer; change this so that when both
imageBuild.Spec.BuildCachePVC and InputFilesServer are set you do not replace
"shared-workspace" but instead add a separate workspace binding (e.g., Name:
"build-cache") using the PVC (PersistentVolumeClaimVolumeSource with ClaimName:
imageBuild.Spec.BuildCachePVC) and leave the existing "shared-workspace" binding
for per-build uploads; update the code paths that construct
sharedWorkspaceBinding and any workspace list to append the new build-cache
binding instead of assigning it to "shared-workspace", or alternatively add
validation to reject the combination up-front if you prefer explicit failure.
In `@internal/controller/operatorconfig/controller.go`:
- Around line 894-898: The SCC deletion in cleanupWorkspaceInfra unconditionally
calls r.Delete on a securityv1.SecurityContextConstraints (named by
workspaceSCCName) and treats any error other than IsNotFound as fatal; update
the error check to also ignore "no matches for kind" errors by adding
isNoMatchError(err) to the conditional (i.e., treat both errors.IsNotFound(err)
and isNoMatchError(err) as non-fatal) so cleanup proceeds on non-OpenShift
clusters.
---
Nitpick comments:
In `@internal/buildapi/workspace_test.go`:
- Around line 63-87: Add authenticated happy-path tests for the Create Workspace
flow: create a test in the same Describe/Context ("Create Workspace") that
builds a valid WorkspaceRequest JSON, sets a valid auth header (matching how
server expects authentication on server.router), and calls
server.router.ServeHTTP to exercise the handler instead of only unauthenticated
checks; mock the Kubernetes client used by the workspace handlers (the mock
implementation your codebase uses for k8s interactions) to return successful
responses for create/exec/deploy operations and assert the handler returns the
expected success status (e.g., HTTP 201/200) and response body. Ensure the test
references WorkspaceRequest, server.router.ServeHTTP, and the route
"/v1/workspaces" so it targets the same handler, and injects the mocked k8s
client into the server/test setup before invoking the request.
In `@internal/controller/operatorconfig/resources.go`:
- Around line 956-1003: The workspace SCC created by buildWorkspaceSCC currently
grants broad privileges (AllowPrivilegedContainer: true, AllowedCapabilities:
"ALL", RunAsUserStrategyRunAsAny) which increases risk; add an inline comment in
buildWorkspaceSCC documenting why each broad setting is required for workspace
pods (e.g., privileged true and UID 0 for sudo dnf installs), enumerate the
exact capabilities needed instead of "ALL" if possible, and add a TODO/issue
pointer to tighten RunAsUserStrategyRunAsAny to a restricted range or use
rootless alternatives later; keep UserNamespaceLevel: NamespaceLevelRequirePod
but note it as a mitigation in the comment and also add the same rationale to
the operator's user-facing docs/config so reviewers know these are intentional
choices.
In `@internal/controller/workspace/controller_test.go`:
- Around line 359-409: Test currently only calls Reconciler.buildPod with a nil
operatorConfig and doesn't exercise the config-merge path; add a parallel
subtest that constructs a non-nil operatorConfig (set fields that would interact
with Workspace, e.g., Architecture override, Image, TmpfsBuildDir or extra
env/volumes) and call r.buildPod(ws, operatorConfig) to assert merged behavior
(node affinity uses the resulting architecture, JMP_LEASE env still present,
tmpfs volume preserved/overridden as expected, image resolved correctly). Use
the same TestBuildPod_PreservesWorkspaceConfig test to add this case (or a new
TestBuildPod_WithOperatorConfig) and reference Reconciler.buildPod and
operatorConfig merge logic when adding assertions. Ensure the new assertions
verify both preserved Workspace fields and operatorConfig overrides.
In `@internal/controller/workspace/controller.go`:
- Around line 378-388: resourcesOrDefaults currently calls resource.MustParse
which can panic at runtime; refactor to use package-level variables (e.g.,
defaultCPUQty, defaultMemoryQty, defaultResources) initialized once at startup
using resource.ParseQuantity (handle parse errors in init() by logging/failing
early) and then return those prebuilt defaults from resourcesOrDefaults when r
is nil; update the function to reference the package-level defaultResources and
keep the existing behavior for non-nil r.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: ee38f7a9-10e3-41ad-b1db-cef2e537002c
⛔ Files ignored due to path filters (4)
config/crd/bases/automotive.sdv.cloud.redhat.com_imagebuilds.yamlis excluded by!config/crd/bases/**config/crd/bases/automotive.sdv.cloud.redhat.com_operatorconfigs.yamlis excluded by!config/crd/bases/**config/crd/bases/automotive.sdv.cloud.redhat.com_workspaces.yamlis excluded by!config/crd/bases/**config/rbac/role.yamlis excluded by!config/rbac/role.yaml
📒 Files selected for processing (32)
Containerfile.toolchainapi/v1alpha1/imagebuild_types.goapi/v1alpha1/operatorconfig_types.goapi/v1alpha1/workspace_types.goapi/v1alpha1/zz_generated.deepcopy.gobundle.Dockerfilecmd/caib/buildcmd/build.gocmd/caib/image/image.gocmd/caib/main.gocmd/caib/root.gocmd/caib/runtime_wiring.gocmd/caib/workspace/workspace.gocmd/main.goconfig/crd/kustomization.yamlconfig/rbac/scc_role.yamlconfig/rbac/scc_role_binding.yamlhack/deploy-catalog.shinternal/buildapi/client/client.gointernal/buildapi/client/client_test.gointernal/buildapi/server.gointernal/buildapi/types.gointernal/buildapi/workspace.gointernal/buildapi/workspace_test.gointernal/common/tasks/scripts/build_image.shinternal/common/tasks/scripts/find_manifest.shinternal/common/tasks/scripts/push_artifact.shinternal/common/tasks/tasks.gointernal/controller/imagebuild/controller.gointernal/controller/operatorconfig/controller.gointernal/controller/operatorconfig/resources.gointernal/controller/workspace/controller.gointernal/controller/workspace/controller_test.go
✅ Files skipped from review due to trivial changes (8)
- cmd/caib/main.go
- config/rbac/scc_role_binding.yaml
- bundle.Dockerfile
- config/crd/kustomization.yaml
- hack/deploy-catalog.sh
- config/rbac/scc_role.yaml
- internal/buildapi/types.go
- internal/common/tasks/tasks.go
🚧 Files skipped from review as they are similar to previous changes (10)
- cmd/main.go
- internal/common/tasks/scripts/push_artifact.sh
- api/v1alpha1/imagebuild_types.go
- cmd/caib/image/image.go
- api/v1alpha1/workspace_types.go
- api/v1alpha1/operatorconfig_types.go
- internal/common/tasks/scripts/find_manifest.sh
- internal/buildapi/server.go
- api/v1alpha1/zz_generated.deepcopy.go
- internal/common/tasks/scripts/build_image.sh
| FROM autosd-base | ||
|
|
||
| # Enable EPEL for mock | ||
| RUN dnf install -y https://dl.fedoraproject.org/pub/epel/epel-release-latest-10.noarch.rpm |
There was a problem hiding this comment.
Add dnf clean all after EPEL installation to reduce image size.
Static analysis flagged this line. The EPEL install doesn't clean up afterwards, leaving cached metadata in the layer.
Proposed fix
# Enable EPEL for mock
-RUN dnf install -y https://dl.fedoraproject.org/pub/epel/epel-release-latest-10.noarch.rpm
+RUN dnf install -y https://dl.fedoraproject.org/pub/epel/epel-release-latest-10.noarch.rpm && dnf clean all📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| RUN dnf install -y https://dl.fedoraproject.org/pub/epel/epel-release-latest-10.noarch.rpm | |
| # Enable EPEL for mock | |
| RUN dnf install -y https://dl.fedoraproject.org/pub/epel/epel-release-latest-10.noarch.rpm && dnf clean all |
🧰 Tools
🪛 Trivy (0.69.3)
[error] 57-57: 'dnf clean all' missing
'dnf clean all' is missed: dnf install -y https://dl.fedoraproject.org/pub/epel/epel-release-latest-10.noarch.rpm
Rule: DS-0019
(IaC/Dockerfile)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@Containerfile.toolchain` at line 57, The RUN that installs EPEL in
Containerfile.toolchain currently leaves cached metadata in the image; update
the RUN that contains "dnf install -y
https://dl.fedoraproject.org/pub/epel/epel-release-latest-10.noarch.rpm" to run
dnf clean all as part of the same RUN (e.g., append && dnf clean all) so the
cache is removed in the same layer and image size is reduced.
| SecurityContext: &corev1.SecurityContext{ | ||
| Privileged: ptr.To(true), | ||
| AllowPrivilegeEscalation: ptr.To(true), | ||
| ProcMount: ptr.To(corev1.UnmaskedProcMount), | ||
| }, |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Verify the workspace SCC and service account are properly configured
rg -n "workspaceSCCName\|ado-workspace" internal/controller/operatorconfig/Repository: centos-automotive-suite/automotive-dev-operator
Length of output: 73
🏁 Script executed:
# Examine the workspace controller file around the mentioned lines
head -n 350 internal/controller/workspace/controller.go | tail -n 100Repository: centos-automotive-suite/automotive-dev-operator
Length of output: 4657
🏁 Script executed:
# Search for SCC (SecurityContextConstraints) definitions and ado-workspace references
rg -i "scc|securitycontextconstraints|ado-workspace|privileged" --type go -n | head -100Repository: centos-automotive-suite/automotive-dev-operator
Length of output: 15039
🏁 Script executed:
# Look for YAML manifests or configs that might define the SCC
find . -type f \( -name "*.yaml" -o -name "*.yml" \) -exec grep -l "ado-workspace-scc\|SecurityContextConstraints" {} \;Repository: centos-automotive-suite/automotive-dev-operator
Length of output: 209
🏁 Script executed:
# Examine the SCC role file
cat ./config/rbac/scc_role.yamlRepository: centos-automotive-suite/automotive-dev-operator
Length of output: 347
🏁 Script executed:
# Search for any workspace-related SCC or service account configuration
find ./config -type f \( -name "*.yaml" -o -name "*.yml" \) -exec grep -l "workspace\|ado-workspace" {} \;Repository: centos-automotive-suite/automotive-dev-operator
Length of output: 501
🏁 Script executed:
# Look for documentation or comments about privileged pod requirements
grep -r "podman\|overlayfs\|privilege" config/ --include="*.yaml" --include="*.yml" -B 2 -A 2Repository: centos-automotive-suite/automotive-dev-operator
Length of output: 1478
🏁 Script executed:
# View the SCC role binding to see which service accounts it applies to
cat ./config/rbac/scc_role_binding.yamlRepository: centos-automotive-suite/automotive-dev-operator
Length of output: 363
🏁 Script executed:
# Search for SecurityContextConstraints resource definitions in the entire repo
find . -type f \( -name "*.yaml" -o -name "*.yml" \) -exec grep -l "SecurityContextConstraints\|kind.*SCC" {} \; | grep -v vendorRepository: centos-automotive-suite/automotive-dev-operator
Length of output: 73
🏁 Script executed:
# Search for any code that creates or configures the ado-workspace-scc
rg "ado-workspace-scc" --type go -B 3 -A 3Repository: centos-automotive-suite/automotive-dev-operator
Length of output: 733
🏁 Script executed:
# Find where workspaceSCCName is used
rg "workspaceSCCName" --type go -B 5 -A 5Repository: centos-automotive-suite/automotive-dev-operator
Length of output: 3819
🏁 Script executed:
# Search for SCC creation or configuration code
rg "SecurityContextConstraints\|CreateSecurityContextConstraints" --type go -B 3 -A 10Repository: centos-automotive-suite/automotive-dev-operator
Length of output: 73
🏁 Script executed:
# Look for any helm charts or kustomize overlays that might define the SCC
find . -type f \( -name "*.yaml" -o -name "*.yml" \) -path "*/kustomize/*" -o -path "*/helm/*" | head -20Repository: centos-automotive-suite/automotive-dev-operator
Length of output: 73
🏁 Script executed:
# Get the complete buildWorkspaceSCC function
rg "func.*buildWorkspaceSCC" --type go -A 50Repository: centos-automotive-suite/automotive-dev-operator
Length of output: 9926
🏁 Script executed:
# Also check what imports are needed for securityv1 to understand the SCC type
head -n 50 internal/controller/operatorconfig/resources.go | grep -E "^import|securityv1"Repository: centos-automotive-suite/automotive-dev-operator
Length of output: 150
Add explanatory comment to SecurityContext documenting privileged mode requirement.
The SCC (ado-workspace-scc) is correctly configured with AllowPrivilegedContainer=true and RunAsUserStrategyRunAsAny. However, add a comment to the SecurityContext block (lines 306-310) explaining the purpose: privileged mode is required for podman-in-container with rootless user support (uid 1000), overlayfs mounts for persistent package installation, and user namespace setup. The justification is documented in the container command but should also be noted at the SecurityContext definition for clarity.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@internal/controller/workspace/controller.go` around lines 306 - 310, Add a
concise explanatory comment immediately above the SecurityContext block (the
struct setting Privileged: ptr.To(true), AllowPrivilegeEscalation: ptr.To(true),
ProcMount: ptr.To(corev1.UnmaskedProcMount)) describing why privileged mode is
required: enable podman-in-container with rootless user (uid 1000), allow
overlayfs mounts for persistent package installation, and permit user namespace
setup; reference the existing container command justification so readers know
this mirrors that rationale.
There was a problem hiding this comment.
♻️ Duplicate comments (5)
internal/buildapi/workspace.go (3)
220-243:⚠️ Potential issue | 🟠 MajorCreate Jumpstarter secret only after workspace uniqueness check succeeds.
The secret is created (lines 229-242) before the workspace existence check (lines 246-259). If a workspace with the same name already exists, the secret is orphaned. Additionally, treating
AlreadyExistsas success (line 239) could silently bind a stale secret to a new workspace.Move secret creation after the uniqueness check passes.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/buildapi/workspace.go` around lines 220 - 243, Move the Jumpstarter secret creation (the block that decodes req.ClientConfig, sets jmpClientSecret, builds the corev1.Secret and calls k8sClient.Create) so it runs only after the workspace uniqueness check completes successfully; locate the current secret logic referencing jmpClientSecret, req.ClientConfig, corev1.Secret and k8sClient.Create and relocate it to after the existing workspace existence check (the code that queries for an existing workspace/name). Also stop treating k8serrors.IsAlreadyExists as success for this secret — if Create returns AlreadyExists, return an error response instead of proceeding, to avoid binding a stale secret to a new workspace.
193-198:⚠️ Potential issue | 🟠 MajorFail closed if
OperatorConfigcannot be loaded.Line 194 ignores the error from
loadOperatorConfigFn. A transient read failure (not NotFound) silently skips workspace policy enforcement (maxResources, tmpfsBuildDir, storage class). OnlyNotFoundshould result in nil config; other errors should return 500.🔧 Suggested fix
// Load workspace configuration from OperatorConfig - operatorConfig, _ := loadOperatorConfigFn(c.Request.Context(), k8sClient, namespace) + operatorConfig, err := loadOperatorConfigFn(c.Request.Context(), k8sClient, namespace) + if err != nil && !k8serrors.IsNotFound(err) { + c.JSON(http.StatusInternalServerError, gin.H{"error": "failed to load operator configuration"}) + return + } var wsConfig *automotivev1alpha1.WorkspacesConfig if operatorConfig != nil { wsConfig = operatorConfig.Spec.Workspaces }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/buildapi/workspace.go` around lines 193 - 198, The code ignores the error returned by loadOperatorConfigFn causing silent failure; update the call to loadOperatorConfigFn (used with c.Request.Context(), k8sClient, namespace) to capture and inspect the error, treat a NotFound error as a nil operatorConfig (and continue), but for any other error return an HTTP 500 response (or surface the error upstream) so workspace policy enforcement (wsConfig / automotivev1alpha1.WorkspacesConfig) is not skipped; use the Kubernetes helper (errors.IsNotFound) to distinguish NotFound from real errors and ensure operatorConfig is only nil when NotFound.
812-819:⚠️ Potential issue | 🟠 MajorInitialize
RequestsandLimitsmaps after DeepCopy, not before.Lines 813-816 create
reswith initialized maps, but line 818 replacesresentirely withwsConfig.Resources.DeepCopy(). If the OperatorConfig hasresources: {}or only partial configuration, the DeepCopy can return a struct with nilRequestsorLimitsmaps, causing panics on lines 831-832 and 845-846.🔧 Suggested fix
// Start from OperatorConfig defaults, then overlay user values - res := &corev1.ResourceRequirements{ - Requests: corev1.ResourceList{}, - Limits: corev1.ResourceList{}, - } + res := &corev1.ResourceRequirements{} if wsConfig != nil && wsConfig.Resources != nil { res = wsConfig.Resources.DeepCopy() } + if res.Requests == nil { + res.Requests = corev1.ResourceList{} + } + if res.Limits == nil { + res.Limits = corev1.ResourceList{} + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/buildapi/workspace.go` around lines 812 - 819, The current code initializes res with empty Requests/Limits then overwrites it with wsConfig.Resources.DeepCopy(), which can leave Requests or Limits nil and cause panics when later accessed; change the logic in the workspace resource-building code so that when wsConfig != nil && wsConfig.Resources != nil you assign res = wsConfig.Resources.DeepCopy() and then ensure res.Requests and res.Limits are non-nil (allocate empty corev1.ResourceList maps if either is nil) before returning/using res; reference the res variable, wsConfig.Resources.DeepCopy(), and the later accesses to res.Requests/res.Limits to locate where to add the nil checks and map initializations.cmd/caib/workspace/workspace.go (2)
501-518:⚠️ Potential issue | 🟠 MajorSync body not retryable across re-auth attempts.
The
progressReaderis created once outsideExecuteWithReauth, wrapping abytes.Buffer. If the helper retries the callback (after re-authentication), subsequentSyncWorkspacecalls read from an already-consumed buffer and upload empty/truncated archives.🔧 Suggested fix
func uploadFiles(name, absDir string, files []string) { var buf bytes.Buffer if err := tarTrackedFiles(absDir, files, &buf); err != nil { handleError(fmt.Errorf("failed to create tar archive: %w", err)) } totalBytes := int64(buf.Len()) - pr := newProgressReader(&buf, totalBytes) + archive := buf.Bytes() err := caibcommon.ExecuteWithReauth(serverURL, &authToken, insecureSkipTLS, func(client *buildapiclient.Client) error { - return client.SyncWorkspace(context.Background(), name, pr) + pr := newProgressReader(bytes.NewReader(archive), totalBytes) + err := client.SyncWorkspace(context.Background(), name, pr) + pr.finish() + return err }) - pr.finish() if err != nil { handleError(fmt.Errorf("failed to sync workspace: %w", err)) } fmt.Println("Files synced") }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cmd/caib/workspace/workspace.go` around lines 501 - 518, The current uploadFiles creates a single progressReader over a bytes.Buffer before ExecuteWithReauth, so retries will re-read an already-consumed reader; instead stash the tar bytes (e.g., tarData := buf.Bytes(), totalBytes := int64(len(tarData))) and create a fresh reader+progressReader inside the ExecuteWithReauth callback on each attempt (use bytes.NewReader(tarData) and newProgressReader), assign that progressReader to an outer variable (e.g., finalPR) so you can call finalPR.finish() after ExecuteWithReauth returns, and pass the freshly created pr into client.SyncWorkspace; update references to newProgressReader and SyncWorkspace accordingly.
608-614:⚠️ Potential issue | 🟡 MinorFlattening command arguments loses argument boundaries.
Joining
cmdPartswith spaces (line 613) breaks commands that have arguments containing spaces, quotes, or shell metacharacters. For example,exec my-app -- echo "hello world"would sendecho hello worldas the command string.Consider preserving the argument array in the API request or using proper shell escaping.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cmd/caib/workspace/workspace.go` around lines 608 - 614, The current code flattens cmdParts into a single string (variable command) which loses argument boundaries; instead keep cmdParts as a []string and pass that slice to the API/executor rather than using strings.Join. Update the logic around cmdParts/command in workspace.go (the block that builds command and uses handleError) to stop creating the single command string and to propagate the []string (cmdParts) through the call path (or, if the backend requires a shell string, perform proper shell-quoting/escaping there rather than naive joining) so quoted arguments and spaces are preserved.
🧹 Nitpick comments (2)
internal/common/tasks/scripts/push_artifact.sh (1)
237-238: Comment is misleading:aib-manifest.ymlis not added separately below.The comment states the manifest is "added separately below," but there's no code in this script that adds
aib-manifest.ymlas a separate layer or includes it elsewhere. This aligns withfind_manifest.sh(lines 31-32), which also excludesaib-manifest.ymlfrom the shared workspace copy, treating it as a build artifact that shouldn't be pushed to the registry.If the intent is simply to exclude it (consistent with
find_manifest.sh), consider updating the comment:📝 Suggested comment clarification
- # Skip .size sidecar files and aib-manifest.yml (added separately below) + # Skip .size sidecar files and aib-manifest.yml (build artifact, not a partition layer) case "$part_file" in *.size|aib-manifest.yml) continue ;; esac🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/common/tasks/scripts/push_artifact.sh` around lines 237 - 238, Update the misleading comment on the case that skips files: it currently claims "aib-manifest.yml (added separately below)" even though nothing in this script adds that manifest later; change the comment to state we are skipping .size sidecar files and excluding aib-manifest.yml (treating it as a build artifact that should not be pushed) so it matches the behavior in find_manifest.sh; locate the case statement that checks "$part_file" (case "$part_file" in *.size|aib-manifest.yml) and edit the comment text accordingly.api/v1alpha1/operatorconfig_types.go (1)
422-425: Remember to regenerate CRD manifests.After modifying types in
api/v1alpha1/, runmake generate manifeststo regenerate DeepCopy methods and CRDs. As per coding guidelines: "After modifying types in api/v1alpha1/, runmake generate manifeststo regenerate DeepCopy methods and CRDs".🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@api/v1alpha1/operatorconfig_types.go` around lines 422 - 425, You added a new/changed field Workspaces *WorkspacesConfig in the OperatorConfig types, so you must regenerate the autogenerated DeepCopy methods and CRD manifests; run the project codegen (e.g., run the repository's codegen command: make generate manifests) to update DeepCopy, client, listers and the CRD YAMLs so the changes to WorkspacesConfig and OperatorConfig are reflected in the generated artifacts.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@cmd/caib/workspace/workspace.go`:
- Around line 501-518: The current uploadFiles creates a single progressReader
over a bytes.Buffer before ExecuteWithReauth, so retries will re-read an
already-consumed reader; instead stash the tar bytes (e.g., tarData :=
buf.Bytes(), totalBytes := int64(len(tarData))) and create a fresh
reader+progressReader inside the ExecuteWithReauth callback on each attempt (use
bytes.NewReader(tarData) and newProgressReader), assign that progressReader to
an outer variable (e.g., finalPR) so you can call finalPR.finish() after
ExecuteWithReauth returns, and pass the freshly created pr into
client.SyncWorkspace; update references to newProgressReader and SyncWorkspace
accordingly.
- Around line 608-614: The current code flattens cmdParts into a single string
(variable command) which loses argument boundaries; instead keep cmdParts as a
[]string and pass that slice to the API/executor rather than using strings.Join.
Update the logic around cmdParts/command in workspace.go (the block that builds
command and uses handleError) to stop creating the single command string and to
propagate the []string (cmdParts) through the call path (or, if the backend
requires a shell string, perform proper shell-quoting/escaping there rather than
naive joining) so quoted arguments and spaces are preserved.
In `@internal/buildapi/workspace.go`:
- Around line 220-243: Move the Jumpstarter secret creation (the block that
decodes req.ClientConfig, sets jmpClientSecret, builds the corev1.Secret and
calls k8sClient.Create) so it runs only after the workspace uniqueness check
completes successfully; locate the current secret logic referencing
jmpClientSecret, req.ClientConfig, corev1.Secret and k8sClient.Create and
relocate it to after the existing workspace existence check (the code that
queries for an existing workspace/name). Also stop treating
k8serrors.IsAlreadyExists as success for this secret — if Create returns
AlreadyExists, return an error response instead of proceeding, to avoid binding
a stale secret to a new workspace.
- Around line 193-198: The code ignores the error returned by
loadOperatorConfigFn causing silent failure; update the call to
loadOperatorConfigFn (used with c.Request.Context(), k8sClient, namespace) to
capture and inspect the error, treat a NotFound error as a nil operatorConfig
(and continue), but for any other error return an HTTP 500 response (or surface
the error upstream) so workspace policy enforcement (wsConfig /
automotivev1alpha1.WorkspacesConfig) is not skipped; use the Kubernetes helper
(errors.IsNotFound) to distinguish NotFound from real errors and ensure
operatorConfig is only nil when NotFound.
- Around line 812-819: The current code initializes res with empty
Requests/Limits then overwrites it with wsConfig.Resources.DeepCopy(), which can
leave Requests or Limits nil and cause panics when later accessed; change the
logic in the workspace resource-building code so that when wsConfig != nil &&
wsConfig.Resources != nil you assign res = wsConfig.Resources.DeepCopy() and
then ensure res.Requests and res.Limits are non-nil (allocate empty
corev1.ResourceList maps if either is nil) before returning/using res; reference
the res variable, wsConfig.Resources.DeepCopy(), and the later accesses to
res.Requests/res.Limits to locate where to add the nil checks and map
initializations.
---
Nitpick comments:
In `@api/v1alpha1/operatorconfig_types.go`:
- Around line 422-425: You added a new/changed field Workspaces
*WorkspacesConfig in the OperatorConfig types, so you must regenerate the
autogenerated DeepCopy methods and CRD manifests; run the project codegen (e.g.,
run the repository's codegen command: make generate manifests) to update
DeepCopy, client, listers and the CRD YAMLs so the changes to WorkspacesConfig
and OperatorConfig are reflected in the generated artifacts.
In `@internal/common/tasks/scripts/push_artifact.sh`:
- Around line 237-238: Update the misleading comment on the case that skips
files: it currently claims "aib-manifest.yml (added separately below)" even
though nothing in this script adds that manifest later; change the comment to
state we are skipping .size sidecar files and excluding aib-manifest.yml
(treating it as a build artifact that should not be pushed) so it matches the
behavior in find_manifest.sh; locate the case statement that checks "$part_file"
(case "$part_file" in *.size|aib-manifest.yml) and edit the comment text
accordingly.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 9058dd81-66c1-4541-a139-0e81b6b207d5
⛔ Files ignored due to path filters (4)
config/crd/bases/automotive.sdv.cloud.redhat.com_imagebuilds.yamlis excluded by!config/crd/bases/**config/crd/bases/automotive.sdv.cloud.redhat.com_operatorconfigs.yamlis excluded by!config/crd/bases/**config/crd/bases/automotive.sdv.cloud.redhat.com_workspaces.yamlis excluded by!config/crd/bases/**config/rbac/role.yamlis excluded by!config/rbac/role.yaml
📒 Files selected for processing (34)
Containerfile.toolchainapi/v1alpha1/imagebuild_types.goapi/v1alpha1/operatorconfig_types.goapi/v1alpha1/workspace_types.goapi/v1alpha1/zz_generated.deepcopy.gobundle.Dockerfilecmd/caib/buildcmd/build.gocmd/caib/image/image.gocmd/caib/main.gocmd/caib/root.gocmd/caib/runtime_wiring.gocmd/caib/workspace/workspace.gocmd/main.goconfig/crd/kustomization.yamlconfig/rbac/scc_role.yamlconfig/rbac/scc_role_binding.yamlhack/container-wrapper.shhack/deploy-catalog.shhack/workspace-entrypoint.shinternal/buildapi/client/client.gointernal/buildapi/client/client_test.gointernal/buildapi/server.gointernal/buildapi/types.gointernal/buildapi/workspace.gointernal/buildapi/workspace_test.gointernal/common/tasks/scripts/build_image.shinternal/common/tasks/scripts/find_manifest.shinternal/common/tasks/scripts/push_artifact.shinternal/common/tasks/tasks.gointernal/controller/imagebuild/controller.gointernal/controller/operatorconfig/controller.gointernal/controller/operatorconfig/resources.gointernal/controller/workspace/controller.gointernal/controller/workspace/controller_test.go
✅ Files skipped from review due to trivial changes (11)
- cmd/caib/main.go
- bundle.Dockerfile
- config/crd/kustomization.yaml
- hack/deploy-catalog.sh
- hack/container-wrapper.sh
- cmd/caib/root.go
- config/rbac/scc_role.yaml
- internal/buildapi/client/client_test.go
- internal/common/tasks/tasks.go
- internal/controller/workspace/controller_test.go
- api/v1alpha1/workspace_types.go
🚧 Files skipped from review as they are similar to previous changes (11)
- config/rbac/scc_role_binding.yaml
- api/v1alpha1/imagebuild_types.go
- internal/buildapi/workspace_test.go
- cmd/caib/runtime_wiring.go
- cmd/caib/image/image.go
- internal/controller/operatorconfig/controller.go
- internal/buildapi/server.go
- internal/common/tasks/scripts/find_manifest.sh
- internal/controller/operatorconfig/resources.go
- internal/buildapi/client/client.go
- api/v1alpha1/zz_generated.deepcopy.go
There was a problem hiding this comment.
Actionable comments posted: 3
♻️ Duplicate comments (11)
internal/common/tasks/scripts/build_image.sh (1)
59-69:⚠️ Potential issue | 🟠 MajorDelete cached
image.jsonbefore reusingBUILD_DIR.The persistent-cache cleanup removes orphaned output directories but leaves
$BUILD_DIR/image.json. Line 528 then republishes that file for every build, so a cacheddiskrun can expose the previous build's manifest.🔧 Proposed fix
for orphan in "$BUILD_DIR"/image_output--*; do [ -e "$orphan" ] || break echo "Removing orphaned build output: $(basename "$orphan")" rm -rf "$orphan" done + rm -f "$BUILD_DIR/image.json"Also applies to: 528-528
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/common/tasks/scripts/build_image.sh` around lines 59 - 69, The cleanup loop currently leaves a cached manifest at $BUILD_DIR/image.json which can be reused across builds; update the script to explicitly remove "$BUILD_DIR/image.json" before reusing BUILD_DIR: detect if the file exists (reference BUILD_DIR and image.json) and rm -f it (or equivalent) before the republish step so stale manifests aren't carried forward; ensure this removal happens alongside the existing orphan removal logic that iterates over image_output--*.internal/controller/imagebuild/controller.go (1)
760-773:⚠️ Potential issue | 🟠 MajorDon't swap out uploaded inputs for the cache PVC.
When
BuildCachePVCis set, this branch stops mountingimageBuild.Status.PVCName, so uploadedsource_pathcontent disappears from the build. It also points different builds at the same/workspace/sharedroot even thoughinternal/common/tasks/scripts/build_image.shnow cleans that root on each run.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/controller/imagebuild/controller.go` around lines 760 - 773, The current branch selects BuildCachePVC before checking uploaded inputs, causing uploaded content (imageBuild.Status.PVCName) to be replaced by the cache PVC; change the selection order so that when imageBuild.Spec.GetInputFilesServer() is true and imageBuild.Status.PVCName is non-empty you bind that PVC to sharedWorkspaceBinding first (use the PersistentVolumeClaim with ClaimName = imageBuild.Status.PVCName), otherwise fall back to imageBuild.Spec.BuildCachePVC, and only then to the VolumeClaimTemplate path — update the conditional logic around sharedWorkspaceBinding in the controller.go block that references imageBuild.Spec.BuildCachePVC, imageBuild.Spec.GetInputFilesServer(), and imageBuild.Status.PVCName.Containerfile.toolchain (2)
22-38:⚠️ Potential issue | 🟠 MajorDon't build the rootfs with signature checks disabled.
Line 26 and Line 34 bypass RPM verification, so the later key import never protects this installroot. That turns the nightly repo fetch into an unsigned supply-chain hop for the toolchain image.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Containerfile.toolchain` around lines 22 - 38, The install currently disables signature verification (repo file gpgcheck and the dnf --nogpgcheck flag) and imports the GPG key only after install, so import the key before installing and enable verification: write the repo with gpgcheck=1, remove the --nogpgcheck flag from the RUN dnf install invocation, move the rpm --root=${ROOTFS} --import ${ROOTFS}/etc/pki/rpm-gpg/${GPG_KEY} call to run before the dnf install, and fail the build if the import fails (remove the "|| true") so the installroot is protected by the imported key.
90-91:⚠️ Potential issue | 🟠 MajorExpose
jmpfrom a non-root path.Line 91 only adds
/root/.local/jumpstarter/bintoPATH, but Line 37 inhack/workspace-entrypoint.shrunsjmpas UID/GID 1000. That user cannot traverse/root, so workspace startup silently skips client activation.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Containerfile.toolchain` around lines 90 - 91, The image exposes jmp only under /root/.local/jumpstarter/bin which UID/GID 1000 in hack/workspace-entrypoint.sh cannot access; change the install or PATH so the jmp binary is placed in or visible from a non-root location. Update the RUN/curl install invocation or set an install prefix (so jumpstarter is installed to a shared location such as /usr/local/share/jumpstarter or /home/jumpstarter/.local) and update the ENV PATH (replace or add to the existing ENV PATH "/usr/local/bin" or the chosen non-root directory) so the jmp executable is accessible to UID 1000; ensure references to jmp and the ENV PATH change correspond to the install command and the workspace-entrypoint.sh usage.internal/controller/operatorconfig/controller.go (1)
908-912:⚠️ Potential issue | 🟠 MajorIgnore SCC
NoMatchduring workspace cleanup.
cleanupOSBuilds()now reaches this path unconditionally, but Line 910 still treats the OpenShift-onlySecurityContextConstraintsdelete as fatal. On non-OpenShift clusters that can wedge cleanup/finalizer removal even though the SCC API was never available.🔧 Proposed fix
- if err := r.Delete(ctx, scc); err != nil && !errors.IsNotFound(err) { + if err := r.Delete(ctx, scc); err != nil && !errors.IsNotFound(err) && !isNoMatchError(err) { return fmt.Errorf("failed to delete workspace SCC: %w", err) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/controller/operatorconfig/controller.go` around lines 908 - 912, The delete of SecurityContextConstraints in cleanupOSBuilds currently treats any error other than NotFound as fatal; update the error handling around r.Delete(ctx, scc) (the SecurityContextConstraints delete for workspaceSCCName) to also ignore API "NoMatch" errors on clusters without the SCC resource. Specifically, change the if condition to treat both errors.IsNotFound(err) and the API "no match" case (use the helper apierrors.IsNoMatchError(err) / errors.IsNoMatchError equivalent from k8s apimachinery) as non-fatal so cleanup can proceed on non-OpenShift clusters.internal/buildapi/workspace.go (3)
220-243:⚠️ Potential issue | 🟠 MajorJumpstarter secret created before workspace validation can be orphaned.
The client config secret is created (lines 229-242) before checking if the workspace already exists (lines 246-259). If workspace creation fails or conflicts, the secret remains orphaned with no owner reference for garbage collection.
Move secret creation after workspace CR creation succeeds, then set an owner reference.
🔧 Suggested approach
- Move secret creation to after successful
k8sClient.Create(ctx, ws)- Set
OwnerReferenceson the secret pointing to the workspace- If secret creation fails, delete the workspace or add it to the cleanup path
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/buildapi/workspace.go` around lines 220 - 243, The secret creation using req.ClientConfig (jmpClientSecret and Secret object) must be moved to after the workspace CR is successfully created (after the call that creates ws via k8sClient.Create(ctx, ws)); set the Secret.ObjectMeta.OwnerReferences to point to the created workspace (use the workspace UID and appropriate APIVersion/Kind) so Kubernetes garbage-collects it, and if the secret creation fails, rollback by deleting the workspace (k8sClient.Delete(ctx, ws)) or include the workspace in the existing cleanup/error path; ensure you still decode req.ClientConfig and only attempt Secret creation when ws creation returned no error.
812-819:⚠️ Potential issue | 🟠 MajorNil map panic after
DeepCopy.Lines 813-816 initialize empty maps, but line 818 reassigns
resto aDeepCopy()which can have nilRequests/Limitsmaps (e.g., if OperatorConfig setsresources: {}). The subsequent assignments at lines 831-832 and 845-846 will panic.🐛 Proposed fix
// Start from OperatorConfig defaults, then overlay user values res := &corev1.ResourceRequirements{ Requests: corev1.ResourceList{}, Limits: corev1.ResourceList{}, } if wsConfig != nil && wsConfig.Resources != nil { res = wsConfig.Resources.DeepCopy() + if res.Requests == nil { + res.Requests = corev1.ResourceList{} + } + if res.Limits == nil { + res.Limits = corev1.ResourceList{} + } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/buildapi/workspace.go` around lines 812 - 819, The code assigns res = wsConfig.Resources.DeepCopy() which may produce a ResourceRequirements whose Requests or Limits maps are nil, causing panics later when writing to them; after the DeepCopy call (when wsConfig != nil && wsConfig.Resources != nil) ensure res.Requests and res.Limits are non-nil maps (initialize empty corev1.ResourceList{} if nil) before any subsequent assignments. Update the handling around res, wsConfig, and Resources.DeepCopy() so you never write into a nil Requests or Limits map.
193-198:⚠️ Potential issue | 🟠 MajorSilently ignoring
OperatorConfigload errors allows policy bypass.Same issue as in
server.go: transient API errors fall back tonilconfig, skipping workspace policy enforcement (maxResources,tmpfsBuildDir, storage class defaults).🛡️ Proposed fix
// Load workspace configuration from OperatorConfig - operatorConfig, _ := loadOperatorConfigFn(c.Request.Context(), k8sClient, namespace) + operatorConfig, err := loadOperatorConfigFn(c.Request.Context(), k8sClient, namespace) + if err != nil && !k8serrors.IsNotFound(err) { + c.JSON(http.StatusInternalServerError, gin.H{"error": "failed to load operator config"}) + return + } var wsConfig *automotivev1alpha1.WorkspacesConfig🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/buildapi/workspace.go` around lines 193 - 198, The code currently ignores the error returned by loadOperatorConfigFn which lets transient API failures silently disable workspace policies; update the logic that calls loadOperatorConfigFn to check the returned error (from loadOperatorConfigFn), and if non-nil return or propagate a wrapped error (instead of proceeding with operatorConfig == nil), so caller can fail fast and enforce policies; ensure the subsequent wsConfig assignment (using operatorConfig.Spec.Workspaces) only runs when operatorConfig is non-nil and no error occurred, and add a clear error message referencing loadOperatorConfigFn/operatorConfig/wsConfig to aid debugging.internal/buildapi/server.go (2)
1626-1686:⚠️ Potential issue | 🟠 MajorTOCTOU:
GenerateNamecan leak duplicate build-cache PVCs.The list-then-create flow with
GenerateName(lines 1631-1642, 1660-1686) is not atomic. If two concurrent builds for the same workspace both miss the existing PVC in the list, each creates a separate cache. This fragments warm-cache state and orphans extra PVCs.Use a deterministic PVC name (e.g.,
<workspace>-build-cache) with create-or-get semantics.🔧 Suggested approach
- pvc := &corev1.PersistentVolumeClaim{ - ObjectMeta: metav1.ObjectMeta{ - GenerateName: wsName + "-build-cache-", + buildCachePVCName := wsName + "-build-cache" + pvc := &corev1.PersistentVolumeClaim{ + ObjectMeta: metav1.ObjectMeta{ + Name: buildCachePVCName, Namespace: namespace, Labels: buildCacheLabels, ... if err := k8sClient.Create(ctx, pvc); err != nil { + if k8serrors.IsAlreadyExists(err) { + return buildCachePVCName, nil + } return "", fmt.Errorf("creating build-cache PVC: %w", err) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/buildapi/server.go` around lines 1626 - 1686, The current list-then-create flow with GenerateName can create duplicate PVCs under race; change to a deterministic PVC name (e.g., wsName + "-build-cache") instead of using GenerateName in the PVC object and implement create-or-get semantics: construct the PVC with Name set (not GenerateName), call k8sClient.Create(ctx, pvc) and if it returns an AlreadyExists error (errors.IsAlreadyExists), fetch the existing PVC via k8sClient.Get and return its Name (ensuring DeletionTimestamp == nil); where Create succeeds return the new PVC name; handle other errors normally. Refer to the PVC object creation and the GenerateName usage and the surrounding create/List logic to locate the code to modify.
1536-1542:⚠️ Potential issue | 🔴 CriticalShell injection via untrusted
repoPath.The
repoPathfrom user input is interpolated directly into a shell command without sanitization. A malicious path likefoo; rm -rf /would execute arbitrary commands in the workspace pod.The
safeFilenamefunction exists (lines 441-464) for similar validation. Apply it torepoPath, or use theshellQuotehelper fromworkspace.go.🛡️ Proposed fix using shellQuote
+// shellQuote returns s wrapped in POSIX single quotes with embedded single +// quotes escaped, safe for interpolation into a /bin/sh -c script. +func shellQuote(s string) string { + return "'" + strings.ReplaceAll(s, "'", `'\''`) + "'" +} + // Start HTTP server in the workspace (background, fire-and-forget). // Redirect shell's own FDs first so runc exec doesn't block waiting for SPDY pipes. cmd := []string{"/bin/sh", "-c", - fmt.Sprintf("exec 0</dev/null 1>/dev/null 2>/dev/null; cd %s && python3 -m http.server %d &", repoPath, port)} + fmt.Sprintf("exec 0</dev/null 1>/dev/null 2>/dev/null; cd %s && python3 -m http.server %d &", shellQuote(repoPath), port)}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/buildapi/server.go` around lines 1536 - 1542, The repoPath is being interpolated into a shell -c command unsanitized, allowing shell injection; before building cmd in the HTTP server start block, sanitize or escape repoPath using the existing helpers (call shellQuote(repoPath) from workspace.go or validate with safeFilename(repoPath)) and use that sanitized value in the fmt.Sprintf that constructs cmd (instead of raw repoPath); keep the existing podExec call and error wrapping (fmt.Errorf("starting HTTP server in workspace %q: %w", wsName, err)) but ensure repoPath is sanitized so the spawned shell cannot execute injected commands.internal/controller/workspace/controller.go (1)
195-202:⚠️ Potential issue | 🟠 MajorUser-supplied images run with elevated privileges.
The pod runs with
Privileged: trueand accepts arbitrary images fromws.Spec.Image. WhileHostUsers: falseprovides some isolation via user namespaces, a malicious image can still escape to the node if kernel vulnerabilities exist.Consider restricting
ws.Spec.Imageto an operator-controlled allowlist, or at minimum validate the image against a configurable prefix (e.g.,registry.redhat.io/automotive/).Also applies to: 282-286
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/controller/workspace/controller.go` around lines 195 - 202, The code currently accepts ws.Spec.Image directly (see ws.Spec.Image usage and fallback to WorkspacesConfig.GetToolchainImage) which allows arbitrary images; change the logic to validate any user-supplied ws.Spec.Image against an operator-controlled allowlist or configurable prefix (e.g., operatorConfig.Spec.Workspaces.ImagePrefix or AllowedImages) before accepting it: if ws.Spec.Image is non-empty, call a validation helper (e.g., IsImageAllowed(image, operatorConfig.Spec.Workspaces)) and only use the image when validation passes; otherwise fall back to WorkspacesConfig.GetToolchainImage or return an error/deny the request. Apply the same validation wherever ws.Spec.Image is read (also at the other occurrence around lines 282-286).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@internal/buildapi/server.go`:
- Around line 1567-1571: The code currently ignores errors from
loadOperatorConfigFn and treats any error as a nil operatorConfig; change the
logic in the handler that calls loadOperatorConfigFn(ctx, k8sClient, namespace)
to inspect the returned error: if err == nil set operatorConfig and wsConfig as
before; if the error is a NotFound (use the appropriate k8s API check e.g.
apierrors.IsNotFound) treat operatorConfig as nil and continue; for any other
error return a 500/internal server error response so policy enforcement is not
bypassed. Keep the same symbols (loadOperatorConfigFn, operatorConfig, wsConfig)
so the fix is applied in the same call site.
In `@internal/controller/imagebuild/controller.go`:
- Around line 789-792: The storage-class fallback added for the
VolumeClaimTemplate branch wasn't applied to the upload-backed PVC code path;
update getOrCreateWorkspacePVC to mirror the same logic used where
storageClassName is set (check imageBuild.Spec.StorageClass first, else fall
back to operatorConfig.Spec.OSBuilds.StorageClass) and use that storageClassName
pointer when constructing the PVC (and when setting PVC.Spec.StorageClassName).
Locate getOrCreateWorkspacePVC and the PVC construction/creation logic and
ensure it references the same storageClassName resolution as the
VolumeClaimTemplate branch (using the same variable names like storageClassName
and imageBuild.Spec.StorageClass / operatorConfig.Spec.OSBuilds.StorageClass).
- Line 105: The RBAC annotation currently grants only "get;update" for
workspaces but updateWorkspaceLease() uses r.Patch(ctx, ws, patch) which
requires the "patch" verb; update the kubebuilder RBAC comment line that
references workspaces (the line containing
"+kubebuilder:rbac:groups=automotive.sdv.cloud.redhat.com,resources=workspaces,verbs=...")
to include "patch" in the verbs list so PATCH requests from r.Patch are
authorized.
---
Duplicate comments:
In `@Containerfile.toolchain`:
- Around line 22-38: The install currently disables signature verification (repo
file gpgcheck and the dnf --nogpgcheck flag) and imports the GPG key only after
install, so import the key before installing and enable verification: write the
repo with gpgcheck=1, remove the --nogpgcheck flag from the RUN dnf install
invocation, move the rpm --root=${ROOTFS} --import
${ROOTFS}/etc/pki/rpm-gpg/${GPG_KEY} call to run before the dnf install, and
fail the build if the import fails (remove the "|| true") so the installroot is
protected by the imported key.
- Around line 90-91: The image exposes jmp only under
/root/.local/jumpstarter/bin which UID/GID 1000 in hack/workspace-entrypoint.sh
cannot access; change the install or PATH so the jmp binary is placed in or
visible from a non-root location. Update the RUN/curl install invocation or set
an install prefix (so jumpstarter is installed to a shared location such as
/usr/local/share/jumpstarter or /home/jumpstarter/.local) and update the ENV
PATH (replace or add to the existing ENV PATH "/usr/local/bin" or the chosen
non-root directory) so the jmp executable is accessible to UID 1000; ensure
references to jmp and the ENV PATH change correspond to the install command and
the workspace-entrypoint.sh usage.
In `@internal/buildapi/server.go`:
- Around line 1626-1686: The current list-then-create flow with GenerateName can
create duplicate PVCs under race; change to a deterministic PVC name (e.g.,
wsName + "-build-cache") instead of using GenerateName in the PVC object and
implement create-or-get semantics: construct the PVC with Name set (not
GenerateName), call k8sClient.Create(ctx, pvc) and if it returns an
AlreadyExists error (errors.IsAlreadyExists), fetch the existing PVC via
k8sClient.Get and return its Name (ensuring DeletionTimestamp == nil); where
Create succeeds return the new PVC name; handle other errors normally. Refer to
the PVC object creation and the GenerateName usage and the surrounding
create/List logic to locate the code to modify.
- Around line 1536-1542: The repoPath is being interpolated into a shell -c
command unsanitized, allowing shell injection; before building cmd in the HTTP
server start block, sanitize or escape repoPath using the existing helpers (call
shellQuote(repoPath) from workspace.go or validate with safeFilename(repoPath))
and use that sanitized value in the fmt.Sprintf that constructs cmd (instead of
raw repoPath); keep the existing podExec call and error wrapping
(fmt.Errorf("starting HTTP server in workspace %q: %w", wsName, err)) but ensure
repoPath is sanitized so the spawned shell cannot execute injected commands.
In `@internal/buildapi/workspace.go`:
- Around line 220-243: The secret creation using req.ClientConfig
(jmpClientSecret and Secret object) must be moved to after the workspace CR is
successfully created (after the call that creates ws via k8sClient.Create(ctx,
ws)); set the Secret.ObjectMeta.OwnerReferences to point to the created
workspace (use the workspace UID and appropriate APIVersion/Kind) so Kubernetes
garbage-collects it, and if the secret creation fails, rollback by deleting the
workspace (k8sClient.Delete(ctx, ws)) or include the workspace in the existing
cleanup/error path; ensure you still decode req.ClientConfig and only attempt
Secret creation when ws creation returned no error.
- Around line 812-819: The code assigns res = wsConfig.Resources.DeepCopy()
which may produce a ResourceRequirements whose Requests or Limits maps are nil,
causing panics later when writing to them; after the DeepCopy call (when
wsConfig != nil && wsConfig.Resources != nil) ensure res.Requests and res.Limits
are non-nil maps (initialize empty corev1.ResourceList{} if nil) before any
subsequent assignments. Update the handling around res, wsConfig, and
Resources.DeepCopy() so you never write into a nil Requests or Limits map.
- Around line 193-198: The code currently ignores the error returned by
loadOperatorConfigFn which lets transient API failures silently disable
workspace policies; update the logic that calls loadOperatorConfigFn to check
the returned error (from loadOperatorConfigFn), and if non-nil return or
propagate a wrapped error (instead of proceeding with operatorConfig == nil), so
caller can fail fast and enforce policies; ensure the subsequent wsConfig
assignment (using operatorConfig.Spec.Workspaces) only runs when operatorConfig
is non-nil and no error occurred, and add a clear error message referencing
loadOperatorConfigFn/operatorConfig/wsConfig to aid debugging.
In `@internal/common/tasks/scripts/build_image.sh`:
- Around line 59-69: The cleanup loop currently leaves a cached manifest at
$BUILD_DIR/image.json which can be reused across builds; update the script to
explicitly remove "$BUILD_DIR/image.json" before reusing BUILD_DIR: detect if
the file exists (reference BUILD_DIR and image.json) and rm -f it (or
equivalent) before the republish step so stale manifests aren't carried forward;
ensure this removal happens alongside the existing orphan removal logic that
iterates over image_output--*.
In `@internal/controller/imagebuild/controller.go`:
- Around line 760-773: The current branch selects BuildCachePVC before checking
uploaded inputs, causing uploaded content (imageBuild.Status.PVCName) to be
replaced by the cache PVC; change the selection order so that when
imageBuild.Spec.GetInputFilesServer() is true and imageBuild.Status.PVCName is
non-empty you bind that PVC to sharedWorkspaceBinding first (use the
PersistentVolumeClaim with ClaimName = imageBuild.Status.PVCName), otherwise
fall back to imageBuild.Spec.BuildCachePVC, and only then to the
VolumeClaimTemplate path — update the conditional logic around
sharedWorkspaceBinding in the controller.go block that references
imageBuild.Spec.BuildCachePVC, imageBuild.Spec.GetInputFilesServer(), and
imageBuild.Status.PVCName.
In `@internal/controller/operatorconfig/controller.go`:
- Around line 908-912: The delete of SecurityContextConstraints in
cleanupOSBuilds currently treats any error other than NotFound as fatal; update
the error handling around r.Delete(ctx, scc) (the SecurityContextConstraints
delete for workspaceSCCName) to also ignore API "NoMatch" errors on clusters
without the SCC resource. Specifically, change the if condition to treat both
errors.IsNotFound(err) and the API "no match" case (use the helper
apierrors.IsNoMatchError(err) / errors.IsNoMatchError equivalent from k8s
apimachinery) as non-fatal so cleanup can proceed on non-OpenShift clusters.
In `@internal/controller/workspace/controller.go`:
- Around line 195-202: The code currently accepts ws.Spec.Image directly (see
ws.Spec.Image usage and fallback to WorkspacesConfig.GetToolchainImage) which
allows arbitrary images; change the logic to validate any user-supplied
ws.Spec.Image against an operator-controlled allowlist or configurable prefix
(e.g., operatorConfig.Spec.Workspaces.ImagePrefix or AllowedImages) before
accepting it: if ws.Spec.Image is non-empty, call a validation helper (e.g.,
IsImageAllowed(image, operatorConfig.Spec.Workspaces)) and only use the image
when validation passes; otherwise fall back to
WorkspacesConfig.GetToolchainImage or return an error/deny the request. Apply
the same validation wherever ws.Spec.Image is read (also at the other occurrence
around lines 282-286).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 290b544a-3d3c-456f-a157-93c8cdf4d46e
⛔ Files ignored due to path filters (4)
config/crd/bases/automotive.sdv.cloud.redhat.com_imagebuilds.yamlis excluded by!config/crd/bases/**config/crd/bases/automotive.sdv.cloud.redhat.com_operatorconfigs.yamlis excluded by!config/crd/bases/**config/crd/bases/automotive.sdv.cloud.redhat.com_workspaces.yamlis excluded by!config/crd/bases/**config/rbac/role.yamlis excluded by!config/rbac/role.yaml
📒 Files selected for processing (34)
Containerfile.toolchainapi/v1alpha1/imagebuild_types.goapi/v1alpha1/operatorconfig_types.goapi/v1alpha1/workspace_types.goapi/v1alpha1/zz_generated.deepcopy.gobundle.Dockerfilecmd/caib/buildcmd/build.gocmd/caib/image/image.gocmd/caib/main.gocmd/caib/root.gocmd/caib/runtime_wiring.gocmd/caib/workspace/workspace.gocmd/main.goconfig/crd/kustomization.yamlconfig/rbac/scc_role.yamlconfig/rbac/scc_role_binding.yamlhack/container-wrapper.shhack/deploy-catalog.shhack/workspace-entrypoint.shinternal/buildapi/client/client.gointernal/buildapi/client/client_test.gointernal/buildapi/server.gointernal/buildapi/types.gointernal/buildapi/workspace.gointernal/buildapi/workspace_test.gointernal/common/tasks/scripts/build_image.shinternal/common/tasks/scripts/find_manifest.shinternal/common/tasks/scripts/push_artifact.shinternal/common/tasks/tasks.gointernal/controller/imagebuild/controller.gointernal/controller/operatorconfig/controller.gointernal/controller/operatorconfig/resources.gointernal/controller/workspace/controller.gointernal/controller/workspace/controller_test.go
✅ Files skipped from review due to trivial changes (12)
- bundle.Dockerfile
- config/crd/kustomization.yaml
- internal/common/tasks/scripts/push_artifact.sh
- config/rbac/scc_role.yaml
- cmd/caib/runtime_wiring.go
- cmd/caib/root.go
- hack/container-wrapper.sh
- internal/buildapi/workspace_test.go
- api/v1alpha1/zz_generated.deepcopy.go
- internal/common/tasks/tasks.go
- api/v1alpha1/workspace_types.go
- internal/buildapi/client/client.go
🚧 Files skipped from review as they are similar to previous changes (11)
- cmd/caib/main.go
- config/rbac/scc_role_binding.yaml
- cmd/main.go
- hack/deploy-catalog.sh
- api/v1alpha1/imagebuild_types.go
- cmd/caib/buildcmd/build.go
- internal/buildapi/client/client_test.go
- internal/common/tasks/scripts/find_manifest.sh
- api/v1alpha1/operatorconfig_types.go
- cmd/caib/workspace/workspace.go
- internal/controller/operatorconfig/resources.go
| operatorConfig, _ := loadOperatorConfigFn(ctx, k8sClient, namespace) | ||
| var wsConfig *automotivev1alpha1.WorkspacesConfig | ||
| if operatorConfig != nil { | ||
| wsConfig = operatorConfig.Spec.Workspaces | ||
| } |
There was a problem hiding this comment.
Silently ignoring OperatorConfig load errors allows policy bypass.
If the API server returns a transient error (timeout, connection reset), the code falls back to nil config instead of failing the request. This skips workspace policy enforcement (storage class, build cache size limits, etc.).
Only ignore NotFound; return 500 on other errors.
🛡️ Proposed fix
- operatorConfig, _ := loadOperatorConfigFn(ctx, k8sClient, namespace)
+ operatorConfig, err := loadOperatorConfigFn(ctx, k8sClient, namespace)
+ if err != nil && !k8serrors.IsNotFound(err) {
+ return "", fmt.Errorf("loading operator config: %w", err)
+ }
var wsConfig *automotivev1alpha1.WorkspacesConfig
if operatorConfig != nil {
wsConfig = operatorConfig.Spec.Workspaces
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@internal/buildapi/server.go` around lines 1567 - 1571, The code currently
ignores errors from loadOperatorConfigFn and treats any error as a nil
operatorConfig; change the logic in the handler that calls
loadOperatorConfigFn(ctx, k8sClient, namespace) to inspect the returned error:
if err == nil set operatorConfig and wsConfig as before; if the error is a
NotFound (use the appropriate k8s API check e.g. apierrors.IsNotFound) treat
operatorConfig as nil and continue; for any other error return a 500/internal
server error response so policy enforcement is not bypassed. Keep the same
symbols (loadOperatorConfigFn, operatorConfig, wsConfig) so the fix is applied
in the same call site.
| RestConfig *rest.Config | ||
| } | ||
|
|
||
| // +kubebuilder:rbac:groups=automotive.sdv.cloud.redhat.com,resources=workspaces,verbs=get;update |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Expected: the Workspace RBAC rule includes "patch" because updateWorkspaceLease() calls Patch().
rg -n 'resources=workspaces,verbs=' internal/controller/imagebuild/controller.go
rg -n 'Patch\(ctx, ws, patch\)' internal/controller/imagebuild/controller.goRepository: centos-automotive-suite/automotive-dev-operator
Length of output: 246
🏁 Script executed:
#!/bin/bash
# Verify the function context at line 1457
sed -n '1440,1465p' internal/controller/imagebuild/controller.go
# Check all RBAC annotations in the file
rg -n '\+kubebuilder:rbac' internal/controller/imagebuild/controller.goRepository: centos-automotive-suite/automotive-dev-operator
Length of output: 3004
🌐 Web query:
Kubernetes RBAC update vs patch verb difference
💡 Result:
In Kubernetes RBAC, the 'update' and 'patch' verbs authorize different HTTP methods for modifying resources. 'update' authorizes PUT requests, which replace the entire resource object (atomic full replacement, requiring the client to provide the complete current state with updated resourceVersion for optimistic concurrency). 'patch' authorizes PATCH requests, which apply partial changes to specific fields without needing the full object (supports strategic merge patch, JSON patch, etc., often used by kubectl apply, edit, set image, scale, etc.). Most kubectl update commands translate to PATCH under the hood, so 'patch' is commonly needed even for operations that conceptually "update". Pure PUT updates (rare in practice) require 'update'. For subresources like /scale, even PUT may require 'patch' permission on the subresource. To allow full modification flexibility, grant both 'update' and 'patch'.
Citations:
- 1: https://kubernetes.io/docs/reference/access-authn-authz/authorization/
- 2: https://deploy-preview-44540--kubernetes-io-main-staging.netlify.app/docs/reference/access-authn-authz/authorization/
- 3: https://kubernetes.io/docs/reference/using-api/api-concepts
- 4: https://stackoverflow.com/questions/64754284/what-does-the-update-kubernetes-rbac-permission-do
Grant patch on workspaces RBAC rule.
updateWorkspaceLease() calls r.Patch(ctx, ws, patch) at line 1457, but the kubebuilder RBAC annotation at line 105 only grants get;update. In Kubernetes RBAC, patch is a separate verb from update and authorizes PATCH HTTP requests. Without it, every lease write-back will be denied.
🔧 Proposed fix
-// +kubebuilder:rbac:groups=automotive.sdv.cloud.redhat.com,resources=workspaces,verbs=get;update
+// +kubebuilder:rbac:groups=automotive.sdv.cloud.redhat.com,resources=workspaces,verbs=get;update;patchAlso applies to: 1455–1457
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@internal/controller/imagebuild/controller.go` at line 105, The RBAC
annotation currently grants only "get;update" for workspaces but
updateWorkspaceLease() uses r.Patch(ctx, ws, patch) which requires the "patch"
verb; update the kubebuilder RBAC comment line that references workspaces (the
line containing
"+kubebuilder:rbac:groups=automotive.sdv.cloud.redhat.com,resources=workspaces,verbs=...")
to include "patch" in the verbs list so PATCH requests from r.Patch are
authorized.
e21638c to
b95f1ba
Compare
There was a problem hiding this comment.
♻️ Duplicate comments (10)
internal/buildapi/client/client.go (2)
957-960:⚠️ Potential issue | 🟠 MajorClone
websocket.DefaultDialerhere.A zero-value
websocket.Dialerdrops Gorilla's default proxy handling and handshake timeout. That makes shell sessions ignore proxy settings and removes the normal timeout protection.What defaults does github.com/gorilla/websocket.DefaultDialer set compared with a zero-value websocket.Dialer?🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/buildapi/client/client.go` around lines 957 - 960, The code currently creates a zero-value websocket.Dialer which loses Gorilla's defaults (proxy handling and handshake timeout); instead initialize dialer by cloning websocket.DefaultDialer (e.g., dialer := *websocket.DefaultDialer) and then, if c.httpClient.Transport is an *http.Transport with non-nil TLSClientConfig, assign that TLSClientConfig into the cloned dialer (dialer.TLSClientConfig = t.TLSClientConfig) so you preserve default proxy/timeout behavior while applying the custom TLS config.
770-915:⚠️ Potential issue | 🟠 MajorStop pre-escaping workspace names before
resolve().These methods pass
url.PathEscape(name)intoresolve(), which writes the value intourl.URL.Path.URL.String()then escapes the percent sign again, so a workspace likemy appgoes out asmy%2520app. Useurl.JoinPath(or a helper that joins raw segments) for the workspace endpoints instead.In Go net/url, if URL.Path already contains a percent-escaped segment like "my%20app" and RawPath is empty, does URL.String() emit "%2520"? Is url.JoinPath the correct way to compose path segments?Also applies to: 947-950
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/buildapi/client/client.go` around lines 770 - 915, The code pre-escapes workspace names with url.PathEscape before calling c.resolve (seen in GetWorkspace, workspaceAction, DeleteWorkspace, SyncPlan, SyncWorkspace, workspaceStreamRequest), which causes double-escaping (e.g., "my app" -> "my%2520app"); remove url.PathEscape(name) calls and instead build the path using url.JoinPath (or a small helper that joins raw path segments) to compose "/v1/workspaces", the unescaped name, and optional action/segments, then pass that joined path into c.resolve so names are escaped exactly once.internal/controller/workspace/controller.go (2)
195-202:⚠️ Potential issue | 🔴 CriticalDon't run caller-selected images as privileged root.
ws.Spec.Imageis taken verbatim and then launched withRunAsUser=0,Privileged=true,AllowPrivilegeEscalation=true, and theado-workspaceservice account. That turns workspace creation into an arbitrary privileged-pod launcher. Restrict image selection to an operator-controlled allowlist/default before pod creation, or drop the privileged/root requirement.Also applies to: 269-286
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/controller/workspace/controller.go` around lines 195 - 202, The code currently uses ws.Spec.Image directly (falling back to wsConfig.GetToolchainImage()) and then creates pods with RunAsUser=0, Privileged=true, AllowPrivilegeEscalation=true and the ado-workspace service account, which allows arbitrary privileged images; fix by enforcing an operator-controlled image allowlist or always using the operator default image and never honoring arbitrary ws.Spec.Image (validate ws.Spec.Image against operatorConfig.Spec.Workspaces allowlist before assignment to image), and remove privileged/root execution by setting a non-zero runAsUser and disabling Privileged and AllowPrivilegeEscalation in the pod securityContext (also audit any use of the ado-workspace service account to ensure it has no elevated permissions), and apply the same changes to the other similar block around the code handling lines 269-286.
169-176:⚠️ Potential issue | 🟠 MajorApply the operator workspace policy when building the Pod.
This controller only uses
OperatorConfig.Spec.WorkspacesforGetToolchainImage().resourcesOrDefaults(ws.Spec.Resources)still hardcodes resource defaults,NodeSelectorcomes only fromws.Spec, andTolerationsnever get set, so most of the new workspace policy surface never affects the pods created here.Also applies to: 195-202, 281-309, 354-363
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/controller/workspace/controller.go` around lines 169 - 176, The Pod construction in r.buildPod (called as r.buildPod(ws, operatorConfig)) ignores OperatorConfig.Spec.Workspaces policy: update buildPod (and other places that call it: the other blocks around the mentioned ranges) to merge workspace policy from operatorConfig into the Pod spec — specifically use OperatorConfig.Spec.Workspaces when resolving resources instead of resourcesOrDefaults(ws.Spec.Resources), apply NodeSelector from the workspace policy (falling back to ws.Spec.NodeSelector only if policy absent), and set Tolerations from the workspace policy into the Pod (not leaving Tolerations unset); also ensure GetToolchainImage() still reads from OperatorConfig.Spec.Workspaces. Locate and change buildPod, any direct resource/toleration/nodeSelector handling, and callers at the other indicated blocks so the operator workspace policy is applied consistently.Containerfile.toolchain (3)
57-57:⚠️ Potential issue | 🟡 MinorClean EPEL metadata in the same layer.
This leaves DNF cache behind in the final image.
🧹 Proposed fix
-RUN dnf install -y https://dl.fedoraproject.org/pub/epel/epel-release-latest-10.noarch.rpm +RUN dnf install -y https://dl.fedoraproject.org/pub/epel/epel-release-latest-10.noarch.rpm && dnf clean all🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Containerfile.toolchain` at line 57, The RUN step that installs the EPEL RPM (the line running "dnf install -y https://dl.fedoraproject.org/pub/epel/epel-release-latest-10.noarch.rpm") leaves DNF caches in the layer; modify that RUN to clean DNF metadata/cache in the same layer by chaining cleanup commands (e.g., run dnf install ... && dnf clean all && rm -rf /var/cache/dnf or equivalent) so caches are removed before the layer is committed.
90-91:⚠️ Potential issue | 🟠 MajorInstall
jmpsomewhere the workspace user can traverse.
hack/workspace-entrypoint.shswitches to UID 1000 before runningjmp config ..., but the PATH here points into/root/.local/jumpstarter/bin. That makes Jumpstarter setup and non-root workspace sessions lose the CLI.What is the default install directory for jumpstarter's python/install.sh, and does it support --dir for installing into a system-wide path?🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Containerfile.toolchain` around lines 90 - 91, The Containerfile currently installs Jumpstarter into /root/.local/jumpstarter/bin and sets PATH there, which breaks non-root workspace runs because hack/workspace-entrypoint.sh switches to UID 1000 and then jmp config cannot be found; change the Dockerfile steps so the installer is invoked with a system-writable install location (or use the install script's --dir option) and set ENV PATH to that location (e.g., a system-wide bin under /usr/local) so jmp is executable by UID 1000, or if the installer lacks --dir, move/copy the installed jmp binary into a system path (e.g., /usr/local/bin) after installation and update ENV PATH accordingly so hack/workspace-entrypoint.sh can run jmp config as the workspace user.
10-38:⚠️ Potential issue | 🟠 MajorVerify AutoSD RPMs instead of bypassing signatures.
These lines disable package verification entirely, and the only key imported afterward is the CentOS official key rather than the Automotive SIG key used for
centos-release-autosd. As written, none of the AutoSD packages installed here are authenticated, and the import on Line 38 happens only after the unchecked transaction.For centos-release-autosd / AutoSD nightly, which RPM GPG key signs packages, and should package installs use gpgcheck while leaving repo_gpgcheck disabled?internal/common/tasks/scripts/build_image.sh (1)
59-69:⚠️ Potential issue | 🟠 MajorDrop stale
image.jsonfrom the persistent cache cleanup.
diskmode does not rewrite$BUILD_DIR/image.json, so with persistent cache enabled the copy on Line 528 can publish the previous build's manifest as the current one. Remove the cached manifest alongside the orphanedimage_output--*directories.🧹 Proposed fix
for orphan in "$BUILD_DIR"/image_output--*; do [ -e "$orphan" ] || break echo "Removing orphaned build output: $(basename "$orphan")" rm -rf "$orphan" done + rm -f "$BUILD_DIR/image.json"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/common/tasks/scripts/build_image.sh` around lines 59 - 69, The persistent cache cleanup currently removes stale items in WORKSPACE_PATH and orphaned image_output--* dirs but leaves a stale manifest; update the orphan cleanup in build_image.sh to also remove the cached manifest file by deleting $BUILD_DIR/image.json (alongside the existing loop that removes image_output--*), ensuring the deletion runs only when the file exists so the previous build's image.json cannot be published as the current manifest.internal/controller/operatorconfig/controller.go (1)
908-912:⚠️ Potential issue | 🟠 MajorIgnore
NoMatchwhen deleting the workspace SCC.
cleanupWorkspaceInfra()still returns an error when thesecurity.openshift.ioAPI is absent. On a plain Kubernetes cluster that blocks OSBuilds cleanup and finalizer removal even though the SCC was never creatable there. Gate this delete behinddetectOpenShift(...)or treat the no-match case as non-fatal.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/controller/operatorconfig/controller.go` around lines 908 - 912, cleanupWorkspaceInfra() is attempting to delete a securityv1.SecurityContextConstraints named by workspaceSCCName but still errors when the security.openshift.io API is absent; update the SCC delete so it either first checks detectOpenShift(ctx) and skips the delete on non-OpenShift clusters, or treat a NoMatch API discovery error as non-fatal by catching metav1.IsNoMatchError (or the client discovery NoMatch) and returning nil instead of an error after the Delete call; locate the delete logic in cleanupWorkspaceInfra(), which constructs scc := &securityv1.SecurityContextConstraints{} and calls r.Delete(ctx, scc), and adjust the error handling to ignore NoMatch (or gate the whole block with detectOpenShift()).internal/buildapi/workspace.go (1)
224-263:⚠️ Potential issue | 🟡 MinorSecret created before workspace existence check can leave orphaned secrets.
The Jumpstarter client config secret is created at line 243 before checking if a workspace with that name already exists (line 251). If the workspace already exists or creation fails later, the secret remains orphaned.
Consider reordering to:
- Check workspace existence first
- Create the Workspace CR
- Create the secret only on successful workspace creation, or set an OwnerReference on the secret pointing to the workspace so it gets garbage-collected automatically.
♻️ Suggested approach
+ // Check if workspace CR already exists FIRST + existing := &automotivev1alpha1.Workspace{} + err = k8sClient.Get(c.Request.Context(), client.ObjectKey{Namespace: namespace, Name: req.Name}, existing) + if err == nil { + if existing.DeletionTimestamp != nil { + c.JSON(http.StatusConflict, gin.H{"error": fmt.Sprintf("workspace %q is still terminating, try again shortly", req.Name)}) + } else { + c.JSON(http.StatusConflict, gin.H{"error": fmt.Sprintf("workspace %q already exists", req.Name)}) + } + return + } + if !k8serrors.IsNotFound(err) { + c.JSON(http.StatusInternalServerError, gin.H{"error": "failed to check existing workspace"}) + return + } + // Create secret for Jumpstarter client config if provided var jmpClientSecret string if req.ClientConfig != "" { - // ... secret creation ... + // ... secret creation (now after existence check) ... } - - // Check if workspace CR already exists - existing := &automotivev1alpha1.Workspace{} - // ... moved above ...🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/buildapi/workspace.go` around lines 224 - 263, The code creates the Jumpstarter client Secret (jmpClientSecret) before checking for an existing Workspace CR (existing / k8sClient.Get) which can leave orphaned Secrets; reorder and/or attach ownership: first perform the workspace existence check using k8sClient.Get on automotivev1alpha1.Workspace (existing) and return early if it exists, then create the Workspace CR and only after successful creation decode req.ClientConfig and create the Secret, or if you must create the Secret earlier, set an OwnerReference on the Secret pointing to the created Workspace (so the Secret is garbage-collected) and handle create errors consistently (use k8serrors.IsAlreadyExists when appropriate).
🧹 Nitpick comments (1)
cmd/caib/workspace/workspace.go (1)
784-786: Consider logging or limiting silent retries on API errors.Transient errors during polling are silently continued (line 785). While this is reasonable for brief network hiccups, persistent failures could cause the 5-minute timeout to be reached without any feedback to the user.
Consider adding a counter or logging after multiple consecutive failures.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cmd/caib/workspace/workspace.go` around lines 784 - 786, The loop currently swallows transient API errors at the "if err != nil { continue // transient error, retry }" site with no visibility; add a consecutive-failure counter (e.g., consecutiveErrs) incremented when err != nil and reset on success, emit a warning via the existing logger after a threshold (e.g., 3) and either apply a short backoff or return a descriptive error if the count exceeds a larger threshold (e.g., for the 5-minute poll duration) so persistent failures are surfaced; update the branch around that "continue" to increment/reset the counter and call the logger with context (including err) when thresholds are reached.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@Containerfile.toolchain`:
- Line 57: The RUN step that installs the EPEL RPM (the line running "dnf
install -y
https://dl.fedoraproject.org/pub/epel/epel-release-latest-10.noarch.rpm") leaves
DNF caches in the layer; modify that RUN to clean DNF metadata/cache in the same
layer by chaining cleanup commands (e.g., run dnf install ... && dnf clean all
&& rm -rf /var/cache/dnf or equivalent) so caches are removed before the layer
is committed.
- Around line 90-91: The Containerfile currently installs Jumpstarter into
/root/.local/jumpstarter/bin and sets PATH there, which breaks non-root
workspace runs because hack/workspace-entrypoint.sh switches to UID 1000 and
then jmp config cannot be found; change the Dockerfile steps so the installer is
invoked with a system-writable install location (or use the install script's
--dir option) and set ENV PATH to that location (e.g., a system-wide bin under
/usr/local) so jmp is executable by UID 1000, or if the installer lacks --dir,
move/copy the installed jmp binary into a system path (e.g., /usr/local/bin)
after installation and update ENV PATH accordingly so
hack/workspace-entrypoint.sh can run jmp config as the workspace user.
In `@internal/buildapi/client/client.go`:
- Around line 957-960: The code currently creates a zero-value websocket.Dialer
which loses Gorilla's defaults (proxy handling and handshake timeout); instead
initialize dialer by cloning websocket.DefaultDialer (e.g., dialer :=
*websocket.DefaultDialer) and then, if c.httpClient.Transport is an
*http.Transport with non-nil TLSClientConfig, assign that TLSClientConfig into
the cloned dialer (dialer.TLSClientConfig = t.TLSClientConfig) so you preserve
default proxy/timeout behavior while applying the custom TLS config.
- Around line 770-915: The code pre-escapes workspace names with url.PathEscape
before calling c.resolve (seen in GetWorkspace, workspaceAction,
DeleteWorkspace, SyncPlan, SyncWorkspace, workspaceStreamRequest), which causes
double-escaping (e.g., "my app" -> "my%2520app"); remove url.PathEscape(name)
calls and instead build the path using url.JoinPath (or a small helper that
joins raw path segments) to compose "/v1/workspaces", the unescaped name, and
optional action/segments, then pass that joined path into c.resolve so names are
escaped exactly once.
In `@internal/buildapi/workspace.go`:
- Around line 224-263: The code creates the Jumpstarter client Secret
(jmpClientSecret) before checking for an existing Workspace CR (existing /
k8sClient.Get) which can leave orphaned Secrets; reorder and/or attach
ownership: first perform the workspace existence check using k8sClient.Get on
automotivev1alpha1.Workspace (existing) and return early if it exists, then
create the Workspace CR and only after successful creation decode
req.ClientConfig and create the Secret, or if you must create the Secret
earlier, set an OwnerReference on the Secret pointing to the created Workspace
(so the Secret is garbage-collected) and handle create errors consistently (use
k8serrors.IsAlreadyExists when appropriate).
In `@internal/common/tasks/scripts/build_image.sh`:
- Around line 59-69: The persistent cache cleanup currently removes stale items
in WORKSPACE_PATH and orphaned image_output--* dirs but leaves a stale manifest;
update the orphan cleanup in build_image.sh to also remove the cached manifest
file by deleting $BUILD_DIR/image.json (alongside the existing loop that removes
image_output--*), ensuring the deletion runs only when the file exists so the
previous build's image.json cannot be published as the current manifest.
In `@internal/controller/operatorconfig/controller.go`:
- Around line 908-912: cleanupWorkspaceInfra() is attempting to delete a
securityv1.SecurityContextConstraints named by workspaceSCCName but still errors
when the security.openshift.io API is absent; update the SCC delete so it either
first checks detectOpenShift(ctx) and skips the delete on non-OpenShift
clusters, or treat a NoMatch API discovery error as non-fatal by catching
metav1.IsNoMatchError (or the client discovery NoMatch) and returning nil
instead of an error after the Delete call; locate the delete logic in
cleanupWorkspaceInfra(), which constructs scc :=
&securityv1.SecurityContextConstraints{} and calls r.Delete(ctx, scc), and
adjust the error handling to ignore NoMatch (or gate the whole block with
detectOpenShift()).
In `@internal/controller/workspace/controller.go`:
- Around line 195-202: The code currently uses ws.Spec.Image directly (falling
back to wsConfig.GetToolchainImage()) and then creates pods with RunAsUser=0,
Privileged=true, AllowPrivilegeEscalation=true and the ado-workspace service
account, which allows arbitrary privileged images; fix by enforcing an
operator-controlled image allowlist or always using the operator default image
and never honoring arbitrary ws.Spec.Image (validate ws.Spec.Image against
operatorConfig.Spec.Workspaces allowlist before assignment to image), and remove
privileged/root execution by setting a non-zero runAsUser and disabling
Privileged and AllowPrivilegeEscalation in the pod securityContext (also audit
any use of the ado-workspace service account to ensure it has no elevated
permissions), and apply the same changes to the other similar block around the
code handling lines 269-286.
- Around line 169-176: The Pod construction in r.buildPod (called as
r.buildPod(ws, operatorConfig)) ignores OperatorConfig.Spec.Workspaces policy:
update buildPod (and other places that call it: the other blocks around the
mentioned ranges) to merge workspace policy from operatorConfig into the Pod
spec — specifically use OperatorConfig.Spec.Workspaces when resolving resources
instead of resourcesOrDefaults(ws.Spec.Resources), apply NodeSelector from the
workspace policy (falling back to ws.Spec.NodeSelector only if policy absent),
and set Tolerations from the workspace policy into the Pod (not leaving
Tolerations unset); also ensure GetToolchainImage() still reads from
OperatorConfig.Spec.Workspaces. Locate and change buildPod, any direct
resource/toleration/nodeSelector handling, and callers at the other indicated
blocks so the operator workspace policy is applied consistently.
---
Nitpick comments:
In `@cmd/caib/workspace/workspace.go`:
- Around line 784-786: The loop currently swallows transient API errors at the
"if err != nil { continue // transient error, retry }" site with no visibility;
add a consecutive-failure counter (e.g., consecutiveErrs) incremented when err
!= nil and reset on success, emit a warning via the existing logger after a
threshold (e.g., 3) and either apply a short backoff or return a descriptive
error if the count exceeds a larger threshold (e.g., for the 5-minute poll
duration) so persistent failures are surfaced; update the branch around that
"continue" to increment/reset the counter and call the logger with context
(including err) when thresholds are reached.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 37605479-ca63-4cb6-831b-4f22f363b0eb
⛔ Files ignored due to path filters (4)
config/crd/bases/automotive.sdv.cloud.redhat.com_imagebuilds.yamlis excluded by!config/crd/bases/**config/crd/bases/automotive.sdv.cloud.redhat.com_operatorconfigs.yamlis excluded by!config/crd/bases/**config/crd/bases/automotive.sdv.cloud.redhat.com_workspaces.yamlis excluded by!config/crd/bases/**config/rbac/role.yamlis excluded by!config/rbac/role.yaml
📒 Files selected for processing (34)
Containerfile.toolchainapi/v1alpha1/imagebuild_types.goapi/v1alpha1/operatorconfig_types.goapi/v1alpha1/workspace_types.goapi/v1alpha1/zz_generated.deepcopy.gobundle.Dockerfilecmd/caib/buildcmd/build.gocmd/caib/image/image.gocmd/caib/main.gocmd/caib/root.gocmd/caib/runtime_wiring.gocmd/caib/workspace/workspace.gocmd/main.goconfig/crd/kustomization.yamlconfig/rbac/scc_role.yamlconfig/rbac/scc_role_binding.yamlhack/container-wrapper.shhack/deploy-catalog.shhack/workspace-entrypoint.shinternal/buildapi/client/client.gointernal/buildapi/client/client_test.gointernal/buildapi/server.gointernal/buildapi/types.gointernal/buildapi/workspace.gointernal/buildapi/workspace_test.gointernal/common/tasks/scripts/build_image.shinternal/common/tasks/scripts/find_manifest.shinternal/common/tasks/scripts/push_artifact.shinternal/common/tasks/tasks.gointernal/controller/imagebuild/controller.gointernal/controller/operatorconfig/controller.gointernal/controller/operatorconfig/resources.gointernal/controller/workspace/controller.gointernal/controller/workspace/controller_test.go
✅ Files skipped from review due to trivial changes (11)
- config/crd/kustomization.yaml
- cmd/caib/main.go
- bundle.Dockerfile
- internal/common/tasks/scripts/push_artifact.sh
- hack/container-wrapper.sh
- cmd/caib/root.go
- internal/buildapi/types.go
- config/rbac/scc_role.yaml
- internal/common/tasks/tasks.go
- internal/buildapi/client/client_test.go
- api/v1alpha1/workspace_types.go
🚧 Files skipped from review as they are similar to previous changes (12)
- config/rbac/scc_role_binding.yaml
- api/v1alpha1/imagebuild_types.go
- internal/common/tasks/scripts/find_manifest.sh
- cmd/main.go
- hack/deploy-catalog.sh
- cmd/caib/image/image.go
- cmd/caib/buildcmd/build.go
- hack/workspace-entrypoint.sh
- internal/controller/operatorconfig/resources.go
- api/v1alpha1/operatorconfig_types.go
- internal/buildapi/server.go
- internal/controller/imagebuild/controller.go
There was a problem hiding this comment.
Actionable comments posted: 4
♻️ Duplicate comments (12)
internal/controller/operatorconfig/controller.go (1)
908-912:⚠️ Potential issue | 🟡 MinorHandle
isNoMatchErrorwhen deleting the workspace SCC on Kubernetes.The SCC deletion at line 910 checks for
errors.IsNotFound(err)but notisNoMatchError(err). On a non-OpenShift cluster, thesecuritycontextconstraintsresource type doesn't exist, and the delete will fail with a "no matches for kind" error rather than NotFound. This will cause cleanup to fail on Kubernetes.Suggested fix
scc := &securityv1.SecurityContextConstraints{} scc.Name = workspaceSCCName - if err := r.Delete(ctx, scc); err != nil && !errors.IsNotFound(err) { + if err := r.Delete(ctx, scc); err != nil && !errors.IsNotFound(err) && !isNoMatchError(err) { return fmt.Errorf("failed to delete workspace SCC: %w", err) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/controller/operatorconfig/controller.go` around lines 908 - 912, The deletion of the SecurityContextConstraints object (scc / workspaceSCCName) uses r.Delete and only treats errors.IsNotFound(err) as ignorable; update the error handling to also ignore "no matches for kind" errors by checking isNoMatchError(err) (or apierrors.IsNoMatchError(err)) in the same conditional so that a non-OpenShift Kubernetes cluster (where the SCC resource kind doesn't exist) does not cause cleanup to fail.cmd/caib/workspace/workspace.go (1)
611-617:⚠️ Potential issue | 🟠 MajorPreserve argv boundaries in
workspace exec.
strings.Join(cmdParts, " ")loses the original quoting and argument split. Commands with spaces or shell metacharacters do not round-trip correctly throughWorkspaceExecRequest.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cmd/caib/workspace/workspace.go` around lines 611 - 617, The code currently flattens command arguments with strings.Join(cmdParts, " ") losing quoting and argv boundaries; instead preserve the original argument slice (cmdParts) and pass that slice through WorkspaceExecRequest (do not build a single string named command) so the executor receives the exact argv array; update places that reference the joined "command" to use the []string (e.g., cmdParts / commandArgs) and ensure WorkspaceExecRequest and any downstream handling consume an argument slice rather than a single combined string.internal/controller/imagebuild/controller.go (3)
105-105:⚠️ Potential issue | 🟠 MajorGrant
patchonworkspaces.
updateWorkspaceLease()usesr.Patch(...)on Lines 1455-1457, but this RBAC rule only grantsget;update. Successful builds will still fail the lease write-back with a forbidden error.🔧 Suggested change
-// +kubebuilder:rbac:groups=automotive.sdv.cloud.redhat.com,resources=workspaces,verbs=get;update +// +kubebuilder:rbac:groups=automotive.sdv.cloud.redhat.com,resources=workspaces,verbs=get;update;patchRun this to confirm the local mismatch between the RBAC annotation and the
r.Patchcall:#!/bin/bash rg -n '\+kubebuilder:rbac:groups=automotive\.sdv\.cloud\.redhat\.com,resources=workspaces,verbs=' internal/controller/imagebuild/controller.go sed -n '1452,1458p' internal/controller/imagebuild/controller.go🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/controller/imagebuild/controller.go` at line 105, The RBAC annotation for workspaces is missing the "patch" verb while updateWorkspaceLease() calls r.Patch(...); update the kubebuilder RBAC marker that references group automotive.sdv.cloud.redhat.com and resource workspaces (the "+kubebuilder:rbac:...resources=workspaces,verbs=...") to include "patch" alongside existing verbs so the controller can perform r.Patch in updateWorkspaceLease without Forbidden errors.
789-792:⚠️ Potential issue | 🟠 MajorMirror this fallback in
getOrCreateWorkspacePVC().This fixes the
VolumeClaimTemplatebranch, but upload-backed builds still go throughgetOrCreateWorkspacePVC(), which only honorsimageBuild.Spec.StorageClass. Storage behavior will still differ whenever uploads are enabled.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/controller/imagebuild/controller.go` around lines 789 - 792, Update getOrCreateWorkspacePVC to mirror the same storageClass fallback as the VolumeClaimTemplate branch: when imageBuild.Spec.StorageClass is non-empty use that, otherwise fall back to operatorConfig.Spec.OSBuilds.StorageClass if present; adjust the storageClassName/pointer logic in getOrCreateWorkspacePVC (the same variable used in the VolumeClaimTemplate code path) so upload-backed builds honor the operatorConfig default storage class as well.
760-772:⚠️ Potential issue | 🟠 MajorKeep upload-backed builds on the upload PVC.
When
BuildCachePVCis set, this branch replacesshared-workspaceoutright, so the PVC populated bycreateUploadPod()is never mounted into the build. Workspace builds that still usesource_pathuploads will miss their uploaded content unless the cache is mounted separately or this combination is rejected.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/controller/imagebuild/controller.go` around lines 760 - 772, The current branch unconditionally replaces the "shared-workspace" binding when imageBuild.Spec.BuildCachePVC is set, which drops any upload PVC produced by createUploadPod() and loses uploaded source files; change the logic in the workspace-binding section so that if an upload-backed build (InputFilesServer/upload PVC created earlier by createUploadPod()) exists you do not overwrite that binding — instead either (A) keep the existing "shared-workspace" binding and add the cache PVC as a separate workspace binding (e.g., name it "build-cache") using imageBuild.Spec.BuildCachePVC, or (B) only use imageBuild.Spec.BuildCachePVC as "shared-workspace" when no upload PVC/UploadFiles binding exists; update the code around shared-workspace, createUploadPod(), and VolumeClaimTemplate handling accordingly.internal/buildapi/server.go (4)
1572-1590:⚠️ Potential issue | 🔴 CriticalAuthorize
--extra-repoagainstws.Spec.OwnerbeforepodExec.This path checks existence and phase, but never verifies workspace ownership. Any authenticated caller can point
workspace-name:/pathat someone else’s running workspace and have the API expose that content over HTTP. Thread the requester intoresolveExtraRepos()and reject non-owners before starting the server.Also applies to: 1595-1617
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/buildapi/server.go` around lines 1572 - 1590, resolveExtraRepos currently validates ExtraRepos format and workspace phase but does not verify the requester is the workspace owner; update resolveExtraRepos to accept the requester identity (e.g., caller user ID or principal) and for each workspace name parsed from req.ExtraRepos load the Workspace object (using the existing k8s client), compare workspace.Spec.Owner to the requester, and return an authorization error if they differ before proceeding to start the pod/HTTP server (podExec). Apply the same ownership check in the second occurrence handling the other ExtraRepos block (the duplicate at 1595-1617), and reference BuildRequest.ExtraRepos, resolveExtraRepos, Workspace.Spec.Owner and the podExec startup point when making the change.
1643-1647:⚠️ Potential issue | 🟠 MajorDon't swallow
OperatorConfigread failures here.Falling back to
nilon everyloadOperatorConfigFnerror bypasses workspace defaults/policy on transient API failures. OnlyNotFoundshould be ignored; other errors should fail the request.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/buildapi/server.go` around lines 1643 - 1647, The code currently ignores all errors from loadOperatorConfigFn by using the blank identifier; change it to capture the error (e.g., operatorConfig, err := loadOperatorConfigFn(ctx, k8sClient, namespace)) and only suppress NotFound errors using the Kubernetes API machinery check (apierrors.IsNotFound(err)); for any other err return or propagate the error from the handler, and only set wsConfig = operatorConfig.Spec.Workspaces when operatorConfig != nil. Reference: loadOperatorConfigFn, operatorConfig, wsConfig.
1612-1617:⚠️ Potential issue | 🟠 MajorDon't treat background
http.serverlaunches as success.Both branches fire
python3 -m http.server ... &on fixed ports and immediately inject URLs ifpodExecitself succeeds. Port conflicts or short-lived server exits still look like success here, which can leaveextra_repos/workspace_urlpointing at stale or dead content.Also applies to: 1689-1692
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/buildapi/server.go` around lines 1612 - 1617, The current code shells out with "python3 -m http.server %d &" via podExec (see variables cmd, podExec, wsName, workspaceContainerName, port) which treats a successful podExec as success even if the server immediately fails; change the command to start the server and verify readiness before returning (for example: start the server redirecting logs, wait briefly and probe localhost:%d with curl or wget inside the same pod, and fail the podExec if the probe fails, printing the server log), and apply the same change to the other launch site referenced (the other podExec invocation around the second block).
1707-1716:⚠️ Potential issue | 🟠 MajorUse a deterministic build-cache PVC name.
This list-then-
GenerateNameflow is TOCTOU. Two concurrent builds for the same workspace can both miss the existing claim and create separate caches, fragmenting warm state and leaving extra PVCs behind.Also applies to: 1736-1762
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/buildapi/server.go` around lines 1707 - 1716, The current list-then-GenerateName flow (using pvcList, k8sClient.List, buildCacheLabels and GenerateName) is TOCTOU and can create duplicate build-cache PVCs; instead derive a deterministic PVC name from the workspace/labels (e.g., hash of workspace ID + buildCacheLabels), then try k8sClient.Get for a corev1.PersistentVolumeClaim with that Name and namespace; if Get returns NotFound, create a PVC with that exact Name (not GenerateName) and if Create returns AlreadyExists, re-Get and return the existing PVC name—remove the list-and-scan logic and the GenerateName usage (also fix the similar code around the 1736-1762 block) so callers always use the deterministic name and handle create-race by retrying Get on AlreadyExists.internal/buildapi/workspace.go (1)
224-263:⚠️ Potential issue | 🟠 MajorSecret created before workspace uniqueness check can leave orphaned secrets.
The Jumpstarter client config secret is created (lines 226-247) before checking if the workspace CR already exists (lines 249-263). If the workspace already exists, the handler returns a conflict error but the secret remains. Move secret creation after the uniqueness check, or set an OwnerReference on the secret after successful workspace creation.
Suggested restructuring
+ // Check if workspace CR already exists first + existing := &automotivev1alpha1.Workspace{} + err = k8sClient.Get(c.Request.Context(), client.ObjectKey{Namespace: namespace, Name: req.Name}, existing) + if err == nil { + if existing.DeletionTimestamp != nil { + c.JSON(http.StatusConflict, gin.H{"error": fmt.Sprintf("workspace %q is still terminating, try again shortly", req.Name)}) + } else { + c.JSON(http.StatusConflict, gin.H{"error": fmt.Sprintf("workspace %q already exists", req.Name)}) + } + return + } + if !k8serrors.IsNotFound(err) { + c.JSON(http.StatusInternalServerError, gin.H{"error": "failed to check existing workspace"}) + return + } + // Create secret for Jumpstarter client config if provided var jmpClientSecret string if req.ClientConfig != "" { // ... secret creation ... } - - // Check if workspace CR already exists - existing := &automotivev1alpha1.Workspace{} - // ...,
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/buildapi/workspace.go` around lines 224 - 263, The secret for Jumpstarter client config (jmpClientSecret / variable set in the req.ClientConfig block that creates a corev1.Secret) is created before the uniqueness check of the Workspace CR (existing := &automotivev1alpha1.Workspace{} and k8sClient.Get), which can leave orphaned secrets on conflict; move the entire secret creation logic (the req.ClientConfig base64 decode and k8sClient.Create of the corev1.Secret) so it runs only after the workspace existence check succeeds (after verifying !k8serrors.IsNotFound(err)), or alternatively ensure the created secret gets an OwnerReference to the Workspace CR right after creating the Workspace so it is garbage-collected on failure—update error handling (k8serrors.IsAlreadyExists) accordingly.internal/buildapi/client/client.go (2)
957-960:⚠️ Potential issue | 🟠 MajorUse
websocket.DefaultDialerto preserve proxy and timeout settings.A zero-value
websocket.Dialer{}drops Gorilla's defaults:Proxy(set tohttp.ProxyFromEnvironment) andHandshakeTimeout(45 seconds). This can cause shell connections to hang when proxies are configured.Suggested fix
- dialer := websocket.Dialer{} - if t, ok := c.httpClient.Transport.(*http.Transport); ok && t.TLSClientConfig != nil { - dialer.TLSClientConfig = t.TLSClientConfig - } + dialer := *websocket.DefaultDialer + if t, ok := c.httpClient.Transport.(*http.Transport); ok { + dialer.TLSClientConfig = t.TLSClientConfig + if t.Proxy != nil { + dialer.Proxy = t.Proxy + } + },
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/buildapi/client/client.go` around lines 957 - 960, Replace the zero-value websocket.Dialer with a copy of gorilla's defaults so Proxy and HandshakeTimeout are preserved: initialize the dialer from websocket.DefaultDialer (e.g., copy *websocket.DefaultDialer into your local dialer variable), then if c.httpClient.Transport.(*http.Transport) has a non-nil TLSClientConfig, assign that value to dialer.TLSClientConfig (leave other DefaultDialer fields intact). Ensure you still reference the same checks around c.httpClient.Transport and only overwrite TLSClientConfig on the copied dialer.
771-794:⚠️ Potential issue | 🟠 MajorDouble-escaping of workspace names in URL paths.
All workspace methods that handle names use
url.PathEscape(name)before passing toresolve(), which assigns tourl.URL.Path. WhenURL.String()is called, percent signs get re-encoded (e.g.,my%20app→my%2520app). Useurl.JoinPath()instead to properly compose path segments:-endpoint := c.resolve(path.Join("/v1/workspaces", url.PathEscape(name))) +endpoint, err := url.JoinPath(c.baseURL.String(), "/v1/workspaces", name) +if err != nil { + return nil, err +}Apply this pattern to:
GetWorkspace,workspaceAction,DeleteWorkspace,SyncPlan,SyncWorkspace,workspaceStreamRequest, andShellWorkspace.,
Also applies to: 806-830, 832-852, 854-883, 885-906, 908-933, 947-948
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/buildapi/client/client.go` around lines 771 - 794, The code double-escapes workspace names by calling url.PathEscape(name) before composing the request path; replace composing logic to use url.JoinPath(...) (or url.URL.JoinPath) to join "/v1/workspaces" and the raw name so percent-encoding is applied once. Concretely, in GetWorkspace (and similarly in workspaceAction, DeleteWorkspace, SyncPlan, SyncWorkspace, workspaceStreamRequest, ShellWorkspace) remove url.PathEscape(name) and instead build the endpoint with url.JoinPath("/v1/workspaces", name) (or use the URL receiver variant), handle any returned error, then pass that joined path to c.resolve(...) so names like "my app" become "%20" only once and are not double-encoded.
🧹 Nitpick comments (2)
api/v1alpha1/zz_generated.deepcopy.go (1)
1159-1163: Generated file should not be committed per repository conventions.Per retrieved learnings,
api/v1alpha1/zz_generated.deepcopy.gois regenerated at deploy/build time viamake generate manifestsand should not be included in PRs. Consider removing this file from the commit and adding it to.gitignoreif not already present.The generated code itself appears correct - it properly deep-copies the new
Workspace,WorkspaceList,WorkspaceSpec,WorkspaceStatus, andWorkspacesConfigtypes with appropriate handling for reference types (pointers, maps, slices).Based on learnings: "Generated files that are not committed should be ignored in PRs. Specifically, do not include api/v1alpha1/zz_generated.deepcopy.go..."
Also applies to: 1264-1403
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@api/v1alpha1/zz_generated.deepcopy.go` around lines 1159 - 1163, The generated deepcopy file api/v1alpha1/zz_generated.deepcopy.go should not be committed; remove this file from the PR commit and update repository ignore rules so it isn’t tracked (e.g., add api/v1alpha1/zz_generated.deepcopy.go or a pattern for generated files to .gitignore), then regenerate as part of CI/build with make generate manifests; you can leave the generated DeepCopyInto/DeepCopy functions in your working tree locally but ensure only source/type definitions are committed and the generated file is excluded from version control.internal/buildapi/client/client.go (1)
732-732: Inconsistent response body close error handling.The new workspace methods silently discard
resp.Body.Close()errors with_ = resp.Body.Close(), while existing methods (e.g.,CreateBuild,GetBuild) log warnings. Consider aligning with the existing pattern for consistency.Also applies to: 758-758, 784-784, 820-820, 846-846, 873-873, 900-900
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/buildapi/client/client.go` at line 732, The new workspace methods currently discard resp.Body.Close() errors (defer func() { _ = resp.Body.Close() }()), causing inconsistency with existing methods like CreateBuild and GetBuild; update each occurrence (the defer at lines shown) to mirror the existing pattern: replace the discard with a defer that checks the error and logs a warning (e.g., defer func() { if err := resp.Body.Close(); err != nil { c.logger.Warnf("failed to close response body: %v", err) } }()), and apply this change to all listed occurrences so all response body closes are logged consistently.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@cmd/caib/workspace/workspace.go`:
- Around line 275-280: If a user supplied an explicit --client file, fail fast
when ResolveJumpstarterClient cannot resolve it: after calling
caibcommon.ResolveJumpstarterClient(strings.TrimSpace(clientConfigFile)) check
for err and if clientConfigFile (trimmed) is non-empty, print a clear error
(including err) and exit with non-zero (or return the error) instead of silently
continuing; keep the existing successful-path behavior that sets clientConfigB64
from clientInfo.Data when err == nil. Reference ResolveJumpstarterClient,
clientConfigFile, clientInfo, clientConfigB64.
- Around line 523-539: computeManifest currently opens every path as a regular
file which drops dangling symlinks and hashes symlink targets instead of link
values; change computeManifest to call os.Lstat(absPath) and check
fileInfo.Mode()&os.ModeSymlink: if it's a symlink, call os.Readlink(absPath) and
hash the returned link target string (include dangling links), otherwise open
and io.Copy the file contents as before; keep the same manifest[relPath] =
hex.EncodeToString(h.Sum(nil)) behavior and preserve error
handling/continutation semantics so it stays consistent with tarTrackedFiles'
symlink-preserving behavior.
In `@internal/buildapi/workspace.go`:
- Around line 491-498: The loop that builds the hashing shell script iterates
req.Files and writes unchecked paths into scriptBuf (via scriptBuf.WriteString
and shellQuote), allowing path traversal like ../../etc/passwd; fix by
validating each path before using it: ensure it is not absolute (reject if
filepath.IsAbs), canonicalize with filepath.Clean and reject if it begins with
".." or otherwise escapes the workspace root, and return a 400 error for invalid
paths so only safe relative paths are included in the sha256sum lines.
- Around line 764-768: The echo line that writes "Deploying %s -> %s" uses
unquoted a.Src and a.Dest and can break the generated script if those paths
contain quotes or special chars; update the loop that builds rsyncCmds (the for
_, a := range req.Artifacts block) to pass quoted/escaped values (reuse
shellQuote(a.Src) and shellQuote(a.Dest) or an equivalent escaping helper) into
the fmt.Fprintf that emits the echo so the printed message is safe, while
leaving the existing rsync fmt.Fprintf as-is.
---
Duplicate comments:
In `@cmd/caib/workspace/workspace.go`:
- Around line 611-617: The code currently flattens command arguments with
strings.Join(cmdParts, " ") losing quoting and argv boundaries; instead preserve
the original argument slice (cmdParts) and pass that slice through
WorkspaceExecRequest (do not build a single string named command) so the
executor receives the exact argv array; update places that reference the joined
"command" to use the []string (e.g., cmdParts / commandArgs) and ensure
WorkspaceExecRequest and any downstream handling consume an argument slice
rather than a single combined string.
In `@internal/buildapi/client/client.go`:
- Around line 957-960: Replace the zero-value websocket.Dialer with a copy of
gorilla's defaults so Proxy and HandshakeTimeout are preserved: initialize the
dialer from websocket.DefaultDialer (e.g., copy *websocket.DefaultDialer into
your local dialer variable), then if c.httpClient.Transport.(*http.Transport)
has a non-nil TLSClientConfig, assign that value to dialer.TLSClientConfig
(leave other DefaultDialer fields intact). Ensure you still reference the same
checks around c.httpClient.Transport and only overwrite TLSClientConfig on the
copied dialer.
- Around line 771-794: The code double-escapes workspace names by calling
url.PathEscape(name) before composing the request path; replace composing logic
to use url.JoinPath(...) (or url.URL.JoinPath) to join "/v1/workspaces" and the
raw name so percent-encoding is applied once. Concretely, in GetWorkspace (and
similarly in workspaceAction, DeleteWorkspace, SyncPlan, SyncWorkspace,
workspaceStreamRequest, ShellWorkspace) remove url.PathEscape(name) and instead
build the endpoint with url.JoinPath("/v1/workspaces", name) (or use the URL
receiver variant), handle any returned error, then pass that joined path to
c.resolve(...) so names like "my app" become "%20" only once and are not
double-encoded.
In `@internal/buildapi/server.go`:
- Around line 1572-1590: resolveExtraRepos currently validates ExtraRepos format
and workspace phase but does not verify the requester is the workspace owner;
update resolveExtraRepos to accept the requester identity (e.g., caller user ID
or principal) and for each workspace name parsed from req.ExtraRepos load the
Workspace object (using the existing k8s client), compare workspace.Spec.Owner
to the requester, and return an authorization error if they differ before
proceeding to start the pod/HTTP server (podExec). Apply the same ownership
check in the second occurrence handling the other ExtraRepos block (the
duplicate at 1595-1617), and reference BuildRequest.ExtraRepos,
resolveExtraRepos, Workspace.Spec.Owner and the podExec startup point when
making the change.
- Around line 1643-1647: The code currently ignores all errors from
loadOperatorConfigFn by using the blank identifier; change it to capture the
error (e.g., operatorConfig, err := loadOperatorConfigFn(ctx, k8sClient,
namespace)) and only suppress NotFound errors using the Kubernetes API machinery
check (apierrors.IsNotFound(err)); for any other err return or propagate the
error from the handler, and only set wsConfig = operatorConfig.Spec.Workspaces
when operatorConfig != nil. Reference: loadOperatorConfigFn, operatorConfig,
wsConfig.
- Around line 1612-1617: The current code shells out with "python3 -m
http.server %d &" via podExec (see variables cmd, podExec, wsName,
workspaceContainerName, port) which treats a successful podExec as success even
if the server immediately fails; change the command to start the server and
verify readiness before returning (for example: start the server redirecting
logs, wait briefly and probe localhost:%d with curl or wget inside the same pod,
and fail the podExec if the probe fails, printing the server log), and apply the
same change to the other launch site referenced (the other podExec invocation
around the second block).
- Around line 1707-1716: The current list-then-GenerateName flow (using pvcList,
k8sClient.List, buildCacheLabels and GenerateName) is TOCTOU and can create
duplicate build-cache PVCs; instead derive a deterministic PVC name from the
workspace/labels (e.g., hash of workspace ID + buildCacheLabels), then try
k8sClient.Get for a corev1.PersistentVolumeClaim with that Name and namespace;
if Get returns NotFound, create a PVC with that exact Name (not GenerateName)
and if Create returns AlreadyExists, re-Get and return the existing PVC
name—remove the list-and-scan logic and the GenerateName usage (also fix the
similar code around the 1736-1762 block) so callers always use the deterministic
name and handle create-race by retrying Get on AlreadyExists.
In `@internal/buildapi/workspace.go`:
- Around line 224-263: The secret for Jumpstarter client config (jmpClientSecret
/ variable set in the req.ClientConfig block that creates a corev1.Secret) is
created before the uniqueness check of the Workspace CR (existing :=
&automotivev1alpha1.Workspace{} and k8sClient.Get), which can leave orphaned
secrets on conflict; move the entire secret creation logic (the req.ClientConfig
base64 decode and k8sClient.Create of the corev1.Secret) so it runs only after
the workspace existence check succeeds (after verifying
!k8serrors.IsNotFound(err)), or alternatively ensure the created secret gets an
OwnerReference to the Workspace CR right after creating the Workspace so it is
garbage-collected on failure—update error handling (k8serrors.IsAlreadyExists)
accordingly.
In `@internal/controller/imagebuild/controller.go`:
- Line 105: The RBAC annotation for workspaces is missing the "patch" verb while
updateWorkspaceLease() calls r.Patch(...); update the kubebuilder RBAC marker
that references group automotive.sdv.cloud.redhat.com and resource workspaces
(the "+kubebuilder:rbac:...resources=workspaces,verbs=...") to include "patch"
alongside existing verbs so the controller can perform r.Patch in
updateWorkspaceLease without Forbidden errors.
- Around line 789-792: Update getOrCreateWorkspacePVC to mirror the same
storageClass fallback as the VolumeClaimTemplate branch: when
imageBuild.Spec.StorageClass is non-empty use that, otherwise fall back to
operatorConfig.Spec.OSBuilds.StorageClass if present; adjust the
storageClassName/pointer logic in getOrCreateWorkspacePVC (the same variable
used in the VolumeClaimTemplate code path) so upload-backed builds honor the
operatorConfig default storage class as well.
- Around line 760-772: The current branch unconditionally replaces the
"shared-workspace" binding when imageBuild.Spec.BuildCachePVC is set, which
drops any upload PVC produced by createUploadPod() and loses uploaded source
files; change the logic in the workspace-binding section so that if an
upload-backed build (InputFilesServer/upload PVC created earlier by
createUploadPod()) exists you do not overwrite that binding — instead either (A)
keep the existing "shared-workspace" binding and add the cache PVC as a separate
workspace binding (e.g., name it "build-cache") using
imageBuild.Spec.BuildCachePVC, or (B) only use imageBuild.Spec.BuildCachePVC as
"shared-workspace" when no upload PVC/UploadFiles binding exists; update the
code around shared-workspace, createUploadPod(), and VolumeClaimTemplate
handling accordingly.
In `@internal/controller/operatorconfig/controller.go`:
- Around line 908-912: The deletion of the SecurityContextConstraints object
(scc / workspaceSCCName) uses r.Delete and only treats errors.IsNotFound(err) as
ignorable; update the error handling to also ignore "no matches for kind" errors
by checking isNoMatchError(err) (or apierrors.IsNoMatchError(err)) in the same
conditional so that a non-OpenShift Kubernetes cluster (where the SCC resource
kind doesn't exist) does not cause cleanup to fail.
---
Nitpick comments:
In `@api/v1alpha1/zz_generated.deepcopy.go`:
- Around line 1159-1163: The generated deepcopy file
api/v1alpha1/zz_generated.deepcopy.go should not be committed; remove this file
from the PR commit and update repository ignore rules so it isn’t tracked (e.g.,
add api/v1alpha1/zz_generated.deepcopy.go or a pattern for generated files to
.gitignore), then regenerate as part of CI/build with make generate manifests;
you can leave the generated DeepCopyInto/DeepCopy functions in your working tree
locally but ensure only source/type definitions are committed and the generated
file is excluded from version control.
In `@internal/buildapi/client/client.go`:
- Line 732: The new workspace methods currently discard resp.Body.Close() errors
(defer func() { _ = resp.Body.Close() }()), causing inconsistency with existing
methods like CreateBuild and GetBuild; update each occurrence (the defer at
lines shown) to mirror the existing pattern: replace the discard with a defer
that checks the error and logs a warning (e.g., defer func() { if err :=
resp.Body.Close(); err != nil { c.logger.Warnf("failed to close response body:
%v", err) } }()), and apply this change to all listed occurrences so all
response body closes are logged consistently.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: c40d4201-0728-4063-8522-fbfa3a7bbe36
⛔ Files ignored due to path filters (4)
config/crd/bases/automotive.sdv.cloud.redhat.com_imagebuilds.yamlis excluded by!config/crd/bases/**config/crd/bases/automotive.sdv.cloud.redhat.com_operatorconfigs.yamlis excluded by!config/crd/bases/**config/crd/bases/automotive.sdv.cloud.redhat.com_workspaces.yamlis excluded by!config/crd/bases/**config/rbac/role.yamlis excluded by!config/rbac/role.yaml
📒 Files selected for processing (34)
Containerfile.toolchainapi/v1alpha1/imagebuild_types.goapi/v1alpha1/operatorconfig_types.goapi/v1alpha1/workspace_types.goapi/v1alpha1/zz_generated.deepcopy.gobundle.Dockerfilecmd/caib/buildcmd/build.gocmd/caib/image/image.gocmd/caib/main.gocmd/caib/root.gocmd/caib/runtime_wiring.gocmd/caib/workspace/workspace.gocmd/main.goconfig/crd/kustomization.yamlconfig/rbac/scc_role.yamlconfig/rbac/scc_role_binding.yamlhack/container-wrapper.shhack/deploy-catalog.shhack/workspace-entrypoint.shinternal/buildapi/client/client.gointernal/buildapi/client/client_test.gointernal/buildapi/server.gointernal/buildapi/types.gointernal/buildapi/workspace.gointernal/buildapi/workspace_test.gointernal/common/tasks/scripts/build_image.shinternal/common/tasks/scripts/find_manifest.shinternal/common/tasks/scripts/push_artifact.shinternal/common/tasks/tasks.gointernal/controller/imagebuild/controller.gointernal/controller/operatorconfig/controller.gointernal/controller/operatorconfig/resources.gointernal/controller/workspace/controller.gointernal/controller/workspace/controller_test.go
✅ Files skipped from review due to trivial changes (10)
- bundle.Dockerfile
- cmd/caib/main.go
- config/crd/kustomization.yaml
- internal/common/tasks/scripts/push_artifact.sh
- cmd/caib/root.go
- hack/container-wrapper.sh
- config/rbac/scc_role.yaml
- internal/buildapi/workspace_test.go
- internal/controller/workspace/controller.go
- api/v1alpha1/workspace_types.go
🚧 Files skipped from review as they are similar to previous changes (11)
- cmd/main.go
- api/v1alpha1/imagebuild_types.go
- hack/deploy-catalog.sh
- config/rbac/scc_role_binding.yaml
- cmd/caib/image/image.go
- cmd/caib/buildcmd/build.go
- internal/controller/workspace/controller_test.go
- internal/common/tasks/tasks.go
- internal/common/tasks/scripts/find_manifest.sh
- api/v1alpha1/operatorconfig_types.go
- internal/controller/operatorconfig/resources.go
| var clientConfigB64 string | ||
| clientInfo, err := caibcommon.ResolveJumpstarterClient(strings.TrimSpace(clientConfigFile)) | ||
| if err == nil { | ||
| fmt.Printf("Using Jumpstarter client %q (endpoint: %s)\n", clientInfo.Name, clientInfo.Endpoint) | ||
| clientConfigB64 = base64.StdEncoding.EncodeToString(clientInfo.Data) | ||
| } |
There was a problem hiding this comment.
Fail fast when an explicit --client file can't be resolved.
If the user passes --client and ResolveJumpstarterClient fails, this silently creates the workspace without any Jumpstarter config. That pushes the real error into later deploy/lease operations instead of failing at the flag that caused it.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@cmd/caib/workspace/workspace.go` around lines 275 - 280, If a user supplied
an explicit --client file, fail fast when ResolveJumpstarterClient cannot
resolve it: after calling
caibcommon.ResolveJumpstarterClient(strings.TrimSpace(clientConfigFile)) check
for err and if clientConfigFile (trimmed) is non-empty, print a clear error
(including err) and exit with non-zero (or return the error) instead of silently
continuing; keep the existing successful-path behavior that sets clientConfigB64
from clientInfo.Data when err == nil. Reference ResolveJumpstarterClient,
clientConfigFile, clientInfo, clientConfigB64.
| func computeManifest(baseDir string, files []string) map[string]string { | ||
| manifest := make(map[string]string, len(files)) | ||
| for _, relPath := range files { | ||
| absPath := filepath.Join(baseDir, relPath) | ||
| f, err := os.Open(absPath) | ||
| if err != nil { | ||
| continue // file may have been deleted since ls-files | ||
| } | ||
| h := sha256.New() | ||
| _, err = io.Copy(h, f) | ||
| _ = f.Close() | ||
| if err != nil { | ||
| continue | ||
| } | ||
| manifest[relPath] = hex.EncodeToString(h.Sum(nil)) | ||
| } | ||
| return manifest |
There was a problem hiding this comment.
Keep the delta-sync manifest consistent with symlink uploads.
tarTrackedFiles() now preserves symlinks, but computeManifest() still opens paths as regular files. Dangling symlinks get dropped from the manifest, and non-dangling ones hash the target contents instead of the link target, so incremental sync can miss symlink-only changes.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@cmd/caib/workspace/workspace.go` around lines 523 - 539, computeManifest
currently opens every path as a regular file which drops dangling symlinks and
hashes symlink targets instead of link values; change computeManifest to call
os.Lstat(absPath) and check fileInfo.Mode()&os.ModeSymlink: if it's a symlink,
call os.Readlink(absPath) and hash the returned link target string (include
dangling links), otherwise open and io.Copy the file contents as before; keep
the same manifest[relPath] = hex.EncodeToString(h.Sum(nil)) behavior and
preserve error handling/continutation semantics so it stays consistent with
tarTrackedFiles' symlink-preserving behavior.
| // Build a shell script that hashes only the files the client cares about. | ||
| // This avoids scanning build artifacts or other untracked content. | ||
| var scriptBuf strings.Builder | ||
| scriptBuf.WriteString("cd /workspace/src\n") | ||
| for path := range req.Files { | ||
| // Only hash regular files that exist; skip missing ones silently | ||
| scriptBuf.WriteString(fmt.Sprintf("[ -f %s ] && sha256sum %s\n", shellQuote(path), shellQuote(path))) | ||
| } |
There was a problem hiding this comment.
Path traversal allows probing files outside /workspace/src.
The sync plan handler hashes files based on client-provided paths without sanitization. A client can send paths like ../../etc/passwd to probe for file existence outside the workspace directory. Consider validating that paths don't escape the workspace root:
for path := range req.Files {
if strings.Contains(path, "..") || strings.HasPrefix(path, "/") {
c.JSON(http.StatusBadRequest, gin.H{"error": fmt.Sprintf("invalid path %q: must be relative without ..", path)})
return
}
// ...
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@internal/buildapi/workspace.go` around lines 491 - 498, The loop that builds
the hashing shell script iterates req.Files and writes unchecked paths into
scriptBuf (via scriptBuf.WriteString and shellQuote), allowing path traversal
like ../../etc/passwd; fix by validating each path before using it: ensure it is
not absolute (reject if filepath.IsAbs), canonicalize with filepath.Clean and
reject if it begins with ".." or otherwise escapes the workspace root, and
return a 400 error for invalid paths so only safe relative paths are included in
the sha256sum lines.
| for _, a := range req.Artifacts { | ||
| fmt.Fprintf(&rsyncCmds, "echo \"Deploying %s -> %s\"\n", a.Src, a.Dest) | ||
| fmt.Fprintf(&rsyncCmds, "rsync -avz --chmod=+x -e \"ssh -p $LOCAL_PORT $SSH_OPTS $SSH_KEY\" %s root@127.0.0.1:%s\n", | ||
| shellQuote(a.Src), shellQuote(a.Dest)) | ||
| } |
There was a problem hiding this comment.
Unquoted variables in echo can break on special characters.
The echo statement interpolates a.Src and a.Dest without quoting, which could break script syntax if paths contain ", $, or backticks. While the actual rsync commands correctly use shellQuote, the display output is vulnerable.
Suggested fix
for _, a := range req.Artifacts {
- fmt.Fprintf(&rsyncCmds, "echo \"Deploying %s -> %s\"\n", a.Src, a.Dest)
+ fmt.Fprintf(&rsyncCmds, "echo 'Deploying' %s '->' %s\n", shellQuote(a.Src), shellQuote(a.Dest))
fmt.Fprintf(&rsyncCmds, "rsync -avz --chmod=+x -e \"ssh -p $LOCAL_PORT $SSH_OPTS $SSH_KEY\" %s root@127.0.0.1:%s\n",
shellQuote(a.Src), shellQuote(a.Dest))
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@internal/buildapi/workspace.go` around lines 764 - 768, The echo line that
writes "Deploying %s -> %s" uses unquoted a.Src and a.Dest and can break the
generated script if those paths contain quotes or special chars; update the loop
that builds rsyncCmds (the for _, a := range req.Artifacts block) to pass
quoted/escaped values (reuse shellQuote(a.Src) and shellQuote(a.Dest) or an
equivalent escaping helper) into the fmt.Fprintf that emits the echo so the
printed message is safe, while leaving the existing rsync fmt.Fprintf as-is.
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (6)
internal/buildapi/workspace.go (3)
801-805:⚠️ Potential issue | 🟡 MinorQuote the status
echoline too.The
rsynccommand already usesshellQuote, but the progressechostill interpolates rawa.Srcanda.Dest. A path containing",$, or backticks can break the generated script before the copy starts.Proposed fix
- fmt.Fprintf(&rsyncCmds, "echo \"Deploying %s -> %s\"\n", a.Src, a.Dest) + fmt.Fprintf(&rsyncCmds, "echo 'Deploying' %s '->' %s\n", shellQuote(a.Src), shellQuote(a.Dest))🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/buildapi/workspace.go` around lines 801 - 805, The echo status line in the loop that builds rsync commands uses unquoted a.Src and a.Dest and can break scripts for paths containing quotes or special chars; update the loop that writes into rsyncCmds (the for _, a := range req.Artifacts block) to pass the same shell-quoted values used for the rsync call (i.e., call shellQuote(a.Src) and shellQuote(a.Dest) when constructing the echo line) so both the human-readable echo and the rsync arguments are safely quoted.
481-495:⚠️ Potential issue | 🟠 MajorAvoid holding the whole sync archive in RAM.
The 512 MiB cap prevents unbounded growth, but
io.ReadAllstill allocates up to 512 MiB per request and defeats streaming. A couple of large syncs can still OOM the API pod; if the SPDY path needs EOF-stable input, spool to a temp file or another bounded stream instead.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/buildapi/workspace.go` around lines 481 - 495, The current code reads the entire upload into memory with io.ReadAll up to maxSyncSize, which can still OOM; change to stream the request to a disk-backed temporary file instead: create a temp file (os.CreateTemp), copy from c.Request.Body into it using io.CopyN or io.LimitReader to enforce maxSyncSize+1, check the written size (os.Stat or returned bytes) and return HTTP 413 if exceeded, then open the temp file and pass an io.Reader (not a byte slice) to copyToPod (callsite referencing copyToPod, workspaceContainerName, maxSyncSize), and ensure the temp file is removed (defer os.Remove) after use so EOF-stable input is provided without holding the archive in RAM.
225-245:⚠️ Potential issue | 🟠 MajorDon’t persist the Jumpstarter secret before the workspace create is known-good.
A duplicate-name conflict or a later
k8sClient.Create(ws)failure leaves sensitive client config orphaned, and treatingAlreadyExistsas success can silently bind stale secret data to a future workspace with the same name. Do the existence checks first, delete the secret again if workspace creation fails, and owner-reference it once theWorkspaceexists.Also applies to: 250-264, 298-299
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/buildapi/workspace.go` around lines 225 - 245, The code currently creates the Jumpstarter secret (jmpClientSecret) before creating the Workspace and treats k8sClient.Create errors with IsAlreadyExists as success; change this to first check for an existing secret (using k8sClient.Get) and fail if one already exists for that name, defer creating the secret until after Workspace creation succeeds (or if you must create it early, ensure you delete it on Workspace creation failure), and once the Workspace is created set an ownerReference on the secret to the created Workspace and update it via k8sClient.Update; update all occurrences that create or treat AlreadyExists as success (references: jmpClientSecret variable, k8sClient.Create call, Workspace creation flow) so no orphaned or stale secret can be bound to a later workspace.internal/buildapi/client/client.go (1)
980-983:⚠️ Potential issue | 🟠 MajorStart from
websocket.DefaultDialerhere.A zero-value dialer drops Gorilla's default proxy resolution and handshake timeout, so shell sessions can hang indefinitely or ignore
HTTP_PROXY/HTTPS_PROXY. Clonewebsocket.DefaultDialerfirst and then overlay the transport settings you want to preserve.Proposed fix
- dialer := websocket.Dialer{} - if t, ok := c.httpClient.Transport.(*http.Transport); ok && t.TLSClientConfig != nil { - dialer.TLSClientConfig = t.TLSClientConfig - } + dialer := *websocket.DefaultDialer + if t, ok := c.httpClient.Transport.(*http.Transport); ok { + dialer.TLSClientConfig = t.TLSClientConfig + dialer.Proxy = t.Proxy + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/buildapi/client/client.go` around lines 980 - 983, Replace the zero-value websocket.Dialer initialization with a clone of websocket.DefaultDialer so Gorilla's default proxy resolution and handshake timeout are preserved: start by assigning dialer := *websocket.DefaultDialer, then if c.httpClient.Transport is an *http.Transport and t.TLSClientConfig != nil set dialer.TLSClientConfig = t.TLSClientConfig (leave other DefaultDialer fields untouched). This keeps existing behavior for proxy and handshake timeout while applying the transport TLSClientConfig used in the current code path.Containerfile.toolchain (2)
57-57:⚠️ Potential issue | 🟡 MinorClean DNF metadata in the EPEL layer.
This is the only install step that leaves cache behind, so the final image keeps an avoidable layer of RPM metadata.
Proposed fix
-RUN dnf install -y https://dl.fedoraproject.org/pub/epel/epel-release-latest-10.noarch.rpm +RUN dnf install -y https://dl.fedoraproject.org/pub/epel/epel-release-latest-10.noarch.rpm && dnf clean all🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Containerfile.toolchain` at line 57, The RUN dnf install -y https://dl.fedoraproject.org/pub/epel/epel-release-latest-10.noarch.rpm layer leaves RPM metadata in the image; modify that RUN to clean DNF caches in the same instruction (e.g., append && dnf clean all && rm -rf /var/cache/dnf /var/cache/yum) so metadata is removed before the layer is committed, ensuring no leftover cache remains from the epel install.
10-10:⚠️ Potential issue | 🟠 MajorEnable RPM verification in the bootstrap stage.
gpgcheck=0,--nogpgcheck, and the best-effort key import after the install mean this stage never authenticates the RPMs it pulls. Import the repo key beforednf install, turngpgcheckback on, and drop the verification bypasses.Also applies to: 22-38
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Containerfile.toolchain` at line 10, The bootstrap stage currently disables RPM verification (ARG GPG_KEY="RPM-GPG-KEY-centosofficial", gpgcheck=0, and --nogpgcheck) and imports the key only after installs; change the Dockerfile so you import the repo GPG key referenced by ARG GPG_KEY before any dnf install, remove gpgcheck=0 and all --nogpgcheck flags, and ensure gpgcheck is enabled (gpgcheck=1) for repo definitions and yum/dnf calls; update the key import step to use rpm --import or dnf config-manager --import pointing at ${GPG_KEY} so installs verify RPM signatures.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@cmd/caib/buildcmd/logs.go`:
- Around line 118-123: The lease writeback currently calls
api.SetWorkspaceLease(ctx, *h.opts.Workspace, leaseID) using the parent ctx
which can hang; wrap that call in a short timeout context (e.g., ctxWithTimeout,
cancel := context.WithTimeout(ctx, <shortDuration>); defer cancel()) and pass
ctxWithTimeout to api.SetWorkspaceLease so the PUT cannot block indefinitely;
reference h.opts.Workspace, leaseID and api.SetWorkspaceLease when applying the
change.
In `@internal/buildapi/workspace.go`:
- Around line 683-715: The websocket writes are performed concurrently by the
stdout pump goroutine (reading from stdoutR and calling wsConn.WriteMessage) and
the main goroutine that sends the close frame after executor.StreamWithContext
returns; protect and serialize writes to avoid gorilla/websocket panics by
funneling all writes through a single writer path or a mutex: either create a
dedicated writer goroutine that reads write requests (binary frames and the
final close message) from a channel and calls wsConn.WriteMessage, or add a
sync.Mutex (e.g., writeMu) and lock around every wsConn.WriteMessage call in the
stdout pump and the close-send logic; also ensure the stdout pump goroutine
exits (or is signaled) before sending the close frame (use a done channel or
waitgroup) so the final close is not raced with remaining binary writes
(references: stdoutR, wsConn.WriteMessage, executor.StreamWithContext,
stdoutW.Close, websocket.FormatCloseMessage).
---
Duplicate comments:
In `@Containerfile.toolchain`:
- Line 57: The RUN dnf install -y
https://dl.fedoraproject.org/pub/epel/epel-release-latest-10.noarch.rpm layer
leaves RPM metadata in the image; modify that RUN to clean DNF caches in the
same instruction (e.g., append && dnf clean all && rm -rf /var/cache/dnf
/var/cache/yum) so metadata is removed before the layer is committed, ensuring
no leftover cache remains from the epel install.
- Line 10: The bootstrap stage currently disables RPM verification (ARG
GPG_KEY="RPM-GPG-KEY-centosofficial", gpgcheck=0, and --nogpgcheck) and imports
the key only after installs; change the Dockerfile so you import the repo GPG
key referenced by ARG GPG_KEY before any dnf install, remove gpgcheck=0 and all
--nogpgcheck flags, and ensure gpgcheck is enabled (gpgcheck=1) for repo
definitions and yum/dnf calls; update the key import step to use rpm --import or
dnf config-manager --import pointing at ${GPG_KEY} so installs verify RPM
signatures.
In `@internal/buildapi/client/client.go`:
- Around line 980-983: Replace the zero-value websocket.Dialer initialization
with a clone of websocket.DefaultDialer so Gorilla's default proxy resolution
and handshake timeout are preserved: start by assigning dialer :=
*websocket.DefaultDialer, then if c.httpClient.Transport is an *http.Transport
and t.TLSClientConfig != nil set dialer.TLSClientConfig = t.TLSClientConfig
(leave other DefaultDialer fields untouched). This keeps existing behavior for
proxy and handshake timeout while applying the transport TLSClientConfig used in
the current code path.
In `@internal/buildapi/workspace.go`:
- Around line 801-805: The echo status line in the loop that builds rsync
commands uses unquoted a.Src and a.Dest and can break scripts for paths
containing quotes or special chars; update the loop that writes into rsyncCmds
(the for _, a := range req.Artifacts block) to pass the same shell-quoted values
used for the rsync call (i.e., call shellQuote(a.Src) and shellQuote(a.Dest)
when constructing the echo line) so both the human-readable echo and the rsync
arguments are safely quoted.
- Around line 481-495: The current code reads the entire upload into memory with
io.ReadAll up to maxSyncSize, which can still OOM; change to stream the request
to a disk-backed temporary file instead: create a temp file (os.CreateTemp),
copy from c.Request.Body into it using io.CopyN or io.LimitReader to enforce
maxSyncSize+1, check the written size (os.Stat or returned bytes) and return
HTTP 413 if exceeded, then open the temp file and pass an io.Reader (not a byte
slice) to copyToPod (callsite referencing copyToPod, workspaceContainerName,
maxSyncSize), and ensure the temp file is removed (defer os.Remove) after use so
EOF-stable input is provided without holding the archive in RAM.
- Around line 225-245: The code currently creates the Jumpstarter secret
(jmpClientSecret) before creating the Workspace and treats k8sClient.Create
errors with IsAlreadyExists as success; change this to first check for an
existing secret (using k8sClient.Get) and fail if one already exists for that
name, defer creating the secret until after Workspace creation succeeds (or if
you must create it early, ensure you delete it on Workspace creation failure),
and once the Workspace is created set an ownerReference on the secret to the
created Workspace and update it via k8sClient.Update; update all occurrences
that create or treat AlreadyExists as success (references: jmpClientSecret
variable, k8sClient.Create call, Workspace creation flow) so no orphaned or
stale secret can be bound to a later workspace.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: a450fbe7-4fc2-4384-95e2-73d34b30dee2
📒 Files selected for processing (5)
Containerfile.toolchaincmd/caib/buildcmd/logs.gohack/workspace-entrypoint.shinternal/buildapi/client/client.gointernal/buildapi/workspace.go
🚧 Files skipped from review as they are similar to previous changes (1)
- hack/workspace-entrypoint.sh
- Operator-managed ServiceAccount and SCC binding Signed-off-by: Benny Zlotnik <bzlotnik@redhat.com> Assisted-by: claude-opus-4.6
Signed-off-by: Benny Zlotnik <bzlotnik@redhat.com> Assisted-by: claude-opus-4.6
Signed-off-by: Benny Zlotnik <bzlotnik@redhat.com> Assisted-by: claude-opus-4.6
Signed-off-by: Benny Zlotnik <bzlotnik@redhat.com> Assisted-by: claude-opus-4.6
Signed-off-by: Benny Zlotnik <bzlotnik@redhat.com> Assisted-by: claude-opus-4.6
Signed-off-by: Benny Zlotnik <bzlotnik@redhat.com> Assisted-by: claude-opus-4.6
Signed-off-by: Benny Zlotnik <bzlotnik@redhat.com> Assisted-by: claude-opus-4.6
Signed-off-by: Benny Zlotnik <bzlotnik@redhat.com> Assisted-by: claude-opus-4.6
Signed-off-by: Benny Zlotnik <bzlotnik@redhat.com> Assisted-by: claude-opus-4.6
Signed-off-by: Benny Zlotnik <bzlotnik@redhat.com> Assisted-by: claude-opus-4.6
Signed-off-by: Benny Zlotnik <bzlotnik@redhat.com> Assisted-by: claude-opus-4.6
Signed-off-by: Benny Zlotnik <bzlotnik@redhat.com> Assisted-by: claude-opus-4.6
The operator SA calls Update on secrets (e.g. setting OwnerReferences, rotating OAuth cookie-secret) but the ClusterRole only had get, list, watch, create, delete. Signed-off-by: Benny Zlotnik <bzlotnik@redhat.com> Assisted-by: claude-opus-4.6
Summary by CodeRabbit
New Features
CLI
Build Integration
Workspace Infrastructure
Tooling
Security
Tests