USHIFT-6797: C2CC Probe remote clusters by pmtk · Pull Request #6729 · openshift/microshift

pmtk · 2026-05-25T09:33:58Z

Summary by CodeRabbit

New Features
- Cluster-to-cluster health-check system with a probe deployment and per-remote health CRs reporting Healthy/Unhealthy, last probe time, and errors.
Configuration
- New probeInterval setting (1s–5m, default 10s) to control probe frequency; probe service ClusterIP deterministically computed.
CLI
- Added hidden probe subcommand to run the probe server.
Installation / Runtime
- Manifests for namespace, RBAC, service account, service, and probe deployment included; CRD for RemoteCluster added.
Tooling
- Make target and script to generate CRD/artifacts.
Tests & QA
- End-to-end and unit tests validating probe behavior, CR lifecycle, and cluster IP computation.

openshift-ci · 2026-05-25T09:34:02Z

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

openshift-ci-robot · 2026-05-25T09:34:02Z

@pmtk: This pull request references USHIFT-6797 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "5.0.0" version, but no target version was set.

Details

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci · 2026-05-25T09:34:11Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: pmtk

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [pmtk]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

coderabbitai · 2026-05-25T09:34:19Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

Walkthrough

Implements C2CC probe support: RemoteCluster API/CRD, probe config and deterministic probe IPs, probe runtime and CLI, controller reconciliation and probe deployment, asset/codegen additions, and unit/acceptance tests.

Changes

C2CC Probe Feature

Layer / File(s)	Summary
RemoteCluster CRD and API Definition `pkg/apis/microshift/v1alpha1/*`, `assets/crd/microshift.io_remoteclusters.yaml`, `cmd/generate-config/config/config-openapi-spec.json`	Adds v1alpha1 API types, scheme wiring, and CRD manifest for `RemoteCluster` with `spec.probeTarget`, `spec.probeInterval` (default `10s`), and `status` fields including `state`, `lastProbeTime`, `lastSuccessfulProbe`, and `errors`.
Configuration, defaults & validation `pkg/config/c2cc.go`, `pkg/config/c2cc_test.go`, `pkg/config/config.go`, `packaging/microshift/config.yaml`, `docs/user/howto_config.md`	Adds `probeInterval` config (1s–5m, default `10s`), resolves to `ResolvedProbeInterval`, computes deterministic `ProbeIP` from service CIDR's 11th host, and adds tests and docs.
Kubernetes assets & codegen `assets/components/c2cc/*`, `scripts/generate-crds.sh`, `Makefile`, `scripts/auto-rebase/assets.yaml`	Adds probe namespace, ServiceAccount, ClusterRole/Binding, Deployment, Service (templated clusterIP), codegen script, Make target `generate-crds`, and updates asset listing.
Controller: deploy probe assets & CRD lifecycle `pkg/controllers/c2cc/controller.go`, `pkg/controllers/c2cc/deploy_probe.go`, `pkg/assets/crd.go`	Controller applies healthcheck CRD at startup, waits for it to be established, computes probe ClusterIP, renders and applies probe assets, implements CRD cleanup, and adds `DeleteCRDs` helper with aggregated error handling.
Healthcheck reconciliation manager `pkg/controllers/c2cc/healthcheck.go`, `pkg/controllers/c2cc/healthcheck_test.go`	Builds desired `RemoteCluster` objects from config, reconciles by creating/updating/deleting managed CRs, deterministic CR naming from next-hop IP, and unit tests for reconcile behaviors.
Probe runtime & CLI `pkg/controllers/c2cc/probe.go`, `pkg/cmd/c2cc_probe.go`, `cmd/microshift/main.go`	`RunProbe` starts HTTP probe server on :8080, manages per-CR probe goroutines that probe targets and update CR status; adds hidden `c2cc-probe` command and registers it in main.
Acceptance tests (Robot Framework) `test/suites/c2cc/*`, `test/resources/c2cc.resource`	Robot tests verify namespace/deployment/service existence, service ClusterIP correctness (11th CIDR host), RemoteCluster health/status fields, and controller self-healing.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Suggested reviewers

eslutsky
kasturinarra
pacevedom

🚥 Pre-merge checks | ✅ 9 | ❌ 3

❌ Failed checks (3 warnings)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 2.86% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Test Structure And Quality	⚠️ Warning	Tests lack descriptive assertion messages. Multiple plain assertions without failure messages found in c2cc_test.go, healthcheck_test.go, and Robot Framework tests.	Add assertion messages to all tests for diagnostic clarity. Go: use second parameter with context. Robot: include msg arguments in Should keywords.
Topology-Aware Scheduling Compatibility	⚠️ Warning	Deployment uses nodeSelector for node-role.kubernetes.io/master without topology-awareness. On HyperShift, this causes the probe pod to remain Pending indefinitely.	Detect topology via infrastructure.Status.ControlPlaneTopology and conditionally apply nodeSelector, or use preferredDuringSchedulingIgnoredDuringExecution affinity instead.

✅ Passed checks (9 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately summarizes the main objective of the PR: implementing C2CC probe functionality for remote clusters, as evidenced by the comprehensive changes to add RemoteCluster CRD, probe interval configuration, health status tracking, and probe deployment logic.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names	✅ Passed	Go tests use static names ("IPv4", "valid single remote IPv4"). Robot Framework tests use static titles. No dynamic identifiers, timestamps, UUIDs, IPs, or node names appear in test titles.
Microshift Test Compatibility	✅ Passed	Check applies to Ginkgo e2e tests only. PR adds no Ginkgo e2e tests—only Robot Framework suites and Go unit tests, which are outside this check's scope.
Single Node Openshift (Sno) Test Compatibility	✅ Passed	No Ginkgo e2e tests added in this PR. Only Robot Framework tests (probe.robot, healthcheck.robot, c2cc.resource) were added, which fall outside the scope of this Ginkgo-specific check.
Ote Binary Stdout Contract	✅ Passed	No process-level stdout writes found. New code uses klog in RunProbe subcommand, which writes to stderr by default. No main/init/suite-level logging that would corrupt OTE JSON stdout.
Ipv6 And Disconnected Network Test Compatibility	✅ Passed	No Ginkgo e2e tests were added. The PR adds Robot Framework test suites (probe.robot, healthcheck.robot) which use different syntax; the custom check applies only to Ginkgo tests.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 5

🧹 Nitpick comments (1)

test/suites/c2cc/probe.robot (1)
123-130: ⚡ Quick win

Consider extracting Compute 11th IP to a shared resource.

This keyword is duplicated identically in 07-healthcheck.robot (lines 97-104). Since both test files already import c2cc.resource, extracting this to the shared resource file would eliminate duplication and provide a single point of maintenance.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/suites/c2cc/probe.robot` around lines 123 - 130, The keyword Compute
11th IP is duplicated; extract it into the shared resource by moving the entire
keyword block (the Robot Framework keyword named "Compute 11th IP" that uses
Process.Run Process to call python3 with the ipaddress snippet, checks rc and
returns the stripped stdout) into the c2cc.resource file, remove the duplicate
definitions from test/suites/c2cc/probe.robot and 07-healthcheck.robot, and
ensure both tests continue to import c2cc.resource so they call the single
shared Compute 11th IP keyword.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@assets/components/c2cc/deployment.yaml`:
- Around line 54-60: The deployment currently hard-pins c2cc-probe to
master-labeled nodes via nodeSelector node-role.kubernetes.io/master and a
NoSchedule toleration; change this to a topology-aware preference instead:
remove the strict nodeSelector and NoSchedule toleration and replace with a
preferred node affinity (preferredDuringSchedulingIgnoredDuringExecution) that
targets control-plane labels (e.g., node-role.kubernetes.io/master or
node-role.kubernetes.io/control-plane) so pods prefer control-plane nodes but
can run elsewhere for single-node/arbiter variants, and if a toleration is still
required use a non-blocking preference (e.g., avoid NoSchedule) or match the
exact taint/key semantics; update the c2cc deployment
(nodeSelector/tolerations/priorityClassName usage) and add topology-specific CI
to validate placement across single-node and multi-node topologies.

In `@cmd/generate-config/config/config-openapi-spec.json`:
- Around line 206-209: The OpenAPI schema for probeInterval lacks a default and
should match runtime default used in fillDefaults(); update the "probeInterval"
property in the OpenAPI schema (clusterToCluster.probeInterval) to include a
default value of "10s" and keep the existing description/type, so generated docs
and clients reflect the actual default; locate the "probeInterval" entry in the
config-openapi-spec JSON and add the "default": "10s" field.

In `@pkg/config/c2cc_test.go`:
- Around line 864-865: fillDefaults() is an error-returning initializer and the
tests currently ignore its error; update each call (e.g., the cfg.fillDefaults()
invocations around the assertions at the noted locations) to capture the
returned error and assert success (for example assign err := cfg.fillDefaults()
and call require.NoError(t, err) or assert.NoError(t, err)) before continuing to
the assert.Equal checks so tests fail fast on initialization errors.

In `@pkg/controllers/c2cc/probe.go`:
- Around line 85-87: The call to factory.WaitForCacheSync currently ignores its
result; update the startup in pkg/controllers/c2cc/probe.go to check the
returned map[reflect.Type]bool (the result of factory.WaitForCacheSync) and fail
fast if any informer failed to sync: after factory.Start(ctx.Done()) capture the
sync map from factory.WaitForCacheSync(ctx.Done()), inspect its values, and if
any value is false log an error via klog.Errorf (including what failed) and
return/exit the function/manager startup instead of continuing to
klog.Infof("Probe manager running, watching RemoteCluster CRs").
- Around line 67-70: The UpdateFunc is restarting probes on any RemoteCluster
update causing status-update-driven restart loops; change UpdateFunc to accept
both oldObj and newObj, cast them to *microshiftv1alpha1.RemoteCluster, compare
the relevant spec fields (e.g., Spec.ProbeTarget and Spec.ProbeInterval) between
old and new, and only call pm.restartProbe(ctx, rc) when those spec fields
differ (ignore updates that only change Status); this preserves runProbeLoop
behavior and prevents resetting consecutiveFailures to 0 on status-only updates.

---

Nitpick comments:
In `@test/suites/c2cc/probe.robot`:
- Around line 123-130: The keyword Compute 11th IP is duplicated; extract it
into the shared resource by moving the entire keyword block (the Robot Framework
keyword named "Compute 11th IP" that uses Process.Run Process to call python3
with the ipaddress snippet, checks rc and returns the stripped stdout) into the
c2cc.resource file, remove the duplicate definitions from
test/suites/c2cc/probe.robot and 07-healthcheck.robot, and ensure both tests
continue to import c2cc.resource so they call the single shared Compute 11th IP
keyword.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 3633745b-f762-4fbe-9134-34031e2ec1d5

📥 Commits

Reviewing files that changed from the base of the PR and between ca5e605 and 2bb9c14.

⛔ Files ignored due to path filters (22)

pkg/apis/microshift/v1alpha1/zz_generated.deepcopy.go is excluded by !**/zz_generated*
pkg/generated/clientset/versioned/clientset.go is excluded by !**/generated/**
pkg/generated/clientset/versioned/fake/clientset_generated.go is excluded by !**/generated/**
pkg/generated/clientset/versioned/fake/doc.go is excluded by !**/generated/**
pkg/generated/clientset/versioned/fake/register.go is excluded by !**/generated/**
pkg/generated/clientset/versioned/scheme/doc.go is excluded by !**/generated/**
pkg/generated/clientset/versioned/scheme/register.go is excluded by !**/generated/**
pkg/generated/clientset/versioned/typed/microshift/v1alpha1/doc.go is excluded by !**/generated/**
pkg/generated/clientset/versioned/typed/microshift/v1alpha1/fake/doc.go is excluded by !**/generated/**
pkg/generated/clientset/versioned/typed/microshift/v1alpha1/fake/fake_microshift_client.go is excluded by !**/generated/**
pkg/generated/clientset/versioned/typed/microshift/v1alpha1/fake/fake_remotecluster.go is excluded by !**/generated/**
pkg/generated/clientset/versioned/typed/microshift/v1alpha1/generated_expansion.go is excluded by !**/generated/**
pkg/generated/clientset/versioned/typed/microshift/v1alpha1/microshift_client.go is excluded by !**/generated/**
pkg/generated/clientset/versioned/typed/microshift/v1alpha1/remotecluster.go is excluded by !**/generated/**
pkg/generated/informers/externalversions/factory.go is excluded by !**/generated/**
pkg/generated/informers/externalversions/generic.go is excluded by !**/generated/**
pkg/generated/informers/externalversions/internalinterfaces/factory_interfaces.go is excluded by !**/generated/**
pkg/generated/informers/externalversions/microshift/interface.go is excluded by !**/generated/**
pkg/generated/informers/externalversions/microshift/v1alpha1/interface.go is excluded by !**/generated/**
pkg/generated/informers/externalversions/microshift/v1alpha1/remotecluster.go is excluded by !**/generated/**
pkg/generated/listers/microshift/v1alpha1/expansion_generated.go is excluded by !**/generated/**
pkg/generated/listers/microshift/v1alpha1/remotecluster.go is excluded by !**/generated/**

📒 Files selected for processing (30)

Makefile
assets/components/c2cc/clusterrole.yaml
assets/components/c2cc/clusterrolebinding.yaml
assets/components/c2cc/deployment.yaml
assets/components/c2cc/namespace.yaml
assets/components/c2cc/service.yaml
assets/components/c2cc/serviceaccount.yaml
assets/crd/microshift.io_remoteclusters.yaml
cmd/generate-config/config/config-openapi-spec.json
cmd/microshift/main.go
docs/user/howto_config.md
packaging/microshift/config.yaml
pkg/apis/microshift/v1alpha1/doc.go
pkg/apis/microshift/v1alpha1/groupversion_info.go
pkg/apis/microshift/v1alpha1/types.go
pkg/assets/crd.go
pkg/cmd/c2cc_probe.go
pkg/config/c2cc.go
pkg/config/c2cc_test.go
pkg/config/config.go
pkg/controllers/c2cc/controller.go
pkg/controllers/c2cc/deploy_probe.go
pkg/controllers/c2cc/healthcheck.go
pkg/controllers/c2cc/healthcheck_test.go
pkg/controllers/c2cc/probe.go
scripts/auto-rebase/assets.yaml
scripts/boilerplate.go.txt
scripts/generate-crds.sh
test/suites/c2cc/07-healthcheck.robot
test/suites/c2cc/probe.robot

coderabbitai · 2026-05-25T09:46:07Z

+      nodeSelector:
+        node-role.kubernetes.io/master: ""
+      priorityClassName: system-cluster-critical
+      tolerations:
+      - key: node-role.kubernetes.io/master
+        operator: Exists
+        effect: NoSchedule


⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash set -euo pipefail FILE="assets/components/c2cc/deployment.yaml" echo "=== File exists? ===" ls -l "$FILE" || exit 1 echo echo "=== Lines 1-120 (context) ===" nl -ba "$FILE" | sed -n '1,120p' echo echo "=== Lines 45-75 (scheduling snippet focus) ===" nl -ba "$FILE" | sed -n '45,75p' echo echo "=== Search for scheduling/topology fields in this file ===" rg -n --hidden --no-ignore-vcs "(nodeSelector|affinity|podAntiAffinity|topologySpreadConstraints|topologyKey|replicas|podDisruptionBudget|PDB|node-role.kubernetes.io/master|control-plane|master|kubernetes.io/hostname)" "$FILE" || true echo echo "=== Repo-wide usage of node-role.kubernetes.io/master ===" rg -n --hidden --no-ignore-vcs "node-role\.kubernetes\.io/master" . || true

Repository: openshift/microshift

Length of output: 266

🏁 Script executed:

#!/bin/bash set -euo pipefail FILE="assets/components/c2cc/deployment.yaml" echo "=== File exists? ===" ls -l "$FILE" || exit 1 echo echo "=== Lines 1-120 (context) ===" awk 'NR>=1 && NR<=120 {printf "%6d %s\n", NR, $0}' "$FILE" echo echo "=== Lines 45-75 (scheduling snippet focus) ===" awk 'NR>=45 && NR<=75 {printf "%6d %s\n", NR, $0}' "$FILE" echo echo "=== Search for scheduling/topology fields in this file ===" rg -n --hidden --no-ignore-vcs "(nodeSelector|affinity|podAntiAffinity|topologySpreadConstraints|topologyKey|replicas|podDisruptionBudget|PDB|node-role.kubernetes.io/master|control-plane|master|kubernetes.io/hostname)" "$FILE" || true echo echo "=== Repo-wide usage of node-role.kubernetes.io/master ===" rg -n --hidden --no-ignore-vcs "node-role\.kubernetes\.io/master" . || true

Repository: openshift/microshift

Length of output: 15624

🏁 Script executed:

#!/bin/bash set -euo pipefail FILE="assets/components/c2cc/deployment.yaml" echo "=== File exists? ===" ls -l "$FILE" || exit 1 echo echo "=== Lines 1-120 (context) ===" awk 'NR>=1 && NR<=120 {printf "%6d %s\n", NR, $0}' "$FILE" echo echo "=== Lines 45-75 (scheduling snippet focus) ===" awk 'NR>=45 && NR<=75 {printf "%6d %s\n", NR, $0}' "$FILE" echo echo "=== Search for scheduling/topology fields in this file ===" rg -n --hidden --no-ignore-vcs "(nodeSelector|affinity|podAntiAffinity|topologySpreadConstraints|topologyKey|replicas|podDisruptionBudget|PDB|node-role.kubernetes.io/master|control-plane|master|kubernetes.io/hostname)" "$FILE" || true echo echo "=== Repo-wide usage of node-role.kubernetes.io/master ===" rg -n --hidden --no-ignore-vcs "node-role\.kubernetes\.io/master" . || true

Repository: openshift/microshift

Length of output: 15624

🏁 Script executed:

#!/bin/bash set -euo pipefail echo "=== Repo-wide: control-plane label usage ===" rg -n --hidden --no-ignore-vcs "node-role\.kubernetes\.io/control-plane" . || true echo echo "=== Repo-wide: arbiter mentions ===" rg -n --hidden --no-ignore-vcs "arbiter" . || true echo echo "=== Repo-wide: SNO mention ===" rg -n --hidden --no-ignore-vcs "\bSNO\b|single[- ]node|single[- ]node" . || true echo echo "=== CI/topology job hints ===" rg -n --hidden --no-ignore-vcs "topology|topological|spread|DoNotSchedule|hostname|podAntiAffinity" .github . || true

Repository: openshift/microshift

Length of output: 50378

🏁 Script executed:

#!/bin/bash set -euo pipefail echo "=== Repo-wide: control-plane label usage ===" rg -n --hidden --no-ignore-vcs "node-role\.kubernetes\.io/control-plane" . || true echo echo "=== Repo-wide: arbiter mentions ===" rg -n --hidden --no-ignore-vcs "arbiter" . || true echo echo "=== Repo-wide: SNO mention ===" rg -n --hidden --no-ignore-vcs "\bSNO\b|single[- ]node|single[- ]node" . || true echo echo "=== CI/topology job hints ===" rg -n --hidden --no-ignore-vcs "topology|topological|spread|DoNotSchedule|hostname|podAntiAffinity" .github . || true

Repository: openshift/microshift

Length of output: 50378

Avoid hard pinning c2cc-probe to master-labeled control-plane nodes
In assets/components/c2cc/deployment.yaml (lines 54-60), the workload uses nodeSelector: node-role.kubernetes.io/master: "" plus a NoSchedule toleration for the same taint, forcing placement onto master-labeled nodes only. This bakes in a specific control-plane topology and isn’t topology-aware for single-node/arbiter variants—make placement topology-aware and cover it with topology-specific CI.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@assets/components/c2cc/deployment.yaml` around lines 54 - 60, The deployment currently hard-pins c2cc-probe to master-labeled nodes via nodeSelector node-role.kubernetes.io/master and a NoSchedule toleration; change this to a topology-aware preference instead: remove the strict nodeSelector and NoSchedule toleration and replace with a preferred node affinity (preferredDuringSchedulingIgnoredDuringExecution) that targets control-plane labels (e.g., node-role.kubernetes.io/master or node-role.kubernetes.io/control-plane) so pods prefer control-plane nodes but can run elsewhere for single-node/arbiter variants, and if a toleration is still required use a non-blocking preference (e.g., avoid NoSchedule) or match the exact taint/key semantics; update the c2cc deployment (nodeSelector/tolerations/priorityClassName usage) and add topology-specific CI to validate placement across single-node and multi-node topologies.

Won't fix — MicroShift is single-node. Every other deployment (service-ca, OVN, CSI, DNS) uses the identical nodeSelector: node-role.kubernetes.io/master + NoSchedule toleration pattern. This is the standard convention, not a topology concern.

@pmtk, understood — that's a valid convention for MicroShift's single-node topology. I'll defer to the established pattern.

✏️ Learnings added

Learnt from: pmtk Repo: openshift/microshift PR: 6729 File: assets/components/c2cc/deployment.yaml:54-60 Timestamp: 2026-05-25T11:24:44.503Z Learning: In MicroShift (openshift/microshift), all system-level deployments (service-ca, OVN, CSI, DNS, c2cc-probe, etc.) use the standard pattern of `nodeSelector: node-role.kubernetes.io/master: ""` plus a `NoSchedule` toleration for `node-role.kubernetes.io/master`. This is intentional and correct — MicroShift is a single-node distribution, so topology-aware scheduling is not applicable. Do not flag this pattern as a topology concern.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

coderabbitai · 2026-05-25T09:46:07Z

+	factory.Start(ctx.Done())
+	factory.WaitForCacheSync(ctx.Done())
+	klog.Infof("Probe manager running, watching RemoteCluster CRs")


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Verify cache sync result is currently ignored. rg -n -C2 'factory.Start|WaitForCacheSync|HasSynced' pkg/controllers/c2cc/probe.go

Repository: openshift/microshift

Length of output: 218

Fail fast if informer cache does not sync. In pkg/controllers/c2cc/probe.go (lines 85-87), the factory.WaitForCacheSync(ctx.Done()) return value is ignored; if the RemoteCluster informer cache never syncs, the manager can start without a valid initial state—handle the sync result (log and exit/return an error) when it fails.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/controllers/c2cc/probe.go` around lines 85 - 87, The call to factory.WaitForCacheSync currently ignores its result; update the startup in pkg/controllers/c2cc/probe.go to check the returned map[reflect.Type]bool (the result of factory.WaitForCacheSync) and fail fast if any informer failed to sync: after factory.Start(ctx.Done()) capture the sync map from factory.WaitForCacheSync(ctx.Done()), inspect its values, and if any value is false log an error via klog.Errorf (including what failed) and return/exit the function/manager startup instead of continuing to klog.Infof("Probe manager running, watching RemoteCluster CRs").

Won't fix — WaitForCacheSync(ctx.Done()) blocks until sync or context cancellation. If ctx is cancelled, the next line <-ctx.Done() exits immediately anyway — both paths converge on the same shutdown behavior.

@pmtk, fair point — if the context is already cancelled when WaitForCacheSync returns false, the <-ctx.Done() path shuts things down cleanly regardless. Acknowledged.

pmtk · 2026-05-25T11:35:12Z

Depends on #6727

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (2)

assets/components/c2cc/deployment.yaml (1)

35-40: ⚡ Quick win

Consider adding a readinessProbe.

The Deployment has a livenessProbe but no readinessProbe. Without it, the Service will route traffic immediately on pod start, potentially before the HTTP server is ready.

♻️ Proposed addition

         livenessProbe:
           httpGet:
             path: /
             port: 8080
           initialDelaySeconds: 10
           periodSeconds: 10
+        readinessProbe:
+          httpGet:
+            path: /
+            port: 8080
+          initialDelaySeconds: 5
+          periodSeconds: 5

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@assets/components/c2cc/deployment.yaml` around lines 35 - 40, Add a
readinessProbe alongside the existing livenessProbe so the Service only routes
traffic once the pod's HTTP server is ready; mirror the livenessProbe's httpGet
(path: /, port: 8080) and set sensible readiness-specific timing (for example
readinessProbe with initialDelaySeconds, periodSeconds and optionally
successThreshold/failureThreshold) in the same Pod spec where livenessProbe is
defined so Kubernetes checks readiness separately from liveness.

pkg/controllers/c2cc/deploy_probe.go (1)

26-27: ⚡ Quick win

Add a defensive guard before indexing ServiceNetwork.

Line 26 assumes at least one entry and can panic on empty config; return a clear error instead.

Suggested patch

 func (c *C2CCRouteManager) deployProbe(ctx context.Context) error {
+	if len(c.cfg.Network.ServiceNetwork) == 0 {
+		return fmt.Errorf("service network is empty")
+	}
 	_, svcNet, err := net.ParseCIDR(c.cfg.Network.ServiceNetwork[0])
 	if err != nil {
 		return fmt.Errorf("failed to parse local service network: %w", err)
 	}

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/controllers/c2cc/deploy_probe.go` around lines 26 - 27, Add a defensive
guard before indexing c.cfg.Network.ServiceNetwork: check that
len(c.cfg.Network.ServiceNetwork) > 0 and if not, return a clear error (e.g.,
fmt.Errorf or wrapped error) indicating the ServiceNetwork is empty; only then
call net.ParseCIDR(c.cfg.Network.ServiceNetwork[0]) and handle its error as
before. This protects the code around net.ParseCIDR and the surrounding function
(where c.cfg.Network.ServiceNetwork is used) from panics due to an empty slice.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pkg/controllers/c2cc/healthcheck.go`:
- Around line 65-67: The reconcile path in healthcheckCRManager.reconcile()
writes got.Spec = want.Spec then calls h.client.RemoteClusters().Update(...)
which can fail with 409s if the probe pod updates status between List and
Update; wrap the Update call in a retry-on-conflict loop (use
retry.RetryOnConflict) that re-gets the latest RemoteCluster
(h.client.RemoteClusters().Get), applies the desired Spec (assign want.Spec to
the fetched object's Spec), and retries the Update until success or a
non-conflict error, preserving the existing metav1.UpdateOptions usage.

In `@pkg/controllers/c2cc/probe.go`:
- Around line 177-181: The current probe logic unconditionally sets status.State
to "Healthy" whenever consecutiveFailures < unhealthyThreshold, which hides
errors; change the logic in the probe result handler that updates status.State
(the block using consecutiveFailures, unhealthyThreshold and status.State) to
avoid overwriting the prior state when there are Errors—either preserve the
existing status.State if Errors is non-empty or use an intermediate state such
as "Degraded"/"UnhealthyPending" instead of "Healthy"; ensure you handle the
initial/zero-value state (set to "Healthy" only when there are no Errors and
consecutiveFailures == 0) so the code that writes status.Errors and status.State
remains consistent.
- Line 44: The http.Server instantiation (server := &http.Server{Addr: ":8080",
Handler: mux}) lacks a ReadHeaderTimeout, making it vulnerable to Slowloris;
update the server struct to include a sensible ReadHeaderTimeout (e.g., a few
seconds) alongside Addr and Handler so the server closes connections that stall
while sending headers (modify the &http.Server literal where server is created).

---

Nitpick comments:
In `@assets/components/c2cc/deployment.yaml`:
- Around line 35-40: Add a readinessProbe alongside the existing livenessProbe
so the Service only routes traffic once the pod's HTTP server is ready; mirror
the livenessProbe's httpGet (path: /, port: 8080) and set sensible
readiness-specific timing (for example readinessProbe with initialDelaySeconds,
periodSeconds and optionally successThreshold/failureThreshold) in the same Pod
spec where livenessProbe is defined so Kubernetes checks readiness separately
from liveness.

In `@pkg/controllers/c2cc/deploy_probe.go`:
- Around line 26-27: Add a defensive guard before indexing
c.cfg.Network.ServiceNetwork: check that len(c.cfg.Network.ServiceNetwork) > 0
and if not, return a clear error (e.g., fmt.Errorf or wrapped error) indicating
the ServiceNetwork is empty; only then call
net.ParseCIDR(c.cfg.Network.ServiceNetwork[0]) and handle its error as before.
This protects the code around net.ParseCIDR and the surrounding function (where
c.cfg.Network.ServiceNetwork is used) from panics due to an empty slice.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: ba6f532c-76d3-473b-8672-a7e040dd7d51

📥 Commits

Reviewing files that changed from the base of the PR and between 7ec41e8 and 06c24c1.

⛔ Files ignored due to path filters (22)

pkg/apis/microshift/v1alpha1/zz_generated.deepcopy.go is excluded by !**/zz_generated*
pkg/generated/clientset/versioned/clientset.go is excluded by !**/generated/**
pkg/generated/clientset/versioned/fake/clientset_generated.go is excluded by !**/generated/**
pkg/generated/clientset/versioned/fake/doc.go is excluded by !**/generated/**
pkg/generated/clientset/versioned/fake/register.go is excluded by !**/generated/**
pkg/generated/clientset/versioned/scheme/doc.go is excluded by !**/generated/**
pkg/generated/clientset/versioned/scheme/register.go is excluded by !**/generated/**
pkg/generated/clientset/versioned/typed/microshift/v1alpha1/doc.go is excluded by !**/generated/**
pkg/generated/clientset/versioned/typed/microshift/v1alpha1/fake/doc.go is excluded by !**/generated/**
pkg/generated/clientset/versioned/typed/microshift/v1alpha1/fake/fake_microshift_client.go is excluded by !**/generated/**
pkg/generated/clientset/versioned/typed/microshift/v1alpha1/fake/fake_remotecluster.go is excluded by !**/generated/**
pkg/generated/clientset/versioned/typed/microshift/v1alpha1/generated_expansion.go is excluded by !**/generated/**
pkg/generated/clientset/versioned/typed/microshift/v1alpha1/microshift_client.go is excluded by !**/generated/**
pkg/generated/clientset/versioned/typed/microshift/v1alpha1/remotecluster.go is excluded by !**/generated/**
pkg/generated/informers/externalversions/factory.go is excluded by !**/generated/**
pkg/generated/informers/externalversions/generic.go is excluded by !**/generated/**
pkg/generated/informers/externalversions/internalinterfaces/factory_interfaces.go is excluded by !**/generated/**
pkg/generated/informers/externalversions/microshift/interface.go is excluded by !**/generated/**
pkg/generated/informers/externalversions/microshift/v1alpha1/interface.go is excluded by !**/generated/**
pkg/generated/informers/externalversions/microshift/v1alpha1/remotecluster.go is excluded by !**/generated/**
pkg/generated/listers/microshift/v1alpha1/expansion_generated.go is excluded by !**/generated/**
pkg/generated/listers/microshift/v1alpha1/remotecluster.go is excluded by !**/generated/**

📒 Files selected for processing (31)

Makefile
assets/components/c2cc/clusterrole.yaml
assets/components/c2cc/clusterrolebinding.yaml
assets/components/c2cc/deployment.yaml
assets/components/c2cc/namespace.yaml
assets/components/c2cc/service.yaml
assets/components/c2cc/serviceaccount.yaml
assets/crd/microshift.io_remoteclusters.yaml
cmd/generate-config/config/config-openapi-spec.json
cmd/microshift/main.go
docs/user/howto_config.md
packaging/microshift/config.yaml
pkg/apis/microshift/v1alpha1/doc.go
pkg/apis/microshift/v1alpha1/groupversion_info.go
pkg/apis/microshift/v1alpha1/types.go
pkg/assets/crd.go
pkg/cmd/c2cc_probe.go
pkg/config/c2cc.go
pkg/config/c2cc_test.go
pkg/config/config.go
pkg/controllers/c2cc/controller.go
pkg/controllers/c2cc/deploy_probe.go
pkg/controllers/c2cc/healthcheck.go
pkg/controllers/c2cc/healthcheck_test.go
pkg/controllers/c2cc/probe.go
scripts/auto-rebase/assets.yaml
scripts/boilerplate.go.txt
scripts/generate-crds.sh
test/resources/c2cc.resource
test/suites/c2cc/healthcheck.robot
test/suites/c2cc/probe.robot

✅ Files skipped from review due to trivial changes (4)

pkg/apis/microshift/v1alpha1/doc.go
assets/components/c2cc/namespace.yaml
cmd/generate-config/config/config-openapi-spec.json
docs/user/howto_config.md

coderabbitai · 2026-05-25T11:48:34Z

+		got.Spec = want.Spec
+		if _, err := h.client.RemoteClusters().Update(ctx, got, metav1.UpdateOptions{}); err != nil {
+			return fmt.Errorf("updating RemoteCluster %q: %w", name, err)


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Verify concurrent status writers and current lack of retry-on-conflict in update path. rg -nP --type=go 'RemoteClusters\.UpdateStatus$' rg -nP --type=go 'healthcheckCRManager$\s+reconcile|RemoteClusters\.Update$|RetryOnConflict'

Repository: openshift/microshift

Length of output: 28029

🏁 Script executed:

#!/bin/bash set -euo pipefail # 1) Inspect the exact code around the update call sed -n '1,140p' pkg/controllers/c2cc/healthcheck.go # 2) Find all remotecluster update call sites in this repo (spec or status) rg -n --type=go "RemoteClusters\($\.(Update|UpdateStatus|Patch)" pkg | head -n 200 # 3) Find any other controller that updates RemoteCluster status/fields rg -n --type=go "RemoteCluster" pkg/controllers/c2cc | head -n 200 rg -n --type=go "RemoteCluster" pkg | head -n 200

Repository: openshift/microshift

Length of output: 29154

🏁 Script executed:

#!/bin/bash set -euo pipefail sed -n '130,280p' pkg/controllers/c2cc/probe.go # Also inspect imports to see if retry is already used in probe.go sed -n '1,120p' pkg/controllers/c2cc/probe.go

Repository: openshift/microshift

Length of output: 6156

🏁 Script executed:

#!/bin/bash set -euo pipefail rg -n --type=go "RetryOnConflict|util/retry|Update$ctx,.*RemoteClusters" pkg/controllers/c2cc/healthcheck.go rg -n --type=go "RemoteClusters\($\.Update\(" pkg/controllers/c2cc rg -n --type=go "UpdateStatus\(" pkg/controllers/c2cc

Repository: openshift/microshift

Length of output: 46

🏁 Script executed:

#!/bin/bash set -euo pipefail # Show how healthcheck manager + probe are wired sed -n '1,220p' pkg/controllers/c2cc/controller.go ls -1 pkg/controllers/c2cc # Find where healthcheckCRManager is instantiated / reconcile loop is started rg -n --type=go "newHealthcheckCRManager|healthcheckCRManager|\\.reconcile\\(" pkg/controllers/c2cc rg -n --type=go "RunProbe\\(" pkg/controllers/c2cc rg -n --type=go "RunProbe" pkg/controllers | head -n 50

Repository: openshift/microshift

Length of output: 8925

🏁 Script executed:

#!/bin/bash set -euo pipefail sed -n '140,260p' pkg/controllers/c2cc/controller.go sed -n '1,220p' pkg/controllers/c2cc/deploy_probe.go rg -n --type=go "RunProbe|deploy_probe|Probe" pkg/controllers/c2cc | head -n 200 # Show full healthcheck.go to confirm Update uses entire object (status included) sed -n '1,120p' pkg/controllers/c2cc/healthcheck.go

Repository: openshift/microshift

Length of output: 13925

🏁 Script executed:

#!/bin/bash set -euo pipefail rg -n "deployment.yaml|c2cc-probe|probe" components/c2cc -S || true ls -R components/c2cc || true # Find the deployment template file referenced by deployProbe fd -a "deployment.yaml" . | head -n 20 # Print the deployment.yaml relevant bits (replicas, container args/env) without dumping entire file FILE="$(fd -a "deployment.yaml" . | head -n 1)" echo "Using $FILE" rg -n "replicas:|c2cc.*probe|RunProbe|healthcheck-probe|container|args:" "$FILE" -n || true sed -n '1,220p' "$FILE"

Repository: openshift/microshift

Length of output: 6913

🏁 Script executed:

#!/bin/bash set -euo pipefail # 1) Probe deployment template ls -la assets/components/c2cc || true sed -n '1,240p' assets/components/c2cc/deployment.yaml # 2) RemoteCluster CRD to see if status subresource is enabled CRD="$(fd -a "microshift.io_remoteclusters.yaml" . | head -n 1)" echo "CRD path: $CRD" sed -n '1,220p' "$CRD" rg -n "subresources:|status:" "$CRD" -n # 3) Retry constants used in suggested patch sed -n '1,140p' vendor/k8s.io/client-go/util/retry/util.go rg -n "DefaultRetry|DefaultBackoff" vendor/k8s.io/client-go/util/retry/util.go

Repository: openshift/microshift

Length of output: 9233

Add retry-on-conflict when updating RemoteCluster Spec (avoid 409s from concurrent probe status writes).

healthcheckCRManager.reconcile() does a List then full-object Update() after modifying got.Spec. The probe pod writes RemoteCluster health via UpdateStatus, and the CRD enables the status subresource—so probe status updates can bump resourceVersion between List and Update, causing reconcile to fail early on conflicts.

Suggested patch

import ( "context" "fmt" "net" "strings" microshiftv1alpha1 "github.com/openshift/microshift/pkg/apis/microshift/v1alpha1" "github.com/openshift/microshift/pkg/config" microshiftclient "github.com/openshift/microshift/pkg/generated/clientset/versioned/typed/microshift/v1alpha1" metav1 "k8s.io/apimachinery/pkg/apis/meta/v1" + "k8s.io/client-go/util/retry" "k8s.io/klog/v2" ) @@ - got.Spec = want.Spec - if _, err := h.client.RemoteClusters().Update(ctx, got, metav1.UpdateOptions{}); err != nil { + if err := retry.RetryOnConflict(retry.DefaultRetry, func() error { + latest, err := h.client.RemoteClusters().Get(ctx, name, metav1.GetOptions{}) + if err != nil { + return err + } + latest.Spec = want.Spec + _, err = h.client.RemoteClusters().Update(ctx, latest, metav1.UpdateOptions{}) + return err + }); err != nil { return fmt.Errorf("updating RemoteCluster %q: %w", name, err) }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

got.Spec = want.Spec

if _, err := h.client.RemoteClusters().Update(ctx, got, metav1.UpdateOptions{}); err != nil {

return fmt.Errorf("updating RemoteCluster %q: %w", name, err)

if err := retry.RetryOnConflict(retry.DefaultRetry, func() error {

latest, err := h.client.RemoteClusters().Get(ctx, name, metav1.GetOptions{})

if err != nil {

return err

}

latest.Spec = want.Spec

_, err = h.client.RemoteClusters().Update(ctx, latest, metav1.UpdateOptions{})

return err

}); err != nil {

return fmt.Errorf("updating RemoteCluster %q: %w", name, err)

}

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/controllers/c2cc/healthcheck.go` around lines 65 - 67, The reconcile path in healthcheckCRManager.reconcile() writes got.Spec = want.Spec then calls h.client.RemoteClusters().Update(...) which can fail with 409s if the probe pod updates status between List and Update; wrap the Update call in a retry-on-conflict loop (use retry.RetryOnConflict) that re-gets the latest RemoteCluster (h.client.RemoteClusters().Get), applies the desired Spec (assign want.Spec to the fetched object's Spec), and retries the Update until success or a non-conflict error, preserving the existing metav1.UpdateOptions usage.

coderabbitai · 2026-05-25T11:48:35Z

+		w.WriteHeader(http.StatusOK)
+		fmt.Fprint(w, "ok")
+	})
+	server := &http.Server{Addr: ":8080", Handler: mux}


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Add ReadHeaderTimeout to prevent Slowloris attacks.

The HTTP server lacks ReadHeaderTimeout, leaving it vulnerable to slow header attacks that can exhaust connections.

🔒 Proposed fix

- server := &http.Server{Addr: ":8080", Handler: mux} + server := &http.Server{ + Addr: ":8080", + Handler: mux, + ReadHeaderTimeout: 10 * time.Second, + }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

server := &http.Server{Addr: ":8080", Handler: mux}

server := &http.Server{

Addr: ":8080",

Handler: mux,

ReadHeaderTimeout: 10 * time.Second,

}

🧰 Tools

🪛 golangci-lint (2.12.2)

[medium] 44-44: G112: Potential Slowloris Attack because ReadHeaderTimeout is not configured in the http.Server

(gosec)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/controllers/c2cc/probe.go` at line 44, The http.Server instantiation (server := &http.Server{Addr: ":8080", Handler: mux}) lacks a ReadHeaderTimeout, making it vulnerable to Slowloris; update the server struct to include a sensible ReadHeaderTimeout (e.g., a few seconds) alongside Addr and Handler so the server closes connections that stall while sending headers (modify the &http.Server literal where server is created).

coderabbitai · 2026-05-25T11:48:35Z

+				if consecutiveFailures >= unhealthyThreshold {
+					status.State = "Unhealthy"
+				} else {
+					status.State = "Healthy"
+				}


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

State shows "Healthy" with errors until threshold is reached.

When a probe fails but consecutiveFailures < unhealthyThreshold, the status is set to "Healthy" with Errors populated. This creates a confusing state where the CR appears healthy but contains error messages. Consider preserving the previous state or introducing an intermediate state.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/controllers/c2cc/probe.go` around lines 177 - 181, The current probe logic unconditionally sets status.State to "Healthy" whenever consecutiveFailures < unhealthyThreshold, which hides errors; change the logic in the probe result handler that updates status.State (the block using consecutiveFailures, unhealthyThreshold and status.State) to avoid overwriting the prior state when there are Errors—either preserve the existing status.State if Errors is non-empty or use an intermediate state such as "Degraded"/"UnhealthyPending" instead of "Healthy"; ensure you handle the initial/zero-value state (set to "Healthy" only when there are no Errors and consecutiveFailures == 0) so the code that writes status.Errors and status.State remains consistent.

coderabbitai

Actionable comments posted: 3

♻️ Duplicate comments (2)

pkg/controllers/c2cc/probe.go (1)

44-44: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Set ReadHeaderTimeout on the probe HTTP server.

This keeps the endpoint from being vulnerable to slow-header connection exhaustion.

Suggested patch

-	server := &http.Server{Addr: ":8080", Handler: mux}
+	server := &http.Server{
+		Addr:              ":8080",
+		Handler:           mux,
+		ReadHeaderTimeout: 10 * time.Second,
+	}

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/controllers/c2cc/probe.go` at line 44, The probe HTTP server
instantiation (the http.Server assigned to variable server in probe.go:
http.Server{Addr: ":8080", Handler: mux}) lacks a ReadHeaderTimeout which makes
it vulnerable to slow-header attacks; set the ReadHeaderTimeout field (e.g.,
time.Second * 5 or a configurable value) on that http.Server struct so the
server closes connections that take too long to send request headers.

pkg/controllers/c2cc/healthcheck.go (1)

65-67: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Add retry-on-conflict around RemoteCluster updates.

Update() can hit 409 conflicts when probe status writers update the same object between list and update.

Suggested patch

 import (
 	"context"
 	"fmt"
 	"net"
 	"strings"

 	microshiftv1alpha1 "github.com/openshift/microshift/pkg/apis/microshift/v1alpha1"
 	"github.com/openshift/microshift/pkg/config"
 	microshiftclient "github.com/openshift/microshift/pkg/generated/clientset/versioned/typed/microshift/v1alpha1"
 	metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
+	"k8s.io/client-go/util/retry"
 	"k8s.io/klog/v2"
 )
@@
-		got.Spec = want.Spec
-		if _, err := h.client.RemoteClusters().Update(ctx, got, metav1.UpdateOptions{}); err != nil {
+		if err := retry.RetryOnConflict(retry.DefaultRetry, func() error {
+			latest, err := h.client.RemoteClusters().Get(ctx, name, metav1.GetOptions{})
+			if err != nil {
+				return err
+			}
+			latest.Spec = want.Spec
+			_, err = h.client.RemoteClusters().Update(ctx, latest, metav1.UpdateOptions{})
+			return err
+		}); err != nil {
 			return fmt.Errorf("updating RemoteCluster %q: %w", name, err)
 		}

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/controllers/c2cc/healthcheck.go` around lines 65 - 67, The RemoteCluster
update can fail with 409 conflicts; wrap the update in a retry-on-conflict loop
(use k8s.io/client-go/util/retry.RetryOnConflict) around
h.client.RemoteClusters().Update, re-get the latest object inside the retry, set
got.Spec = want.Spec each attempt, then call Update(ctx, got,
metav1.UpdateOptions{}), and return the final error if retries exhaust;
reference the variables/functions got, want, and
h.client.RemoteClusters().Update to locate where to apply the loop.

🧹 Nitpick comments (1)

pkg/controllers/c2cc/deploy_probe.go (1)

59-59: ⚡ Quick win

Lower repetitive success logging to verbose level.

This path runs every reconcile, so Infof will generate noisy logs over time.

Suggested change

-	klog.Infof("C2CC probe assets deployed (probe ClusterIP=%s)", probeIP)
+	klog.V(4).Infof("C2CC probe assets deployed (probe ClusterIP=%s)", probeIP)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/controllers/c2cc/deploy_probe.go` at line 59, The success log in
deploy_probe.go currently uses klog.Infof("C2CC probe assets deployed (probe
ClusterIP=%s)", probeIP) which runs every reconcile and is noisy; change this to
a verbose-level log (e.g., klog.V(2).Infof or appropriate verbosity) so normal
Info logs are not flooded—locate the klog.Infof call in the deploy probe
reconciliation (symbol: the logging call in deploy_probe.go that prints probe
ClusterIP) and replace it with a klog.V(x).Infof call using an appropriate
verbosity level.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@assets/components/c2cc/deployment.yaml`:
- Around line 35-40: Add a readinessProbe alongside the existing livenessProbe
so the Service only routes traffic after the app is ready: add a readinessProbe
entry (mirroring the livenessProbe's httpGet path: / and port: 8080) and
configure sensible timings (e.g., initialDelaySeconds, periodSeconds,
successThreshold/failureThreshold) to gate traffic until the probe server is
listening; update the same Kubernetes manifest section containing livenessProbe
to include readinessProbe.

In `@pkg/controllers/c2cc/probe.go`:
- Around line 61-86: The call to informer.AddEventHandler currently ignores its
returned error, so update the code where informer.AddEventHandler(...) is
invoked to capture the returned error and fail fast on error (e.g., return the
error from the surrounding function or call log.Fatalf/process exit), rather
than silently proceeding; keep the existing handlers (AddFunc calling
pm.startProbe, UpdateFunc calling pm.restartProbe, DeleteFunc calling
pm.stopProbe) intact and reference this change around the
informer.AddEventHandler invocation to ensure event-handler registration errors
are handled and propagated.

In `@test/resources/c2cc.resource`:
- Around line 286-294: The current inline Python command uses n[11] which raises
IndexError for networks with fewer than 12 addresses; modify the command built
in the ${cmd} variable to first compute n = ipaddress.ip_network('${cidr}',
strict=False) and check n.num_addresses > 11 (or >=12) before accessing n[11];
if the check fails, print a clear error message and exit with a non-zero status
so the Process.Run call (python3 -c ${cmd}) returns a failing rc and the Robot
keyword surfaces a readable validation error instead of an IndexError traceback.

---

Duplicate comments:
In `@pkg/controllers/c2cc/healthcheck.go`:
- Around line 65-67: The RemoteCluster update can fail with 409 conflicts; wrap
the update in a retry-on-conflict loop (use
k8s.io/client-go/util/retry.RetryOnConflict) around
h.client.RemoteClusters().Update, re-get the latest object inside the retry, set
got.Spec = want.Spec each attempt, then call Update(ctx, got,
metav1.UpdateOptions{}), and return the final error if retries exhaust;
reference the variables/functions got, want, and
h.client.RemoteClusters().Update to locate where to apply the loop.

In `@pkg/controllers/c2cc/probe.go`:
- Line 44: The probe HTTP server instantiation (the http.Server assigned to
variable server in probe.go: http.Server{Addr: ":8080", Handler: mux}) lacks a
ReadHeaderTimeout which makes it vulnerable to slow-header attacks; set the
ReadHeaderTimeout field (e.g., time.Second * 5 or a configurable value) on that
http.Server struct so the server closes connections that take too long to send
request headers.

---

Nitpick comments:
In `@pkg/controllers/c2cc/deploy_probe.go`:
- Line 59: The success log in deploy_probe.go currently uses klog.Infof("C2CC
probe assets deployed (probe ClusterIP=%s)", probeIP) which runs every reconcile
and is noisy; change this to a verbose-level log (e.g., klog.V(2).Infof or
appropriate verbosity) so normal Info logs are not flooded—locate the klog.Infof
call in the deploy probe reconciliation (symbol: the logging call in
deploy_probe.go that prints probe ClusterIP) and replace it with a
klog.V(x).Infof call using an appropriate verbosity level.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 03ed3a56-44f9-4fcb-accc-ec1e9217620d

📥 Commits

Reviewing files that changed from the base of the PR and between 06c24c1 and 89f74e6.

⛔ Files ignored due to path filters (1)

pkg/apis/microshift/v1alpha1/zz_generated.deepcopy.go is excluded by !**/zz_generated*

📒 Files selected for processing (26)

assets/components/c2cc/clusterrole.yaml
assets/components/c2cc/clusterrolebinding.yaml
assets/components/c2cc/deployment.yaml
assets/components/c2cc/namespace.yaml
assets/components/c2cc/service.yaml
assets/components/c2cc/serviceaccount.yaml
assets/crd/microshift.io_remoteclusters.yaml
cmd/generate-config/config/config-openapi-spec.json
cmd/microshift/main.go
docs/user/howto_config.md
packaging/microshift/config.yaml
pkg/apis/microshift/v1alpha1/types.go
pkg/assets/crd.go
pkg/cmd/c2cc_probe.go
pkg/config/c2cc.go
pkg/config/c2cc_test.go
pkg/config/config.go
pkg/controllers/c2cc/controller.go
pkg/controllers/c2cc/deploy_probe.go
pkg/controllers/c2cc/healthcheck.go
pkg/controllers/c2cc/healthcheck_test.go
pkg/controllers/c2cc/probe.go
scripts/auto-rebase/assets.yaml
test/resources/c2cc.resource
test/suites/c2cc/healthcheck.robot
test/suites/c2cc/probe.robot

✅ Files skipped from review due to trivial changes (2)

docs/user/howto_config.md
cmd/generate-config/config/config-openapi-spec.json

coderabbitai · 2026-05-25T13:50:02Z

+        livenessProbe:
+          httpGet:
+            path: /
+            port: 8080
+          initialDelaySeconds: 10
+          periodSeconds: 10


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add a readiness probe for endpoint gating.

Without readiness, the Service can route traffic before the probe server is actually listening, which can cause avoidable startup failures.

Suggested change

livenessProbe: httpGet: path: / port: 8080 initialDelaySeconds: 10 periodSeconds: 10 + readinessProbe: + httpGet: + path: / + port: 8080 + initialDelaySeconds: 2 + periodSeconds: 5

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@assets/components/c2cc/deployment.yaml` around lines 35 - 40, Add a readinessProbe alongside the existing livenessProbe so the Service only routes traffic after the app is ready: add a readinessProbe entry (mirroring the livenessProbe's httpGet path: / and port: 8080) and configure sensible timings (e.g., initialDelaySeconds, periodSeconds, successThreshold/failureThreshold) to gate traffic until the probe server is listening; update the same Kubernetes manifest section containing livenessProbe to include readinessProbe.

coderabbitai · 2026-05-25T13:50:02Z

+	informer.AddEventHandler(cache.ResourceEventHandlerFuncs{
+		AddFunc: func(obj interface{}) {
+			if rc, ok := obj.(*microshiftv1alpha1.RemoteCluster); ok {
+				pm.startProbe(ctx, rc)
+			}
+		},
+		UpdateFunc: func(oldObj, newObj interface{}) {
+			oldRC, ok1 := oldObj.(*microshiftv1alpha1.RemoteCluster)
+			newRC, ok2 := newObj.(*microshiftv1alpha1.RemoteCluster)
+			if ok1 && ok2 && (oldRC.Spec.ProbeTarget != newRC.Spec.ProbeTarget ||
+				oldRC.Spec.ProbeInterval != newRC.Spec.ProbeInterval) {
+				pm.restartProbe(ctx, newRC)
+			}
+		},
+		DeleteFunc: func(obj interface{}) {
+			rc, ok := obj.(*microshiftv1alpha1.RemoteCluster)
+			if !ok {
+				if tombstone, ok := obj.(cache.DeletedFinalStateUnknown); ok {
+					rc, _ = tombstone.Obj.(*microshiftv1alpha1.RemoteCluster)
+				}
+			}
+			if rc != nil {
+				pm.stopProbe(rc.Name)
+			}
+		},
+	})


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash set -euo pipefail # Show relevant section with line numbers sed -n '1,140p' pkg/controllers/c2cc/probe.go | nl -ba | sed -n '40,120p' # Confirm signature/usage elsewhere in repo rg -n "AddEventHandler\(" pkg/controllers/c2cc/probe.go pkg/controllers/c2cc -S || true rg -n "failed to add RemoteCluster informer handlers" -S . || true # Show imports to see if fmt is already available sed -n '1,80p' pkg/controllers/c2cc/probe.go | nl -ba

Repository: openshift/microshift

Length of output: 107

🏁 Script executed:

#!/bin/bash set -euo pipefail # Print relevant portion with line numbers (using awk since nl isn't available) awk 'NR>=40 && NR<=120 {printf "%d\t%s\n", NR, $0}' pkg/controllers/c2cc/probe.go echo "----" # Search for AddEventHandler usage in this package rg -n "AddEventHandler\$" pkg/controllers/c2cc -S || true echo "----" # Search for fmt usage in this file to see if fmt is already imported/used rg -n "\\bfmt\\." pkg/controllers/c2cc/probe.go -S || true echo "----" # Print import block to see if fmt is present awk 'NR>=1 && NR<=80 {print}' pkg/controllers/c2cc/probe.go

Repository: openshift/microshift

Length of output: 5881

🏁 Script executed:

#!/bin/bash set -euo pipefail # Locate the generated informer file for RemoteClusters fd -a "remoteclusters" pkg/generated/informers/externalversions -S || true fd -a "RemoteClusters" pkg/generated/informers/externalversions -S || true # Search for the Informer() method definition returning the informer type rg -n "RemoteClusters\\(\$\\.?Informer\$" pkg/generated/informers/externalversions -S # Show the specific Informer() implementation + returned type rg -n "func \\(.*\$ Informer\$\$ cache\\.SharedIndexInformer" pkg/generated/informers/externalversions -S || true rg -n "func \$.*\$ Informer\$\$" pkg/generated/informers/externalversions -S | rg -n "RemoteClusters" || true # Find the AddEventHandler method signature in the client-go source if present in module cache # (best-effort; ignore failures if go env isn't available) if command -v go >/dev/null 2>&1; then GOPATH="$(go env GOPATH 2>/dev/null || true)" if [ -n "$GOPATH" ] && [ -d "$GOPATH/pkg/mod" ]; then rg -n "func \$.*\$ AddEventHandler\\(" "$GOPATH/pkg/mod" -S --hidden --no-ignore-vcs 2>/dev/null \ | rg -n "ResourceEventHandler" | head -n 20 fi fi

Repository: openshift/microshift

Length of output: 761

🏁 Script executed:

#!/bin/bash set -euo pipefail echo "== In-repo: find SharedIndexInformer interface definition ==" rg -n "type +SharedIndexInformer +interface" -S . || true echo "== In-repo: find AddEventHandler signature on SharedIndexInformer ==" rg -n "SharedIndexInformer.*AddEventHandler|AddEventHandler\\(handler" -S . || true echo "== In GOPATH pkg/mod: find SharedIndexInformer interface ==" if command -v go >/dev/null 2>&1; then GOPATH="$(go env GOPATH 2>/dev/null || true)" if [ -n "${GOPATH:-}" ] && [ -d "$GOPATH/pkg/mod" ]; then rg -n "type +SharedIndexInformer +interface" "$GOPATH/pkg/mod" -S --no-mmap 2>/dev/null | head -n 20 || true else echo "GOPATH pkg/mod not found" fi else echo "go not available" fi echo "== In GOPATH pkg/mod: find AddEventHandler signature ==" if command -v go >/dev/null 2>&1; then GOPATH="$(go env GOPATH 2>/dev/null || true)" if [ -n "${GOPATH:-}" ] && [ -d "$GOPATH/pkg/mod" ]; then rg -n "AddEventHandler\\(.*ResourceEventHandler" "$GOPATH/pkg/mod" -S --no-mmap 2>/dev/null \ | rg -n "SharedIndexInformer|cache\\.SharedIndexInformer|type .*SharedIndexInformer" \ | head -n 50 || true fi fi echo "== Show generated Informer() return type for RemoteClusters ==" sed -n '60,110p' pkg/generated/informers/externalversions/microshift/v1alpha1/remotecluster.go

Repository: openshift/microshift

Length of output: 9804

Handle RemoteCluster informer event-handler registration errors

cache.SharedIndexInformer.AddEventHandler(...) returns an (…, error) but pkg/controllers/c2cc/probe.go ignores the error, so probe event handling could be silently disabled. Fail fast if registration fails.

Suggested patch

- informer.AddEventHandler(cache.ResourceEventHandlerFuncs{ + if _, err := informer.AddEventHandler(cache.ResourceEventHandlerFuncs{ AddFunc: func(obj interface{}) { if rc, ok := obj.(*microshiftv1alpha1.RemoteCluster); ok { pm.startProbe(ctx, rc) } }, UpdateFunc: func(oldObj, newObj interface{}) { oldRC, ok1 := oldObj.(*microshiftv1alpha1.RemoteCluster) newRC, ok2 := newObj.(*microshiftv1alpha1.RemoteCluster) if ok1 && ok2 && (oldRC.Spec.ProbeTarget != newRC.Spec.ProbeTarget || oldRC.Spec.ProbeInterval != newRC.Spec.ProbeInterval) { pm.restartProbe(ctx, newRC) } }, DeleteFunc: func(obj interface{}) { rc, ok := obj.(*microshiftv1alpha1.RemoteCluster) if !ok { if tombstone, ok := obj.(cache.DeletedFinalStateUnknown); ok { rc, _ = tombstone.Obj.(*microshiftv1alpha1.RemoteCluster) } } if rc != nil { pm.stopProbe(rc.Name) } }, - }) + }); err != nil { + return fmt.Errorf("failed to add RemoteCluster informer handlers: %w", err) + }

🧰 Tools

🪛 golangci-lint (2.12.2)

[error] 61-61: Error return value of informer.AddEventHandler is not checked

(errcheck)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/controllers/c2cc/probe.go` around lines 61 - 86, The call to informer.AddEventHandler currently ignores its returned error, so update the code where informer.AddEventHandler(...) is invoked to capture the returned error and fail fast on error (e.g., return the error from the surrounding function or call log.Fatalf/process exit), rather than silently proceeding; keep the existing handlers (AddFunc calling pm.startProbe, UpdateFunc calling pm.restartProbe, DeleteFunc calling pm.stopProbe) intact and reference this change around the informer.AddEventHandler invocation to ensure event-handler registration errors are handled and propagated.

coderabbitai · 2026-05-25T13:50:02Z

+Compute 11th IP
+    [Documentation]    Return the 11th host address in a CIDR (e.g. 10.43.0.0/16 -> 10.43.0.11).
+    [Arguments]    ${cidr}
+    VAR    ${cmd}=    import ipaddress; n=ipaddress.ip_network('${cidr}', strict=False); print(n[11])
+    ${result}=    Process.Run Process    python3    -c    ${cmd}
+    Should Be Equal As Integers    ${result.rc}    0
+    ${ip}=    Strip String    ${result.stdout}
+    RETURN    ${ip}
+


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add bounds check for small CIDRs.

Accessing n[11] will raise IndexError if the network has fewer than 12 addresses (e.g., a /29 CIDR has only 8). While service CIDRs are typically large enough, adding a validation ensures clear error messages if a small CIDR is passed.

🛡️ Proposed fix to validate network size

Compute 11th IP [Documentation] Return the 11th host address in a CIDR (e.g. 10.43.0.0/16 -> 10.43.0.11). [Arguments] ${cidr} - VAR ${cmd}= import ipaddress; n=ipaddress.ip_network('${cidr}', strict=False); print(n[11]) + VAR ${cmd}= import ipaddress; n=ipaddress.ip_network('${cidr}', strict=False); assert n.num_addresses > 11, f"CIDR {n} too small"; print(n[11]) ${result}= Process.Run Process python3 -c ${cmd} Should Be Equal As Integers ${result.rc} 0

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/resources/c2cc.resource` around lines 286 - 294, The current inline Python command uses n[11] which raises IndexError for networks with fewer than 12 addresses; modify the command built in the ${cmd} variable to first compute n = ipaddress.ip_network('${cidr}', strict=False) and check n.num_addresses > 11 (or >=12) before accessing n[11]; if the check fails, print a clear error message and exit with a non-zero status so the Process.Run call (python3 -c ${cmd}) returns a failing rc and the Robot keyword surfaces a readable validation error instead of an IndexError traceback.

The probe pod needs a place to report health state per remote cluster. Add State (enum: NeverProbed/Healthy/Unhealthy with NeverProbed as default), LastSuccessfulProbe, LastProbeTime, and Errors fields to RemoteClusterStatus. Latency statistics are deferred to a later ticket. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The probe pod has a dual role: it listens on :8080 as a probe target for remote clusters, and watches RemoteCluster CRs to actively probe remote probe services. Per-CR goroutines perform HTTP GET at the configured interval and update CR status with state transitions (NeverProbed -> Healthy -> Unhealthy after 3 consecutive failures). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Hidden subcommand invoked by the probe pod deployment as 'microshift c2cc-probe'. Delegates to c2cc.RunProbe() which serves as a probe target and actively probes remote clusters. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Namespace, ServiceAccount, ClusterRole, ClusterRoleBinding, Deployment, and Service for the C2CC probe pod. The Deployment mounts the host microshift binary and runs 'microshift c2cc-probe'. The Service gets a deterministic ClusterIP (11th IP of local service CIDR) so remote clusters can target it as a probe endpoint. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

After the RemoteCluster CRD is established, the controller applies the probe pod manifests (namespace, SA, RBAC, deployment, service). The probe service gets the 11th IP of the local service CIDR as its deterministic ClusterIP so remote clusters can target it. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

When C2CC config is removed, cleanupAll() now deletes the probe pod namespace (cascading to SA, Deployment, Service), ClusterRole, and ClusterRoleBinding before removing the RemoteCluster CRD. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Move deployProbe() into fullReconcile() so the probe deployment is re-applied every 10s. If the deployment is deleted or scaled down at runtime, the next reconcile cycle recreates it. ApplyDeployments is idempotent so this is a no-op when the deployment already matches. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Tests verify probe namespace, deployment readiness, service ClusterIP assignment, RemoteCluster status transitions, and self-healing after deployment deletion or scale-down. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

UpdateFunc unconditionally restarted probes on every RemoteCluster update, including status updates from the probe loop itself. This created a feedback loop that reset consecutiveFailures to 0 on every tick, preventing clusters from ever reaching the unhealthy threshold. Only restart probes when spec fields actually change. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

fillDefaults() returns an error but the tests were ignoring it. Use require.NoError so tests fail fast with a clear message if default initialization breaks. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The keyword was duplicated in both probe.robot and 07-healthcheck.robot. Move it to c2cc.resource which both files already import. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 3

♻️ Duplicate comments (3)

pkg/controllers/c2cc/healthcheck.go (1)

65-67: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Retry on conflict for spec updates.

The update path can fail on transient resourceVersion conflicts when status is updated concurrently; retrying avoids reconcile aborts.

Suggested patch

 import (
 	"context"
 	"fmt"
 	"net"
 	"strings"

 	microshiftv1alpha1 "github.com/openshift/microshift/pkg/apis/microshift/v1alpha1"
 	"github.com/openshift/microshift/pkg/config"
 	microshiftclient "github.com/openshift/microshift/pkg/generated/clientset/versioned/typed/microshift/v1alpha1"
 	metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
+	"k8s.io/client-go/util/retry"
 	"k8s.io/klog/v2"
 )
@@
-		got.Spec = want.Spec
-		if _, err := h.client.RemoteClusters().Update(ctx, got, metav1.UpdateOptions{}); err != nil {
+		if err := retry.RetryOnConflict(retry.DefaultRetry, func() error {
+			latest, err := h.client.RemoteClusters().Get(ctx, name, metav1.GetOptions{})
+			if err != nil {
+				return err
+			}
+			latest.Spec = want.Spec
+			_, err = h.client.RemoteClusters().Update(ctx, latest, metav1.UpdateOptions{})
+			return err
+		}); err != nil {
 			return fmt.Errorf("updating RemoteCluster %q: %w", name, err)
 		}

#!/bin/bash
set -euo pipefail

echo "Verify current reconcile update path (no retry loop):"
awk 'NR>=60 && NR<=72 {printf "%d:%s\n", NR, $0}' pkg/controllers/c2cc/healthcheck.go

echo "Verify concurrent status writer exists:"
rg -n --type=go 'RemoteClusters\(\)\.UpdateStatus\(' pkg/controllers/c2cc/probe.go

echo "Verify RetryOnConflict is not currently used in healthcheck manager:"
rg -n --type=go 'RetryOnConflict' pkg/controllers/c2cc/healthcheck.go || true

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/controllers/c2cc/healthcheck.go` around lines 65 - 67, The update
currently sets got.Spec = want.Spec and calls
h.client.RemoteClusters().Update(...) which can fail with resourceVersion
conflicts; wrap the update in a retry loop using
k8s.io/apimachinery/pkg/util/wait.RetryOnConflict: inside the loop re-get the
latest RemoteCluster (using h.client.RemoteClusters().Get), set its Spec to
want.Spec, then call Update and return on success, propagating non-conflict
errors; reference the existing symbols got.Spec = want.Spec,
h.client.RemoteClusters().Update, and use RetryOnConflict to implement the retry
semantics.

pkg/controllers/c2cc/probe.go (2)

61-86: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Handle AddEventHandler registration errors.

If handler registration fails, probe management silently never starts correctly.

Suggested patch

-	informer.AddEventHandler(cache.ResourceEventHandlerFuncs{
+	if _, err := informer.AddEventHandler(cache.ResourceEventHandlerFuncs{
 		AddFunc: func(obj interface{}) {
 			if rc, ok := obj.(*microshiftv1alpha1.RemoteCluster); ok {
 				pm.startProbe(ctx, rc)
 			}
 		},
@@
 		DeleteFunc: func(obj interface{}) {
@@
 		},
-	})
+	}); err != nil {
+		return fmt.Errorf("failed to add RemoteCluster informer handlers: %w", err)
+	}

#!/bin/bash
set -euo pipefail

echo "Inspect AddEventHandler call site:"
awk 'NR>=58 && NR<=90 {printf "%d:%s\n", NR, $0}' pkg/controllers/c2cc/probe.go

echo "Confirm return value is not captured:"
rg -n --type=go '^\s*informer\.AddEventHandler\(' pkg/controllers/c2cc/probe.go

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/controllers/c2cc/probe.go` around lines 61 - 86, The AddEventHandler call
in probe.go is currently un-checked so failures to register handlers are
ignored; modify the code that registers the informer handlers (the
informer.AddEventHandler call in the same function that calls
pm.startProbe/pm.restartProbe/pm.stopProbe) to capture and handle any
registration error—i.e., call the registration variant that returns an error (or
wrap the registration in a helper that validates success), and on error log the
failure via the controller logger and return the error (or propagate it) so
probe management fails fast instead of silently never starting.

44-44: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Set ReadHeaderTimeout on the probe HTTP server.

This leaves the listener exposed to slow-header connection abuse.

Suggested patch

-	server := &http.Server{Addr: ":8080", Handler: mux}
+	server := &http.Server{
+		Addr:              ":8080",
+		Handler:           mux,
+		ReadHeaderTimeout: 10 * time.Second,
+	}

#!/bin/bash
set -euo pipefail

echo "Inspect server construction:"
awk 'NR>=39 && NR<=48 {printf "%d:%s\n", NR, $0}' pkg/controllers/c2cc/probe.go

echo "Check whether ReadHeaderTimeout is present:"
rg -n --type=go 'ReadHeaderTimeout' pkg/controllers/c2cc/probe.go || true

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/controllers/c2cc/probe.go` at line 44, The probe HTTP server is created
with server := &http.Server{Addr: ":8080", Handler: mux} and lacks a
ReadHeaderTimeout, leaving it vulnerable to slow-header attacks; update the
http.Server construction for the probe (the variable named server) to include a
ReadHeaderTimeout (e.g., 5s or another suitable duration defined in a
constant/config) so the server rejects connections that take too long to send
headers, and ensure the chosen timeout is applied where the probe server is
started.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@assets/crd/microshift.io_remoteclusters.yaml`:
- Around line 43-47: Add schema-level admission validation for
spec.probeInterval: update the CRD's openAPIV3Schema for the spec.probeInterval
property to include an x-kubernetes-validations rule that enforces a duration
between 1s and 5m (e.g. a CEL expression that parses the duration and asserts
seconds >= 1 && seconds <= 300) and a pattern fallback to only accept valid
duration strings (digits + unit like s, m). Target the spec.probeInterval
property in the CRD schema and include a clear message for rejection.

In `@Makefile`:
- Around line 349-352: The generate-crds Makefile target is currently leaf-only
and not invoked by the broader generate/verify flows; update the top-level
targets (e.g., the generate target and verify/verify-config targets used by make
and make verify) to depend on generate-crds so CRD and client generation runs
automatically—modify the prerequisites for the primary generate target and the
verify/verify-config targets to include generate-crds (references: Makefile
targets "generate-crds", "generate", "verify", "verify-config") so stale
CRDs/clients are caught by the normal make/verify pipelines.

In `@pkg/controllers/c2cc/deploy_probe.go`:
- Around line 26-29: The code accesses c.cfg.Network.ServiceNetwork[0] without
checking length which will panic if the slice is empty; before calling
net.ParseCIDR, add a guard that validates len(c.cfg.Network.ServiceNetwork) > 0
and return a clear error (e.g. "service network not configured") if empty, then
proceed to call net.ParseCIDR using the first element and preserve the existing
error wrapping for ParseCIDR failures (variables involved:
c.cfg.Network.ServiceNetwork, net.ParseCIDR, svcNet, err).

---

Duplicate comments:
In `@pkg/controllers/c2cc/healthcheck.go`:
- Around line 65-67: The update currently sets got.Spec = want.Spec and calls
h.client.RemoteClusters().Update(...) which can fail with resourceVersion
conflicts; wrap the update in a retry loop using
k8s.io/apimachinery/pkg/util/wait.RetryOnConflict: inside the loop re-get the
latest RemoteCluster (using h.client.RemoteClusters().Get), set its Spec to
want.Spec, then call Update and return on success, propagating non-conflict
errors; reference the existing symbols got.Spec = want.Spec,
h.client.RemoteClusters().Update, and use RetryOnConflict to implement the retry
semantics.

In `@pkg/controllers/c2cc/probe.go`:
- Around line 61-86: The AddEventHandler call in probe.go is currently
un-checked so failures to register handlers are ignored; modify the code that
registers the informer handlers (the informer.AddEventHandler call in the same
function that calls pm.startProbe/pm.restartProbe/pm.stopProbe) to capture and
handle any registration error—i.e., call the registration variant that returns
an error (or wrap the registration in a helper that validates success), and on
error log the failure via the controller logger and return the error (or
propagate it) so probe management fails fast instead of silently never starting.
- Line 44: The probe HTTP server is created with server := &http.Server{Addr:
":8080", Handler: mux} and lacks a ReadHeaderTimeout, leaving it vulnerable to
slow-header attacks; update the http.Server construction for the probe (the
variable named server) to include a ReadHeaderTimeout (e.g., 5s or another
suitable duration defined in a constant/config) so the server rejects
connections that take too long to send headers, and ensure the chosen timeout is
applied where the probe server is started.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 928d58d0-49a8-4e1a-b7d7-a07340e7adee

📥 Commits

Reviewing files that changed from the base of the PR and between 89f74e6 and 00bf8b1.

⛔ Files ignored due to path filters (22)

pkg/apis/microshift/v1alpha1/zz_generated.deepcopy.go is excluded by !**/zz_generated*
pkg/generated/clientset/versioned/clientset.go is excluded by !**/generated/**
pkg/generated/clientset/versioned/fake/clientset_generated.go is excluded by !**/generated/**
pkg/generated/clientset/versioned/fake/doc.go is excluded by !**/generated/**
pkg/generated/clientset/versioned/fake/register.go is excluded by !**/generated/**
pkg/generated/clientset/versioned/scheme/doc.go is excluded by !**/generated/**
pkg/generated/clientset/versioned/scheme/register.go is excluded by !**/generated/**
pkg/generated/clientset/versioned/typed/microshift/v1alpha1/doc.go is excluded by !**/generated/**
pkg/generated/clientset/versioned/typed/microshift/v1alpha1/fake/doc.go is excluded by !**/generated/**
pkg/generated/clientset/versioned/typed/microshift/v1alpha1/fake/fake_microshift_client.go is excluded by !**/generated/**
pkg/generated/clientset/versioned/typed/microshift/v1alpha1/fake/fake_remotecluster.go is excluded by !**/generated/**
pkg/generated/clientset/versioned/typed/microshift/v1alpha1/generated_expansion.go is excluded by !**/generated/**
pkg/generated/clientset/versioned/typed/microshift/v1alpha1/microshift_client.go is excluded by !**/generated/**
pkg/generated/clientset/versioned/typed/microshift/v1alpha1/remotecluster.go is excluded by !**/generated/**
pkg/generated/informers/externalversions/factory.go is excluded by !**/generated/**
pkg/generated/informers/externalversions/generic.go is excluded by !**/generated/**
pkg/generated/informers/externalversions/internalinterfaces/factory_interfaces.go is excluded by !**/generated/**
pkg/generated/informers/externalversions/microshift/interface.go is excluded by !**/generated/**
pkg/generated/informers/externalversions/microshift/v1alpha1/interface.go is excluded by !**/generated/**
pkg/generated/informers/externalversions/microshift/v1alpha1/remotecluster.go is excluded by !**/generated/**
pkg/generated/listers/microshift/v1alpha1/expansion_generated.go is excluded by !**/generated/**
pkg/generated/listers/microshift/v1alpha1/remotecluster.go is excluded by !**/generated/**

📒 Files selected for processing (31)

Makefile
assets/components/c2cc/clusterrole.yaml
assets/components/c2cc/clusterrolebinding.yaml
assets/components/c2cc/deployment.yaml
assets/components/c2cc/namespace.yaml
assets/components/c2cc/service.yaml
assets/components/c2cc/serviceaccount.yaml
assets/crd/microshift.io_remoteclusters.yaml
cmd/generate-config/config/config-openapi-spec.json
cmd/microshift/main.go
docs/user/howto_config.md
packaging/microshift/config.yaml
pkg/apis/microshift/v1alpha1/doc.go
pkg/apis/microshift/v1alpha1/groupversion_info.go
pkg/apis/microshift/v1alpha1/types.go
pkg/assets/crd.go
pkg/cmd/c2cc_probe.go
pkg/config/c2cc.go
pkg/config/c2cc_test.go
pkg/config/config.go
pkg/controllers/c2cc/controller.go
pkg/controllers/c2cc/deploy_probe.go
pkg/controllers/c2cc/healthcheck.go
pkg/controllers/c2cc/healthcheck_test.go
pkg/controllers/c2cc/probe.go
scripts/auto-rebase/assets.yaml
scripts/boilerplate.go.txt
scripts/generate-crds.sh
test/resources/c2cc.resource
test/suites/c2cc/healthcheck.robot
test/suites/c2cc/probe.robot

✅ Files skipped from review due to trivial changes (4)

pkg/apis/microshift/v1alpha1/doc.go
scripts/auto-rebase/assets.yaml
docs/user/howto_config.md
assets/components/c2cc/namespace.yaml

coderabbitai · 2026-05-25T14:50:41Z

+              probeInterval:
+                default: 10s
+                description: Interval between probe attempts (e.g. "10s", "1m").
+                type: string
+              probeTarget:


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Enforce spec.probeInterval constraints at admission.

probeInterval is documented as 1s–5m, but the CRD schema currently accepts any string. Add schema-level validation so invalid values are rejected early.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@assets/crd/microshift.io_remoteclusters.yaml` around lines 43 - 47, Add schema-level admission validation for spec.probeInterval: update the CRD's openAPIV3Schema for the spec.probeInterval property to include an x-kubernetes-validations rule that enforces a duration between 1s and 5m (e.g. a CEL expression that parses the duration and asserts seconds >= 1 && seconds <= 300) and a pattern fallback to only accept valid duration strings (digits + unit like s, m). Target the spec.probeInterval property in the CRD schema and include a clear message for rejection.

coderabbitai · 2026-05-25T14:50:41Z

+.PHONY: generate-crds
+generate-crds:
+	./scripts/fetch_tools.sh controller-gen && \
+	./scripts/generate-crds.sh


🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

Wire generate-crds into the default generate/verify flow.

Right now this target is leaf-only, so the visible make, make verify, and make verify-config paths won't catch stale CRDs or generated clients after API changes.

Suggested change

-all: generate-config microshift etcd +all: generate-config generate-crds microshift etcd ... -verify-config: generate-config +verify-config: generate-config generate-crds

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@Makefile` around lines 349 - 352, The generate-crds Makefile target is currently leaf-only and not invoked by the broader generate/verify flows; update the top-level targets (e.g., the generate target and verify/verify-config targets used by make and make verify) to depend on generate-crds so CRD and client generation runs automatically—modify the prerequisites for the primary generate target and the verify/verify-config targets to include generate-crds (references: Makefile targets "generate-crds", "generate", "verify", "verify-config") so stale CRDs/clients are caught by the normal make/verify pipelines.

coderabbitai · 2026-05-25T14:50:41Z

+	_, svcNet, err := net.ParseCIDR(c.cfg.Network.ServiceNetwork[0])
+	if err != nil {
+		return fmt.Errorf("failed to parse local service network: %w", err)
+	}


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Guard empty ServiceNetwork before indexing.

Line 26 accesses c.cfg.Network.ServiceNetwork[0] without a length check; an empty list will panic and crash reconciliation.

Suggested fix

func (c *C2CCRouteManager) deployProbe(ctx context.Context) error { + if len(c.cfg.Network.ServiceNetwork) == 0 { + return fmt.Errorf("service network is empty") + } _, svcNet, err := net.ParseCIDR(c.cfg.Network.ServiceNetwork[0]) if err != nil { return fmt.Errorf("failed to parse local service network: %w", err) }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

_, svcNet, err := net.ParseCIDR(c.cfg.Network.ServiceNetwork[0])

if err != nil {

return fmt.Errorf("failed to parse local service network: %w", err)

}

if len(c.cfg.Network.ServiceNetwork) == 0 {

return fmt.Errorf("service network is empty")

}

_, svcNet, err := net.ParseCIDR(c.cfg.Network.ServiceNetwork[0])

if err != nil {

return fmt.Errorf("failed to parse local service network: %w", err)

}

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/controllers/c2cc/deploy_probe.go` around lines 26 - 29, The code accesses c.cfg.Network.ServiceNetwork[0] without checking length which will panic if the slice is empty; before calling net.ParseCIDR, add a guard that validates len(c.cfg.Network.ServiceNetwork) > 0 and return a clear error (e.g. "service network not configured") if empty, then proceed to call net.ParseCIDR using the first element and preserve the existing error wrapping for ParseCIDR failures (variables involved: c.cfg.Network.ServiceNetwork, net.ParseCIDR, svcNet, err).

openshift-ci Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 25, 2026

openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label May 25, 2026

openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 25, 2026

coderabbitai Bot requested changes May 25, 2026

View reviewed changes

pmtk force-pushed the c2cc/probe branch from 7ec41e8 to bcd4630 Compare May 25, 2026 11:32

pmtk force-pushed the c2cc/probe branch from bcd4630 to 06c24c1 Compare May 25, 2026 11:37

coderabbitai Bot requested changes May 25, 2026

View reviewed changes

pmtk force-pushed the c2cc/probe branch from 06c24c1 to 89f74e6 Compare May 25, 2026 13:41

coderabbitai Bot requested changes May 25, 2026

View reviewed changes

pmtk and others added 14 commits May 25, 2026 15:56

CRD: remoteclusters.microshift.io/v1alpha1

d2ed223

Config: ProbeInterval

1fd4d5e

Create RemoteCluster CRs

f6e1153

Register c2cc-probe subcommand

8158bb2

Hidden subcommand invoked by the probe pod deployment as 'microshift c2cc-probe'. Delegates to c2cc.RunProbe() which serves as a probe target and actively probes remote clusters. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add Robot Framework tests for C2CC probe deployment

b395683

Tests verify probe namespace, deployment readiness, service ClusterIP assignment, RemoteCluster status transitions, and self-healing after deployment deletion or scale-down. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Assert fillDefaults() success in probe config tests

12d1d4a

fillDefaults() returns an error but the tests were ignoring it. Use require.NoError so tests fail fast with a clear message if default initialization breaks. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Extract Compute 11th IP keyword to shared c2cc resource

00bf8b1

The keyword was duplicated in both probe.robot and 07-healthcheck.robot. Move it to c2cc.resource which both files already import. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

pmtk force-pushed the c2cc/probe branch from 89f74e6 to 00bf8b1 Compare May 25, 2026 14:41

coderabbitai Bot requested changes May 25, 2026

View reviewed changes

Conversation

pmtk commented May 25, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

openshift-ci Bot commented May 25, 2026

Uh oh!

openshift-ci-robot commented May 25, 2026 • edited by openshift-ci Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

openshift-ci Bot commented May 25, 2026

Uh oh!

coderabbitai Bot commented May 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Estimated code review effort

Suggested reviewers

❌ Failed checks (3 warnings)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pmtk May 25, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 25, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot May 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pmtk May 25, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 25, 2026

Choose a reason for hiding this comment

Uh oh!

pmtk commented May 25, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 25, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 25, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 25, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 25, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 25, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 25, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 25, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 25, 2026

pmtk commented May 25, 2026 •

edited by coderabbitai Bot

Loading

openshift-ci-robot commented May 25, 2026 •

edited by openshift-ci Bot

Loading

coderabbitai Bot commented May 25, 2026 •

edited

Loading

coderabbitai Bot May 25, 2026 •

edited

Loading

coderabbitai Bot May 25, 2026 •

edited

Loading