fix(rc9): GA blockers + monetize buy-side fixes from the v0.10.0-rc9 report#583
Merged
Conversation
…et create-only The pinned serviceoffer-controller image (f5d94fc) was a side-branch build that predated the change making Secret create-only in the reconciler. The tightened ClusterRole grants no secrets update/patch verb, so the deployed binary 403s when it Updates the per-agent hermes-api-server / remote-signer-keystore Secrets on re-reconcile, and per-agent provisioning never converges. Repin to 503016b@sha256:bec62ea0 (rc9 commit 503016b, image 0.10.0-rc9), whose reconciler treats Secret as create-only and matches the shipped RBAC. Add a tripwire test mirroring the x402-verifier one so a future downgrade can't silently re-ship the bug. The short-SHA tag keeps the dev-mode :latest rewrite and production pin invariants intact.
HoldSign popped s.auths[0] with no deadline check. A pre-signed Permit2 (OBOL) batch shares one ~5-min deadline, so once expired the buyer served the whole batch auth-by-auth, each returning 503 invalid_payment_expired from the verifier before reaching a fresh auth. Add authDeadlineUnix (covering the Permit2 deadline, nested ERC-3009 validBefore, and legacy flat field) and skip expired auths at pick time. USDC vouchers use a year-2106 validBefore and are never dropped.
… is gone reconcileDeletingPurchase routed the 'not found in sidecar status' error into the Configured&&Remaining>0 branch, which kept Remaining>0 and requeued every 5s forever, stranding the PurchaseRequest in Terminating until its finalizer was force-removed. That signal means the sidecar has nothing left to drain. Add a case (via isSidecarUpstreamGone) that collapses Remaining to 0 so cleanup and finalizer removal proceed, consistent with the terminal not-found check already present later in the function. Transient errors still requeue.
buy.py status/list showed the raw sidecar 'remaining' count, so an all-expired Permit2 auth pool read as ready to spend. Add _auth_deadline / _count_valid_auths and surface expired auths in both commands so an operator or agent tops up instead of burning expired vouchers into 503s.
… path HandleProxy (and the standalone inference gateway) rebuild the ForwardAuth middleware per request with VerifyOnly=false by design — they proxy to the real upstream and settle only after a <400 response — so the verifyOnly=false warning fired on every paid request telling operators to 'fix' correct config. Add a SettlesInProcess flag that suppresses the warning on those paths while leaving the genuinely-dangerous Traefik ForwardAuth path loud.
obol agent new --model X provisioned cleanly for an unknown model, then every chat call failed with 'no healthy deployments for this model'. Add a preflight in createCRDAgent that checks a non-empty --model against the LiteLLM registry and fails fast with the available models. A transient list error warns and continues; an empty model still lets the controller auto-pin.
…races
Docker Desktop on macOS intermittently fails to create the gRPC-FUSE mount
source for a k3d node's workspace data dir under sustained cluster-churn
("error while creating mount source path ...: no such file or directory"),
so the k3s node never reports ready and k3d rolls the cluster back. The host
dir exists; it's a daemon-side file-sharing race. The dual-stack stack-up loop
already retries port-bind and image/Helm transients — extend it to retry this
mount race (a fresh cluster on retry clears it) so the release smoke isn't
flaked by an environment-side Docker hiccup.
OisinKyne
approved these changes
Jun 3, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
What changed: Fixes the v0.10.0-rc9 upgrade-report issues. Each was validated against rc9
source (adversarially re-checked) before fixing.
serviceoffer-controllerfrom thef5d94fcside-branch build(which predated the Secret-create-only reconciler change) to
503016b@sha256:bec62ea0(rc9 commit
503016bf, image0.10.0-rc9). The old pinUpdates per-agent Secrets, whichthe tightened RBAC (no
secrets:update/patch) 403s → per-agent provisioning never converges.Added a tripwire test.
x402-buyerHoldSignnow drops expired pre-signed auths before signing(Permit2 deadline / ERC-3009 validBefore), ending the
503 invalid_payment_expiredcascade.buy.py status/listcount is expiry-aware (valid vs expired auths).obolup(or obol) #5 —reconcileDeletingPurchasefinalizes the delete-drain on thenot found in sidecar statussignal instead of requeueing forever (strandedTerminating).verifyOnly=falsewarning on the in-process settle path(
HandleProxy/obol sell inference); the Traefik ForwardAuth path still warns.obol agent new --model Xvalidates X against the LiteLLM registry; fails fast.Issue #2 (master-Hermes PVC ownership on k3d local-path) is already fixed on
main(
eb985bd/671c8acroot-chown init container); this branch inherits it. Per-agent Hermes wasconfirmed not a residual (it seeds via Secret, not host PVC writes).
Why it matters: #1 and #2 are GA blockers from the rc9 report; the rest are buy-side
correctness / UX fixes on the monetize path.
Risk level: low — the controller fix is a pin bump to an already-published rc9 image; the rest
are narrow, regression-tested behaviour changes. No RBAC widened, no security surface added.
Commit under test:
8cfd0ca1(live-chain evidence captured atb90118fd;8cfd0ca1adds only the dual-stack flows retry, no runtime behaviour change)
Base branch:
mainScope
Validation
CI checks:
Unit tests:
Integration tests:
Flow tests (best result per flow across 3 full smoke runs on local darwin/arm64, k3d):
Release smoke:
Live Chain Evidence
Network: Base Sepolia (eip155:84532)
RPC/provider: paid drpc load-balancer (redacted)
Facilitator: https://x402.gcp.obol.tech (prometheus-overlay)
Contracts and tokens:
Wallet roles:
Balances:
Transaction receipts:
Runtime Evidence
QA environment:
Images:
Kubernetes / stack:
Model and routing:
Artifacts and logs:
Demo readiness:
Review Notes
Known gaps:
obolup(or obol) #5 (controller) source fixes reach release installs only once thebuyer/controller images are rebuilt at this branch's commit and their pins bumped (the normal
release-image step). They take effect immediately in dev mode (validated by the smoke). The Bring obolup code to this repo #1
repin already points at a published rc9 image, so it fixes release installs as-is.
build poisoned
serviceoffer-controller:latest; forcing/rebuilding from this branch resolves it.Follow-ups:
User facing ingress #3/Specify flag passing to
obolup(or obol) #5 to release installs.Reviewer focus:
internal/embed/infrastructure/base/templates/x402.yamlcontroller pin +embed_crd_test.gotripwire (no RBAC widened).
internal/x402/buyer/signer.goexpiry filter (USDC validBefore=2106 never dropped).internal/serviceoffercontroller/purchase.gonot-found drain case (transient errors still requeue).