Skip to content

feat(core): implement validatorapi beacon/sync committee selections handlers#459

Open
varex83agent wants to merge 8 commits into
mainfrom
bohdan/validatorapi-selections
Open

feat(core): implement validatorapi beacon/sync committee selections handlers#459
varex83agent wants to merge 8 commits into
mainfrom
bohdan/validatorapi-selections

Conversation

@varex83agent
Copy link
Copy Markdown
Collaborator

Summary

  • Ports BeaconCommitteeSelections (core/validatorapi/validatorapi.go:798-864) and SyncCommitteeSelections (validatorapi.go:1072-1138) into crates/core/src/validatorapi/component.rs, replacing both unimplemented!() arms.
  • Adds a register_active_validators hook on Component so the handlers can translate per-selection validator_index values to DV root pubkeys, mirroring Go's c.eth2Cl.ActiveValidators(ctx) lookup.
  • Replaces the BeaconCommitteeSelection {} / SyncCommitteeSelection {} placeholder structs in validatorapi/types.rs with type aliases for pluto_eth2api::v1::BeaconCommitteeSelection / v1::SyncCommitteeSelection, matching the established AttestationData = phase0::AttestationData pattern.
  • Bounds each await_agg_sig_db lookup with a new SELECTIONS_AGG_SIG_DB_TIMEOUT (~two slots) so a stalled AggSigDB cannot pin a selections request forever; bounds the active_validators hook with UPSTREAM_REQUEST_TIMEOUT.
  • No secrets/upstream-body leakage into client-visible messages: verification failures, unknown validator indices, subscriber failures, and AggSigDB lookup failures all use generic strings; underlying errors are attached on ApiError::source for debug logs only.

Go reference

Endpoint Go (Charon v1.7.1) Rust
beacon_committee_selections core/validatorapi/validatorapi.go:798-864 Component::beacon_committee_selections in validatorapi/component.rs
sync_committee_selections core/validatorapi/validatorapi.go:1072-1138 Component::sync_committee_selections in validatorapi/component.rs
ActiveValidators lookup c.eth2Cl.ActiveValidators(ctx) (called at the top of both handlers) new register_active_validators hook on Component returning HashMap<ValidatorIndex, BLSPubKey>
verifyPartialSig validatorapi.go:1352-1368 Component::verify_partial_sig (PR-1 plumbing)
NewPartialSignedBeaconCommitteeSelection core/signeddata.go:1241 signeddata::BeaconCommitteeSelection::new_partial
NewPartialSignedSyncCommitteeSelection core/signeddata.go:1303 signeddata::SyncCommitteeSelection::new_partial
NewPrepareAggregatorDuty(slot) duty fanout key Duty::new_prepare_aggregator_duty(SlotNumber::new(slot))
NewPrepareSyncContributionDuty(slot) duty fanout key Duty::new_prepare_sync_contribution_duty(SlotNumber::new(slot))
awaitAggSigDBFunc(ctx, duty, pk) validatorapi.go:849 / :1123 await_agg_sig_db_fn hook (PR-1 plumbing) bounded by SELECTIONS_AGG_SIG_DB_TIMEOUT

Test plan

  • cargo +nightly fmt --all --check
  • cargo clippy -p pluto-core --all-targets --all-features -- -D warnings
  • cargo test -p pluto-core --all-features - 384/384 passing (10 new tests covering: happy path / multi-selection fanout+stitching / unknown validator-index 400 / verification-failure short-circuit / await timeout - for each of the two handlers)

@varex83agent varex83agent force-pushed the bohdan/validatorapi-plumbing branch from f5c3b49 to 8eedb3f Compare June 2, 2026 14:09
Base automatically changed from bohdan/validatorapi-plumbing to main June 3, 2026 20:02
@varex83agent
Copy link
Copy Markdown
Collaborator Author

/loop-review-pr summary

Ran 3 review-and-fix iteration(s) against this PR. Terminated by: completion_promise (PR_IDEAL).

Quality gates (final)

  • cargo fmt — pass
  • cargo clippy — pass
  • cargo test — pass (1387 / 0 across the workspace)

Resolved during the loop

Major (8)

  • Router beacon_committee_selections / sync_committee_selections were todo!() stubs — endpoint reachable but panicked. Wired into state.handler with response wrappers matching Charon's {data: [...]} wire shape — crates/core/src/validatorapi/router.rs — fix in b1a8a1d
  • Per-selection SELECTIONS_AGG_SIG_DB_TIMEOUT instead of per-request deadline — worst case was N*24s. Switched to a single shared tokio::time::Instant deadline anchored at the start of the fan-out phase, via tokio::time::timeout_atcrates/core/src/validatorapi/component.rs — fix in e7179d2
  • Subscriber fan-out had no timeout — a stalled subscriber could pin the request indefinitely. Each sub(&duty, set.clone()) call now shares the same per-request deadline — crates/core/src/validatorapi/component.rs — fix in e7179d2
  • Body-size limit not applied to selection routes (would have inherited axum's 2 MiB default). Introduced SELECTIONS_BODY_LIMIT (64 KiB) — crates/core/src/validatorapi/router.rs — fix in b1a8a1d
  • No structured logging on the new error paths. Every fan-out / AggSigDB-await error branch now emits a tracing::warn! (timeouts) or tracing::error! (wiring failures) with slot and source error — crates/core/src/validatorapi/component.rs — fix in e7179d2 + b5e7b44
  • No test for empty selections == []. Added {beacon,sync}_committee_selections_empty_input_returns_empty_data — fix in e7179d2
  • No test for downcast_*_selection 500 path. Added {beacon,sync}_committee_selections_rejects_aggsigdb_type_mismatch — fix in e7179d2
  • Malformed JSON on POST routes returned axum's plain-text 400/422/415 instead of the router's {code, message} envelope (mirrors the b5eb0ca fix already on main that this branch predates). Introduced json_rejection_to_api_error; rewired all four POST handlers via Result<Json<...>, JsonRejection>. Added envelope tests for both selection routes and the attester duties route — crates/core/src/validatorapi/router.rs — fix in cef07e9 + 554ce69

Minor (7)

  • Instant::checked_add(...).expect(...) hardcoded "24s" in the panic message and was duplicated across both handlers — extracted selections_deadline() helper — fix in b5e7b44
  • beacon_committee_selections inlined selection.slot.tree_hash_root().0 while the sync handler used selection.message_root() — switched to the helper for consistency; removed the now-unused tree_hash::TreeHash import — fix in b5e7b44
  • Hook-missing ok_or_else branches returned 500 without logging — added tracing::error! to fetch_active_validators and both handlers' await_agg_sig_db_fn checks — fix in b5e7b44
  • Subscriber Err and AggSigDB Err (non-timeout) were logged at warn — bumped to error (timeouts stay at warn as transient peer-side issues) — fix in b5e7b44
  • selections_post / duties_post were mechanical duplicates — collapsed into a single generic bounded_post(handler, body_limit) helper — fix in cef07e9
  • const declarations sat between two use blocks in router.rs — reordered so all imports come before constants — fix in cef07e9
  • component.rs import grouping inverted vs the workspace std → external → super/crate convention — reordered — fix in 554ce69

Nits (3)

  • tracing::warn!(slot = *slot, ...) deref noise — destructured for (&slot, set) in &psigs_by_slot so the eight log fields + four SlotNumber::new(*slot) calls collapse to plain slot — fix in b5e7b44
  • TestHandler selection methods eager-cloned the Option before unwrapping — switched all six Option-based stubs to match self.<field>.as_ref() with unimplemented! on the None path — fix in b5e7b44 + 554ce69
  • SELECTIONS_BODY_LIMIT doc-comment per-entry size was inflated (~300 B claimed vs ~210-250 B actual) — tightened — fix in cef07e9

Outstanding (intentional deferrals)

These were flagged during the loop but deferred because they sit outside this PR's stated scope or trade off poorly against the size of the change required:

  • epoch_from_slot per-selection upstream-spec amplification — Each selection currently triggers a fresh GET /eth/v1/config/spec because EthBeaconNodeApiClient has no client-side cache. With the body cap at 64 KiB this is bounded at ~260 spec fetches per request rather than unbounded, but a follow-up should either cache the slots-per-epoch / signing-domain on the client or hoist the lookup to once per request. Affects security/perf only; correctness is unchanged.
  • 400 vs 500 status divergence with Charon — Pluto surfaces validator not found and verification failures as 400 with a generic message + attached source; Charon returns 500 via writeError's apiError fallback. Intentional Pluto policy choice (clearer client errors); see in-line comments at the call sites.
  • Empty selections == [] returns {"data":[]} — Charon returns {"data":null} for the same case. The Pluto shape is more beacon-API-spec-compliant; documented in the empty-input tests.
  • Style refactors deferred — Two selection handlers share ~90% of their code; the two downcast_*_selection helpers are mechanical duplicates; test fixture builders make_selections_component_{insecure,secure} are ~95% duplicates; endpoint: &'static str discriminator could be a SelectionEndpoint enum. None of these affect correctness.
  • Defensive MAX_SELECTIONS_PER_REQUEST cap — The 64 KiB body cap already bounds the maximum element count to ~260; an explicit element-count cap would mirror VAL_INDEXES_MAX_LEN but adds little extra defence-in-depth.
  • ApiJson<T> extractor wrapper — Would remove the repeated let Json(x) = x.map_err(json_rejection_to_api_error)?; lines at the top of each POST handler. Deferred for parity with the existing query_rejection_to_api_error pattern.
  • Multi-N shared-deadline coverage test — All existing *_timeout tests use N=1, so a refactor that accidentally restored per-await timeouts would not be caught. The shared-deadline contract is encoded in selections_deadline() and exercised indirectly; an explicit N=3 test would lock the contract.

Verdict

PR is ideal — all bug/major findings resolved, gates green.

State log: .claude/loop-review-state/pr-459.md

varex83agent and others added 8 commits June 5, 2026 12:18
…andlers

Ports `BeaconCommitteeSelections` and `SyncCommitteeSelections` from
`core/validatorapi/validatorapi.go` (lines 798-864 and 1072-1138). Each
handler resolves every selection's validator-index to a DV root pubkey
via a new `register_active_validators` hook, builds the matching
`signeddata::*::new_partial` wrapper, verifies the per-share selection
proof, groups the partial-signed data by slot, fans the per-slot
`ParSignedDataSet` out to every subscriber under the corresponding
`PrepareAggregator` / `PrepareSyncContribution` duty, then pulls each
aggregated reply back out of the AggSigDB via `await_agg_sig_db` and
stitches it into the response.

The `BeaconCommitteeSelection` / `SyncCommitteeSelection` placeholder
structs in `validatorapi/types.rs` are now aliases for the consensus
`v1::BeaconCommitteeSelection` / `v1::SyncCommitteeSelection`, matching
the established pattern (`AttestationData = phase0::AttestationData`)
so the handler signatures carry the real validator-index / slot /
selection-proof tuple Go uses.

`fetch_active_validators` is bounded by `UPSTREAM_REQUEST_TIMEOUT`; each
`await_agg_sig_db` lookup is bounded by `SELECTIONS_AGG_SIG_DB_TIMEOUT`
(~two slots). Upstream stalls surface as 504/408; unknown validator
indices surface as 400 with a generic "validator not found" message;
verification failures surface as 400; subscriber / lookup errors
surface as 500 with the original error attached as `source` for debug
logging only.

Tests (10 new):
- happy path for each handler (one selection in, one aggregated out)
- multi-selection input — per-slot subscriber fanout + response
  stitching
- unknown validator-index short-circuits with 400 before the AggSigDB
  is touched
- tampered (zero) selection proof short-circuits with 400 before the
  AggSigDB is touched
- stalled `await_agg_sig_db` trips `SELECTIONS_AGG_SIG_DB_TIMEOUT` and
  returns 408

Co-Authored-By: Bohdan Ohorodnii <35969035+varex83@users.noreply.github.com>
The Handler trait methods Component::{beacon,sync}_committee_selections
were reachable only from the Handler trait — the router still had
todo!() stubs at the corresponding axum entry points, so any real
request to POST /eth/v1/validator/{beacon,sync}_committee_selections
would panic the request task.

Wires both endpoints into AppState::handler, mirroring Charon's
beaconCommitteeSelectionsJSON / syncCommitteeSelectionsJSON wire shape
(${data: [...]}) via two small response wrappers in types.rs.

Adds SELECTIONS_BODY_LIMIT (64 KiB) and a selections_post helper so
the new POST routes are bounded before deserialisation, defending
against the request-amplification surface (one BLS verification per
selection) once the routes are reachable.

Extends TestHandler with selection response stubs and adds router
tests for: response shape, sync-variant wrapping, and the body-limit
layer rejecting oversized POSTs.
The previous SELECTIONS_AGG_SIG_DB_TIMEOUT was applied per AggSigDB
await, so a batch of N selections could legitimately spend N*24s in
the lookup phase before failing; the subscriber fan-out had no
deadline at all, letting a stalled subscriber pin the handler.

Both selection handlers now anchor a single Instant deadline at the
start of the fan-out phase and use tokio::time::timeout_at for every
subscriber call and every await_agg_sig_db_fn call. The renamed
SELECTIONS_PHASE_TIMEOUT is the total budget for that phase rather
than a per-await bound.

Each error path in the fan-out and AggSigDB-await loops now emits a
tracing::warn with the slot and the underlying source error so an
operator can correlate a 408/500 response with a debug-pipeline log
without leaking either into the client-visible message.

Adds six tests:
  * subscriber timeout (beacon, sync)
  * empty selections input is a 200 no-op (beacon, sync)
  * AggSigDB returning the wrong concrete SignedData type produces 500
    via the downcast helper (beacon, sync)
Cleanup after the selection-route wiring change — rustfmt collapses
the BeaconCommitteeSelectionsResponse import block onto one line and
trims the let bindings in the new tests to match the file's existing
formatting.
clippy::arithmetic_side_effects (workspace-wide deny) flags
`Instant::now() + Duration` even though overflow is unreachable in
practice. Switches to `checked_add` with an .expect() carrying the
invariant, matching the pattern already used elsewhere in the crate
(qbft/consensus timer code).
The selection endpoints (and the duties endpoints on this branch)
extracted the body as bare `Json<...>`, so malformed JSON, wrong
element types, and missing content-type produced axum's default
plain-text 400 / 422 / 415 — out of line with the router's other
4xx responses which already use the `{code, message}` envelope via
`query_rejection_to_api_error`.

Mirrors the fix already on main for the duties endpoints
(b5eb0ca) which this branch predates. Adds a
`json_rejection_to_api_error` helper that normalises every parse
failure to a uniform 400 (matching Charon's `unmarshal`) while
preserving the 413 path so the body-size cap still surfaces
correctly. Both selection routes and both duties routes route their
`Json<...>` rejection through it.

Also:
  * Reorders router.rs so all `use` items come before the
    `const` declarations (style — the SELECTIONS_BODY_LIMIT
    addition had split the `use` blocks).
  * Collapses the duplicate `duties_post` / `selections_post`
    helpers into a single generic `bounded_post(handler, limit)`.
  * Tightens the `SELECTIONS_BODY_LIMIT` doc-comment per-entry
    size estimate (~210-250 B, ~250-300 entries per request).
  * Adds two router tests asserting the envelope shape for a
    malformed body — one per selection endpoint.
* Extract a `selections_deadline()` helper so the
  `Instant::checked_add(...).expect(...)` pattern lives in one place
  and the panic message tracks the constant rather than embedding a
  hard-coded duration string.

* Switch `beacon_committee_selections` to call
  `selection.message_root()` instead of inlining
  `selection.slot.tree_hash_root().0` — matches the sync handler
  and centralises the SSZ selection-proof root in
  `crates/eth2api/src/v1.rs`. The `tree_hash::TreeHash` import is
  no longer needed.

* Add `tracing::error!` to the two "hook not registered" branches
  in `fetch_active_validators` and the new
  `await_agg_sig_db_fn` check inside each selection handler.
  A handler running with a half-wired `Component` is exactly the
  kind of operational red flag the surrounding logs are designed
  to surface.

* Bump the subscriber-`Err` and aggsigdb-`Err` log calls in the
  selection fan-out / await loops from `tracing::warn!` to
  `tracing::error!`. Timeouts stay at `warn` because they are
  transient peer-side issues; non-transient wiring failures
  belong at `error` so production alerting picks them up.

* Destructure-by-copy in both per-slot loops
  (`for (&slot, set) in &psigs_by_slot`) so the `*slot` deref
  noise in eight `tracing` field assignments and four
  `SlotNumber::new(*slot)` calls collapses to plain `slot`.

* `TestHandler::{beacon,sync}_committee_selections` switch from
  `.clone().expect(...)` to `match .as_ref()` so unstubbed
  call-sites panic with `unimplemented!` (matching every other
  stubbed method in the file) and the clone only happens on the
  `Some` path.
* Reorder `component.rs` imports back to the workspace
  `std` → external → `super`/`crate` grouping. Iter2's helper
  changes had left the `super::` block sitting between `std`
  and the external crates, drifting from every other file in
  `validatorapi/` and the sibling core modules.

* Apply the iter2 `match self.X.as_ref() { Some(r) => Ok(r.clone()),
  None => unimplemented!(...) }` stub shape to the remaining four
  `Option`-based `TestHandler` methods (attester_duties,
  proposer_duties, sync_committee_duties, attestation_data) so the
  whole file uses one style instead of two.

* Add `attester_duties_returns_api_error_shape_on_malformed_body`
  — the duties endpoints share the same `json_rejection_to_api_error`
  plumbing as the selection routes, and the iter2 envelope tests
  only locked the selection side. This test would now fail if a
  future refactor accidentally re-introduced bare `Json<ValIndexes>`
  extraction on a duties handler. The selection envelope test
  docstring is updated to point at the new duties counterpart.
@varex83agent varex83agent force-pushed the bohdan/validatorapi-selections branch from 554ce69 to bc5f269 Compare June 5, 2026 10:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant