Skip to content

fix: don't bound source-target distance#431

Merged
pablodeymo merged 2 commits into
mainfrom
fix/remove-step-cap-in-vote-validation
Jun 11, 2026
Merged

fix: don't bound source-target distance#431
pablodeymo merged 2 commits into
mainfrom
fix/remove-step-cap-in-vote-validation

Conversation

@MegaRedHand

Copy link
Copy Markdown
Collaborator

🗒️ Description / Motivation

This PR fixes an issue we found during some devnet runs, that a step-limit we added to bound the ancestor-check loop inadvertently also capped the finality period to 128 slots before the devnet stopped including attestations.

@github-actions

Copy link
Copy Markdown

🤖 Kimi Code Review

Critical Security Issue: Unbounded computation DoS vector

crates/blockchain/src/store.rs (lines 119-128, 145-155)

The removal of MAX_ANCESTRY_WALK_SLOTS introduces an unbounded loop on the gossip path. An attacker can craft an attestation with a source checkpoint thousands of slots in the past, forcing the node to iterate through the entire chain history (potentially millions of slots) performing database lookups for every malicious attestation received.

  1. Restore the cap or implement an alternative bound. The slot difference between descendant_header.slot and ancestor.slot should be checked before entering the loop:

    if descendant_header.slot.saturating_sub(ancestor.slot) > MAX_ANCESTRY_WALK_SLOTS {
        return false;
    }
  2. Performance: Even for legitimate traffic, during non-finality periods, justified checkpoints can lag by hundreds of slots. The bounded walk prevents latency spikes.

  3. Consensus safety: If this change is intended to support legitimate attestations with distant sources (>128 slots), ensure this aligns with the spec's attestation inclusion window (typically current + previous epoch). Otherwise, you risk accepting invalid votes.

Question: What is the rationale for removing this protection? If it's to support long-range sync or historical attestation processing, consider making the limit configurable or context-dependent (unlimited during sync, capped during gossip).

Minor: The warning log removal reduces observability. If you keep a limit, retain the log to detect potential attacks.


Automated review by Kimi (Moonshot AI) · kimi-k2.5 · custom prompt

@github-actions

Copy link
Copy Markdown

🤖 Claude Code Review

Now I have enough context for the review.


PR 431 Review: fix: don't bound source-target distance

Overview

This PR removes the MAX_ANCESTRY_WALK_SLOTS = 128 hard cap from checkpoint_is_ancestor, which was causing the devnet to reject valid attestations once the source-to-target distance exceeded 128 slots, effectively preventing finalization in long-running devnets.


Correctness

The fix is correct. The cap was placed inside the ancestor-walk loop and counted parent hops, not slots, so it also consumed steps on empty slots (no block produced). In a long-running devnet where the source checkpoint is naturally many slots behind the target, the walk was cut short and the attestation was discarded as if the ancestry relationship didn't hold. Removing the cap restores correct behaviour.


Stale Documentation (must fix)

The docstring for checkpoint_is_ancestor was not updated — it still references the removed constant in two places:

store.rs:122-126
/// starts from its parent without re-reading it. Walks parent links down to the
/// ancestor's slot, up to [`MAX_ANCESTRY_WALK_SLOTS`] steps. Empty (skipped)
/// slots on the path are traversed transparently since they carry no block.
/// Conservative: a block missing from the store, or exceeding the walk cap,
/// yields `false`.
  • [`MAX_ANCESTRY_WALK_SLOTS`] is now a dangling intra-doc link that will emit a compiler warning under rustdoc::broken_intra_doc_links.
  • "or exceeding the walk cap" is now incorrect — the only termination condition for a missing block is that the store returns None, which still yields false and that semantics should be preserved in the doc.

Suggested replacement for the stale lines:

/// starts from its parent without re-reading it. Walks parent links down to the
/// ancestor's slot. Empty (skipped) slots on the path are traversed transparently
/// since they carry no block. Conservative: a block missing from the store yields `false`.

DoS Consideration (worth discussing, not a blocker)

The original cap was presumably added to bound per-attestation CPU work on the gossip path. With the cap removed, the worst-case work per attestation becomes O(target.slot − source.slot) block-header lookups. The practical upper bound is the length of the live chain between the two checkpoints, which is bounded by blocks actually present in the store.

Mitigating factors that make this acceptable:

  • All three referenced blocks (source, target, head) must already be in the local store before the walk starts (availability check).
  • head.slot is bounded by the timing check (attestation_start_interval > store.time() + GOSSIP_DISPARITY_INTERVALS), which limits how far in the future the head can be.
  • Honest attestations span at most a few epochs.

Residual risk: a peer could craft attestations whose source and head are the oldest and newest blocks in the store respectively, forcing a full-chain walk on every such gossip message. This is low severity for a devnet but worth keeping in mind as the chain matures. A natural alternative would be capping at something proportional to the actual justification window (e.g., finalized_slot + WEAK_SUBJECTIVITY_PERIOD), but that is a follow-up concern rather than a blocker here.


Summary

  • Fix is correct — the 128-slot cap was too tight and was wrongly rejecting valid attestations.
  • One must-fix: update the docstring to remove the reference to MAX_ANCESTRY_WALK_SLOTS (dangling doc-link + stale prose).
  • No logic changes needed — the walk termination conditions (slot match, slot underflow, missing block) remain sound without the cap.

Automated review by Claude (Anthropic) · sonnet · custom prompt

@github-actions

Copy link
Copy Markdown

🤖 Codex Code Review

  1. High: crates/blockchain/src/store.rs:139-148 removes the only bound on checkpoint_is_ancestor, but this function is called on the gossip validation path for both single and aggregated attestations before later rejection points like missing target state (crates/blockchain/src/store.rs:314-323, 401-406). Each hop does a Store::get_block_header lookup and SSZ decode (crates/storage/src/store.rs:1005-1009), and the store keeps up to 21,600 block headers (crates/storage/src/store.rs:86-87). That turns one old-but-locally-known attestation into thousands of DB reads/decodes, even when it will fail immediately afterward because only 3,000 states are retained (crates/storage/src/store.rs:89-90). For a public gossip path in a consensus client, that is a real CPU/IO amplification vector. I would keep a bound or add a cheaper age/slot-distance precheck before walking ancestry.

  2. Low: crates/blockchain/src/store.rs:121-126 still documents MAX_ANCESTRY_WALK_SLOTS and says “exceeding the walk cap” returns false, but the constant and behavior were removed in this PR. That is now misleading, and the intra-doc link can become a rustdoc failure if broken_intra_doc_links is enabled.

Assumptions/testing gap: I could not run cargo test or cargo doc in this environment because rustup tried to write under a read-only home directory. If the intent is to accept deep historical attestations for correctness, I’d also add a regression test covering a >128-slot valid ancestor chain so the tradeoff is explicit.


Automated review by OpenAI Codex · gpt-5.4 · custom prompt

@greptile-apps

greptile-apps Bot commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

Removes the MAX_ANCESTRY_WALK_SLOTS = 128 step-limit from checkpoint_is_ancestor that was inadvertently capping the finality period to 128 slots and causing attestations to be dropped on long-running devnets.

  • The loop now terminates only through the slot comparison (slot < ancestor.slot) or a missing store entry, which is the correct behavior for an unbounded ancestry walk.
  • The doc comment on checkpoint_is_ancestor was not updated and still references the removed constant and walk-cap semantics, including a dangling rustdoc link.

Confidence Score: 4/5

Safe to merge — the fix correctly removes the cap that was truncating valid attestations; the only remaining nit is a stale doc comment.

The change is small and targeted: dropping a step counter and a constant that were causing legitimate attestations to be rejected after 128 slots. The loop already terminates correctly via the slot comparison and the missing-entry guard. The only issue is the doc comment still describing the removed cap, including a dangling rustdoc link to the deleted constant.

The doc comment in checkpoint_is_ancestor needs to be updated to drop references to the removed constant and walk-cap behavior.

Important Files Changed

Filename Overview
crates/blockchain/src/store.rs Removes the 128-step cap in checkpoint_is_ancestor that was incorrectly bounding the finality period; the loop now terminates only by slot comparison or a missing store entry. The doc comment was not updated and still references the removed constant.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[checkpoint_is_ancestor] --> B{ancestor.slot >= descendant.slot}
    B -- Yes --> C{slots equal AND roots match}
    C -- Yes --> D[return true]
    C -- No --> E[return false]
    B -- No --> F[current_root = descendant_header.parent_root]
    F --> G{get_block_header current_root}
    G -- None --> E
    G -- Some header --> I{header.slot == ancestor.slot}
    I -- Yes --> J{current_root == ancestor.root}
    J -- Yes --> D
    J -- No --> E
    I -- No --> K{header.slot less than ancestor.slot}
    K -- Yes --> E
    K -- No --> L[current_root = header.parent_root]
    L --> G
Loading
Prompt To Fix All With AI
Fix the following 1 code review issue. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 1
crates/blockchain/src/store.rs:119-126
The doc comment still refers to `MAX_ANCESTRY_WALK_SLOTS` and the walk-cap behavior ("up to `[MAX_ANCESTRY_WALK_SLOTS]` steps", "exceeding the walk cap, yields `false`"), both of which were removed by this PR. The dangling link will also produce a `rustdoc` warning (`[MAX_ANCESTRY_WALK_SLOTS]` resolves to nothing).

```suggestion
/// Return whether `ancestor` lies on `descendant`'s parent chain.
///
/// `descendant_header` is the descendant's already-fetched header, so the walk
/// starts from its parent without re-reading it. Walks parent links down to the
/// ancestor's slot. Empty (skipped) slots on the path are traversed
/// transparently since they carry no block.
/// Conservative: a block missing from the store yields `false`.
```

Reviews (1): Last reviewed commit: "docs: update function documentation" | Re-trigger Greptile

Comment on lines 119 to 126
/// Return whether `ancestor` lies on `descendant`'s parent chain.
///
/// `descendant_header` is the descendant's already-fetched header, so the walk
/// starts from its parent without re-reading it. Walks parent links down to the
/// ancestor's slot, up to [`MAX_ANCESTRY_WALK_SLOTS`] steps. Empty (skipped)
/// slots on the path are traversed transparently since they carry no block.
/// Conservative: a block missing from the store, or exceeding the walk cap,
/// yields `false`.
/// ancestor's slot. Empty (skipped) slots on the path are traversed transparently
/// since they carry no block. A block missing from the store yields `false`.
fn checkpoint_is_ancestor(
store: &Store,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 The doc comment still refers to MAX_ANCESTRY_WALK_SLOTS and the walk-cap behavior ("up to [MAX_ANCESTRY_WALK_SLOTS] steps", "exceeding the walk cap, yields false"), both of which were removed by this PR. The dangling link will also produce a rustdoc warning ([MAX_ANCESTRY_WALK_SLOTS] resolves to nothing).

Suggested change
/// Return whether `ancestor` lies on `descendant`'s parent chain.
///
/// `descendant_header` is the descendant's already-fetched header, so the walk
/// starts from its parent without re-reading it. Walks parent links down to the
/// ancestor's slot, up to [`MAX_ANCESTRY_WALK_SLOTS`] steps. Empty (skipped)
/// slots on the path are traversed transparently since they carry no block.
/// Conservative: a block missing from the store, or exceeding the walk cap,
/// yields `false`.
/// ancestor's slot. Empty (skipped) slots on the path are traversed transparently
/// since they carry no block. A block missing from the store yields `false`.
fn checkpoint_is_ancestor(
store: &Store,
/// Return whether `ancestor` lies on `descendant`'s parent chain.
///
/// `descendant_header` is the descendant's already-fetched header, so the walk
/// starts from its parent without re-reading it. Walks parent links down to the
/// ancestor's slot. Empty (skipped) slots on the path are traversed
/// transparently since they carry no block.
/// Conservative: a block missing from the store yields `false`.
Prompt To Fix With AI
This is a comment left during a code review.
Path: crates/blockchain/src/store.rs
Line: 119-126

Comment:
The doc comment still refers to `MAX_ANCESTRY_WALK_SLOTS` and the walk-cap behavior ("up to `[MAX_ANCESTRY_WALK_SLOTS]` steps", "exceeding the walk cap, yields `false`"), both of which were removed by this PR. The dangling link will also produce a `rustdoc` warning (`[MAX_ANCESTRY_WALK_SLOTS]` resolves to nothing).

```suggestion
/// Return whether `ancestor` lies on `descendant`'s parent chain.
///
/// `descendant_header` is the descendant's already-fetched header, so the walk
/// starts from its parent without re-reading it. Walks parent links down to the
/// ancestor's slot. Empty (skipped) slots on the path are traversed
/// transparently since they carry no block.
/// Conservative: a block missing from the store yields `false`.
```

How can I resolve this? If you propose a fix, please make it concise.

@pablodeymo pablodeymo merged commit 225e6d3 into main Jun 11, 2026
2 checks passed
@pablodeymo pablodeymo deleted the fix/remove-step-cap-in-vote-validation branch June 11, 2026 19:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants