Skip to content

Rule proposal: long-context-redact — collapse over-budget threads on long pages #120

@twschiller

Description

@twschiller

Category

New defense rule

What problem does this solve?

Even when a page is "clean" (no injection, no dark patterns), sheer length is a defense surface. Liu et al. (TACL 2023), Lost in the Middle: How Language Models Use Long Contexts, document a U-shaped accuracy curve: LLM retrieval and reasoning degrade sharply when relevant information sits in the middle of a long context window, even on models advertised as long-context.

A page with 200 comments or 80 reviews pushes the agent's task instructions and the page payload to the edges, where the agent both (a) misses the answer and (b) is more susceptible to mid-context injection that benefits from positional dilution. The existing comments-redact and reviews-redact rules remove these surfaces entirely. But when the comments are the task ("summarize the top criticism of this product"), removal is wrong — the agent needs the content, just not all 800 entries.

Proposed solution

On pages whose visible text exceeds a budget (proposed: 50k chars; tunable), collapse:

  • Comment threads past the first N entries
  • Review lists past the first N entries
  • Reply chains past the first N levels of depth

into the same click-to-reveal placeholder shape cross-origin-frame-redact, comments-redact, and irrelevant-sections-redact already use. Agents that need the tail can reveal explicitly; default behavior preserves the head of the list (typically highest-quality on engagement-sorted platforms).

Alternatives considered

  • Lower the cap in comments-redact / reviews-redact. Doesn't help — those rules are all-or-nothing today; this proposal is the "keep the head" variant.
  • Reader-mode extraction (Readability.js). Already part of the prior-art lineage; Readability picks a main article and discards comments wholesale. We want the keep-some-comments shape.
  • Generic LLM trim. Same shape as irrelevant-sections-redact, but the trigger here is length, not engagement-rail recognition — no LLM call needed for a "keep first N" heuristic.

Controlling false positives

  • Default-off. Until per-host hit/skip data confirms the head-N preserves task-relevant content, this rule should ship default-off — same posture as irrelevant-sections-redact. Users opt in when their workflow tolerates the trade-off.
  • Per-host denylist. Sites where the tail is structurally load-bearing — Hacker News (highly-rated child comments outvalue top-level), GitHub issue threads (resolution often in the last reply), Reddit (AskScience-style threads with cited replies), public-comment portals like regulations.gov, court records — never apply the rule.
  • Preserve elements with structural significance markers. Reddit "best answer" flags, Stack Overflow "Accepted" badges, GitHub "Marked as answer", aria-label*="solved", and rows with engagement scores above a per-host percentile. The head-N count should not blindly drop a flagged answer because it sits at position 47.
  • High length threshold. 50k visible chars is roughly 12k tokens — well above the budget where Lost-in-the-Middle starts to bite for current frontier models. Tuning low risks redacting pages that fit comfortably in context.
  • Reveal-on-demand contract. Collapsed regions become click-to-reveal placeholders, not deletions. An agent that detects the placeholder shape can decide to expand — same affordance as cross-origin-frame-redact.
  • Per-section quotas, not page-wide. Apply N independently to each thread/list so a page with multiple distinct conversation surfaces (an article with comments + a sidebar of reviews) doesn't lose representation in one to keep room for the other.
  • Skip when the page is the agent's task surface. If the user's task implies "summarize all reviews", the agent can't tell us — but a per-host denylist for the platforms where this is the common ask (review-aggregator sites like Trustpilot, Amazon SERP review pages once the user navigates to "see all reviews") gets most of the way there.

Prior art / references

Tagged Impact M / Complexity H.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestrule-proposalProposed new defense rule, pending triage/citation review

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions