Skip to content

Rule proposal: canvas-text-annotate — flag canvas/video text surfaces invisible to DOM walkers #123

@twschiller

Description

@twschiller

Category

New defense rule (speculative / research-direction)

What problem does this solve?

Every rule that currently ships defends a DOM-readable surface — text nodes, attributes, structured data, accessibility-tree content. None defend against text that's rendered to <canvas> or video frames and only becomes "text" to the agent after OCR. Multimodal computer-use agents (Claude Computer Use, OpenAI Operator/Atlas, Browser Use's screenshot mode) increasingly run vision on rendered pages — at which point every DOM-side defense is bypassed by painting the payload to a canvas instead of writing it as DOM text.

The image-based-prompt-injection research line establishes that vision-language encoders do not distinguish "image content the user wants to show" from "instructions embedded inside an image" — the same architectural problem the indirect-prompt-injection rules already address for text. Reported attack-success rates range from ~64% under stealth constraints to higher under permissive threat models.

Proposed solution

Three escalating options; pick whichever clears the FP bar:

  1. Presence annotation only. Annotate the page when any <canvas> larger than a size threshold is rendered, or when <video> with autoplay is present. A coarse "vision-only content lives here" signal for the agent to weight DOM content as primary source.
  2. OCR-side check. Off-thread (OffscreenCanvas + worker), rasterize the canvas, run a lightweight OCR pass (e.g. Tesseract.js / WASM, Apache 2.0), apply the existing prompt-injection pattern set to the recognized text. Replace canvas with placeholder when matches occur (this option crosses from annotate into redact territory and should likely be a distinct sibling rule).
  3. Pixel heuristic. Skip OCR entirely. Annotate canvases that render as full-page or fill the viewport (≥ a configurable fraction). Canvas-as-content is rare outside attack surfaces and a small set of legitimate apps (Figma, Excalidraw, Google Docs canvas renderer, games).

Per repo convention ("defenses against prompt injection should strip the content, not just label it"): route 2 is the only one that can credibly replace matched canvas content with a placeholder. Route 1 and 3 can only annotate. v1 is the annotate-only floor; route 2 is the principled endpoint if it ever clears the cost bar.

Alternatives considered

  • Strip the canvas. Wrong threat model — many canvases are legitimate (charts, games, design tools). The defense is signalling that the content is invisible to DOM-side rules.
  • Defer to the agent's vision-encoder defenses. Reasonable, but worth shipping a coarse content-script signal in the meantime.
  • Server-side image proxy that OCRs and re-renders. Out of scope for an extension.

Controlling false positives

The dominant FP risk is annotating legitimate canvas-heavy apps. Without strong gating this rule fires on every Figma/Excalidraw/Google Docs page.

  • Origin allowlist for known canvas apps. Skip the rule entirely on *.figma.com, excalidraw.com, docs.google.com, *.tldraw.com, *.adobe.com, miro.com, *.canva.com, lucid.app, *.notion.so (canvas-rendered tables), *.codesandbox.io, common game-host origins (*.itch.io, *.poki.com), and major CAD/3D tools. Treat the allowlist as a maintenance surface, similar to how roach-motel-annotate and disguised-ad-flag lean on curated site data.
  • Size threshold. Only consider canvases that fill ≥50% of viewport (configurable). A 200×80 chart sparkline is not the attack surface.
  • Stable across mutations. Canvases redrawn every animation frame (games, video, real-time data viz) should not re-trigger annotation; debounce / annotate-once-per-load.
  • Off-screen canvases excluded. Many libraries (chart.js, three.js) maintain off-screen render buffers. Require visibility via IntersectionObserver before considering.
  • Phrase the annotation carefully. "Vision-readable content present that DOM-side defenses do not cover" — not "potential injection". Matches the same precise-statement posture as bot-cloaking-annotate.
  • For route 2 (OCR), reuse the existing prompt-injection pattern set with whole-string matching. Same precision bar as prompt-injection-redact — avoid matching axis labels in a chart that happen to contain instruction-shaped substrings.
  • For route 2, OCR confidence threshold. Tesseract.js exposes per-word confidence; require confidence above a floor (e.g., 70) before feeding to the pattern matcher. Garbage-OCR-as-injection is the most embarrassing FP mode.
  • Default-off, experimental. Same posture as bot-cloaking-annotate. Not a default-on rule under any realistic threshold today.
  • Telemetry first. Before considering default-on, gather per-host hit counts on real browsing data; promote allowlist entries based on observed false-positive sites, same way schema-trust-sanitize documents its known-syndicator short-circuit list.

Prior art / references

Tagged Impact L / Complexity H.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestrule-proposalProposed new defense rule, pending triage/citation review

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions