Category
New defense rule (speculative / research-direction)
What problem does this solve?
Every rule that currently ships defends a DOM-readable surface — text nodes, attributes, structured data, accessibility-tree content. None defend against text that's rendered to <canvas> or video frames and only becomes "text" to the agent after OCR. Multimodal computer-use agents (Claude Computer Use, OpenAI Operator/Atlas, Browser Use's screenshot mode) increasingly run vision on rendered pages — at which point every DOM-side defense is bypassed by painting the payload to a canvas instead of writing it as DOM text.
The image-based-prompt-injection research line establishes that vision-language encoders do not distinguish "image content the user wants to show" from "instructions embedded inside an image" — the same architectural problem the indirect-prompt-injection rules already address for text. Reported attack-success rates range from ~64% under stealth constraints to higher under permissive threat models.
Proposed solution
Three escalating options; pick whichever clears the FP bar:
- Presence annotation only. Annotate the page when any
<canvas> larger than a size threshold is rendered, or when <video> with autoplay is present. A coarse "vision-only content lives here" signal for the agent to weight DOM content as primary source.
- OCR-side check. Off-thread (OffscreenCanvas + worker), rasterize the canvas, run a lightweight OCR pass (e.g. Tesseract.js / WASM, Apache 2.0), apply the existing prompt-injection pattern set to the recognized text. Replace canvas with placeholder when matches occur (this option crosses from
annotate into redact territory and should likely be a distinct sibling rule).
- Pixel heuristic. Skip OCR entirely. Annotate canvases that render as full-page or fill the viewport (≥ a configurable fraction). Canvas-as-content is rare outside attack surfaces and a small set of legitimate apps (Figma, Excalidraw, Google Docs canvas renderer, games).
Per repo convention ("defenses against prompt injection should strip the content, not just label it"): route 2 is the only one that can credibly replace matched canvas content with a placeholder. Route 1 and 3 can only annotate. v1 is the annotate-only floor; route 2 is the principled endpoint if it ever clears the cost bar.
Alternatives considered
- Strip the canvas. Wrong threat model — many canvases are legitimate (charts, games, design tools). The defense is signalling that the content is invisible to DOM-side rules.
- Defer to the agent's vision-encoder defenses. Reasonable, but worth shipping a coarse content-script signal in the meantime.
- Server-side image proxy that OCRs and re-renders. Out of scope for an extension.
Controlling false positives
The dominant FP risk is annotating legitimate canvas-heavy apps. Without strong gating this rule fires on every Figma/Excalidraw/Google Docs page.
- Origin allowlist for known canvas apps. Skip the rule entirely on
*.figma.com, excalidraw.com, docs.google.com, *.tldraw.com, *.adobe.com, miro.com, *.canva.com, lucid.app, *.notion.so (canvas-rendered tables), *.codesandbox.io, common game-host origins (*.itch.io, *.poki.com), and major CAD/3D tools. Treat the allowlist as a maintenance surface, similar to how roach-motel-annotate and disguised-ad-flag lean on curated site data.
- Size threshold. Only consider canvases that fill ≥50% of viewport (configurable). A 200×80 chart sparkline is not the attack surface.
- Stable across mutations. Canvases redrawn every animation frame (games, video, real-time data viz) should not re-trigger annotation; debounce / annotate-once-per-load.
- Off-screen canvases excluded. Many libraries (chart.js, three.js) maintain off-screen render buffers. Require visibility via
IntersectionObserver before considering.
- Phrase the annotation carefully. "Vision-readable content present that DOM-side defenses do not cover" — not "potential injection". Matches the same precise-statement posture as
bot-cloaking-annotate.
- For route 2 (OCR), reuse the existing prompt-injection pattern set with whole-string matching. Same precision bar as
prompt-injection-redact — avoid matching axis labels in a chart that happen to contain instruction-shaped substrings.
- For route 2, OCR confidence threshold. Tesseract.js exposes per-word confidence; require confidence above a floor (e.g., 70) before feeding to the pattern matcher. Garbage-OCR-as-injection is the most embarrassing FP mode.
- Default-off, experimental. Same posture as
bot-cloaking-annotate. Not a default-on rule under any realistic threshold today.
- Telemetry first. Before considering default-on, gather per-host hit counts on real browsing data; promote allowlist entries based on observed false-positive sites, same way
schema-trust-sanitize documents its known-syndicator short-circuit list.
Prior art / references
Tagged Impact L / Complexity H.
Category
New defense rule (speculative / research-direction)
What problem does this solve?
Every rule that currently ships defends a DOM-readable surface — text nodes, attributes, structured data, accessibility-tree content. None defend against text that's rendered to
<canvas>or video frames and only becomes "text" to the agent after OCR. Multimodal computer-use agents (Claude Computer Use, OpenAI Operator/Atlas, Browser Use's screenshot mode) increasingly run vision on rendered pages — at which point every DOM-side defense is bypassed by painting the payload to a canvas instead of writing it as DOM text.The image-based-prompt-injection research line establishes that vision-language encoders do not distinguish "image content the user wants to show" from "instructions embedded inside an image" — the same architectural problem the indirect-prompt-injection rules already address for text. Reported attack-success rates range from ~64% under stealth constraints to higher under permissive threat models.
Proposed solution
Three escalating options; pick whichever clears the FP bar:
<canvas>larger than a size threshold is rendered, or when<video>with autoplay is present. A coarse "vision-only content lives here" signal for the agent to weight DOM content as primary source.annotateintoredactterritory and should likely be a distinct sibling rule).Per repo convention ("defenses against prompt injection should strip the content, not just label it"): route 2 is the only one that can credibly replace matched canvas content with a placeholder. Route 1 and 3 can only annotate. v1 is the annotate-only floor; route 2 is the principled endpoint if it ever clears the cost bar.
Alternatives considered
Controlling false positives
The dominant FP risk is annotating legitimate canvas-heavy apps. Without strong gating this rule fires on every Figma/Excalidraw/Google Docs page.
*.figma.com,excalidraw.com,docs.google.com,*.tldraw.com,*.adobe.com,miro.com,*.canva.com,lucid.app,*.notion.so(canvas-rendered tables),*.codesandbox.io, common game-host origins (*.itch.io,*.poki.com), and major CAD/3D tools. Treat the allowlist as a maintenance surface, similar to howroach-motel-annotateanddisguised-ad-flaglean on curated site data.IntersectionObserverbefore considering.bot-cloaking-annotate.prompt-injection-redact— avoid matching axis labels in a chart that happen to contain instruction-shaped substrings.bot-cloaking-annotate. Not a default-on rule under any realistic threshold today.schema-trust-sanitizedocuments its known-syndicator short-circuit list.Prior art / references
Tagged Impact L / Complexity H.