Skip to content

feat(mcp-web-fetch): expose domain filters#383

Open
ceilf6 wants to merge 2 commits into
developfrom
feat/web-fetch-domain-filters-380
Open

feat(mcp-web-fetch): expose domain filters#383
ceilf6 wants to merge 2 commits into
developfrom
feat/web-fetch-domain-filters-380

Conversation

@ceilf6

@ceilf6 ceilf6 commented Jul 1, 2026

Copy link
Copy Markdown
Owner

Linked Issue Or Context

Summary

  • Added optional allowed_domains and blocked_domains parameters to the web_fetch MCP schema.
  • Mapped allowed_domains -> engine allowHosts and blocked_domains -> engine denyHosts.
  • Added schema and handler-level no-network tests proving allow/block filters reject before fetch.

Impact Scope

  • Scope is limited to @frontagent/mcp-web-fetch MCP tool schema/handler plus focused tests.
  • No changes to URL safety policy, private IP handling, redirects, timeout/maxBytes clamps, web_search, JS rendering, or core security auto-approval.

GitNexus Impact Summary

  • Risk level: LOW
  • Critical skeleton changes: MCP boundary files touched: packages/mcp-web-fetch/src/tools.ts, packages/mcp-web-fetch/src/tools.security.test.ts
  • GitNexus impact: impact(webFetchSchema, upstream) returned LOW with direct callers 0, affected processes 0, modules 0. impact(handleWebFetchTool, upstream) returned LOW with direct upstream 1 (tools.security.test.ts), affected processes 0, modules 0. detect_changes(scope=staged) reported changed_count 3, changed_files 3, affected_count 0, risk_level low; changed symbols webFetchSchema, allWebFetchSchemas, handleWebFetchTool; affected_processes none.
  • Verification: pnpm agent:bootstrap; pnpm quality:predev; focused mcp-web-fetch tests; pnpm quality:precommit; pnpm quality:local before push and again via pre-push hook.

Verification

  • pnpm --dir packages/mcp-web-fetch test -- --run src/tools.security.test.ts
  • pnpm --dir packages/mcp-web-fetch test
  • pnpm --dir packages/mcp-web-fetch typecheck
  • pnpm quality:precommit
  • pnpm quality:local

Checklist

  • I have linked an issue or explained why this PR stands alone.
  • I have kept the diff focused on the stated change.
  • I have run pnpm quality:precommit, or explained why it could not run.
  • I have run pnpm quality:local for critical skeleton changes, or explained why it could not run.
  • I have updated docs or tests when behavior, public APIs, or Harness contracts changed.
  • For critical skeleton changes, I have filled the GitNexus impact summary with concrete results.

Copilot AI review requested due to automatic review settings July 1, 2026 10:42
@github-actions

github-actions Bot commented Jul 1, 2026

Copy link
Copy Markdown

🛡️ ceilf6/repo-guard

代码评审报告: feat(mcp-web-fetch): expose domain filters

风险等级:
处理建议: 批准
决策摘要: 变更范围收敛于 MCP schema/handler 适配层,忠实实现了 #380 全部验收标准并有 no-network 测试覆盖,可以合并。

级联分析

  • 变更符号: webFetchSchema, allWebFetchSchemas, handleWebFetchTool
  • 受影响流程: web_fetch MCP 工具调用路径;allowed_domainsallowHostsblocked_domainsdenyHosts 映射后进入既有 fetchUrl 引擎过滤
  • 变更集外调用方: handleWebFetchTool 有 1 个上游 (tools.security.test.ts);新增 schema 字段为可选,不破坏现有 web_fetch 调用方;fetchUrl 已支持 allowHosts/denyHosts,无契约漂移
  • 置信度: medium (GitNexus 报告 affected_processes none 且 required 仍为 ['url'];引擎实际过滤行为由已有 URL-safety 测试保证,本 PR 未修改)

问题发现

无 blocking findings。

新增字段可选、默认 undefined,与既有 format/timeoutMs/maxBytes 参数模式一致;未触碰 URL safety policy、private IP、redirect、maxBytes/timeout clamps,符合 issue 的 non-goals。命名 allowed_domains/blocked_domains 与 Anthropic 文档参数对齐,满足 issue 显式约束。

行级发现

  • [packages/mcp-web-fetch/src/tools.ts:73] allowHosts/denyHosts 直接透传 args.allowed_domains as string[],未做运行时校验(如非数组、非字符串元素)。schema 声明为 array/string,但 handler 入口 argsRecord<string, unknown>,MCP 客户端若传畸形值会绕过 schema 到达引擎。非阻塞:确认 fetchUrl/URL-safety 层对非法 host 输入有防御即可;否则考虑最小校验。

Karpathy 评审

  • 假设: 依赖 fetchUrl 已正确实现 allowHosts/denyHosts 的 reject-before-fetch 语义;本 PR 未验证引擎内部,仅通过 handler 层 no-network 测试间接确认,假设合理且有既有引擎测试兜底。
  • 简洁性: 纯适配层映射,无新增抽象或间接层,方案与问题规模匹配。
  • 结构质量: 无退化。字段追加复用既有 schema 结构,tools.ts 仅 +12 行,无文件膨胀或错误层泄漏。
  • 变更范围: 外科手术式。3 文件中含 1 个实现、1 个测试、1 个 plan 文档;plan 文档为流程产物,不影响运行时。
  • 验证: 覆盖 allow 拒绝、block 拒绝、schema 形状三条路径,并断言 fetchMock 未被调用以证明 reject-before-fetch,验证与验收标准一致。

缺失覆盖

  • 无阻塞缺口。可选增强:同时传入 allowed_domainsblocked_domains(allowlist 命中但被 blocklist 拒绝)的交叉场景未测试,但该优先级逻辑属于引擎既有行为,非本 PR 引入,不作为合并前要求。

@ceilf6

ceilf6 commented Jul 1, 2026

Copy link
Copy Markdown
Owner Author

Repo-guard non-blocking note acknowledged. I am not adding extra runtime validation in this PR because the existing handleWebFetchTool pattern casts schema-validated args for format/timeoutMs/maxBytes, and this issue is scoped to exposing/mapping the Anthropic-compatible schema fields. Malformed allowed_domains/blocked_domains values fail closed before network fetch in the current path rather than widening access. The implementation keeps the change limited to the adapter layer as requested.

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the @frontagent/mcp-web-fetch MCP tool to expose Anthropic-compatible domain filters (allowed_domains, blocked_domains) and wires them through to the existing engine host filtering (allowHosts, denyHosts), with added no-network security tests to prove rejections happen before any fetch attempt.

Changes:

  • Extended the web_fetch MCP input schema with optional allowed_domains / blocked_domains string-array parameters.
  • Mapped allowed_domains -> allowHosts and blocked_domains -> denyHosts in the tool handler.
  • Added schema and handler-level “no network” tests validating early rejection and that fetch is never invoked.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
packages/mcp-web-fetch/src/tools.ts Adds schema fields and maps the new tool args into engine options (allowHosts / denyHosts).
packages/mcp-web-fetch/src/tools.security.test.ts Adds tests covering schema exposure and pre-fetch rejection behavior for allow/block filters.
docs/superpowers/plans/2026-07-01-web-fetch-domain-filters.md Documents an implementation plan for exposing the domain filters and verifying behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +58 to +70
it('exposes allowed_domains and blocked_domains schema parameters', () => {
expect(webFetchSchema.inputSchema.properties.allowed_domains).toEqual({
type: 'array',
items: { type: 'string' },
description: '可选允许域名列表;映射到底层引擎 allowHosts,仅允许抓取这些主机名。',
});
expect(webFetchSchema.inputSchema.properties.blocked_domains).toEqual({
type: 'array',
items: { type: 'string' },
description: '可选阻止域名列表;映射到底层引擎 denyHosts,拒绝抓取这些主机名。',
});
expect(webFetchSchema.inputSchema.required).toEqual(['url']);
});

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 671cbd0: schema assertions now use toMatchObject so descriptions and future additive schema fields do not make the test brittle.

Comment on lines +41 to +45
allowed_domains: {
type: 'array',
items: { type: 'string' },
description: '可选允许域名列表;映射到底层引擎 allowHosts,仅允许抓取这些主机名。',
},

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 671cbd0: allowed_domains and blocked_domains now declare minItems: 1, and the handler rejects empty provided lists before any fetch attempt. Added a no-network regression test for empty allowed_domains.

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛡️ ceilf6/repo-guard

代码评审报告: feat(mcp-web-fetch): expose domain filters

风险等级:
处理建议: 评论
决策摘要: 变更范围收敛、契约扩展安全、验收标准基本覆盖,可在补齐 engine 端过滤真实生效的验证后合并;当前测试对拒绝措辞的断言略脆弱且未验证 allowHosts/denyHosts 实际传入 engine。

级联分析

  • 变更符号: webFetchSchema, allWebFetchSchemas, handleWebFetchTool
  • 受影响流程: web_fetch MCP 工具调用 → fetchUrl engine 路径(host filter 已有)
  • 变更集外调用方: handleWebFetchTool 上游仅 tools.security.test.ts(1 个),webFetchSchema 直接调用方 0;新增字段为可选,向后兼容,未破坏既有调用方
  • 置信度: medium(GitNexus 报告 LOW/affected 0,与 diff 一致;但 engine 端 allowHosts/denyHosts 的实际拒绝语义未在本 PR diff 中呈现,依赖既有 engine 行为的推断)

问题发现

  1. [低] 测试未验证过滤参数真正到达 engine,且未区分“字段透传”与“engine 拒绝”

    • 证据: tools.security.test.ts:70-84:87-102 只断言 result.success === falsefetchMock 未被调用。但 handler 在把 allowHosts/denyHosts 传给 fetchUrl 之前,fetchUrl 内部对 https://example.com/ 本身可能因其它 URL safety 检查失败,从而 fetch mock 同样不会被调用——测试无法区分“确因 allowlist/denylist 拒绝”还是“因无关原因拒绝”。
    • 受影响调用方/流程: web_fetchfetchUrl。若未来 engine 忽略 allowHosts/denyHosts,这些测试仍可能因巧合通过。
    • 最小可行修复: 断言 result.error 明确匹配 allowlist/denylist 拒绝措辞(当前 allow 用例已用 /not in allowlist|allow/i,但 /allow/i 过宽),或增加一个 positive 反向用例:allowed_domains: ['example.com'] + 允许 fetch mock 返回,验证 fetch 被调用,证明 allowHosts 确实透传且未误拒。
  2. [低] 拒绝措辞断言与 engine 实际错误文案耦合,易脆

    • 证据: tools.security.test.ts:78/not in allowlist|allow/i:96/denylist|blocked|denied/i。这些正则依赖 engine 内部错误字符串,本 PR 未固定该契约。engine 文案一旦调整(非本 PR 控制),测试会 flakey。
    • 最小可行修复: 若 engine 错误文案属于稳定契约,在 engine 层补一条注释/常量引用;否则放宽为断言拒绝发生 + fetch 未调用 + 一个 positive 透传用例,减少对具体措辞的依赖。

行级发现

  • [packages/mcp-web-fetch/src/tools.ts:71] 用 args.allowed_domains as string[] | undefined 直接 cast,未校验数组元素为 string,非法输入(如 [123])会静默透传到 engine;建议在此处对元素类型做最小校验或依赖 schema 校验层明确生效。
  • [packages/mcp-web-fetch/src/tools.ts:74] 空数组被显式拒绝,但 schema 已声明 minItems: 1——若 MCP 层已按 schema 校验,此 handler 分支为冗余防御;若 schema 未强制校验,则说明不能依赖 schema,第 71 行的元素类型也同样不可信。请明确 schema 校验是否在调用前生效,并保持两处假设一致。
  • [packages/mcp-web-fetch/src/tools.security.test.ts:78] /allow/i 过于宽泛,可能匹配无关错误文案;收紧为 /not in allowlist/i 以确保断言的是 allowlist 拒绝。

Karpathy 评审

  • 假设: handler 假设 MCP 层对 schema minItems/type 已校验,但又在 handler 内重复做空数组校验,说明假设不确定;需明确校验边界归属。
  • 简洁性: 实现为最小 adapter 映射,未引入不必要抽象,符合 issue scope,良好。
  • 结构质量: 逻辑保留在既有 URL safety/engine 路径,schema+handler 仅做参数透传,无层级泄漏或重复 helper;无结构退化。
  • 变更范围: diff 聚焦,未触碰 URL safety policy、private IP、redirect、clamp、web_search、JS 渲染或 auto-approval,符合 issue non-goals。plan 文档为附带产物,非代码风险。
  • 验证: 覆盖 schema 存在性、allow/block 拒绝、空 allowlist 拒绝;缺 positive 透传用例与 denylist 边界(空 blocked_domains 已在 handler 处理但无对应测试)。

缺失覆盖

  • 缺 positive 用例:allowed_domains 命中允许主机时 fetch 应被调用,证明 allowHosts 未误拒且确实透传到 engine。
  • blocked_domains: [] 的测试:handler 有对应拒绝分支(tools.ts:79-84)但无测试覆盖。
  • 缺 allowed 与 blocked 同时提供时的行为验证(优先级/叠加语义),若属预期使用场景应补一条。

try {
switch (toolName) {
case 'web_fetch': {
const allowHosts = args.allowed_domains as string[] | undefined;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

args.allowed_domains as string[] | undefined 直接 cast,未校验数组元素为 string,非法输入(如 [123])会静默透传到 engine;建议在此处对元素类型做最小校验或依赖 schema 校验层明确生效。

const allowHosts = args.allowed_domains as string[] | undefined;
const denyHosts = args.blocked_domains as string[] | undefined;

if (Array.isArray(allowHosts) && allowHosts.length === 0) {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

空数组被显式拒绝,但 schema 已声明 minItems: 1——若 MCP 层已按 schema 校验,此 handler 分支为冗余防御;若 schema 未强制校验,则说明不能依赖 schema,第 71 行的元素类型也同样不可信。请明确 schema 校验是否在调用前生效,并保持两处假设一致。

throw new Error('network attempted');
});
globalThis.fetch = fetchMock as unknown as typeof fetch;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/allow/i 过于宽泛,可能匹配无关错误文案;收紧为 /not in allowlist/i 以确保断言的是 allowlist 拒绝。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat(mcp-web-fetch): expose allowed_domains and blocked_domains in web_fetch schema

2 participants