Skip to content

fix(server): associate ctx-scoped elicitation/sampling with the originating request (SEP-2260)#2228

Open
felixweinberger wants to merge 14 commits into
mainfrom
fweinberger/m1-sep-2260-association
Open

fix(server): associate ctx-scoped elicitation/sampling with the originating request (SEP-2260)#2228
felixweinberger wants to merge 14 commits into
mainfrom
fweinberger/m1-sep-2260-association

Conversation

@felixweinberger
Copy link
Copy Markdown
Contributor

@felixweinberger felixweinberger commented Jun 1, 2026

🗺️ 2026-07-28 Spec Implementation — milestone tracker · This PR: M1: SEP-2260 association hardening

Implements the 2025-lineage improvements from SEP-2260 (server-request association): ctx-scoped elicitation/sampling requests now ride the originating request's response stream, and the streamable HTTP transport gains an opt-in SSE keepalive for long human-in-the-loop waits. Adds the 4 e2e requirements that pin these behaviors.

Motivation and Context

SEP-2260 (Final) requires server→client requests to be associated with the originating client request. Tracking issue: #2186.

Scope note from spec analysis: SEP-2260's MUST-strength rules never landed in the published 2025-11-25 spec pages (which remain SHOULD-strength: server→client messages "SHOULD relate to the originating client request"), and in the current draft spec they are enforced structurally by MRTR (SEP-2322) rather than as standalone rules. This PR therefore ships what is meaningful on the 2025 lineage:

  1. Association fix: ctx.mcpReq.elicitInput() / requestSampling() were sent without relatedRequestId, so on streamable HTTP they were routed to the standalone GET stream instead of the originating POST response stream (and silently undeliverable when no GET stream was open). They now carry the handled request's id, matching ctx.mcpReq.send() semantics.
  2. SSE keepalive (opt-in keepAliveInterval option): per SEP-2260's timeout considerations, transports SHOULD use SSE keepalives so connections survive long nested waits. Default off — no behavior change unless configured.

The remaining SEP-2260 rules (client rejects unsolicited server→client requests; no API for unsolicited pushes) assert draft-era semantics and land with the MRTR implementation (#2187), where they hold by construction.

How Has This Been Tested?

4 new e2e requirements with scenarios (protocol:assoc:*), all passing:

  • nested-on-originating-stream — nested elicit/sample arrives on the originating POST SSE stream
  • keepalive-during-nested — keepalive comments observed during a nested wait
  • ping-exempt — server-initiated ping remains valid outside any request context
  • plus the association behavior under enableJsonResponse documented

Full e2e matrix: 33 files, 1,237 cells = 1,145 passed + 92 expected-fail, 0 unexpected failures; no pre-existing cell changed state. Server unit tests 68/68. Typecheck/lint clean.

Breaking Changes

None intended. Behavior change (improvement): ctx-scoped elicitation/sampling requests now arrive on the originating request's response stream instead of the standalone GET stream — this is the spec-recommended routing and also makes these requests work when no GET stream is open. Handler-supplied options still override. Caveat: under enableJsonResponse mode, nested ctx-scoped requests are not deliverable (same pre-existing limitation as ctx.mcpReq.send()).

ctx-scoped elicitation/sampling in enableJsonResponse mode (or after the response stream closed) now fails with a clear error instead of being silently dropped. Elicitation fundamentally requires a streaming response. The standalone GET stream is never used as a fallback for request-associated messages.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation update

Checklist

  • I have read the MCP Documentation
  • My code follows the repository's style guidelines
  • New and existing tests pass locally
  • I have added appropriate error handling
  • I have added or updated documentation as needed

Additional context

  • API shape of keepAliveInterval (name, ms unit, opt-in default) is open for review feedback.
  • Part of the 2026-07-28 spec implementation plan (project 41); milestone M1.

@changeset-bot
Copy link
Copy Markdown

changeset-bot Bot commented Jun 1, 2026

🦋 Changeset detected

Latest commit: 21aefbe

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 2 packages
Name Type
@modelcontextprotocol/server Patch
@modelcontextprotocol/node Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@pkg-pr-new
Copy link
Copy Markdown

pkg-pr-new Bot commented Jun 1, 2026

Open in StackBlitz

@modelcontextprotocol/client

npm i https://pkg.pr.new/modelcontextprotocol/typescript-sdk/@modelcontextprotocol/client@2228

@modelcontextprotocol/codemod

npm i https://pkg.pr.new/modelcontextprotocol/typescript-sdk/@modelcontextprotocol/codemod@2228

@modelcontextprotocol/server

npm i https://pkg.pr.new/modelcontextprotocol/typescript-sdk/@modelcontextprotocol/server@2228

@modelcontextprotocol/server-legacy

npm i https://pkg.pr.new/modelcontextprotocol/typescript-sdk/@modelcontextprotocol/server-legacy@2228

@modelcontextprotocol/express

npm i https://pkg.pr.new/modelcontextprotocol/typescript-sdk/@modelcontextprotocol/express@2228

@modelcontextprotocol/fastify

npm i https://pkg.pr.new/modelcontextprotocol/typescript-sdk/@modelcontextprotocol/fastify@2228

@modelcontextprotocol/hono

npm i https://pkg.pr.new/modelcontextprotocol/typescript-sdk/@modelcontextprotocol/hono@2228

@modelcontextprotocol/node

npm i https://pkg.pr.new/modelcontextprotocol/typescript-sdk/@modelcontextprotocol/node@2228

commit: 21aefbe

@felixweinberger felixweinberger force-pushed the fweinberger/m0-conformance-draft-baseline branch from 87f0810 to 285a0c0 Compare June 1, 2026 12:41
@felixweinberger felixweinberger force-pushed the fweinberger/m1-sep-2260-association branch from d618283 to f2f21b8 Compare June 1, 2026 12:41
@felixweinberger
Copy link
Copy Markdown
Contributor Author

Review follow-up: added 12 unit tests in @modelcontextprotocol/server (association: relatedRequestId at the transport level, explicit-override precedence, top-level APIs unaffected; keepalive: timer lifecycle, cleanup on stream completion/transport close, concurrent streams). The tests caught one real edge case: keepAliveInterval <= 0 was previously fed to setInterval (Node clamps to 1ms → keepalive flood); now treated as disabled.

Comment thread packages/server/src/server/server.ts Outdated
Comment thread test/e2e/requirements.ts Outdated
@felixweinberger felixweinberger force-pushed the fweinberger/m0-conformance-draft-baseline branch from 285a0c0 to ae4bde3 Compare June 1, 2026 13:37
@felixweinberger felixweinberger force-pushed the fweinberger/m1-sep-2260-association branch from f2f21b8 to b678f44 Compare June 1, 2026 13:38
@felixweinberger
Copy link
Copy Markdown
Contributor Author

Updated per maintainer review: dropped the verbose section comment (tests are self-explanatory); eliminated the new non-null assertions by starting the keepalive inside the stream's start callback; startKeepAlive now always returns a stop function so cleanup sites call it unconditionally. Server tests 80/80, e2e 1,237 cells / 0 unexpected.

Comment thread packages/server/src/server/streamableHttp.ts
Comment thread packages/server/src/server/server.ts Outdated
Comment thread packages/server/src/server/streamableHttp.ts
@felixweinberger felixweinberger force-pushed the fweinberger/m0-conformance-draft-baseline branch from ae4bde3 to b0ce6ce Compare June 1, 2026 13:52
@felixweinberger felixweinberger force-pushed the fweinberger/m1-sep-2260-association branch from b678f44 to a109146 Compare June 1, 2026 13:52
@felixweinberger felixweinberger force-pushed the fweinberger/m0-conformance-draft-baseline branch from b0ce6ce to 2c5acb2 Compare June 1, 2026 13:59
@felixweinberger felixweinberger force-pushed the fweinberger/m1-sep-2260-association branch from a109146 to 8cfaca3 Compare June 1, 2026 14:07
Comment thread packages/server/src/server/streamableHttp.ts Outdated
Comment thread packages/server/src/server/streamableHttp.ts
@felixweinberger felixweinberger force-pushed the fweinberger/m1-sep-2260-association branch from 8cfaca3 to 54ebc75 Compare June 1, 2026 14:22
@felixweinberger felixweinberger force-pushed the fweinberger/m0-conformance-draft-baseline branch from 8c7ecc1 to 37150a7 Compare June 1, 2026 14:23
…(SEP-2260)

Adds three requirements covering server-request association on the 2025
path: nested server-to-client requests ride the originating request's SSE
response stream, the originating stream stays alive during nested waits
when transport keepalive is enabled, and ping is exempt from the
association rule.

Test-first: the nested-on-originating-stream and keepalive-during-nested
cells are red until the corresponding server changes land.
…riginating request

ctx.mcpReq.elicitInput() and ctx.mcpReq.requestSampling() now pass
relatedRequestId for the request being handled, matching ctx.mcpReq.send()
semantics. On streamable HTTP these requests are routed onto the
originating request's SSE response stream instead of the standalone GET
stream, and they work even when no GET stream is open. Explicit
handler-supplied options still override.
Adds a keepAliveInterval option to
WebStandardStreamableHTTPServerTransport. When set, the transport
periodically writes an SSE comment frame on every open SSE stream (POST
response streams and the standalone GET stream) so idle connections are
not torn down by intermediaries, e.g. while a request handler waits on a
nested elicitation. Disabled by default.
…ge cases

Adds unit coverage for the two SEP-2260 server changes:

- server.test.ts: ctx-scoped elicitInput/requestSampling pass the handled
  request's id as relatedRequestId at the transport.send level, explicit
  handler options still override, other options keep the implicit
  association, and the top-level Server.elicitInput()/createMessage()
  remain unassociated.

- streamableHttp.test.ts: keepalive is off by default (no timer
  scheduled), comment frames are written at the configured interval,
  timers are cleared when a response stream completes and when the
  transport closes mid-stream, concurrent streams keep independent
  timers, and non-positive intervals are treated as disabled.

The non-positive interval cases exposed a gap: keepAliveInterval <= 0 was
passed straight to setInterval (which clamps to 1ms, flooding the stream).
startKeepAlive now treats non-positive values as disabled.
Avoids new non-null assertions on the stream controller and makes
startKeepAlive always return a stop function (no-op when disabled), so
cleanup sites call it unconditionally. Also drops a redundant comment in
buildContext.
@felixweinberger felixweinberger force-pushed the fweinberger/m0-conformance-draft-baseline branch from 37150a7 to e9e6a0f Compare June 1, 2026 14:35
@felixweinberger felixweinberger force-pushed the fweinberger/m1-sep-2260-association branch from 54ebc75 to d3618a9 Compare June 1, 2026 14:35
@felixweinberger felixweinberger changed the base branch from fweinberger/m0-conformance-draft-baseline to main June 1, 2026 14:36
@felixweinberger
Copy link
Copy Markdown
Contributor Author

Un-stacked: this PR has no dependency on #2226 or #2227 (zero file overlap; its requirement entries use no new manifest fields) — now based directly on main, mergeable independently.

…live

- Server->client requests related to a response stream that cannot carry
  SSE (JSON-response mode, or the stream is gone) are delivered on the
  standalone SSE stream instead of being silently dropped, matching their
  routing before they carried a relatedRequestId.
- Keepalive intervals that are not finite positive numbers (NaN, Infinity)
  are treated as disabled instead of flooding streams with 1ms timers.
- The keepalive armed for an event-replay stream is stopped when replay
  fails, instead of leaking the timer.
- startKeepAlive JSDoc matches its actual contract (always returns a stop
  function) and the disabled branch reuses the noKeepAlive constant.
@felixweinberger
Copy link
Copy Markdown
Contributor Author

@claude review

…elpers

SEP-2260 requires ctx-scoped server->client requests to be associated with
the originating request - that is the purpose of the ctx helpers vs the
top-level Server methods - so the association must not be overridable.
Spreading options before relatedRequestId also fixes the gotcha where an
explicit relatedRequestId: undefined would silently clobber the default.
Matches ctx.mcpReq.send(), which already does
{ ...options, relatedRequestId: request.id }.
@felixweinberger
Copy link
Copy Markdown
Contributor Author

felixweinberger commented Jun 1, 2026

One more change from maintainer review (b979896): the ctx-scoped helpers now spread options before relatedRequestId, making the association non-overridable — { ...options, relatedRequestId: ctx.mcpReq.id } instead of { relatedRequestId: ctx.mcpReq.id, ...options }.

Rationale: SEP-2260 requires ctx-scoped server→client requests to be associated with the originating request; that's the entire purpose of the ctx helpers vs the top-level Server methods, so the association must not be overridable. This also fixes the gotcha where a caller passing an explicit relatedRequestId: undefined would silently clobber the default. Matches ctx.mcpReq.send(), which already does { ...options, relatedRequestId: request.id }.

The "explicit override wins" unit test is inverted accordingly: it now asserts the association wins over a handler-supplied relatedRequestId.

Copy link
Copy Markdown

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional findings (outside current diff — PR may have been updated during review):

  • 🟡 packages/server/src/server/streamableHttp.ts:791-802 — Sibling of the replayEvents() keepalive leak (which the existing comment incorrectly declares safe for this path): in handlePostRequest's SSE branch the keepalive timer is armed synchronously in the stream's start callback (line ~799), before the awaited writePrimingEvent() → eventStore.storeEvent() call (line ~840) — if storeEvent rejects, the catch returns a 400 without calling cleanup()/stopKeepAlive(), leaving a running interval enqueueing ': keepalive' chunks into an orphaned, never-read stream until the transport closes. Apply the same fix as the replayEvents site (start the keepalive after writePrimingEvent resolves, or call stopKeepAlive() in the catch) so a fix to replayEvents alone doesn't leave this path behind.

    Extended reasoning...

    The bug. In handlePostRequest's SSE branch, the ReadableStream constructor's start callback runs synchronously, so stopKeepAlive = this.startKeepAlive(controller, encoder) (packages/server/src/server/streamableHttp.ts:799) arms the setInterval immediately at stream construction. Only afterwards does the code reach await this.writePrimingEvent(streamController!, encoder, streamId, clientProtocolVersion) (line ~840), which calls await this._eventStore.storeEvent(streamId, …) when an eventStore is configured and the client protocol version is ≥ 2025-11-25. writePrimingEvent has no internal try/catch, so a backing-store rejection (DB/Redis hiccup, store unavailable) propagates to the function-level catch at the end of handlePostRequest (line ~868), which only calls onerror and returns a 400 -32700 Parse error response — it never calls cleanup()/stopKeepAlive(), and the Response wrapping the readable is never constructed, so the stream is orphaned with the timer still running.

    Why nothing else stops the timer. The two normal teardown paths are both unreachable on this error path: (1) onmessage is never invoked for the request (the handler loop sits after the failed await), so no response ever flows through send() and the all-responses-ready → stream.cleanup() path never fires; (2) the readable is never returned to the client, so its cancel callback can't fire either. The keepalive's own self-cleanup only triggers when controller.enqueue() throws — but enqueueing into an unread, still-readable stream does not throw (it just buffers past the high-water mark), so the catch inside startKeepAlive never runs. The interval keeps firing and the chunk queue keeps growing.

    Step-by-step proof.

    1. Configure the transport with both eventStore and keepAliveInterval: 1000 — exactly the resumable, long-human-in-the-loop deployment this PR targets.
    2. A client POSTs a tools/call (protocol version 2025-11-25). The SSE branch constructs the ReadableStream; start runs synchronously and arms the keepalive interval.
    3. The _streamMapping/_requestToStreamMapping entries are registered, then writePrimingEvent runs and eventStore.storeEvent() rejects (transient backing-store failure).
    4. Control jumps to the catch at line ~868: onerror fires and a 400 -32700 response is returned. stopKeepAlive() is never called.
    5. Every 1000 ms the interval enqueues ': keepalive\n\n' into the discarded stream's queue, indefinitely. Each subsequent failed POST adds another running timer plus its own growing buffer. They are only reclaimed when transport.close() (or closeSSEStream(requestId)) iterates _streamMapping and invokes cleanup() — which on a long-lived stateful session may be hours or days away.

    Why this isn't already covered. The existing inline comment about the keepalive leak covers only the replayEvents() site and explicitly asserts that "the same pattern in handleGetRequest/handlePostRequest is fine because those paths have no awaited fallible call between arming the timer and registering the cleanup." That claim is wrong for handlePostRequest: the awaited, fallible writePrimingEvent sits between the cleanup registration and any code path that would ever invoke that cleanup, and the catch never invokes it. Flagging this site prevents a partial fix that only patches replayEvents. The new keepalive unit tests cover stream completion and transport close but not the priming-event failure path, so nothing in CI catches it.

    Impact and scope. The keepalive timer is new in this PR, so the running-timer + growing-queue leak on this error path is PR-introduced (only the stale mapping entries were pre-existing). Mitigations keep it minor: unlike the replayEvents case, the _streamMapping entry (whose cleanup() calls stopKeepAlive()) was registered before the await, so transport.close() can eventually reclaim it; the trigger needs eventStore + keepAliveInterval + a storeEvent rejection; and the per-failure cost is one timer plus ~13 bytes per tick. It's a bounded resource leak rather than an unbounded one.

    How to fix. Same one-liner family as the replayEvents remediation: either move the startKeepAlive() call (or defer arming the timer) until after writePrimingEvent resolves, or call stopKeepAlive()/the registered cleanup() for streamId in handlePostRequest's catch before returning the 400. Whichever shape the replayEvents fix takes, applying it to this sibling site at the same time keeps the keepalive lifecycle consistent across all three stream-creation paths.

Comment thread packages/server/src/server/server.ts Outdated
Copy link
Copy Markdown

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't find any new issues in this revision and the earlier findings (JSON-mode routing regression, keepalive leak on failed replay, NaN guard, docs, JSDoc, override precedence) all look addressed by the follow-up commits. That said, this changes server→client request routing in the streamable HTTP transport and adds a new public transport option, so it should still get a maintainer's review (and a changeset) before merging.

Extended reasoning...

Overview

This PR makes ctx-scoped elicitInput()/requestSampling() carry the originating request's id as relatedRequestId (now non-overridable, matching ctx.mcpReq.send()), adds a fallback in WebStandardStreamableHTTPServerTransport.send() so related server→client requests fall back to the standalone GET stream when the related response stream cannot carry SSE, and introduces an opt-in keepAliveInterval SSE keepalive. It adds substantial unit coverage (association precedence, keepalive timer lifecycle, replay-failure cleanup) plus three new e2e requirements/scenarios and a docs/server.md mention.

Security risks

No direct security exposure: no auth, crypto, or input-trust changes. The main risks are availability-shaped — keepalive timer leaks or floods (both now guarded and tested) and silent message drops from routing changes (the JSON-mode regression I flagged earlier was fixed with a fallback path and a regression test).

Level of scrutiny

This touches the core message-routing path of the production streamable HTTP transport and changes default behavior for nested server→client requests, and it adds a new public API surface (keepAliveInterval, whose shape the author explicitly leaves open for review). Per the repo's conventions (burden of proof on addition, design review first), that warrants maintainer judgment rather than bot approval, even though the implementation now looks correct to me.

Other factors

All findings from my previous review rounds were addressed in 270945d, cd284ca, 94125e7, and b979896, each with accompanying tests; the latest commit also resolves the spread-order inconsistency by making the association non-overridable, consistent with the protocol-layer ctx helpers. The remaining open item is the missing changeset (already flagged by changeset-bot) — this PR changes published-package behavior and adds an option, so it likely needs one. The bug-hunting system found no new issues on this revision.

…s response stream

Server-to-client requests must ride the originating request's response
stream (SEP-2260). When that stream cannot carry SSE (JSON response mode,
or the stream has closed), throw a clear error instead of falling back to
the standalone GET stream. The standalone stream is not a valid fallback:
clients are not required to open one, so the previous behavior could hang
forever while also delivering the request unassociated.
A mutation-based audit showed four of the five ctx-association unit tests
only catch regressions that the e2e association scenarios already catch
over real streamable HTTP. Keep the override-resistance test, which is the
sole guard for handler-supplied relatedRequestId being ignored (no e2e
scenario passes an explicit relatedRequestId).
Comment thread docs/server.md Outdated
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant