Skip to content

feat(agent): add stream_final_turn_only parameter to stream_async#2104

Open
zhifanl wants to merge 2 commits into
strands-agents:mainfrom
zhifanl:feat/stream-final-turn-only
Open

feat(agent): add stream_final_turn_only parameter to stream_async#2104
zhifanl wants to merge 2 commits into
strands-agents:mainfrom
zhifanl:feat/stream-final-turn-only

Conversation

@zhifanl
Copy link
Copy Markdown

@zhifanl zhifanl commented Apr 9, 2026

Motivation

When using stream_async with tool-using agents, text events from every model turn are yielded to the caller — including intermediate reasoning before tool calls. For production chat UIs and SSE endpoints, this is noise. The only workaround today requires consumers to implement fragile buffering logic that depends on SDK internals like start_event_loop, raw messageStop events, and the end_turntool_use override.

This adds a first-class SDK option to stream only the final answer, eliminating the need for consumer-side buffering.

Resolves: #2055

Public API Changes

Agent.stream_async accepts a new stream_final_turn_only keyword argument:

# Before: consumers receive text from ALL model turns
async for event in agent.stream_async("Analyze this data"):
    if "data" in event:
        yield event["data"]  # Includes intermediate "Let me look that up..." text

# After: consumers receive text only from the final turn
async for event in agent.stream_async("Analyze this data", stream_final_turn_only=True):
    if "data" in event:
        yield event["data"]  # Only final answer tokens

When stream_final_turn_only=True, intermediate turn text events are buffered internally and discarded when the turn ends with tool use. Text from the final turn (where stop_reason == "end_turn") is flushed to both the caller and callback handler. Non-text events (lifecycle, tool use, reasoning, citations, model stream chunks) pass through unchanged regardless of this setting.

Default is False — fully backward compatible, no behavior change unless opted in.

Use Cases

  • Chat applications streaming via SSE where users should only see the final answer
  • API endpoints wrapping agents where downstream consumers expect a single coherent streamed response
  • Any production deployment where intermediate model reasoning is noise for the end user

Related Issues

#2055

Type of Change

New feature

Testing

  • 8 unit tests covering backward compatibility, single/multi-turn scenarios, callback handler behavior, empty final turns, and non-text event passthrough
  • All 408 agent tests pass
  • I ran hatch run prepare

All test passed

Checklist

  • I have read the CONTRIBUTING document
  • I have added any necessary tests that prove my fix is effective or my feature works
  • I have updated the documentation accordingly - link
  • I have added an appropriate example to the documentation to outline the feature, or no new docs are needed - Will update once gather positive feedback
  • My changes generate no new warnings
  • Any dependent changes have been merged and published

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@zhifanl
Copy link
Copy Markdown
Author

zhifanl commented Apr 14, 2026

can anyone help take a look at this?

@zhifanl
Copy link
Copy Markdown
Author

zhifanl commented May 12, 2026

@yonib05 yonib05 added area-async Related to asynchronous flows or multi-threading area-devx Developer experience improvements labels May 27, 2026
Tom Li added 2 commits May 27, 2026 11:56
Add a stream_final_turn_only parameter to Agent.stream_async that buffers
intermediate turn text events and only yields text from the final model
turn. Non-text events (lifecycle, tool use, reasoning, citations) pass
through unchanged.

Closes strands-agents#2055
@JackYPCOnline JackYPCOnline force-pushed the feat/stream-final-turn-only branch from fc7c4d1 to 5815a8b Compare May 27, 2026 15:58
@github-actions github-actions Bot added size/m and removed size/m labels May 27, 2026
@JackYPCOnline JackYPCOnline self-assigned this May 27, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented May 27, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

continue
elif isinstance(event, EventLoopStopEvent):
stop_reason = event["stop"][0]
if stop_reason == "end_turn":
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue: When stream_final_turn_only=True and the final turn ends with a non-end_turn stop reason (e.g., max_tokens, cancelled, content_filtered), all buffered text from that turn is silently discarded. In production, this means if a model hits its token limit on the final turn, the user receives zero text output with no indication of what happened.

Suggestion: Consider flushing buffered text for any stop reason that is not tool_use (since tool_use is the only reason that indicates "this isn't the final turn"). For example:

elif isinstance(event, EventLoopStopEvent):
    stop_reason = event["stop"][0]
    if stop_reason != "tool_use":
        for buffered in text_event_buffer:
            callback_handler(**buffered)
            yield buffered
    text_event_buffer.clear()

This way, if the agent is cancelled or hits max_tokens on the final turn, the partial text is still delivered to the caller. If you decide to keep the current behavior, please document explicitly in the docstring that text is only delivered for end_turn stop reasons (not just "final turn").

text events from the final turn (where stop_reason is "end_turn"). Non-text events such as
lifecycle, tool use, reasoning, and citation events are yielded normally regardless of this
setting. When False (default), all events are yielded as they are produced with no change
in behavior.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue: The docstring says "Non-text events such as lifecycle, tool use, reasoning, and citation events are yielded normally regardless of this setting." While accurate, this creates an asymmetry that may confuse users: reasoning text from intermediate turns passes through (it's a ReasoningTextStreamEvent, not a TextStreamEvent), but regular text from those same turns does not. For agents using extended thinking, users would see intermediate reasoning but not intermediate text.

Suggestion: Consider calling this out explicitly in the docstring with a brief note, e.g.:

Note: Reasoning events from intermediate turns are still yielded since they are distinct
from text stream events. Only {"data": ...} text events are buffered/filtered.

@github-actions
Copy link
Copy Markdown
Contributor

Assessment: Comment

Clean implementation of a useful feature that addresses a real pain point for production streaming use cases. The approach of buffering at the stream_async level using existing typed events is well-designed and minimally invasive.

Review Categories
  • Edge case handling: The current implementation only flushes buffered text for stop_reason == "end_turn", which means max_tokens, cancelled, and other terminal stop reasons silently discard text. This is the primary concern.
  • Testing: Good coverage of the happy path and multi-turn scenarios. Missing tests for non-end_turn final stop reasons.
  • Documentation: The docstring could better clarify the reasoning/text asymmetry for intermediate turns.

The overall design is solid and the test suite is thorough for the core scenarios.

@JackYPCOnline
Copy link
Copy Markdown
Contributor

Hi @zhifanl,

Thank you for submitting this PR! I’ve gone ahead and rebased it to align with the latest changes in the main branch

Could you please review the feedback/comments left on the PR and consider addressing the suggested fixes?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This adds a first-class SDK option to stream only the final answer, eliminating the need for consumer-side buffering.

What's the use case for this versus agent.invoke? The events are buffered as is so I'm not clear why you would use this instead of agent.invoke which provides the completed message as well

@yonib05 yonib05 added the python Pull requests that update python code label May 29, 2026
@yonib05 yonib05 added the enhancement New feature or request label May 29, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area-async Related to asynchronous flows or multi-threading area-devx Developer experience improvements enhancement New feature or request python Pull requests that update python code size/m

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE] Make agent only yield final reponse

4 participants