feat(agent): add stream_final_turn_only parameter to stream_async#2104
feat(agent): add stream_final_turn_only parameter to stream_async#2104zhifanl wants to merge 2 commits into
Conversation
|
can anyone help take a look at this? |
|
Can anyone help me approve this? https://github.com/strands-agents/sdk-python/actions/runs/24648225417/job/72065246092?pr=2104 |
Add a stream_final_turn_only parameter to Agent.stream_async that buffers intermediate turn text events and only yields text from the final model turn. Non-text events (lifecycle, tool use, reasoning, citations) pass through unchanged. Closes strands-agents#2055
fc7c4d1 to
5815a8b
Compare
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
| continue | ||
| elif isinstance(event, EventLoopStopEvent): | ||
| stop_reason = event["stop"][0] | ||
| if stop_reason == "end_turn": |
There was a problem hiding this comment.
Issue: When stream_final_turn_only=True and the final turn ends with a non-end_turn stop reason (e.g., max_tokens, cancelled, content_filtered), all buffered text from that turn is silently discarded. In production, this means if a model hits its token limit on the final turn, the user receives zero text output with no indication of what happened.
Suggestion: Consider flushing buffered text for any stop reason that is not tool_use (since tool_use is the only reason that indicates "this isn't the final turn"). For example:
elif isinstance(event, EventLoopStopEvent):
stop_reason = event["stop"][0]
if stop_reason != "tool_use":
for buffered in text_event_buffer:
callback_handler(**buffered)
yield buffered
text_event_buffer.clear()This way, if the agent is cancelled or hits max_tokens on the final turn, the partial text is still delivered to the caller. If you decide to keep the current behavior, please document explicitly in the docstring that text is only delivered for end_turn stop reasons (not just "final turn").
| text events from the final turn (where stop_reason is "end_turn"). Non-text events such as | ||
| lifecycle, tool use, reasoning, and citation events are yielded normally regardless of this | ||
| setting. When False (default), all events are yielded as they are produced with no change | ||
| in behavior. |
There was a problem hiding this comment.
Issue: The docstring says "Non-text events such as lifecycle, tool use, reasoning, and citation events are yielded normally regardless of this setting." While accurate, this creates an asymmetry that may confuse users: reasoning text from intermediate turns passes through (it's a ReasoningTextStreamEvent, not a TextStreamEvent), but regular text from those same turns does not. For agents using extended thinking, users would see intermediate reasoning but not intermediate text.
Suggestion: Consider calling this out explicitly in the docstring with a brief note, e.g.:
Note: Reasoning events from intermediate turns are still yielded since they are distinct
from text stream events. Only {"data": ...} text events are buffered/filtered.
|
Assessment: Comment Clean implementation of a useful feature that addresses a real pain point for production streaming use cases. The approach of buffering at the Review Categories
The overall design is solid and the test suite is thorough for the core scenarios. |
|
Hi @zhifanl, Thank you for submitting this PR! I’ve gone ahead and rebased it to align with the latest changes in the main branch Could you please review the feedback/comments left on the PR and consider addressing the suggested fixes? |
There was a problem hiding this comment.
This adds a first-class SDK option to stream only the final answer, eliminating the need for consumer-side buffering.
What's the use case for this versus agent.invoke? The events are buffered as is so I'm not clear why you would use this instead of agent.invoke which provides the completed message as well
Motivation
When using
stream_asyncwith tool-using agents, text events from every model turn are yielded to the caller — including intermediate reasoning before tool calls. For production chat UIs and SSE endpoints, this is noise. The only workaround today requires consumers to implement fragile buffering logic that depends on SDK internals likestart_event_loop, rawmessageStopevents, and theend_turn→tool_useoverride.This adds a first-class SDK option to stream only the final answer, eliminating the need for consumer-side buffering.
Resolves: #2055
Public API Changes
Agent.stream_asyncaccepts a newstream_final_turn_onlykeyword argument:When
stream_final_turn_only=True, intermediate turn text events are buffered internally and discarded when the turn ends with tool use. Text from the final turn (wherestop_reason == "end_turn") is flushed to both the caller and callback handler. Non-text events (lifecycle, tool use, reasoning, citations, model stream chunks) pass through unchanged regardless of this setting.Default is
False— fully backward compatible, no behavior change unless opted in.Use Cases
Related Issues
#2055
Type of Change
New feature
Testing
hatch run prepareAll test passed
Checklist
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.