Fix(workflow)/add _process_content_object function in _rehydration_utils file to extract output from event.content object before assigning it to child.output, in _reconstruct_node_states#5909
Open
samarth1224 wants to merge 1 commit into
Conversation
…tract output from event.content object before assigning it to child.output, in _reconstruct_node_states
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Link to Issue
1. Link to an existing issue (if applicable):
NOTE
I opened the pull request before but it was closed due to v2 branch being merged into main branch.
The previous test failures in last PR were due to inherent issues in the v2 branche's unittests.
Problem:
During fresh execution of a node with
message_as_output == True,process_llm_agent_outputextracts text from theContentparts, parses it, validates it against the schema, and assigns the resulting output toevent.output.However, before the event is persisted to the session,
_consume_event_queueoptimizes formessage_as_outputnodes by strippingevent.output(setting it toNone) and only saving the rawevent.contentUpon workflow resumption,
_reconstruct_node_statesrebuilds the node's state. Becauseevent.outputisNone, it fell back to assigning the rawevent.content(aContentobject) directly tochild.output.Solution:
Added
_process_content_object()to extract the output from the rawevent.contentduring rehydration.This function mirrors the logic used in the
process_llm_agent_output function in_llm_agent_wrapper.py` fileContentobject.Parts.thoughtparts to prevent Chain-of-Thought reasoning text from leaking into the final output.Testing Plan
I have added new unit tests for the
_process_content_objectfunction. These tests cover various scenarios, including plain text extraction, JSON parsing for structured outputs, and the correct filtering of thought parts to prevent internal reasoning from leaking into the final output.Additionally, I have updated the existing rehydration test
_test_scan_message_as_outputthat previously asserted the rawContentobject was returned; It now correctly verify that the output is reconstructed during node state rehydration.Unit Tests:
Summary Unit Test
Manual End-to-End (E2E) Tests:
I have already provided the necessary setup and instruction in the description of Issue #5553, along with minimal reproducible code.
The same setup can be used as Manual E2E test.
I am presentig the minimal reprouducible code and successful mitigation of the bug in the screenshot.
Output
Additional Context
-I have tested it using the the static Graph-Based workflows. The behavior remains same, as the raw content is assigned to ctx.output during the rehydration.
-Also as specified in the comments in the
_consume_event_queuefunction inrunner.pyIt feels like it intend to set
message_as_output = Trueonly when the output_schema is not provided, butprocess_llm_agent_outputfunction inllm_agent_wrapperfile, sets themessage_as_output =Trueregardless of whether theoutput_schemais provided or not.Checklist