Skip to content

feat(agents): add include_sources for per-agent content source filtering#5925

Open
bofenghuang wants to merge 1 commit into
google:mainfrom
bofenghuang:feat/include-sources-content-filter
Open

feat(agents): add include_sources for per-agent content source filtering#5925
bofenghuang wants to merge 1 commit into
google:mainfrom
bofenghuang:feat/include-sources-content-filter

Conversation

@bofenghuang
Copy link
Copy Markdown

@bofenghuang bofenghuang commented Jun 1, 2026

Please ensure you have read the contribution guide before creating a pull request.

Link to Issue or Description of Change

1. Link to an existing issue (if applicable):

2. Or, if no issue exists, describe the change:

Problem:

In multi-agent pipelines, every peer agent's output is narrative-cast into a role='user' content entry ("For context: [agent_name] said: ..."). The only existing control is include_contents ('default'/'none'), which is a session-history scope control — it determines how much of the conversation history is visible. There is no agent-level source control: no way to say which agents within that history a given agent should see.

Issue #2207 documents this problem. The before_model_callback workaround proposed there has several limitations that make it a poor fit for production use:

  1. Text parsing is fragile — it regex-matches [agent_name] said: against the internal narrative format. If ADK changes that format, the workaround silently breaks with no warning.
  2. No per-source selectivity — it drops all peer entries. You cannot express "keep agent_a's output but drop agent_b's".
  3. Must be duplicated — the callback must be attached individually to every agent that needs isolation, with no composable way to express the policy.
  4. FC/FR pairs not considered — the regex can accidentally strip parts of function call/response content, and it does not maintain the function call/response pairing invariant required by the model API.
  5. Observability misalignmentbefore_model_callback on the agent runs after plugin callbacks (see base_llm_flow.py:_handle_before_model_callback). LLM observability tools (e.g. Braintrust) hook in as plugins and therefore capture llm_request.contents before the workaround strips the peer entries. What observability reports and what the model actually receives are different — making traces unreliable for debugging.

Solution:

Add include_sources: list[str] | None to LlmAgent — a declarative per-agent source control that answers "from which agents?", orthogonal to the existing session-history scope control (include_contents).

# Full history, only user + this agent's own turns
LlmAgent(include_sources=['user', 'self'])

# Full history, user + one specific upstream agent
LlmAgent(include_sources=['user', 'summarizer_agent'])

# User messages only — stateless classifier
LlmAgent(include_sources=['user'])

Reserved names: 'user' (plain human messages), 'self' (this agent's own prior model turns), any other string is matched directly against event.author.

The filter runs inside _get_contents() at the event level, before _present_other_agent_message() converts authorship into embedded text. Source identity is read from event.author metadata — no text parsing, no format coupling. Since filtering happens at the request-building stage (before any callbacks), observability plugins always see the same contents the model receives. Function call/response pairs are preserved: FC responses for the current agent's calls are tied to 'self' (dropped together with their calls), and another agent's FC responses are dropped when that agent's FC call is also filtered. Live-mode sessions are handled by mapping event.author == agent_name to the 'self' reserved name.

include_sources=[] raises ValueError at construction time (use None to disable filtering).

Testing Plan

Unit Tests:

  • I have added or updated unit tests for my change.
  • All unit tests pass locally.

New file tests/unittests/flows/llm_flows/test_contents_source_filter.py — 20 unit tests directly on _get_contents() and _get_current_turn_contents() covering: no-op (None), each reserved name, combinations, named agent filtering, FC/FR pair preservation, FC/FR co-dropping, other-agent FC response attribution, live-mode self-mapping, and _get_current_turn_contents propagation.

Extended tests/unittests/agents/test_llm_agent_include_contents.py with 4 integration tests: empty-list validation, None default, user-only isolation in a sequential pipeline (the exact case from #2207), and multi-turn composition.

tests/unittests/flows/llm_flows/test_contents_source_filter.py  20 passed
tests/unittests/agents/test_llm_agent_include_contents.py        7 passed (3 pre-existing + 4 new)
tests/unittests/flows/llm_flows/test_contents*.py               all passed
tests/unittests/agents/test_llm_agent*.py                       125 passed, 2 xfailed (pre-existing VAIS failures, unrelated)

Manual End-to-End (E2E) Tests:

Verified with a sequential pipeline (upstream → downstream) using MockModel:

=== WITHOUT include_sources  (current behaviour) ===
  [user] 'Hello from user'
  [user] "[Part(text='For context:'), Part(text='[upstream] said: Upstream result XYZ')]"

=== WITH include_sources=['user'] ===
  [user] 'Hello from user'

=== WITH include_sources=['user', 'self'] ===
  [user] 'Hello from user'

The exact pipeline from issue #2207 can be replaced with:

code_refactorer_agent = LlmAgent(
    name="CodeRefactorerAgent",
    include_sources=['user', 'self'],  # replaces before_model_callback workaround
    ...
)

Checklist

  • I have read the CONTRIBUTING.md document.
  • I have performed a self-review of my own code.
  • I have commented my code, particularly in hard-to-understand areas.
  • I have added tests that prove my fix is effective or that my feature works.
  • New and existing unit tests pass locally with my changes.
  • I have manually tested my changes end-to-end.
  • Any dependent changes have been merged and published in downstream modules.

Additional context

Two known edge cases deferred to follow-up:

  • Compaction events (author='model'): treated as other-agent with name 'model'; dropped by ['user', 'self'] if compaction is enabled.
  • Mixed FC response events: a single 'user'-authored event containing FC responses to calls from multiple agents is classified as other-reply based on the first other-agent response found.

/cc @Jacksunwei @rohityan

Add `include_sources: list[str] | None` to `LlmAgent` as an orthogonal
axis to the existing `include_contents` temporal-window control. Where
`include_contents` answers "how far back?", `include_sources` answers
"from whom?" — allowing agents in a multi-agent pipeline to declare an
allowlist of content sources rather than receiving every narrative-cast
peer output.

Reserved source names: 'user' (plain human messages), 'self' (this
agent's own prior model turns), and any agent name matched directly
against event.author before narrative casting occurs.

Filtering runs at the event level inside _get_contents(), before
_present_other_agent_message() converts authorship into embedded text,
so source identity is read from structured metadata rather than parsed
from "[agent_name] said:" strings.

Function call/response pairing is preserved: FC responses for the current
agent's own calls are tied to 'self' (dropped together with their calls
when 'self' is absent), and another agent's FC responses are dropped when
that agent's call is also filtered. Live-mode events are handled by
mapping event.author == agent_name to the 'self' reserved name, since
_is_other_agent_reply() returns True for all non-user events in live
sessions.

Raises ValueError when include_sources=[] (use None to disable).
@google-cla
Copy link
Copy Markdown

google-cla Bot commented Jun 1, 2026

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

@adk-bot adk-bot added the core [Component] This issue is related to the core interface and implementation label Jun 1, 2026
@adk-bot
Copy link
Copy Markdown
Collaborator

adk-bot commented Jun 1, 2026

Response from ADK Triaging Agent

Hello @bofenghuang, thank you for submitting this pull request!

We noticed that the Google Contributor License Agreement (CLA) check has failed for this PR. According to our contribution guidelines, all contributions must be accompanied by a signed CLA before we can accept and review your pull request.

Please visit Google CLA page to see your current agreements or to sign a new one. Once signed, the check should automatically update or you can request a re-run.

Thank you for your contribution, and we look forward to reviewing your PR once the CLA is completed!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core [Component] This issue is related to the core interface and implementation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants