Problem
The llm_anthropic node instantiates ChatAnthropic with no cache configuration:
# nodes/llm_anthropic/anthropic.py:110
self._llm = ChatAnthropic(
model=model, api_key=apikey, temperature=0, max_tokens=self._modelOutputTokens
)
.rocketride/schema/llm_anthropic.json exposes model / modelTotalTokens / apikey only. There is no way for a pipeline author to opt into Anthropic prompt caching.
In a 28-minute coding-agent run, every claude-opus-4-6 call re-shipped 2–5 KB of stable system prompt plus an accumulating message history. With ~180 LLM calls across the run and large stable prefixes, cache_control: {"type": "ephemeral"} would cut input-token latency on cached blocks by ~85% and cost by ~90% per Anthropic's published numbers. Estimated savings on that run alone: ~470 s (~30%).
Proposed fix
- Add an optional
caching boolean (or finer-grained struct: { system: bool, history: bool }) to services-catalog.json profiles for llm_anthropic.
- When enabled, attach
cache_control: {"type": "ephemeral"} to system-prompt content blocks (and optionally to the most recent stable message tail) before passing to ChatAnthropic. Modern langchain-anthropic accepts this either via message content blocks or model_kwargs.
- Surface
usage.cache_creation_input_tokens / cache_read_input_tokens from responses into the flow trace (see companion issue on token-usage emission) so users can verify cache hits.
Acceptance
- A pipeline with
caching: true on an llm_anthropic profile shows non-zero cache_read_input_tokens on the second and later calls in the same session.
- Default behavior (no
caching field) is unchanged — backwards compatible.
- Schema + docs updated.
Suggested labels
enhancement, performance, cost, nodes/llm_anthropic
Problem
The
llm_anthropicnode instantiatesChatAnthropicwith no cache configuration:.rocketride/schema/llm_anthropic.jsonexposesmodel/modelTotalTokens/apikeyonly. There is no way for a pipeline author to opt into Anthropic prompt caching.In a 28-minute coding-agent run, every
claude-opus-4-6call re-shipped 2–5 KB of stable system prompt plus an accumulating message history. With ~180 LLM calls across the run and large stable prefixes,cache_control: {"type": "ephemeral"}would cut input-token latency on cached blocks by ~85% and cost by ~90% per Anthropic's published numbers. Estimated savings on that run alone: ~470 s (~30%).Proposed fix
cachingboolean (or finer-grained struct:{ system: bool, history: bool }) toservices-catalog.jsonprofiles forllm_anthropic.cache_control: {"type": "ephemeral"}to system-prompt content blocks (and optionally to the most recent stable message tail) before passing toChatAnthropic. Modernlangchain-anthropicaccepts this either via message content blocks ormodel_kwargs.usage.cache_creation_input_tokens/cache_read_input_tokensfrom responses into the flow trace (see companion issue on token-usage emission) so users can verify cache hits.Acceptance
caching: trueon anllm_anthropicprofile shows non-zerocache_read_input_tokenson the second and later calls in the same session.cachingfield) is unchanged — backwards compatible.Suggested labels
enhancement,performance,cost,nodes/llm_anthropic