Skip to content

fix(provider): enable prompt caching for DashScope-routed Qwen models#92

Open
peiwenz2 wants to merge 2 commits into
browser-use:mainfrom
peiwenz2:feat/dashscope-prompt-cache
Open

fix(provider): enable prompt caching for DashScope-routed Qwen models#92
peiwenz2 wants to merge 2 commits into
browser-use:mainfrom
peiwenz2:feat/dashscope-prompt-cache

Conversation

@peiwenz2
Copy link
Copy Markdown

@peiwenz2 peiwenz2 commented May 24, 2026

The applyCaching gate in provider/transform.ts only fired for @ai-sdk/anthropic and @ai-sdk/alibaba. opencode's catalog wires every DashScope (Alibaba Cloud Model Studio / Bailian) model — alibaba, alibaba-cn, alibaba-coding-plan(-cn) — through @ai-sdk/openai-compatible pointing at
https://dashscope[-intl].aliyuncs.com/compatible-mode/v1, so no cache_control markers were ever sent on the wire.

Without caching, browser tasks on qwen3.7-max pay full price every turn and the model can end up more expensive than Opus 4.7 — exactly the cost cliff hinted at in the model recommendations. Bailian's cache_control protocol is shaped like Anthropic's (5m TTL, 4-breakpoint cap, 10% cache_read / 125% cache_write), so the existing 4-marker strategy carries over without needing a separate code path.

Changes:

  • Add isDashScopeRoutedModel() helper covering alibaba*, dashscope, bailian and the @ai-sdk/alibaba SDK path. Include it in the applyCaching gate.
  • DashScope wants cache_control on a content block, not on the message envelope. System messages arrive here as strings; lift them into a single-block array before the content-level marker is applied so the AI SDK openai-compatible plugin can spread the cache_control field onto the wire block (per packages/openai-compatible/src/chat/convert-to-openai-compatible-chat-messages.ts in vercel/ai).
  • Add @ai-sdk/alibaba to sdkKey() so persisted providerOptions stored under the catalog's providerID (e.g. alibaba-cn) remap to the SDK's "alibaba" namespace on session reload.
  • Regression tests in test/provider/dashscope-cache.test.ts cover the alibaba-cn / alibaba / alibaba-coding-plan-cn paths, the 4-breakpoint cap, the string-to-block lift, the @ai-sdk/alibaba SDK path, and the gateway-exclusion contract.

Refs: https://www.alibabacloud.com/help/zh/model-studio/context-cache

Issue for this PR

Closes #

Type of change

  • Bug fix
  • New feature
  • Refactor / code improvement
  • Documentation

What does this PR do?

Please provide a description of the issue, the changes you made to fix it, and why they work. It is expected that you understand why your changes work and if you do not understand why at least say as much so a maintainer knows how much to value the PR.

If you paste a large clearly AI generated description here your PR may be IGNORED or CLOSED!

How did you verify your code works?

Screenshots / recordings

If this is a UI change, please include a screenshot or recording.

Checklist

  • I have tested my changes locally
  • I have not included unrelated changes in this PR

If you do not follow this template your PR will be automatically rejected.


Summary by cubic

Enable prompt caching for DashScope-routed models (Qwen and others) so cache_control markers are sent via the @ai-sdk/openai-compatible path. This cuts repeated-turn costs and matches Bailian’s cache protocol.

  • Bug Fixes
    • Detect DashScope models with isDashScopeRoutedModel() for alibaba*, dashscope, bailian, and @ai-sdk/alibaba, and include them in the caching gate.
    • Send cache_control on content blocks; lift string system messages to a single text block so markers attach correctly.
    • Remap persisted provider options by adding @ai-sdk/alibaba to sdkKey() (maps stored alibaba-cn options to alibaba on reload).
    • Add regression tests covering alibaba-cn/alibaba/alibaba-coding-plan-cn, the 4-breakpoint cap, string-to-block lift, the @ai-sdk/alibaba path, gateway exclusion, non-DashScope openai-compatible exclusion, and rename fixtures to qwen3.7-max.

Written for commit b49bbef. Summary will update on new commits. Review in cubic

The applyCaching gate in provider/transform.ts only fired for @ai-sdk/anthropic
and @ai-sdk/alibaba. opencode's catalog wires every DashScope (Alibaba Cloud
Model Studio / Bailian) model — alibaba, alibaba-cn, alibaba-coding-plan(-cn)
— through @ai-sdk/openai-compatible pointing at
https://dashscope[-intl].aliyuncs.com/compatible-mode/v1, so no cache_control
markers were ever sent on the wire.

Without caching, browser tasks on qwen3-max pay full price every turn and
the model can end up more expensive than Opus 4.7 — exactly the cost cliff
hinted at in the model recommendations. Bailian's cache_control protocol is
shaped like Anthropic's (5m TTL, 4-breakpoint cap, 10% cache_read /
125% cache_write), so the existing 4-marker strategy carries over without
needing a separate code path.

Changes:
- Add isDashScopeRoutedModel() helper covering alibaba*, dashscope, bailian
  and the @ai-sdk/alibaba SDK path. Include it in the applyCaching gate.
- DashScope wants cache_control on a content block, not on the message
  envelope. System messages arrive here as strings; lift them into a
  single-block array before the content-level marker is applied so the
  AI SDK openai-compatible plugin can spread the cache_control field onto
  the wire block (per
  packages/openai-compatible/src/chat/convert-to-openai-compatible-chat-messages.ts
  in vercel/ai).
- Add @ai-sdk/alibaba to sdkKey() so persisted providerOptions stored under
  the catalog's providerID (e.g. alibaba-cn) remap to the SDK's "alibaba"
  namespace on session reload.
- Regression tests in test/provider/dashscope-cache.test.ts cover the
  alibaba-cn / alibaba / alibaba-coding-plan-cn paths, the 4-breakpoint
  cap, the string-to-block lift, the @ai-sdk/alibaba SDK path, and the
  gateway-exclusion contract.

Refs: https://www.alibabacloud.com/help/zh/model-studio/context-cache
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 2 files

Re-trigger cubic

…n3.7-max

qwen3.7-max is the model the README pricing footnote calls out — and the
one users have actually been hitting the uncached cost cliff on. The cache
gate added in the previous commit is keyed on providerID (alibaba* /
dashscope / bailian / @ai-sdk/alibaba), so it already covers every model
the catalog hangs under those providers, including qwen3.7-max, the
qwen3.7-max-2026-05-20 snapshot, qwen3-max, qwen3.6-*, kimi-k2.5, and
deepseek-v3.2.

This commit only renames the example model id in the fixture so the test
output makes that coverage visible at a glance to anyone reading the diff.
No production code changes; cost numbers in the fixture are informational —
the gate does not read cost. Live probe against
dashscope.aliyuncs.com/compatible-mode/v1 on 2026-05-24 confirmed
qwen3.7-max returns prompt_tokens_details.cache_creation /
cache_creation_input_tokens / cache_type=ephemeral / cached_tokens in
response usage, so the wire-format path the gate enables is honored
server-side.
@peiwenz2
Copy link
Copy Markdown
Author

Ran the new regression tests locally before pushing — full output:

$ bun test --timeout 30000 test/provider/dashscope-cache.test.ts
bun test v1.3.14 (0d9b296a)

test/provider/dashscope-cache.test.ts:

 10 pass
 0 fail
 34 expect() calls
Ran 10 tests across 1 file. [101.00ms]

Coverage: alibaba-cn (DashScope CN), alibaba (intl), alibaba-coding-plan-cn (subscription tier), and the native @ai-sdk/alibaba path — all confirmed to inject cache_control on the right content block. Also pins the 4-breakpoint cap, the string-to-block lift for system messages, empty-system passthrough, the @ai-sdk/gateway exclusion contract (gateway handles caching itself via gateway: { caching: "auto" }), and the deepseek direct passthrough (still excluded — DeepSeek's direct API uses implicit caching).

Live wire verification against the actual DashScope endpoint on 2026-05-24 confirmed the response shape matches the protocol:

"prompt_tokens_details": {
  "cache_creation": { "ephemeral_5m_input_tokens": 0 },
  "cache_creation_input_tokens": 0,
  "cache_type": "ephemeral",
  "cached_tokens": 0
}

(Zeros because the smoke prompt was 41 tokens, below Bailian's 1024-token minimum — what matters is the response schema itself, which proves the server recognises the cache_control markers this PR emits.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant