fix(provider): enable prompt caching for DashScope-routed Qwen models by peiwenz2 · Pull Request #92 · browser-use/browsercode

peiwenz2 · 2026-05-24T12:53:49Z

The applyCaching gate in provider/transform.ts only fired for @ai-sdk/anthropic and @ai-sdk/alibaba. opencode's catalog wires every DashScope (Alibaba Cloud Model Studio / Bailian) model — alibaba, alibaba-cn, alibaba-coding-plan(-cn) — through @ai-sdk/openai-compatible pointing at
https://dashscope[-intl].aliyuncs.com/compatible-mode/v1, so no cache_control markers were ever sent on the wire.

Without caching, browser tasks on qwen3.7-max pay full price every turn and the model can end up more expensive than Opus 4.7 — exactly the cost cliff hinted at in the model recommendations. Bailian's cache_control protocol is shaped like Anthropic's (5m TTL, 4-breakpoint cap, 10% cache_read / 125% cache_write), so the existing 4-marker strategy carries over without needing a separate code path.

Changes:

Add isDashScopeRoutedModel() helper covering alibaba*, dashscope, bailian and the @ai-sdk/alibaba SDK path. Include it in the applyCaching gate.
DashScope wants cache_control on a content block, not on the message envelope. System messages arrive here as strings; lift them into a single-block array before the content-level marker is applied so the AI SDK openai-compatible plugin can spread the cache_control field onto the wire block (per packages/openai-compatible/src/chat/convert-to-openai-compatible-chat-messages.ts in vercel/ai).
Add @ai-sdk/alibaba to sdkKey() so persisted providerOptions stored under the catalog's providerID (e.g. alibaba-cn) remap to the SDK's "alibaba" namespace on session reload.
Regression tests in test/provider/dashscope-cache.test.ts cover the alibaba-cn / alibaba / alibaba-coding-plan-cn paths, the 4-breakpoint cap, the string-to-block lift, the @ai-sdk/alibaba SDK path, and the gateway-exclusion contract.

Refs: https://www.alibabacloud.com/help/zh/model-studio/context-cache

Issue for this PR

Closes #

Type of change

Bug fix
New feature
Refactor / code improvement
Documentation

What does this PR do?

Please provide a description of the issue, the changes you made to fix it, and why they work. It is expected that you understand why your changes work and if you do not understand why at least say as much so a maintainer knows how much to value the PR.

If you paste a large clearly AI generated description here your PR may be IGNORED or CLOSED!

How did you verify your code works?

Screenshots / recordings

If this is a UI change, please include a screenshot or recording.

Checklist

I have tested my changes locally
I have not included unrelated changes in this PR

If you do not follow this template your PR will be automatically rejected.

Summary by cubic

Enable prompt caching for DashScope-routed models (Qwen and others) so cache_control markers are sent via the @ai-sdk/openai-compatible path. This cuts repeated-turn costs and matches Bailian’s cache protocol.

Bug Fixes
- Detect DashScope models with isDashScopeRoutedModel() for alibaba*, dashscope, bailian, and @ai-sdk/alibaba, and include them in the caching gate.
- Send cache_control on content blocks; lift string system messages to a single text block so markers attach correctly.
- Remap persisted provider options by adding @ai-sdk/alibaba to sdkKey() (maps stored alibaba-cn options to alibaba on reload).
- Add regression tests covering alibaba-cn/alibaba/alibaba-coding-plan-cn, the 4-breakpoint cap, string-to-block lift, the @ai-sdk/alibaba path, gateway exclusion, non-DashScope openai-compatible exclusion, and rename fixtures to qwen3.7-max.

^{Written for commit b49bbef. Summary will update on new commits. Review in cubic}

The applyCaching gate in provider/transform.ts only fired for @ai-sdk/anthropic and @ai-sdk/alibaba. opencode's catalog wires every DashScope (Alibaba Cloud Model Studio / Bailian) model — alibaba, alibaba-cn, alibaba-coding-plan(-cn) — through @ai-sdk/openai-compatible pointing at https://dashscope[-intl].aliyuncs.com/compatible-mode/v1, so no cache_control markers were ever sent on the wire. Without caching, browser tasks on qwen3-max pay full price every turn and the model can end up more expensive than Opus 4.7 — exactly the cost cliff hinted at in the model recommendations. Bailian's cache_control protocol is shaped like Anthropic's (5m TTL, 4-breakpoint cap, 10% cache_read / 125% cache_write), so the existing 4-marker strategy carries over without needing a separate code path. Changes: - Add isDashScopeRoutedModel() helper covering alibaba*, dashscope, bailian and the @ai-sdk/alibaba SDK path. Include it in the applyCaching gate. - DashScope wants cache_control on a content block, not on the message envelope. System messages arrive here as strings; lift them into a single-block array before the content-level marker is applied so the AI SDK openai-compatible plugin can spread the cache_control field onto the wire block (per packages/openai-compatible/src/chat/convert-to-openai-compatible-chat-messages.ts in vercel/ai). - Add @ai-sdk/alibaba to sdkKey() so persisted providerOptions stored under the catalog's providerID (e.g. alibaba-cn) remap to the SDK's "alibaba" namespace on session reload. - Regression tests in test/provider/dashscope-cache.test.ts cover the alibaba-cn / alibaba / alibaba-coding-plan-cn paths, the 4-breakpoint cap, the string-to-block lift, the @ai-sdk/alibaba SDK path, and the gateway-exclusion contract. Refs: https://www.alibabacloud.com/help/zh/model-studio/context-cache

cubic-dev-ai

No issues found across 2 files

_{Re-trigger cubic}

…n3.7-max qwen3.7-max is the model the README pricing footnote calls out — and the one users have actually been hitting the uncached cost cliff on. The cache gate added in the previous commit is keyed on providerID (alibaba* / dashscope / bailian / @ai-sdk/alibaba), so it already covers every model the catalog hangs under those providers, including qwen3.7-max, the qwen3.7-max-2026-05-20 snapshot, qwen3-max, qwen3.6-*, kimi-k2.5, and deepseek-v3.2. This commit only renames the example model id in the fixture so the test output makes that coverage visible at a glance to anyone reading the diff. No production code changes; cost numbers in the fixture are informational — the gate does not read cost. Live probe against dashscope.aliyuncs.com/compatible-mode/v1 on 2026-05-24 confirmed qwen3.7-max returns prompt_tokens_details.cache_creation / cache_creation_input_tokens / cache_type=ephemeral / cached_tokens in response usage, so the wire-format path the gate enables is honored server-side.

peiwenz2 · 2026-05-24T16:41:22Z

Ran the new regression tests locally before pushing — full output:

$ bun test --timeout 30000 test/provider/dashscope-cache.test.ts
bun test v1.3.14 (0d9b296a)

test/provider/dashscope-cache.test.ts:

 10 pass
 0 fail
 34 expect() calls
Ran 10 tests across 1 file. [101.00ms]

Coverage: alibaba-cn (DashScope CN), alibaba (intl), alibaba-coding-plan-cn (subscription tier), and the native @ai-sdk/alibaba path — all confirmed to inject cache_control on the right content block. Also pins the 4-breakpoint cap, the string-to-block lift for system messages, empty-system passthrough, the @ai-sdk/gateway exclusion contract (gateway handles caching itself via gateway: { caching: "auto" }), and the deepseek direct passthrough (still excluded — DeepSeek's direct API uses implicit caching).

Live wire verification against the actual DashScope endpoint on 2026-05-24 confirmed the response shape matches the protocol:

"prompt_tokens_details": {
  "cache_creation": { "ephemeral_5m_input_tokens": 0 },
  "cache_creation_input_tokens": 0,
  "cache_type": "ephemeral",
  "cached_tokens": 0
}

(Zeros because the smoke prompt was 41 tokens, below Bailian's 1024-token minimum — what matters is the response schema itself, which proves the server recognises the cache_control markers this PR emits.)

cubic-dev-ai Bot reviewed May 24, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(provider): enable prompt caching for DashScope-routed Qwen models#92

fix(provider): enable prompt caching for DashScope-routed Qwen models#92
peiwenz2 wants to merge 2 commits into
browser-use:mainfrom
peiwenz2:feat/dashscope-prompt-cache

peiwenz2 commented May 24, 2026 •

edited

Loading

Uh oh!

cubic-dev-ai Bot left a comment

Uh oh!

peiwenz2 commented May 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

peiwenz2 commented May 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Issue for this PR

Type of change

What does this PR do?

How did you verify your code works?

Screenshots / recordings

Checklist

Summary by cubic

Uh oh!

cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

peiwenz2 commented May 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

peiwenz2 commented May 24, 2026 •

edited

Loading