fix(provider): enable prompt caching for DashScope-routed Qwen models#92
fix(provider): enable prompt caching for DashScope-routed Qwen models#92peiwenz2 wants to merge 2 commits into
Conversation
The applyCaching gate in provider/transform.ts only fired for @ai-sdk/anthropic and @ai-sdk/alibaba. opencode's catalog wires every DashScope (Alibaba Cloud Model Studio / Bailian) model — alibaba, alibaba-cn, alibaba-coding-plan(-cn) — through @ai-sdk/openai-compatible pointing at https://dashscope[-intl].aliyuncs.com/compatible-mode/v1, so no cache_control markers were ever sent on the wire. Without caching, browser tasks on qwen3-max pay full price every turn and the model can end up more expensive than Opus 4.7 — exactly the cost cliff hinted at in the model recommendations. Bailian's cache_control protocol is shaped like Anthropic's (5m TTL, 4-breakpoint cap, 10% cache_read / 125% cache_write), so the existing 4-marker strategy carries over without needing a separate code path. Changes: - Add isDashScopeRoutedModel() helper covering alibaba*, dashscope, bailian and the @ai-sdk/alibaba SDK path. Include it in the applyCaching gate. - DashScope wants cache_control on a content block, not on the message envelope. System messages arrive here as strings; lift them into a single-block array before the content-level marker is applied so the AI SDK openai-compatible plugin can spread the cache_control field onto the wire block (per packages/openai-compatible/src/chat/convert-to-openai-compatible-chat-messages.ts in vercel/ai). - Add @ai-sdk/alibaba to sdkKey() so persisted providerOptions stored under the catalog's providerID (e.g. alibaba-cn) remap to the SDK's "alibaba" namespace on session reload. - Regression tests in test/provider/dashscope-cache.test.ts cover the alibaba-cn / alibaba / alibaba-coding-plan-cn paths, the 4-breakpoint cap, the string-to-block lift, the @ai-sdk/alibaba SDK path, and the gateway-exclusion contract. Refs: https://www.alibabacloud.com/help/zh/model-studio/context-cache
…n3.7-max qwen3.7-max is the model the README pricing footnote calls out — and the one users have actually been hitting the uncached cost cliff on. The cache gate added in the previous commit is keyed on providerID (alibaba* / dashscope / bailian / @ai-sdk/alibaba), so it already covers every model the catalog hangs under those providers, including qwen3.7-max, the qwen3.7-max-2026-05-20 snapshot, qwen3-max, qwen3.6-*, kimi-k2.5, and deepseek-v3.2. This commit only renames the example model id in the fixture so the test output makes that coverage visible at a glance to anyone reading the diff. No production code changes; cost numbers in the fixture are informational — the gate does not read cost. Live probe against dashscope.aliyuncs.com/compatible-mode/v1 on 2026-05-24 confirmed qwen3.7-max returns prompt_tokens_details.cache_creation / cache_creation_input_tokens / cache_type=ephemeral / cached_tokens in response usage, so the wire-format path the gate enables is honored server-side.
|
Ran the new regression tests locally before pushing — full output: Coverage: Live wire verification against the actual DashScope endpoint on 2026-05-24 confirmed the response shape matches the protocol: "prompt_tokens_details": {
"cache_creation": { "ephemeral_5m_input_tokens": 0 },
"cache_creation_input_tokens": 0,
"cache_type": "ephemeral",
"cached_tokens": 0
}(Zeros because the smoke prompt was 41 tokens, below Bailian's 1024-token minimum — what matters is the response schema itself, which proves the server recognises the |
The applyCaching gate in provider/transform.ts only fired for @ai-sdk/anthropic and @ai-sdk/alibaba. opencode's catalog wires every DashScope (Alibaba Cloud Model Studio / Bailian) model — alibaba, alibaba-cn, alibaba-coding-plan(-cn) — through @ai-sdk/openai-compatible pointing at
https://dashscope[-intl].aliyuncs.com/compatible-mode/v1, so no cache_control markers were ever sent on the wire.
Without caching, browser tasks on qwen3.7-max pay full price every turn and the model can end up more expensive than Opus 4.7 — exactly the cost cliff hinted at in the model recommendations. Bailian's cache_control protocol is shaped like Anthropic's (5m TTL, 4-breakpoint cap, 10% cache_read / 125% cache_write), so the existing 4-marker strategy carries over without needing a separate code path.
Changes:
Refs: https://www.alibabacloud.com/help/zh/model-studio/context-cache
Issue for this PR
Closes #
Type of change
What does this PR do?
Please provide a description of the issue, the changes you made to fix it, and why they work. It is expected that you understand why your changes work and if you do not understand why at least say as much so a maintainer knows how much to value the PR.
If you paste a large clearly AI generated description here your PR may be IGNORED or CLOSED!
How did you verify your code works?
Screenshots / recordings
If this is a UI change, please include a screenshot or recording.
Checklist
If you do not follow this template your PR will be automatically rejected.
Summary by cubic
Enable prompt caching for DashScope-routed models (Qwen and others) so
cache_controlmarkers are sent via the@ai-sdk/openai-compatiblepath. This cuts repeated-turn costs and matches Bailian’s cache protocol.isDashScopeRoutedModel()foralibaba*,dashscope,bailian, and@ai-sdk/alibaba, and include them in the caching gate.cache_controlon content blocks; lift string system messages to a single text block so markers attach correctly.@ai-sdk/alibabatosdkKey()(maps storedalibaba-cnoptions toalibabaon reload).alibaba-cn/alibaba/alibaba-coding-plan-cn, the 4-breakpoint cap, string-to-block lift, the@ai-sdk/alibabapath, gateway exclusion, non-DashScopeopenai-compatibleexclusion, and rename fixtures toqwen3.7-max.Written for commit b49bbef. Summary will update on new commits. Review in cubic