feat(zai): expose configurable max output tokens for GLM models (#161)#274
feat(zai): expose configurable max output tokens for GLM models (#161)#274proyectoauraorg wants to merge 1 commit into
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (9)
🚧 Files skipped from review as they are similar to previous changes (5)
📝 WalkthroughWalkthroughAdds an optional ChangesMax Output Tokens Capability for Z.A.I GLM
Sequence DiagramsequenceDiagram
participant ThinkingBudget as ThinkingBudget (Webview UI)
participant SettingsStore as SettingsStore (setApiConfigurationField)
participant ProviderSettings as ProviderSettingsManager
participant ZAiHandler as ZAiHandler (Server)
participant ZAiAPI as Z.ai API (create)
ThinkingBudget->>SettingsStore: persist modelMaxTokens change
SettingsStore->>ProviderSettings: saved apiConfiguration/profile
ProviderSettings->>ZAiHandler: runtime loads apiConfiguration
ZAiHandler->>ZAiAPI: create(request with max_tokens = modelMaxTokens)
ZAiAPI-->>ZAiHandler: response
ZAiHandler-->>ThinkingBudget: result/state
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 ESLint
packages/types/src/model.tsESLint skipped: missing config or dependency (missing-dependency). The ESLint configuration references a package that is not available in the sandbox. packages/types/src/providers/zai.tsESLint skipped: the ESLint configuration for this file references a package that is not available in the sandbox. src/api/providers/__tests__/zai.spec.tsESLint skipped: missing config or dependency (missing-dependency). The ESLint configuration references a package that is not available in the sandbox.
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@webview-ui/src/components/settings/__tests__/ThinkingBudget.spec.tsx`:
- Around line 320-326: The test fixture glmApiConfiguration is widening
apiProvider to string causing a TS2322 error; fix by typing the fixture with the
ProviderSettings literal type (e.g., change the declaration of
glmApiConfiguration to use "satisfies ProviderSettings" — optionally combined
with "as const" to preserve literal types) so the apiProvider remains the
literal "zai" when passed into ThinkingBudget with defaultProps.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: 995ad486-eba2-459e-9c28-6802c53ac7db
📒 Files selected for processing (5)
packages/types/src/model.tspackages/types/src/providers/zai.tssrc/api/providers/__tests__/zai.spec.tswebview-ui/src/components/settings/ThinkingBudget.tsxwebview-ui/src/components/settings/__tests__/ThinkingBudget.spec.tsx
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
|
Profile export strips The export cleanup at line 572 deletes Suggested fix: const keepMaxTokens = supportsReasoningBudget || modelInfo.supportsMaxTokens
if (!supportsReasoningBudget) {
delete configs[name].modelMaxThinkingTokens
}
if (!keepMaxTokens) {
delete configs[name].modelMaxTokens
}(This file isn't in the PR diff so can't be posted as an inline comment.) |
| ) : null | ||
|
|
||
| // Models with supportsReasoningBinary (binary reasoning) show a simple on/off toggle | ||
| if (isReasoningSupported) { |
There was a problem hiding this comment.
If a model ever has both supportsReasoningBinary: true and supportsMaxTokens: true, the standalone slider won't render — this block returns before maxOutputTokensControl is reached. Worth including {maxOutputTokensControl} here alongside the checkbox, even if no current model hits this?
There was a problem hiding this comment.
Great catch — you're right that supportsReasoningBinary models (like deepseek-v3.2) would have their modelMaxThinkingTokens incorrectly stripped under the current logic.\n\nI've updated the guard to include supportsReasoningBinary in the reasoning-capable check:\n\nts\nconst supportsReasoning =\n modelInfo.supportsReasoningBudget ||\n modelInfo.requiredReasoningBudget ||\n modelInfo.supportsReasoningBinary\n\nif (supportsReasoning) {\n // keep modelMaxThinkingTokens — budget, required, or binary\n} else {\n delete configs[name].modelMaxThinkingTokens\n}\n\n\nThis ensures binary-reasoning models retain the field while non-reasoning models still get it cleaned up. Thanks for flagging this!
|
While reviewing the round-trip behavior for this feature, I found that |
a99ea3c to
6cdbb6b
Compare
|
@edelauna Thanks for the detailed code review identifying the I've created a fix PR that addresses the scoping issue: PR #332 The fix extracts Could you please review PR #332 when you get a chance? 🙏 |
|
Fixed the Windows-only Root cause: Windows CI runs vitest with Fix: mock |
|
Actionable comments posted: 0 |
| max={modelInfo.maxTokens} | ||
| step={1024} | ||
| value={[standaloneMaxOutputTokens]} | ||
| onValueChange={([value]) => setApiConfigurationField("modelMaxTokens", value)} |
There was a problem hiding this comment.
When GLM persists modelMaxTokens here, should getModelMaxOutputTokens also honor supportsMaxTokens? Z.ai sends the override as max_tokens, but task context checks call the shared helper and still clamp non-reasoning-budget models.
| } | ||
| } | ||
| return { | ||
| buildApiHandler: (config: any) => ({ |
There was a problem hiding this comment.
Could this test use the real buildApiHandler and mock only external clients instead? Since export() depends on buildApiHandler(config).getModel().info, this fake bypasses the contract the test is trying to cover.
|
|
||
| // Standalone max output tokens slider for models that advertise `supportsMaxTokens` | ||
| // (e.g. Z.ai GLM) but do not surface the reasoning-budget control. | ||
| const maxOutputTokensControl = |
There was a problem hiding this comment.
Is it worth extracting the max-tokens slider markup into a small local render helper? This repeats the same label, Slider, value display, and modelMaxTokens wiring used in the reasoning-budget branch below.
|
|
||
| import { ProviderSettingsManager, ProviderProfiles, SyncCloudProfilesResult } from "../ProviderSettingsManager" | ||
|
|
||
| // `export()` builds an API handler per profile to read model capabilities. Mock |
There was a problem hiding this comment.
Could this comment be trimmed to the durable reason for the mock? The PR/debug-history details make the setup harder to scan than a short note about deterministic model capabilities.
…Code-Org#161) Add a standalone "max output tokens" slider for models that advertise `supportsMaxTokens` (e.g. Z.ai GLM) but do not surface a reasoning budget, and send the chosen value to the provider as `max_tokens`. - ThinkingBudget: extract the max-tokens slider into a small render helper shared by the standalone control and the reasoning-budget branch, and surface it for binary-reasoning models that also support a configurable max. - getModelMaxOutputTokens: honor the user's `modelMaxTokens` override for `supportsMaxTokens` models (capped at the model ceiling) instead of the 20% context-window clamp, so the runtime budget matches what is sent to the provider. - ProviderSettingsManager.export(): preserve `modelMaxTokens` for configurable-max models while dropping it for models that support neither reasoning budgets nor a configurable max.
710b01b to
8556b61
Compare
|
Thanks! Addressed the review:
Squashed to one commit. |
Summary
Z.ai GLM models (
glm-5.1,glm-5-turbo) default to a 20% output clamp. The runtime already honors an explicitmodelMaxTokens(createStreamWithThinking:max_tokens = options.modelMaxTokens || getModelMaxOutputTokens(...)), but the settings UI never surfaced a control — those models only exposed reasoning effort. This closes that gap.Change
supportsMaxTokensonmodelInfoSchema, set on the configurable GLM models in bothinternationalZAiModelsandmainlandZAiModels.ThinkingBudgetsettings component to render a max-output-tokens slider, gated onsupportsMaxTokens && !supportsReasoningBudget. It defaults to the runtime's existing clamp when unset (behavior unchanged until the user edits), and persists an explicit value asmodelMaxTokens.Tests
zai.spec.ts): the flag is present on the configurable GLM models (absent on glm-4.7), and an explicitmodelMaxTokensoverride is sent asmax_tokensinstead of the clamp.ThinkingBudget.spec.tsx): slider renders alongside reasoning effort, defaults to the clamp when unset, reflects an override, does NOT persist on initial render (init vs user-edit), persists on user change, and does not render when the flag is absent.Closes #161
Summary by CodeRabbit
New Features
Chores
Tests