Skip to content

feat(zai): expose configurable max output tokens for GLM models (#161)#274

Open
proyectoauraorg wants to merge 1 commit into
Zoo-Code-Org:mainfrom
proyectoauraorg:feat/161-zai-glm-max-output
Open

feat(zai): expose configurable max output tokens for GLM models (#161)#274
proyectoauraorg wants to merge 1 commit into
Zoo-Code-Org:mainfrom
proyectoauraorg:feat/161-zai-glm-max-output

Conversation

@proyectoauraorg
Copy link
Copy Markdown
Contributor

@proyectoauraorg proyectoauraorg commented May 24, 2026

Summary

Z.ai GLM models (glm-5.1, glm-5-turbo) default to a 20% output clamp. The runtime already honors an explicit modelMaxTokens (createStreamWithThinking: max_tokens = options.modelMaxTokens || getModelMaxOutputTokens(...)), but the settings UI never surfaced a control — those models only exposed reasoning effort. This closes that gap.

Change

  • New optional capability flag supportsMaxTokens on modelInfoSchema, set on the configurable GLM models in both internationalZAiModels and mainlandZAiModels.
  • Reuses the existing ThinkingBudget settings component to render a max-output-tokens slider, gated on supportsMaxTokens && !supportsReasoningBudget. It defaults to the runtime's existing clamp when unset (behavior unchanged until the user edits), and persists an explicit value as modelMaxTokens.

Tests

  • Backend (zai.spec.ts): the flag is present on the configurable GLM models (absent on glm-4.7), and an explicit modelMaxTokens override is sent as max_tokens instead of the clamp.
  • Webview (ThinkingBudget.spec.tsx): slider renders alongside reasoning effort, defaults to the clamp when unset, reflects an override, does NOT persist on initial render (init vs user-edit), persists on user change, and does not render when the flag is absent.

Closes #161

Summary by CodeRabbit

  • New Features

    • Added a model capability flag for configurable max-output-tokens; supported models show a dedicated max-output-tokens slider that defaults to a runtime-clamped value and honors explicit modelMaxTokens overrides (capped by model limits).
  • Chores

    • Export logic refined to preserve or strip modelMaxTokens and thinking-token fields based on model capabilities.
  • Tests

    • Added tests for capability advertising, slider rendering/behavior and persistence, max-token override/capping, and export rules.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 24, 2026

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: c73eb9fb-0150-4da5-af4b-be7379980e87

📥 Commits

Reviewing files that changed from the base of the PR and between 710b01b and 8556b61.

📒 Files selected for processing (9)
  • packages/types/src/model.ts
  • packages/types/src/providers/zai.ts
  • src/api/providers/__tests__/zai.spec.ts
  • src/core/config/ProviderSettingsManager.ts
  • src/core/config/__tests__/ProviderSettingsManager.spec.ts
  • src/shared/__tests__/api.spec.ts
  • src/shared/api.ts
  • webview-ui/src/components/settings/ThinkingBudget.tsx
  • webview-ui/src/components/settings/__tests__/ThinkingBudget.spec.tsx
🚧 Files skipped from review as they are similar to previous changes (5)
  • src/api/providers/tests/zai.spec.ts
  • src/core/config/ProviderSettingsManager.ts
  • packages/types/src/model.ts
  • packages/types/src/providers/zai.ts
  • webview-ui/src/components/settings/tests/ThinkingBudget.spec.tsx

📝 Walkthrough

Walkthrough

Adds an optional supportsMaxTokens model capability, marks Z.AI GLM variants as supporting it, surfaces a conditional max-output-tokens slider with runtime defaults and persistence, and adds backend and UI tests validating capability advertising and explicit override behavior.

Changes

Max Output Tokens Capability for Z.A.I GLM

Layer / File(s) Summary
Type contract and provider metadata
packages/types/src/model.ts, packages/types/src/providers/zai.ts
modelInfoSchema adds optional supportsMaxTokens; GLM-5.1 and GLM-5-turbo (international and mainland) are marked supportsMaxTokens: true.
Backend token override behavior
src/api/providers/__tests__/zai.spec.ts
Tests verify supportsMaxTokens is advertised for configurable GLMs and that explicit modelMaxTokens passed into ZAiHandler becomes max_tokens in the provider create call.
Provider settings export filtering
src/core/config/ProviderSettingsManager.ts, src/core/config/__tests__/ProviderSettingsManager.spec.ts
Splits export gating so modelMaxThinkingTokens is removed only when reasoning budgets are unsupported, while modelMaxTokens is removed only when the model supports neither reasoning budgets nor configurable max tokens; tests added for both cases.
Runtime max-output computation
src/shared/api.ts, src/shared/__tests__/api.spec.ts
getModelMaxOutputTokens now honors settings.modelMaxTokens for models with supportsMaxTokens, capping overrides to the model ceiling; tests added for override and ceiling behavior.
ThinkingBudget UI integration
webview-ui/src/components/settings/ThinkingBudget.tsx
Imports getModelMaxOutputTokens, conditionally renders a standalone max-output-tokens slider for supportsMaxTokens models, introduces renderMaxTokensSlider, and reworks reasoning-effort rendering to include the control.
UI test coverage for max-tokens behavior
webview-ui/src/components/settings/__tests__/ThinkingBudget.spec.tsx
Adds tests for slider rendering with supportsMaxTokens, default/clamped initialization, explicit override behavior, non-persistence on initial render, persistence on user interaction, and conditional hiding when flag is absent.

Sequence Diagram

sequenceDiagram
  participant ThinkingBudget as ThinkingBudget (Webview UI)
  participant SettingsStore as SettingsStore (setApiConfigurationField)
  participant ProviderSettings as ProviderSettingsManager
  participant ZAiHandler as ZAiHandler (Server)
  participant ZAiAPI as Z.ai API (create)

  ThinkingBudget->>SettingsStore: persist modelMaxTokens change
  SettingsStore->>ProviderSettings: saved apiConfiguration/profile
  ProviderSettings->>ZAiHandler: runtime loads apiConfiguration
  ZAiHandler->>ZAiAPI: create(request with max_tokens = modelMaxTokens)
  ZAiAPI-->>ZAiHandler: response
  ZAiHandler-->>ThinkingBudget: result/state
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Suggested reviewers

  • edelauna
  • taltas
  • hannesrudolph
  • JamesRobert20
  • navedmerchant

Poem

A rabbit twists a tiny knob, 🐇
Numbers tumble, tokens bob,
Clamp gives way to a gentle slide,
User sets how far output strides,
Hop, persist, and watch it glide.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main change: adding a configurable max output tokens feature for Z.ai GLM models.
Description check ✅ Passed The PR description comprehensively covers the implementation details, testing approach, and rationale for the changes.
Linked Issues check ✅ Passed All coding requirements from issue #161 are met: supportsMaxTokens flag added, UI slider rendered for applicable models, modelMaxTokens persisted and honored, default clamp preserved when unset.
Out of Scope Changes check ✅ Passed All changes directly support the stated objective of exposing configurable max output tokens for Z.ai GLM models; no unrelated modifications detected.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 ESLint

If the error stems from missing dependencies, add them to the package.json file. For unrecoverable errors (e.g., due to private dependencies), disable the tool in the CodeRabbit configuration.

packages/types/src/model.ts

ESLint skipped: missing config or dependency (missing-dependency). The ESLint configuration references a package that is not available in the sandbox.

packages/types/src/providers/zai.ts

ESLint skipped: the ESLint configuration for this file references a package that is not available in the sandbox.

src/api/providers/__tests__/zai.spec.ts

ESLint skipped: missing config or dependency (missing-dependency). The ESLint configuration references a package that is not available in the sandbox.

  • 6 others

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@webview-ui/src/components/settings/__tests__/ThinkingBudget.spec.tsx`:
- Around line 320-326: The test fixture glmApiConfiguration is widening
apiProvider to string causing a TS2322 error; fix by typing the fixture with the
ProviderSettings literal type (e.g., change the declaration of
glmApiConfiguration to use "satisfies ProviderSettings" — optionally combined
with "as const" to preserve literal types) so the apiProvider remains the
literal "zai" when passed into ThinkingBudget with defaultProps.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 995ad486-eba2-459e-9c28-6802c53ac7db

📥 Commits

Reviewing files that changed from the base of the PR and between b5c5e21 and 9bd5fbd.

📒 Files selected for processing (5)
  • packages/types/src/model.ts
  • packages/types/src/providers/zai.ts
  • src/api/providers/__tests__/zai.spec.ts
  • webview-ui/src/components/settings/ThinkingBudget.tsx
  • webview-ui/src/components/settings/__tests__/ThinkingBudget.spec.tsx

Comment thread webview-ui/src/components/settings/__tests__/ThinkingBudget.spec.tsx Outdated
@codecov
Copy link
Copy Markdown

codecov Bot commented May 24, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@edelauna
Copy link
Copy Markdown
Contributor

Profile export strips modelMaxTokens for GLM models (src/core/config/ProviderSettingsManager.ts:572)

The export cleanup at line 572 deletes modelMaxTokens when supportsReasoningBudget is falsy. GLM models only advertise supportsMaxTokens + supportsReasoningEffort — neither reasoning-budget flag — so any custom max-tokens value the user sets via the new slider will be silently stripped on profile export. Re-importing the profile then resets the slider to the default clamp.

Suggested fix:

const keepMaxTokens = supportsReasoningBudget || modelInfo.supportsMaxTokens
if (!supportsReasoningBudget) {
    delete configs[name].modelMaxThinkingTokens
}
if (!keepMaxTokens) {
    delete configs[name].modelMaxTokens
}

(This file isn't in the PR diff so can't be posted as an inline comment.)

) : null

// Models with supportsReasoningBinary (binary reasoning) show a simple on/off toggle
if (isReasoningSupported) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a model ever has both supportsReasoningBinary: true and supportsMaxTokens: true, the standalone slider won't render — this block returns before maxOutputTokensControl is reached. Worth including {maxOutputTokensControl} here alongside the checkbox, even if no current model hits this?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great catch — you're right that supportsReasoningBinary models (like deepseek-v3.2) would have their modelMaxThinkingTokens incorrectly stripped under the current logic.\n\nI've updated the guard to include supportsReasoningBinary in the reasoning-capable check:\n\nts\nconst supportsReasoning =\n modelInfo.supportsReasoningBudget ||\n modelInfo.requiredReasoningBudget ||\n modelInfo.supportsReasoningBinary\n\nif (supportsReasoning) {\n // keep modelMaxThinkingTokens — budget, required, or binary\n} else {\n delete configs[name].modelMaxThinkingTokens\n}\n\n\nThis ensures binary-reasoning models retain the field while non-reasoning models still get it cleaned up. Thanks for flagging this!

@proyectoauraorg
Copy link
Copy Markdown
Contributor Author

While reviewing the round-trip behavior for this feature, I found that ProviderSettingsManager.export() strips modelMaxTokens for any model without a reasoning budget — which would silently wipe the GLM max-output value this PR introduces on export/import. I've split the two deletions so modelMaxTokens is preserved whenever the model sets supportsMaxTokens (as GLM does), while modelMaxThinkingTokens still requires a reasoning budget. Added tests covering both branches (GLM preserves, a plain model strips both).

@proyectoauraorg
Copy link
Copy Markdown
Contributor Author

@edelauna Thanks for the detailed code review identifying the modelMaxTokens bug!

I've created a fix PR that addresses the scoping issue: PR #332

The fix extracts maxOutputTokensControl as a named JSX variable declared before both the binary reasoning and budget reasoning conditional branches, making the max output tokens slider available for all reasoning models (including supportsReasoningBinary models like Zhipu GLM).

Could you please review PR #332 when you get a chance? 🙏

@proyectoauraorg
Copy link
Copy Markdown
Contributor Author

Fixed the Windows-only platform-unit-test failure (ProviderSettingsManager > Export > should preserve modelMaxTokens … (e.g. GLM)). It passed on ubuntu/local but failed on Windows.

Root cause: Windows CI runs vitest with poolOptions.forks.singleFork (all spec files share one process). ProviderSettingsManager.spec used the real buildApiHandler, so a sibling spec's module-level vi.mock("../../../api") (e.g. importExport.spec) could bleed in under reduced isolation — making modelInfo.supportsMaxTokens read as undefined, so export() dropped modelMaxTokens and the assertion failed. I reproduced the exact contamination locally by forcing --pool=forks --poolOptions.forks.singleFork --no-isolate.

Fix: mock buildApiHandler directly in ProviderSettingsManager.spec, deriving model capabilities from the real @roo-code/types definitions via vi.importActual. The Export tests now exercise the token-field filtering deterministically and are immune to cross-file api mocks, while still validating real capabilities (glm-5.1 supportsMaxTokens, claude-3-5-haiku neither). Locally: tsc + eslint clean, core/config 231/231, and the suite passes under the exact Windows singleFork config.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 28, 2026

Actionable comments posted: 0

max={modelInfo.maxTokens}
step={1024}
value={[standaloneMaxOutputTokens]}
onValueChange={([value]) => setApiConfigurationField("modelMaxTokens", value)}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When GLM persists modelMaxTokens here, should getModelMaxOutputTokens also honor supportsMaxTokens? Z.ai sends the override as max_tokens, but task context checks call the shared helper and still clamp non-reasoning-budget models.

}
}
return {
buildApiHandler: (config: any) => ({
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this test use the real buildApiHandler and mock only external clients instead? Since export() depends on buildApiHandler(config).getModel().info, this fake bypasses the contract the test is trying to cover.


// Standalone max output tokens slider for models that advertise `supportsMaxTokens`
// (e.g. Z.ai GLM) but do not surface the reasoning-budget control.
const maxOutputTokensControl =
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it worth extracting the max-tokens slider markup into a small local render helper? This repeats the same label, Slider, value display, and modelMaxTokens wiring used in the reasoning-budget branch below.


import { ProviderSettingsManager, ProviderProfiles, SyncCloudProfilesResult } from "../ProviderSettingsManager"

// `export()` builds an API handler per profile to read model capabilities. Mock
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this comment be trimmed to the durable reason for the mock? The PR/debug-history details make the setup harder to scan than a short note about deterministic model capabilities.

…Code-Org#161)

Add a standalone "max output tokens" slider for models that advertise
`supportsMaxTokens` (e.g. Z.ai GLM) but do not surface a reasoning budget, and
send the chosen value to the provider as `max_tokens`.

- ThinkingBudget: extract the max-tokens slider into a small render helper shared
  by the standalone control and the reasoning-budget branch, and surface it for
  binary-reasoning models that also support a configurable max.
- getModelMaxOutputTokens: honor the user's `modelMaxTokens` override for
  `supportsMaxTokens` models (capped at the model ceiling) instead of the 20%
  context-window clamp, so the runtime budget matches what is sent to the provider.
- ProviderSettingsManager.export(): preserve `modelMaxTokens` for configurable-max
  models while dropping it for models that support neither reasoning budgets nor a
  configurable max.
@drvaquera drvaquera force-pushed the feat/161-zai-glm-max-output branch from 710b01b to 8556b61 Compare June 1, 2026 00:19
@proyectoauraorg
Copy link
Copy Markdown
Contributor Author

Thanks! Addressed the review:

  • Extracted a renderMaxTokensSlider helper in ThinkingBudget shared by the standalone control and the reasoning-budget branch, and surfaced {maxOutputTokensControl} in the binary-reasoning branch so a model with both supportsReasoningBinary and supportsMaxTokens still gets the slider.
  • getModelMaxOutputTokens now honors supportsMaxTokens: it returns the user's modelMaxTokens override (capped at the model ceiling) instead of the 20% context-window clamp, so the runtime budget matches what Z.ai receives as max_tokens. Added unit tests.
  • Trimmed the ProviderSettingsManager mock comment to the durable reason.
  • On the real buildApiHandler (vs the mock): I kept the explicit mock — it already drives the filtering off the real @roo-code/types model definitions, keeps this suite isolated from sibling specs that mock ../../../api (the singleFork cross-file leak), and the real handler → getModel().info contract for GLM is now covered directly in zai.spec.ts. Switching the Export tests to the real handler would also build a handler for the retired groq profile case. Happy to switch if you'd prefer — let me know.

Squashed to one commit. tsc, eslint, and the affected suites (api / ProviderSettingsManager / zai / ThinkingBudget — 142 tests) pass locally.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[ENHANCEMENT] Make Z.ai GLM max output configurable in settings

2 participants