Skip to content

Allow autoevals to support both zod 3 and zod 4#155

Merged
Caitlin Pinn (cpinn) merged 37 commits into
mainfrom
caitlin/update-zod4
Jun 8, 2026
Merged

Allow autoevals to support both zod 3 and zod 4#155
Caitlin Pinn (cpinn) merged 37 commits into
mainfrom
caitlin/update-zod4

Conversation

@cpinn

@cpinn Caitlin Pinn (cpinn) commented Dec 23, 2025

Copy link
Copy Markdown
Contributor

Changes

Allow autoevals to install either zod 3 or zod 4.

The typescript sdk was updated to allow zod to be a peer dependency in order to work with either zod 3 or zod 4.

This PR makes a similar update to the autoevals package and runs a matrix with zod 3 and zod 4 over the existing tests.

Our internal uses of autoevals does not allow for a direct upgrade to zod v4 at this time but the peer dependency should unblock users use of using autoevals with zod 4.

@github-actions

github-actions Bot commented Dec 23, 2025

Copy link
Copy Markdown

Braintrust eval report

Autoevals (caitlin/update-zod4-1780606355)

Score Average Improvements Regressions
NumericDiff 78.1% (-1pp) 8 🟢 10 🔴
Time_to_first_token 9.97tok (+1.39tok) 110 🟢 109 🔴
Llm_calls 1.55 (+0) - -
Tool_calls 0 (+0) - -
Errors 0 (+0) - -
Llm_errors 0 (+0) - -
Tool_errors 0 (+0) - -
Prompt_tokens 528.42tok (-2.58tok) 1 🟢 -
Prompt_cached_tokens 0tok (+0tok) - -
Prompt_cache_creation_tokens 0tok (+0tok) - -
Prompt_cache_creation_5m_tokens 0tok (+0tok) - -
Prompt_cache_creation_1h_tokens 0tok (+0tok) - -
Completion_tokens 472.62tok (-3.23tok) 114 🟢 89 🔴
Completion_reasoning_tokens 360.15tok (-4.98tok) 92 🟢 73 🔴
Completion_accepted_prediction_tokens 0tok (+0tok) - -
Completion_rejected_prediction_tokens 0tok (+0tok) - -
Completion_audio_tokens 0tok (+0tok) - -
Total_tokens 1001.04tok (-5.81tok) 115 🟢 88 🔴
Estimated_cost 0$ (0$) 102 🟢 78 🔴
Duration 9.97s (+1.39s) 110 🟢 109 🔴
Llm_duration 10.68s (+1.41s) 110 🟢 109 🔴

Caitlin Pinn (cpinn) and others added 6 commits December 23, 2025 16:24
Use native toJSONSchema() method from Zod v4 instead of relying on
zod-to-json-schema library which is not compatible with Zod v4.

Fixes "Invalid schema for function" errors where schemas had
'type: "None"' instead of 'type: "object"'.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- Replace direct zodToJsonSchema call with schemaToJson helper for
  classify_statements function to properly use Zod v4's native
  toJSONSchema() method
- Format JSON dataset files and pnpm-lock.yaml with prettier

This completes the Zod v4 compatibility fixes for OpenAI function
calling schemas.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
@cpinn Caitlin Pinn (cpinn) marked this pull request as ready for review December 29, 2025 16:37
@cpinn Caitlin Pinn (cpinn) marked this pull request as draft December 29, 2025 16:38
Comment thread .github/workflows/python.yaml Outdated
runs-on: ubuntu-latest
strategy:
matrix:
python-version: ["3.8", "3.9", "3.10", "3.11", "3.12"]

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue with sdk tuple behavior in 3.8, 3.8 has been end of life since 2024

@cpinn Caitlin Pinn (cpinn) changed the title update to zod 4 update to zod 4, stay on zod /v3 for now Dec 29, 2025
Upgrade to zod 4.2.1 in preparation of zod 4+ migration.
Export from zod/v3 until everything is ready in the braintrust backend
Added Zod as a peer dependency accepting both v3 and v4 (^3.0.0 || ^4.0.0).
This ensures consumers have a compatible Zod version installed while allowing
flexibility for projects using either Zod 3 or 4.

Zod remains in dependencies for build/test purposes, but declaring it as a
peer dependency prevents version conflicts when autoevals is used in projects
with their own Zod version.
@cpinn Caitlin Pinn (cpinn) changed the title update to zod 4, stay on zod /v3 for now update to zod 4, stay on zod /v3 for compatibility Dec 29, 2025
@cpinn Caitlin Pinn (cpinn) marked this pull request as ready for review December 29, 2025 21:16
Comment thread package.json
"openai": "^6.3.0",
"zod": "^3.25.76",
"zod-to-json-schema": "^3.24.6"
"openai": "^6.7.0",

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

6.7.0 version is necessary in order to properly support zod 4 with fallbacks to zod 3

@cpinn Caitlin Pinn (cpinn) marked this pull request as draft January 13, 2026 02:27
@cpinn

Copy link
Copy Markdown
Contributor Author

Sadly this change is still failing some internal integration tests and I hadn't been able to figure out why yet.

There is still a lot more to be done on the overall zod upgrade.

@cpinn Caitlin Pinn (cpinn) changed the title Upgrade autoevals to zod 4 Make zod a peer dependency in the autoevals sdk Jan 13, 2026
@cpinn Caitlin Pinn (cpinn) marked this pull request as ready for review June 4, 2026 19:45
@cpinn Caitlin Pinn (cpinn) changed the title Make zod a peer dependency in the autoevals sdk Allow autoeval to support both zod 3 and zod 4 Jun 4, 2026
@cpinn Caitlin Pinn (cpinn) changed the title Allow autoeval to support both zod 3 and zod 4 Allow autoevals to support both zod 3 and zod 4 Jun 4, 2026
@github-actions

github-actions Bot commented Jun 8, 2026

Copy link
Copy Markdown

Braintrust eval report

Autoevals (HEAD-1780948726)

Score Average Improvements Regressions
NumericDiff 79.7% (+2pp) 8 🟢 5 🔴
Time_to_first_token 10.94tok (+0.97tok) 44 🟢 175 🔴
Llm_calls 1.55 (+0) - -
Tool_calls 0 (+0) - -
Errors 0 (+0) - -
Llm_errors 0 (+0) - -
Tool_errors 0 (+0) - -
Prompt_tokens 528.42tok (+0tok) - -
Prompt_cached_tokens 0tok (+0tok) - -
Prompt_cache_creation_tokens 0tok (+0tok) - -
Prompt_cache_creation_5m_tokens 0tok (+0tok) - -
Prompt_cache_creation_1h_tokens 0tok (+0tok) - -
Completion_tokens 467.6tok (-5.02tok) 104 🟢 102 🔴
Completion_reasoning_tokens 356.65tok (-3.49tok) 90 🟢 86 🔴
Completion_accepted_prediction_tokens 0tok (+0tok) - -
Completion_rejected_prediction_tokens 0tok (+0tok) - -
Completion_audio_tokens 0tok (+0tok) - -
Total_tokens 996.02tok (-5.02tok) 104 🟢 102 🔴
Estimated_cost 0$ (0$) 90 🟢 93 🔴
Duration 10.94s (+0.97s) 44 🟢 175 🔴
Llm_duration 11.67s (+0.98s) 44 🟢 175 🔴

@cpinn Caitlin Pinn (cpinn) merged commit 9eba0fe into main Jun 8, 2026
15 checks passed
@github-actions

github-actions Bot commented Jun 8, 2026

Copy link
Copy Markdown

Braintrust eval report

Autoevals (main-1780952944)

Score Average Improvements Regressions
NumericDiff 79.7% (0pp) 7 🟢 3 🔴
Time_to_first_token 10.55tok (-0.39tok) 111 🟢 108 🔴
Llm_calls 1.55 (+0) - -
Tool_calls 0 (+0) - -
Errors 0 (+0) - -
Llm_errors 0 (+0) - -
Tool_errors 0 (+0) - -
Prompt_tokens 528.42tok (+0tok) - -
Prompt_cached_tokens 0tok (+0tok) - -
Prompt_cache_creation_tokens 0tok (+0tok) - -
Prompt_cache_creation_5m_tokens 0tok (+0tok) - -
Prompt_cache_creation_1h_tokens 0tok (+0tok) - -
Completion_tokens 480.89tok (+13.29tok) 90 🟢 107 🔴
Completion_reasoning_tokens 368tok (+11.35tok) 77 🟢 88 🔴
Completion_accepted_prediction_tokens 0tok (+0tok) - -
Completion_rejected_prediction_tokens 0tok (+0tok) - -
Completion_audio_tokens 0tok (+0tok) - -
Total_tokens 1009.31tok (+13.29tok) 90 🟢 107 🔴
Estimated_cost 0$ (+0$) 78 🟢 97 🔴
Duration 10.55s (-0.39s) 111 🟢 108 🔴
Llm_duration 11.25s (-0.42s) 116 🟢 103 🔴

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants