Feat: hud-python sdk v6 by Parth220 · Pull Request #421 · hud-evals/hud-python

Parth220 · 2026-06-15T18:30:29Z

Note

High Risk
This is a major SDK and protocol shift (v5 agents cannot drive v6-served environments) plus CI test setup changes that drop browser/Playwright provisioning, which can hide regressions in computer-use paths if those tests still exist.

Overview
This PR ships HUD Python SDK v6 as the primary surface: environments expose a thin control channel with capabilities (ssh, mcp, cdp, rfb, robot) and tasks (@env.template() generators), while agent harnesses own the tools. User-facing narrative moves from v5 scenarios/MCP tools to protocol-first manifest → tasks.start → tasks.grade, with Task.run(agent) returning a Job/Run instead of hud.eval() / env("scenario", ...).

Documentation is restructured on Mintlify: default v6 nav (docs/v6/), v5 tagged Legacy under docs/v5/, redirects from old paths, new Migrate to v6 guide, agent skill doc, and refreshed site styling (docs.json, custom.css). Several long-form cookbooks are removed from the old tree and replaced or relocated (e.g. v6 coding-agent, ops-diagnostics, a2a-chat, robot-benchmark).

Runnable examples land under cookbooks/ (A2A chat server moved out of the SDK as reference code; codex-style agent; v6 chat_env using EvaluationResult and templates). README and CONTRIBUTING are rewritten for v6 workflows (hud init, hud deploy, hud eval without --rootdir=hud).

CI/dev ergonomics: GitHub Actions drops Xvfb/Playwright install from the test matrix; .githooks/pre-push is removed. .gitignore expands for local/experimental dirs. Adds AGENTS.md (and CLAUDE.md pointer) for contributor/agent guidance.

^{Reviewed by Cursor Bugbot for commit c673f40. Bugbot is set up for automated code reviews on this repo. Configure here.}

[codex] drop v4 task compatibility

Decouple agent native tools from environment primitives

# Conflicts: # docs/reference/agents.mdx # hud/environment/environment.py # hud/environment/tests/test_environment.py # hud/tools/computer/base.py # hud/tools/computer/gemini.py # hud/tools/executors/xdo.py # hud/tools/tests/test_computer.py

Refactor Agents

…6-env

…into v6-robot

Robot capability: environment.robots, episode recorder, telemetry, ensembler

Rename Robot Capability + Add MainThreadSimRunner

…arching global ast

Resolve registry name from the served Environment, not a source scan

cursor · 2026-06-16T06:12:22Z

+from hud.eval import Taskset, group_relative
+
+agent = create_agent("claude-sonnet-4-5")
+job = await Taskset(count_letter(word=w) for w in words).run(agent, group=16)


Taskset constructor misused

Medium Severity

The training example calls Taskset(count_letter(word=w) for w in words), which passes the generator as the taskset name, leaving tasks empty. .run(agent, group=16) then schedules no tasks, so the GRPO snippet does nothing useful.

^{Reviewed by Cursor Bugbot for commit 88ba14d. Configure here.}

cursor · 2026-06-16T06:12:22Z

+@env.template()
+async def count_letter(word: str = "strawberry", letter: str = "r"):
+    answer = yield f"How many '{letter}'s are in '{word}'? Reply with just the number."
+    yield 1.0 if answer and str(word.count(letter)) in answer else 0.0


Letter count case mismatch

Low Severity

The sample grader uses case-sensitive word.count(letter) while checking whether that count appears in the agent answer. Mixed-case inputs (e.g. "Strawberry" / "r") can score 0.0 even when the answer is correct, unlike the prior lowercased logic.

^{Reviewed by Cursor Bugbot for commit 88ba14d. Configure here.}

cursor

Cursor Bugbot has reviewed your changes and found 2 potential issues.

There are 4 total unresolved issues (including 2 from previous reviews).

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit 704bca4. Configure here.}

jdchawla29 and others added 30 commits April 27, 2026 16:07

drop v4 task compatibility

d2e8a8d

Merge pull request #403 from hud-evals/codex/drop-v4-support

2e937d4

[codex] drop v4 task compatibility

Align docs with v4 support removal

4f37307

Fix public docs SDK imports

0f19561

v5 regression tests

66afab0

Decouple agent native tools from environment primitives

a2bb01c

tool updates

63165d0

Merge pull request #407 from hud-evals/decouple-agent-tools

7bfbdc6

Decouple agent native tools from environment primitives

small gitignore

a43c5c0

refactor OpenAIChatAgent into openai_compatible package

eeef96f

agent updates

9366a1a

Merge pull request #413 from hud-evals/j/agent-updates

18306c5

Refactor Agents

add AGENTS.md

2330b9e

add init env

9442766

Merge branch 'v6' of https://github.com/hud-evals/hud-python into l/v…

1576dee

…6-env

simplify fx

78f5461

fx

0c84a19

Update .gitignore

9d7696f

Isolate agent run state

c8d3a1b

add more testing guideliens to AGENTS.md

89c3138

fix imports

4f494b0

simplify tool name handling

93ce003

agent context with top-level system prompt and citation options

70de8c7

tests updated

f92e707

restructure + claude [in progress, openai/gemini not done]

e1d420c

rfb + runnable test [in progress}

e285d66

refactor openai + gemini

beecc36

fx

8181d2e

imp and warmup

f33c7ee

lorenss-m and others added 19 commits June 13, 2026 16:11

refactor and improve docs cadence

c308d1a

update endpoint

68ea5b6

docs

e72a3eb

docs adjustment

2a07225

Merge branch 'v6-l-clean' of https://github.com/hud-evals/hud-python …

a472623

…into v6-robot

align robot and docs, format and fixes

db58f86

fxs

e34335c

Merge pull request #419 from hud-evals/v6-robot

9f67834

Robot capability: environment.robots, episode recorder, telemetry, ensembler

thread runner add

5962b07

capability rename

bc06c18

small tweak in proc + flush line

57aceb5

Merge pull request #420 from hud-evals/v6-robot-2

b451efd

Rename Robot Capability + Add MainThreadSimRunner

linter

1aa4e17

improve telem exporter

4925ec9

docs fixes

68007e6

fix rubric based grader and windows local, add convenience imports

39970b0

local teleme export + windows local test

48309ff

env var merge and proper win support

d7f6cc5

upgrade settings links

1f449da

cursor Bot reviewed Jun 15, 2026

View reviewed changes

Comment thread cookbooks/codex-coding/codex_agent.py

solvemproblr and others added 3 commits June 16, 2026 02:22

fix: env name resolution now uses env.py declared name, instead of se…

a4a78c7

…arching global ast

Merge pull request #422 from hud-evals/asa/environment-name-fix

b72a944

Resolve registry name from the served Environment, not a source scan

improve local observability

88ba14d

mintlify Bot deployed to staging - docs June 16, 2026 06:09 View deployment

cursor Bot reviewed Jun 16, 2026

View reviewed changes

add better remote guidance, docs and bump version

704bca4

mintlify Bot deployed to staging - docs June 17, 2026 00:42 View deployment

cursor Bot reviewed Jun 17, 2026

View reviewed changes

Comment thread .hud/config.json

Comment thread docs/v6/cookbooks/coding-agent.mdx

small adjustments

c673f40

jdchawla29 marked this pull request as draft June 17, 2026 21:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat: hud-python sdk v6#421

Feat: hud-python sdk v6#421
Parth220 wants to merge 148 commits into
mainfrom
v6

Parth220 commented Jun 15, 2026 •

edited by cursor Bot

Loading

Uh oh!

Uh oh!

cursor Bot Jun 16, 2026

Uh oh!

cursor Bot Jun 16, 2026

Uh oh!

cursor Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

Parth220 commented Jun 15, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

cursor Bot Jun 16, 2026

Choose a reason for hiding this comment

Taskset constructor misused

Uh oh!

cursor Bot Jun 16, 2026

Choose a reason for hiding this comment

Letter count case mismatch

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Parth220 commented Jun 15, 2026 •

edited by cursor Bot

Loading