Skip to content

Feat: hud-python sdk v6#421

Draft
Parth220 wants to merge 148 commits into
mainfrom
v6
Draft

Feat: hud-python sdk v6#421
Parth220 wants to merge 148 commits into
mainfrom
v6

Conversation

@Parth220

@Parth220 Parth220 commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

Note

High Risk
This is a major SDK and protocol shift (v5 agents cannot drive v6-served environments) plus CI test setup changes that drop browser/Playwright provisioning, which can hide regressions in computer-use paths if those tests still exist.

Overview
This PR ships HUD Python SDK v6 as the primary surface: environments expose a thin control channel with capabilities (ssh, mcp, cdp, rfb, robot) and tasks (@env.template() generators), while agent harnesses own the tools. User-facing narrative moves from v5 scenarios/MCP tools to protocol-first manifest → tasks.start → tasks.grade, with Task.run(agent) returning a Job/Run instead of hud.eval() / env("scenario", ...).

Documentation is restructured on Mintlify: default v6 nav (docs/v6/), v5 tagged Legacy under docs/v5/, redirects from old paths, new Migrate to v6 guide, agent skill doc, and refreshed site styling (docs.json, custom.css). Several long-form cookbooks are removed from the old tree and replaced or relocated (e.g. v6 coding-agent, ops-diagnostics, a2a-chat, robot-benchmark).

Runnable examples land under cookbooks/ (A2A chat server moved out of the SDK as reference code; codex-style agent; v6 chat_env using EvaluationResult and templates). README and CONTRIBUTING are rewritten for v6 workflows (hud init, hud deploy, hud eval without --rootdir=hud).

CI/dev ergonomics: GitHub Actions drops Xvfb/Playwright install from the test matrix; .githooks/pre-push is removed. .gitignore expands for local/experimental dirs. Adds AGENTS.md (and CLAUDE.md pointer) for contributor/agent guidance.

Reviewed by Cursor Bugbot for commit c673f40. Bugbot is set up for automated code reviews on this repo. Configure here.

jdchawla29 and others added 30 commits April 27, 2026 16:07
Decouple agent native tools from environment primitives
# Conflicts:
#	docs/reference/agents.mdx
#	hud/environment/environment.py
#	hud/environment/tests/test_environment.py
#	hud/tools/computer/base.py
#	hud/tools/computer/gemini.py
#	hud/tools/executors/xdo.py
#	hud/tools/tests/test_computer.py
Comment thread cookbooks/codex-coding/codex_agent.py
Comment thread README.md
from hud.eval import Taskset, group_relative

agent = create_agent("claude-sonnet-4-5")
job = await Taskset(count_letter(word=w) for w in words).run(agent, group=16)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taskset constructor misused

Medium Severity

The training example calls Taskset(count_letter(word=w) for w in words), which passes the generator as the taskset name, leaving tasks empty. .run(agent, group=16) then schedules no tasks, so the GRPO snippet does nothing useful.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 88ba14d. Configure here.

Comment thread README.md
@env.template()
async def count_letter(word: str = "strawberry", letter: str = "r"):
answer = yield f"How many '{letter}'s are in '{word}'? Reply with just the number."
yield 1.0 if answer and str(word.count(letter)) in answer else 0.0

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Letter count case mismatch

Low Severity

The sample grader uses case-sensitive word.count(letter) while checking whether that count appears in the agent answer. Mixed-case inputs (e.g. "Strawberry" / "r") can score 0.0 even when the answer is correct, unlike the prior lowercased logic.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 88ba14d. Configure here.

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

There are 4 total unresolved issues (including 2 from previous reviews).

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 704bca4. Configure here.

Comment thread .hud/config.json
Comment thread docs/v6/cookbooks/coding-agent.mdx
@jdchawla29 jdchawla29 marked this pull request as draft June 17, 2026 21:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants