feat(robot): OpenPI integration, batched inference, H.264 traces, and cloud runtimes by lukass16 · Pull Request #424 · hud-evals/hud-python

lukass16 · 2026-06-17T06:18:31Z

Issue

v6 had no first-class way to run VLA policies against robot sims: no low-latency obs→action loop, no OpenPI/LeRobot compatibility, no batched GPU inference across episodes, and no cloud placement beyond Docker.

Solution

Adds the robot capability (env serves frames over WebSocket; agent runs the policy) and the v6-robot-3 harness improvements:

RobotAgent harness — observe → infer → act loop with LeRobotModel/LeRobotAdapter; integrates with v6 Task/Taskset/Job
RemoteModel + OpenPIAdapter — drive rollouts from a stock OpenPI WebSocket policy server; slash-delimited observation/... keys end-to-end
BatchedAgent/BatchedModel — batch concurrent ainfer() calls into one GPU forward; isolated episode state per rollout
H.264 video traces — per-camera CMAF streaming instead of per-tick JPEGs (av added to [robot] extra)
ModalRuntime / DaytonaRuntime — per-rollout cloud sandboxes as Providers alongside DockerRuntime ([modal], [daytona] extras)
Misc — connect ready_timeout 120s→240s for slow env boots; env name from Environment(...) declaration; rubric grader + Windows local fixes; bump to 0.6.0

Outcome / Verification

LIBERO + pi0.5 runs via the robot-benchmark cookbook
OpenPI policy server works with RemoteModel + OpenPIAdapter, no custom agent code
Concurrent rollouts batch inference correctly; camera frames show as video segments in traces
Modal/Daytona create and tear down one sandbox per rollout

pip install -e ".[robot,dev]"
pytest hud/agents/robot/ hud/capabilities/ -q

<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> **Medium Risk**
> Large public docs and URL redirect changes affect every external link; CI no longer runs browser/Playwright setup locally. Robot/cloud runtime paths (if shipped with this PR) add new optional deps and per-rollout sandbox behavior worth validating before release.
> 
> **Overview**
> **Docs & DX (bulk of the diff):** Mintlify is restructured around **v6** (new quickstart, reference, run guides, cookbooks, migration page) with **v5** kept as a legacy tab and URL redirects from old paths. The README is rewritten around the v6 protocol (manifest → `tasks.start` → capabilities → `tasks.grade`), `@env.template()`, deploy/eval flow, and GRPO-oriented training. New **`AGENTS.md`** (and `CLAUDE.md` pointer) codifies repo map and quality bar; **`docs/skill.md`** adds an agent skill for v6 env building and task-signal doctrine. Platform doc links are updated to `/v5/...` or `/v6/...`; old standalone cookbook MDX (codex, opencode, ops) is removed in favor of v6 cookbook pages. **`docs/custom.css`** and **`docs.json`** switch theme/fonts/styling toward HUD marketing look.
> 
> **Cookbooks:** **`cookbooks/a2a-chat`** and **`cookbooks/codex-coding`** are added as standalone uv projects (v6 `Chat`, `@env.template()`, `LocalRuntime`, A2A `server.py`). Harbor integration doc moves under **`docs/v6/advanced/harbor-convert.mdx`**.
> 
> **CI & contrib:** GitHub Actions drops **Xvfb / Playwright install** and runs **`pytest` without `--rootdir=hud`**; **`.githooks/pre-push` is deleted** (CONTRIBUTING still mentions enabling `.githooks`). **`.gitignore`** expands for local dev artifacts. Minor **`CONTRIBUTING.md`** test command alignment.
> 
> **Per PR scope (not in this diff excerpt):** **`0.6.0`** robot/VLA work—`RobotAgent` loop, LeRobot/OpenPI (`RemoteModel`, `OpenPIAdapter`), **`BatchedAgent`** GPU batching, H.264 trace video, **`ModalRuntime` / `DaytonaRuntime`**, longer env connect timeout—documented in the robot-benchmark v6 cookbook and PR verification steps.
> 
> <sup>Reviewed by [Cursor Bugbot](https://cursor.com/bugbot) for commit 78d91fcf0068a7f4815c4cd91e7e5ba3c1503bed. Bugbot is set up for automated code reviews on this repo. Configure [here](https://www.cursor.com/dashboard/bugbot).</sup>
<!-- /CURSOR_SUMMARY -->

[codex] drop v4 task compatibility

Decouple agent native tools from environment primitives

# Conflicts: # docs/reference/agents.mdx # hud/environment/environment.py # hud/environment/tests/test_environment.py # hud/tools/computer/base.py # hud/tools/computer/gemini.py # hud/tools/executors/xdo.py # hud/tools/tests/test_computer.py

Refactor Agents

…6-env

Robot capability: environment.robots, episode recorder, telemetry, ensembler

Rename Robot Capability + Add MainThreadSimRunner

…arching global ast

Resolve registry name from the served Environment, not a source scan

Docker for slow envs like Isaac Sim publishes the port before @env.initialize finishes, so hello retries can exceed 120s on slow container boots.

Add a weightless Model that queries a remote policy server over the OpenPI msgpack/WebSocket protocol: the adapter builds the request dict, the server owns all pre/post-processing + the forward, and infer() ships it and returns the [T, A] chunk. connect() is lazy and idempotent (blocks until the server is up); response_key covers "actions" (stock OpenPI) vs "action" (Cosmos).

…erence BatchedModel wraps any Model and coalesces concurrent ainfer() calls into a single stacked forward: a lazily-started worker drains up to batch_size queued calls (or flushes after max_wait_s for the suite tail), runs one inner.infer, and scatters the [N, T, A] rows back to each caller. BatchedAgent wraps a RobotAgent and shallow-clones it per run so each rollout keeps isolated episode state while sharing the one batched model. Usage stays a one-liner: BatchedAgent(agent, batch_size=8) with max_concurrent set to match.

Migrate the robot harness to OpenPI-standard, slash-delimited observation keys end-to-end, and add a thin OpenPIAdapter so a generic OpenPI policy server drives the harness with no agent code changes.

Replace per-tick JPEG observation images with per-camera H.264/CMAF video streaming for robot traces: - Add hud/agents/robot/video.py (SegmentEncoder/VideoStreamer): encode each camera on a background thread, emitting CMAF fragments as VideoSegmentStep spans without blocking the act loop. - RobotAgent starts/finalizes the streamer at the env control rate; finalize in `finally` so a crashed run still leaves video. - ObservationStep.from_obs records only numeric state now; camera frames travel as video. - Step.emit accepts an explicit trace_id so the encoder thread (no contextvars trace context) attributes spans correctly. - Add RobotClient.get_control_rate(); add "video_segment" RobotStepSource; add PyAV (av>=12) to the robot extra.

Add ModalRuntime as a Provider alongside DockerRuntime: resolve image once (from_name or lazy build), create an isolated Sandbox per rollout, expose the env control channel over raw TCP, terminate on exit. Export from hud.eval and add optional [modal] extra.

…oxes Add DaytonaRuntime as a Provider alongside ModalRuntime: resolve snapshot once (build from image if missing), create an isolated sandbox per rollout, start the env server in a background session, reach it via an asyncssh local-forward (Daytona exposes only HTTPS previews, connect dials tcp://), delete on exit. workdir defaults to /app to match the scaffolded Dockerfile.hud. Export from hud.eval and add optional [daytona] extra.

…nto v6-robot-3

mintlify · 2026-06-17T06:18:41Z

Preview deployment for your docs. Learn more about Mintlify Previews.

Project	Status	Preview	Updated (UTC)
hud	🟢 Ready	View Preview	Jun 17, 2026, 6:19 AM

cursor

Cursor Bugbot has reviewed your changes and found 3 potential issues.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit 78d91fc. Configure here.}

cursor · 2026-06-17T06:21:30Z

+                arr = await asyncio.to_thread(self.inner.infer, stacked)  # [N, T, A]
+                for (_, fut), chunk in zip(items, arr, strict=True):
+                    if not fut.done():
+                        fut.set_result(chunk)


Remote batching breaks OpenPI

High Severity

When BatchedModel coalesces multiple concurrent ainfer calls, it runs one inner.infer and splits the result with zip(items, arr, strict=True). RemoteModel.infer always performs a single WebSocket request and returns a [1, T, A] array, not one row per queued rollout. With two or more live rollouts, splitting fails or assigns the wrong chunk, breaking batched OpenPI concurrent inference.

Additional Locations (1)

hud/agents/robot/model.py#L98-L103

^{Reviewed by Cursor Bugbot for commit 78d91fc. Configure here.}

cursor · 2026-06-17T06:21:30Z

@@ -0,0 +1,3 @@
+{
+  "tasksetId": "de5f3062-2587-4b33-a547-27995df213bd"
+}


Committed local taskset config

Medium Severity

A new .hud/config.json stores a specific tasksetId UUID. That file is meant for local hud sync state, but it is not gitignored, so every clone inherits one developer's platform taskset and may sync tasks to the wrong remote target.

^{Reviewed by Cursor Bugbot for commit 78d91fc. Configure here.}

cursor · 2026-06-17T06:21:30Z

+            "task": prompt,
+        }
+        for model_key, env_key in zip(self.model_image_keys, self.image_keys, strict=False):
+            batch[model_key] = torch_mod.from_numpy(data[env_key]).permute(2, 0, 1).float() / 255.0


LeRobot adapter missing state

Medium Severity

LeRobotAdapter.adapt_observation indexes data[self.state_key] without checking that bind found a state feature. Environments with only camera observations leave state_key as None, causing a TypeError or KeyError on the first inference step instead of a clear error or optional state handling.

^{Reviewed by Cursor Bugbot for commit 78d91fc. Configure here.}

jdchawla29 and others added 30 commits April 27, 2026 16:07

drop v4 task compatibility

d2e8a8d

Merge pull request #403 from hud-evals/codex/drop-v4-support

2e937d4

[codex] drop v4 task compatibility

Align docs with v4 support removal

4f37307

Fix public docs SDK imports

0f19561

v5 regression tests

66afab0

Decouple agent native tools from environment primitives

a2bb01c

tool updates

63165d0

Merge pull request #407 from hud-evals/decouple-agent-tools

7bfbdc6

Decouple agent native tools from environment primitives

small gitignore

a43c5c0

refactor OpenAIChatAgent into openai_compatible package

eeef96f

agent updates

9366a1a

Merge pull request #413 from hud-evals/j/agent-updates

18306c5

Refactor Agents

add AGENTS.md

2330b9e

add init env

9442766

Merge branch 'v6' of https://github.com/hud-evals/hud-python into l/v…

1576dee

…6-env

simplify fx

78f5461

fx

0c84a19

Update .gitignore

9d7696f

Isolate agent run state

c8d3a1b

add more testing guideliens to AGENTS.md

89c3138

fix imports

4f494b0

simplify tool name handling

93ce003

agent context with top-level system prompt and citation options

70de8c7

tests updated

f92e707

restructure + claude [in progress, openai/gemini not done]

e1d420c

rfb + runnable test [in progress}

e285d66

refactor openai + gemini

beecc36

fx

8181d2e

imp and warmup

f33c7ee

lorenss-m and others added 26 commits June 13, 2026 17:49

fxs

e34335c

Merge pull request #419 from hud-evals/v6-robot

9f67834

Robot capability: environment.robots, episode recorder, telemetry, ensembler

thread runner add

5962b07

capability rename

bc06c18

small tweak in proc + flush line

57aceb5

Merge pull request #420 from hud-evals/v6-robot-2

b451efd

Rename Robot Capability + Add MainThreadSimRunner

linter

1aa4e17

improve telem exporter

4925ec9

docs fixes

68007e6

fix rubric based grader and windows local, add convenience imports

39970b0

local teleme export + windows local test

48309ff

env var merge and proper win support

d7f6cc5

upgrade settings links

1f449da

fix: env name resolution now uses env.py declared name, instead of se…

a4a78c7

…arching global ast

Merge pull request #422 from hud-evals/asa/environment-name-fix

b72a944

Resolve registry name from the served Environment, not a source scan

improve local observability

88ba14d

add better remote guidance, docs and bump version

704bca4

small adjustments

c673f40

fix(clients): raise connect ready_timeout default to 240s

27fa8fd

Docker for slow envs like Isaac Sim publishes the port before @env.initialize finishes, so hello retries can exceed 120s on slow container boots.

feat(robot): adopt OpenPI wire-key convention + OpenPIAdapter

c001f8e

Migrate the robot harness to OpenPI-standard, slash-delimited observation keys end-to-end, and add a thin OpenPIAdapter so a generic OpenPI policy server drives the harness with no agent code changes.

Merge remote-tracking branch 'origin/lukass/modal-daytona-runtimes' i…

78d91fc

…nto v6-robot-3

mintlify Bot deployed to staging - docs June 17, 2026 06:19 View deployment

lukass16 closed this Jun 17, 2026

cursor Bot reviewed Jun 17, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(robot): OpenPI integration, batched inference, H.264 traces, and cloud runtimes#424

feat(robot): OpenPI integration, batched inference, H.264 traces, and cloud runtimes#424
lukass16 wants to merge 156 commits into
mainfrom
v6-robot-3

lukass16 commented Jun 17, 2026 •

edited by cursor Bot

Loading

Uh oh!

mintlify Bot commented Jun 17, 2026 •

edited

Loading

Uh oh!

cursor Bot left a comment

Uh oh!

cursor Bot Jun 17, 2026

Uh oh!

cursor Bot Jun 17, 2026

Uh oh!

cursor Bot Jun 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

lukass16 commented Jun 17, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Issue

Solution

Outcome / Verification

Uh oh!

mintlify Bot commented Jun 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor Bot Jun 17, 2026

Choose a reason for hiding this comment

Remote batching breaks OpenPI

Uh oh!

cursor Bot Jun 17, 2026

Choose a reason for hiding this comment

Committed local taskset config

Uh oh!

cursor Bot Jun 17, 2026

Choose a reason for hiding this comment

LeRobot adapter missing state

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

lukass16 commented Jun 17, 2026 •

edited by cursor Bot

Loading

mintlify Bot commented Jun 17, 2026 •

edited

Loading