Skip to content

feat(robot): OpenPI integration, batched inference, H.264 traces, and cloud runtimes#424

Closed
lukass16 wants to merge 156 commits into
mainfrom
v6-robot-3
Closed

feat(robot): OpenPI integration, batched inference, H.264 traces, and cloud runtimes#424
lukass16 wants to merge 156 commits into
mainfrom
v6-robot-3

Conversation

@lukass16

@lukass16 lukass16 commented Jun 17, 2026

Copy link
Copy Markdown

Issue

v6 had no first-class way to run VLA policies against robot sims: no low-latency obs→action loop, no OpenPI/LeRobot compatibility, no batched GPU inference across episodes, and no cloud placement beyond Docker.

Solution

Adds the robot capability (env serves frames over WebSocket; agent runs the policy) and the v6-robot-3 harness improvements:

  • RobotAgent harnessobserve → infer → act loop with LeRobotModel/LeRobotAdapter; integrates with v6 Task/Taskset/Job
  • RemoteModel + OpenPIAdapter — drive rollouts from a stock OpenPI WebSocket policy server; slash-delimited observation/... keys end-to-end
  • BatchedAgent/BatchedModel — batch concurrent ainfer() calls into one GPU forward; isolated episode state per rollout
  • H.264 video traces — per-camera CMAF streaming instead of per-tick JPEGs (av added to [robot] extra)
  • ModalRuntime / DaytonaRuntime — per-rollout cloud sandboxes as Providers alongside DockerRuntime ([modal], [daytona] extras)
  • Misc — connect ready_timeout 120s→240s for slow env boots; env name from Environment(...) declaration; rubric grader + Windows local fixes; bump to 0.6.0

Outcome / Verification

  • LIBERO + pi0.5 runs via the robot-benchmark cookbook
  • OpenPI policy server works with RemoteModel + OpenPIAdapter, no custom agent code
  • Concurrent rollouts batch inference correctly; camera frames show as video segments in traces
  • Modal/Daytona create and tear down one sandbox per rollout
pip install -e ".[robot,dev]"
pytest hud/agents/robot/ hud/capabilities/ -q

<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> **Medium Risk**
> Large public docs and URL redirect changes affect every external link; CI no longer runs browser/Playwright setup locally. Robot/cloud runtime paths (if shipped with this PR) add new optional deps and per-rollout sandbox behavior worth validating before release.
> 
> **Overview**
> **Docs & DX (bulk of the diff):** Mintlify is restructured around **v6** (new quickstart, reference, run guides, cookbooks, migration page) with **v5** kept as a legacy tab and URL redirects from old paths. The README is rewritten around the v6 protocol (manifest → `tasks.start` → capabilities → `tasks.grade`), `@env.template()`, deploy/eval flow, and GRPO-oriented training. New **`AGENTS.md`** (and `CLAUDE.md` pointer) codifies repo map and quality bar; **`docs/skill.md`** adds an agent skill for v6 env building and task-signal doctrine. Platform doc links are updated to `/v5/...` or `/v6/...`; old standalone cookbook MDX (codex, opencode, ops) is removed in favor of v6 cookbook pages. **`docs/custom.css`** and **`docs.json`** switch theme/fonts/styling toward HUD marketing look.
> 
> **Cookbooks:** **`cookbooks/a2a-chat`** and **`cookbooks/codex-coding`** are added as standalone uv projects (v6 `Chat`, `@env.template()`, `LocalRuntime`, A2A `server.py`). Harbor integration doc moves under **`docs/v6/advanced/harbor-convert.mdx`**.
> 
> **CI & contrib:** GitHub Actions drops **Xvfb / Playwright install** and runs **`pytest` without `--rootdir=hud`**; **`.githooks/pre-push` is deleted** (CONTRIBUTING still mentions enabling `.githooks`). **`.gitignore`** expands for local dev artifacts. Minor **`CONTRIBUTING.md`** test command alignment.
> 
> **Per PR scope (not in this diff excerpt):** **`0.6.0`** robot/VLA work—`RobotAgent` loop, LeRobot/OpenPI (`RemoteModel`, `OpenPIAdapter`), **`BatchedAgent`** GPU batching, H.264 trace video, **`ModalRuntime` / `DaytonaRuntime`**, longer env connect timeout—documented in the robot-benchmark v6 cookbook and PR verification steps.
> 
> <sup>Reviewed by [Cursor Bugbot](https://cursor.com/bugbot) for commit 78d91fcf0068a7f4815c4cd91e7e5ba3c1503bed. Bugbot is set up for automated code reviews on this repo. Configure [here](https://www.cursor.com/dashboard/bugbot).</sup>
<!-- /CURSOR_SUMMARY -->

jdchawla29 and others added 30 commits April 27, 2026 16:07
Decouple agent native tools from environment primitives
# Conflicts:
#	docs/reference/agents.mdx
#	hud/environment/environment.py
#	hud/environment/tests/test_environment.py
#	hud/tools/computer/base.py
#	hud/tools/computer/gemini.py
#	hud/tools/executors/xdo.py
#	hud/tools/tests/test_computer.py
lorenss-m and others added 26 commits June 13, 2026 17:49
Robot capability: environment.robots, episode recorder, telemetry, ensembler
Rename Robot Capability + Add MainThreadSimRunner
Resolve registry name from the served Environment, not a source scan
Docker for slow envs like Isaac Sim publishes the port before @env.initialize finishes, so hello retries
can exceed 120s on slow container boots.
Add a weightless Model that queries a remote policy server over the OpenPI
msgpack/WebSocket protocol: the adapter builds the request dict, the server
owns all pre/post-processing + the forward, and infer() ships it and returns
the [T, A] chunk. connect() is lazy and idempotent (blocks until the server
is up); response_key covers "actions" (stock OpenPI) vs "action" (Cosmos).
…erence

BatchedModel wraps any Model and coalesces concurrent ainfer() calls into a
single stacked forward: a lazily-started worker drains up to batch_size queued
calls (or flushes after max_wait_s for the suite tail), runs one inner.infer,
and scatters the [N, T, A] rows back to each caller.

BatchedAgent wraps a RobotAgent and shallow-clones it per run so each rollout
keeps isolated episode state while sharing the one batched model. Usage stays a
one-liner: BatchedAgent(agent, batch_size=8) with max_concurrent set to match.
Migrate the robot harness to OpenPI-standard, slash-delimited observation
keys end-to-end, and add a thin OpenPIAdapter so a generic OpenPI policy
server drives the harness with no agent code changes.
Replace per-tick JPEG observation images with per-camera H.264/CMAF video
streaming for robot traces:

- Add hud/agents/robot/video.py (SegmentEncoder/VideoStreamer): encode each
  camera on a background thread, emitting CMAF fragments as VideoSegmentStep
  spans without blocking the act loop.
- RobotAgent starts/finalizes the streamer at the env control rate; finalize
  in `finally` so a crashed run still leaves video.
- ObservationStep.from_obs records only numeric state now; camera frames travel
  as video.
- Step.emit accepts an explicit trace_id so the encoder thread (no contextvars
  trace context) attributes spans correctly.
- Add RobotClient.get_control_rate(); add "video_segment" RobotStepSource;
  add PyAV (av>=12) to the robot extra.
Add ModalRuntime as a Provider alongside DockerRuntime: resolve image once
(from_name or lazy build), create an isolated Sandbox per rollout, expose
the env control channel over raw TCP, terminate on exit. Export from
hud.eval and add optional [modal] extra.
…oxes

Add DaytonaRuntime as a Provider alongside ModalRuntime: resolve snapshot once (build from image if missing), create an isolated sandbox per rollout, start the env server in a background session, reach it via an asyncssh local-forward (Daytona exposes only HTTPS previews, connect dials tcp://), delete on exit. workdir defaults to /app to match the scaffolded Dockerfile.hud. Export from hud.eval and add optional [daytona] extra.
@mintlify

mintlify Bot commented Jun 17, 2026

Copy link
Copy Markdown

Preview deployment for your docs. Learn more about Mintlify Previews.

Project Status Preview Updated (UTC)
hud 🟢 Ready View Preview Jun 17, 2026, 6:19 AM

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 3 potential issues.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 78d91fc. Configure here.

arr = await asyncio.to_thread(self.inner.infer, stacked) # [N, T, A]
for (_, fut), chunk in zip(items, arr, strict=True):
if not fut.done():
fut.set_result(chunk)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remote batching breaks OpenPI

High Severity

When BatchedModel coalesces multiple concurrent ainfer calls, it runs one inner.infer and splits the result with zip(items, arr, strict=True). RemoteModel.infer always performs a single WebSocket request and returns a [1, T, A] array, not one row per queued rollout. With two or more live rollouts, splitting fails or assigns the wrong chunk, breaking batched OpenPI concurrent inference.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 78d91fc. Configure here.

Comment thread .hud/config.json
@@ -0,0 +1,3 @@
{
"tasksetId": "de5f3062-2587-4b33-a547-27995df213bd"
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Committed local taskset config

Medium Severity

A new .hud/config.json stores a specific tasksetId UUID. That file is meant for local hud sync state, but it is not gitignored, so every clone inherits one developer's platform taskset and may sync tasks to the wrong remote target.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 78d91fc. Configure here.

"task": prompt,
}
for model_key, env_key in zip(self.model_image_keys, self.image_keys, strict=False):
batch[model_key] = torch_mod.from_numpy(data[env_key]).permute(2, 0, 1).float() / 255.0

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LeRobot adapter missing state

Medium Severity

LeRobotAdapter.adapt_observation indexes data[self.state_key] without checking that bind found a state feature. Environments with only camera observations leave state_key as None, causing a TypeError or KeyError on the first inference step instead of a clear error or optional state handling.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 78d91fc. Configure here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants