feat(robot): OpenPI integration, batched inference, H.264 traces, and cloud runtimes#424
feat(robot): OpenPI integration, batched inference, H.264 traces, and cloud runtimes#424lukass16 wants to merge 156 commits into
Conversation
[codex] drop v4 task compatibility
Decouple agent native tools from environment primitives
# Conflicts: # docs/reference/agents.mdx # hud/environment/environment.py # hud/environment/tests/test_environment.py # hud/tools/computer/base.py # hud/tools/computer/gemini.py # hud/tools/executors/xdo.py # hud/tools/tests/test_computer.py
Refactor Agents
Robot capability: environment.robots, episode recorder, telemetry, ensembler
Rename Robot Capability + Add MainThreadSimRunner
…arching global ast
Resolve registry name from the served Environment, not a source scan
Docker for slow envs like Isaac Sim publishes the port before @env.initialize finishes, so hello retries can exceed 120s on slow container boots.
Add a weightless Model that queries a remote policy server over the OpenPI msgpack/WebSocket protocol: the adapter builds the request dict, the server owns all pre/post-processing + the forward, and infer() ships it and returns the [T, A] chunk. connect() is lazy and idempotent (blocks until the server is up); response_key covers "actions" (stock OpenPI) vs "action" (Cosmos).
…erence BatchedModel wraps any Model and coalesces concurrent ainfer() calls into a single stacked forward: a lazily-started worker drains up to batch_size queued calls (or flushes after max_wait_s for the suite tail), runs one inner.infer, and scatters the [N, T, A] rows back to each caller. BatchedAgent wraps a RobotAgent and shallow-clones it per run so each rollout keeps isolated episode state while sharing the one batched model. Usage stays a one-liner: BatchedAgent(agent, batch_size=8) with max_concurrent set to match.
Migrate the robot harness to OpenPI-standard, slash-delimited observation keys end-to-end, and add a thin OpenPIAdapter so a generic OpenPI policy server drives the harness with no agent code changes.
Replace per-tick JPEG observation images with per-camera H.264/CMAF video streaming for robot traces: - Add hud/agents/robot/video.py (SegmentEncoder/VideoStreamer): encode each camera on a background thread, emitting CMAF fragments as VideoSegmentStep spans without blocking the act loop. - RobotAgent starts/finalizes the streamer at the env control rate; finalize in `finally` so a crashed run still leaves video. - ObservationStep.from_obs records only numeric state now; camera frames travel as video. - Step.emit accepts an explicit trace_id so the encoder thread (no contextvars trace context) attributes spans correctly. - Add RobotClient.get_control_rate(); add "video_segment" RobotStepSource; add PyAV (av>=12) to the robot extra.
Add ModalRuntime as a Provider alongside DockerRuntime: resolve image once (from_name or lazy build), create an isolated Sandbox per rollout, expose the env control channel over raw TCP, terminate on exit. Export from hud.eval and add optional [modal] extra.
…oxes Add DaytonaRuntime as a Provider alongside ModalRuntime: resolve snapshot once (build from image if missing), create an isolated sandbox per rollout, start the env server in a background session, reach it via an asyncssh local-forward (Daytona exposes only HTTPS previews, connect dials tcp://), delete on exit. workdir defaults to /app to match the scaffolded Dockerfile.hud. Export from hud.eval and add optional [daytona] extra.
|
Preview deployment for your docs. Learn more about Mintlify Previews.
|
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 3 potential issues.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 78d91fc. Configure here.
| arr = await asyncio.to_thread(self.inner.infer, stacked) # [N, T, A] | ||
| for (_, fut), chunk in zip(items, arr, strict=True): | ||
| if not fut.done(): | ||
| fut.set_result(chunk) |
There was a problem hiding this comment.
Remote batching breaks OpenPI
High Severity
When BatchedModel coalesces multiple concurrent ainfer calls, it runs one inner.infer and splits the result with zip(items, arr, strict=True). RemoteModel.infer always performs a single WebSocket request and returns a [1, T, A] array, not one row per queued rollout. With two or more live rollouts, splitting fails or assigns the wrong chunk, breaking batched OpenPI concurrent inference.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit 78d91fc. Configure here.
| @@ -0,0 +1,3 @@ | |||
| { | |||
| "tasksetId": "de5f3062-2587-4b33-a547-27995df213bd" | |||
| } | |||
There was a problem hiding this comment.
Committed local taskset config
Medium Severity
A new .hud/config.json stores a specific tasksetId UUID. That file is meant for local hud sync state, but it is not gitignored, so every clone inherits one developer's platform taskset and may sync tasks to the wrong remote target.
Reviewed by Cursor Bugbot for commit 78d91fc. Configure here.
| "task": prompt, | ||
| } | ||
| for model_key, env_key in zip(self.model_image_keys, self.image_keys, strict=False): | ||
| batch[model_key] = torch_mod.from_numpy(data[env_key]).permute(2, 0, 1).float() / 255.0 |
There was a problem hiding this comment.
LeRobot adapter missing state
Medium Severity
LeRobotAdapter.adapt_observation indexes data[self.state_key] without checking that bind found a state feature. Environments with only camera observations leave state_key as None, causing a TypeError or KeyError on the first inference step instead of a clear error or optional state handling.
Reviewed by Cursor Bugbot for commit 78d91fc. Configure here.


Issue
v6 had no first-class way to run VLA policies against robot sims: no low-latency obs→action loop, no OpenPI/LeRobot compatibility, no batched GPU inference across episodes, and no cloud placement beyond Docker.
Solution
Adds the robot capability (env serves frames over WebSocket; agent runs the policy) and the v6-robot-3 harness improvements:
RobotAgentharness —observe → infer → actloop withLeRobotModel/LeRobotAdapter; integrates with v6Task/Taskset/JobRemoteModel+OpenPIAdapter— drive rollouts from a stock OpenPI WebSocket policy server; slash-delimitedobservation/...keys end-to-endBatchedAgent/BatchedModel— batch concurrentainfer()calls into one GPU forward; isolated episode state per rolloutavadded to[robot]extra)ModalRuntime/DaytonaRuntime— per-rollout cloud sandboxes asProviders alongsideDockerRuntime([modal],[daytona]extras)ready_timeout120s→240s for slow env boots; env name fromEnvironment(...)declaration; rubric grader + Windows local fixes; bump to 0.6.0Outcome / Verification
RemoteModel+OpenPIAdapter, no custom agent code