Skip to content

feat(eval): add Modal and Daytona runtime providers for per-rollout cloud sandboxes#423

Open
lukass16 wants to merge 8 commits into
v6from
lukass/modal-daytona-runtimes
Open

feat(eval): add Modal and Daytona runtime providers for per-rollout cloud sandboxes#423
lukass16 wants to merge 8 commits into
v6from
lukass/modal-daytona-runtimes

Conversation

@lukass16

@lukass16 lukass16 commented Jun 17, 2026

Copy link
Copy Markdown

Issue

The engine could place rollouts locally, in Docker, on a borrowed substrate, or
HUD-hosted — but not on on-demand cloud sandboxes. We want isolated, parallel
cloud envs (Modal, Daytona) per rollout.

Solution

Two new Providers in hud/eval/runtime.py, same shape as DockerRuntime
(acquire → yield Runtime → tear down), so rollout()/connect()/scheduler are
unchanged:

  • ModalRuntimeSandbox.create per rollout from a pre-built image, control
    channel over raw TCP (unencrypted_ports), terminate on exit. Image resolves once
    behind a lock (from_name, or lazy image= build) so concurrent rollouts can't
    race a build.
  • DaytonaRuntime — sandbox from a snapshot (built once from image= if
    missing), env server in a background session, reached via an asyncssh
    local-forward (Daytona exposes only HTTPS previews; connect() dials tcp://).
    SSH token is internal; users only need DAYTONA_API_KEY. workdir defaults to
    /app (scaffold WORKDIR).

Single user handle is the image/snapshot name. Both exported from hud.eval, gated
behind optional [modal]/[daytona] extras. Adds modal_deploy.py to build+publish
the libero image.

Outcome / Verification

  • Drops in via Taskset.run(runtime=...); no engine/client/protocol changes.
  • Lint clean; new deps are optional extras.
  • Follow-ups: --runtime modal|daytona CLI flag, ws:// transport (drop the SSH
    hop), warm-pool to amortize cold start.

Note

Medium Risk
New cloud provisioning paths depend on external APIs, credentials, and SSH tunneling (Daytona disables host key verification); failures could leak resources or block parallel rollouts, but the existing rollout/connect contract is unchanged.

Overview
Adds Modal and Daytona as eval placement providers alongside local/Docker/HUD-hosted, so Task.run / Taskset.run(..., runtime=...) can spin up one isolated cloud sandbox per rollout without changing rollout() or connect().

ModalRuntime creates a Modal Sandbox from a named image (or a one-time lazy image= build behind a lock), serves hud serve env.py on a tunneled raw TCP port, and terminates the sandbox on exit. DaytonaRuntime creates an ephemeral sandbox from a snapshot (optionally creating the snapshot from image= if missing), starts the env server in a background session, and exposes the control channel via asyncssh local port-forward to loopback because Daytona only offers HTTPS previews. Both yield a Runtime with provider metadata for teardown.

ModalRuntime and DaytonaRuntime are exported from hud.eval; optional [modal] and [daytona] extras declare the new dependencies. Smaller fixes: Environment registers capabilities after _started / _hooks_done are initialized; Windows CLI re-wraps stdout/stderr as UTF-8 for Rich; trivial formatting in the Claude SDK agent.

Reviewed by Cursor Bugbot for commit 4a31f50. Bugbot is set up for automated code reviews on this repo. Configure here.

lukass16 added 2 commits June 17, 2026 05:08
Add ModalRuntime as a Provider alongside DockerRuntime: resolve image once
(from_name or lazy build), create an isolated Sandbox per rollout, expose
the env control channel over raw TCP, terminate on exit. Export from
hud.eval and add optional [modal] extra.
…oxes

Add DaytonaRuntime as a Provider alongside ModalRuntime: resolve snapshot once (build from image if missing), create an isolated sandbox per rollout, start the env server in a background session, reach it via an asyncssh local-forward (Daytona exposes only HTTPS previews, connect dials tcp://), delete on exit. workdir defaults to /app to match the scaffolded Dockerfile.hud. Export from hud.eval and add optional [daytona] extra.
Comment thread hud/eval/runtime.py
lukass16 and others added 5 commits June 17, 2026 06:08
Environment(capabilities=[...]) called add_capability() before _hooks_done
was initialized, raising AttributeError; move the flag init above the loop.
Also apply ruff format to satisfy CI (runtime.py, claude sdk agent, cli init).

Co-authored-by: Cursor <cursoragent@cursor.com>
The env server binds all interfaces inside the sandbox; the tunnel is the
only ingress, so the all-interfaces bind is intentional.

Co-authored-by: Cursor <cursoragent@cursor.com>
…smatch

The default command hardcoded --port 8765 while the SSH forward used the
port arg, so a non-default port left the tunnel pointing at a dead port.
Build the default command from port; an explicit command still overrides.

Co-authored-by: Cursor <cursoragent@cursor.com>

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 5977d5b. Configure here.

Comment thread hud/eval/runtime.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant