l/training by lorenss-m · Pull Request #426 · hud-evals/hud-python

lorenss-m · 2026-06-18T01:20:25Z

Note

High Risk
Introduces a new training surface that mutates model weights via remote RL APIs and changes eval defaults for platform tasksets; incorrect loss/optim usage or checkpoint head changes can affect production model behavior.

Overview
Adds HUD-managed on-policy RL via a new hud.train.TrainingClient (exported from hud): forward_backward / optim_step / step against the RL service, built-in losses server-side, and forward_backward_custom for client-side torch losses. Replaces the old eval-layer HudTrainingClient / group_relative advantage-posting path with in-place weight promotion behind one gateway model slug.

Ships the cookbooks/rl-training walkthrough (local or HUD_TASKSET + HUDRuntime, arithmetic + optional 2048 multi-turn env, simple_train vs GLM-style custom IS loss).

CLI & eval: hud models becomes a Typer group (list, fork, checkpoints, head); hud eval loads platform tasksets by name/id (Taskset.from_api) and defaults runtime to local vs hud from whether the source is on disk.

Rollout robustness: optional rollout_timeout on Taskset.run / rollout (shared deadline, partial grade on agent-loop timeout); trace exit sends metadata (e.g. stop_reason). Grading wire forwards full EvaluationResult frames.

Docs: new reference training page, rewritten Train on rewards guide, HUD_RL_URL setting, optional hud-python[train] extra for torch.

^{Reviewed by Cursor Bugbot for commit 810380f. Bugbot is set up for automated code reviews on this repo. Configure here.}

hud/train/: TrainingClient (forward_backward, optim_step, step, custom forward/backward) over the HUD training service, keyed by model id. New 'hud models' CLI group (list, fork, checkpoints, head --set). settings: hud_rl_url; drop the old eval/training.py BYO helper. Docs: v6 training how-to rewritten for the managed trainer + new reference/training page; rl-training cookbook. Co-authored-by: Cursor <cursoragent@cursor.com>

Training POSTs (forward_backward/optim_step/backward) are non-idempotent, so make_request now uses max_retries=0 there (a silent retry would double-apply the optimizer/gradient or collide on the checkpoint name). Adds the 2048 RL cookbook example. Co-authored-by: Cursor <cursoragent@cursor.com>

…inker-training

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit e4c573c. Configure here.}

lorenss-m and others added 5 commits June 17, 2026 17:48

add small notes

ff80752

add timeout safety

b970570

also remote taskset run via cli, report trace info

128d61a

lorenss-m marked this pull request as ready for review June 18, 2026 21:50

cursor Bot reviewed Jun 18, 2026

View reviewed changes

Comment thread cookbooks/rl-training/game2048_env.py Outdated

Merge branch 'v6' of https://github.com/hud-evals/hud-python into l/t…

e2b1164

…inker-training

cursor Bot reviewed Jun 18, 2026

View reviewed changes

Comment thread cookbooks/rl-training/train_2048.py Outdated

lorenss-m added 2 commits June 18, 2026 14:57

fx

f1c00cc

fx 2

af4e1db

lorenss-m changed the title ~~L/tinker training~~ l/training Jun 18, 2026

cursor Bot reviewed Jun 18, 2026

View reviewed changes

Comment thread cookbooks/rl-training/game2048_env.py

Comment thread hud/eval/run.py

fix scoring and timeouts

e4c573c

cursor Bot reviewed Jun 18, 2026

View reviewed changes

Comment thread hud/eval/run.py Outdated

small fix

810380f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

l/training#426

l/training#426
lorenss-m wants to merge 10 commits into
v6from
l/tinker-training

lorenss-m commented Jun 18, 2026 •

edited by cursor Bot

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

lorenss-m commented Jun 18, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

lorenss-m commented Jun 18, 2026 •

edited by cursor Bot

Loading