Skip to content

l/training#426

Open
lorenss-m wants to merge 10 commits into
v6from
l/tinker-training
Open

l/training#426
lorenss-m wants to merge 10 commits into
v6from
l/tinker-training

Conversation

@lorenss-m

@lorenss-m lorenss-m commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Note

High Risk
Introduces a new training surface that mutates model weights via remote RL APIs and changes eval defaults for platform tasksets; incorrect loss/optim usage or checkpoint head changes can affect production model behavior.

Overview
Adds HUD-managed on-policy RL via a new hud.train.TrainingClient (exported from hud): forward_backward / optim_step / step against the RL service, built-in losses server-side, and forward_backward_custom for client-side torch losses. Replaces the old eval-layer HudTrainingClient / group_relative advantage-posting path with in-place weight promotion behind one gateway model slug.

Ships the cookbooks/rl-training walkthrough (local or HUD_TASKSET + HUDRuntime, arithmetic + optional 2048 multi-turn env, simple_train vs GLM-style custom IS loss).

CLI & eval: hud models becomes a Typer group (list, fork, checkpoints, head); hud eval loads platform tasksets by name/id (Taskset.from_api) and defaults runtime to local vs hud from whether the source is on disk.

Rollout robustness: optional rollout_timeout on Taskset.run / rollout (shared deadline, partial grade on agent-loop timeout); trace exit sends metadata (e.g. stop_reason). Grading wire forwards full EvaluationResult frames.

Docs: new reference training page, rewritten Train on rewards guide, HUD_RL_URL setting, optional hud-python[train] extra for torch.

Reviewed by Cursor Bugbot for commit 810380f. Bugbot is set up for automated code reviews on this repo. Configure here.

lorenss-m and others added 5 commits June 17, 2026 17:48
hud/train/: TrainingClient (forward_backward, optim_step, step, custom forward/backward) over the HUD training service, keyed by model id. New 'hud models' CLI group (list, fork, checkpoints, head --set). settings: hud_rl_url; drop the old eval/training.py BYO helper. Docs: v6 training how-to rewritten for the managed trainer + new reference/training page; rl-training cookbook.

Co-authored-by: Cursor <cursoragent@cursor.com>
Training POSTs (forward_backward/optim_step/backward) are non-idempotent, so make_request now uses max_retries=0 there (a silent retry would double-apply the optimizer/gradient or collide on the checkpoint name). Adds the 2048 RL cookbook example.

Co-authored-by: Cursor <cursoragent@cursor.com>
@lorenss-m lorenss-m marked this pull request as ready for review June 18, 2026 21:50
Comment thread cookbooks/rl-training/game2048_env.py Outdated
Comment thread cookbooks/rl-training/train_2048.py Outdated
@lorenss-m lorenss-m changed the title L/tinker training l/training Jun 18, 2026
Comment thread cookbooks/rl-training/game2048_env.py
Comment thread hud/eval/run.py

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit e4c573c. Configure here.

Comment thread hud/eval/run.py Outdated
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant