l/training#426
Open
lorenss-m wants to merge 10 commits into
Open
Conversation
hud/train/: TrainingClient (forward_backward, optim_step, step, custom forward/backward) over the HUD training service, keyed by model id. New 'hud models' CLI group (list, fork, checkpoints, head --set). settings: hud_rl_url; drop the old eval/training.py BYO helper. Docs: v6 training how-to rewritten for the managed trainer + new reference/training page; rl-training cookbook. Co-authored-by: Cursor <cursoragent@cursor.com>
Training POSTs (forward_backward/optim_step/backward) are non-idempotent, so make_request now uses max_retries=0 there (a silent retry would double-apply the optimizer/gradient or collide on the checkpoint name). Adds the 2048 RL cookbook example. Co-authored-by: Cursor <cursoragent@cursor.com>
…inker-training
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit e4c573c. Configure here.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

Note
High Risk
Introduces a new training surface that mutates model weights via remote RL APIs and changes eval defaults for platform tasksets; incorrect loss/optim usage or checkpoint head changes can affect production model behavior.
Overview
Adds HUD-managed on-policy RL via a new
hud.train.TrainingClient(exported fromhud):forward_backward/optim_step/stepagainst the RL service, built-in losses server-side, andforward_backward_customfor client-side torch losses. Replaces the old eval-layerHudTrainingClient/group_relativeadvantage-posting path with in-place weight promotion behind one gateway model slug.Ships the
cookbooks/rl-trainingwalkthrough (local orHUD_TASKSET+HUDRuntime, arithmetic + optional 2048 multi-turn env,simple_trainvs GLM-style custom IS loss).CLI & eval:
hud modelsbecomes a Typer group (list,fork,checkpoints,head);hud evalloads platform tasksets by name/id (Taskset.from_api) and defaults runtime tolocalvshudfrom whether the source is on disk.Rollout robustness: optional
rollout_timeoutonTaskset.run/rollout(shared deadline, partial grade on agent-loop timeout); trace exit sendsmetadata(e.g.stop_reason). Grading wire forwards fullEvaluationResultframes.Docs: new reference training page, rewritten Train on rewards guide,
HUD_RL_URLsetting, optionalhud-python[train]extra for torch.Reviewed by Cursor Bugbot for commit 810380f. Bugbot is set up for automated code reviews on this repo. Configure here.