Skip to content

fix: remove preset_reward_function from RLVR hyperparameters to prevent empty value being sent to service#5889

Closed
lucasjia-aws wants to merge 3 commits into
aws:masterfrom
lucasjia-aws:unskip-integ-tests-train
Closed

fix: remove preset_reward_function from RLVR hyperparameters to prevent empty value being sent to service#5889
lucasjia-aws wants to merge 3 commits into
aws:masterfrom
lucasjia-aws:unskip-integ-tests-train

Conversation

@lucasjia-aws
Copy link
Copy Markdown
Collaborator

@lucasjia-aws lucasjia-aws commented May 21, 2026

Summary

Fix RLVRTrainer failing with ValidationException: Preset reward function is not supported when custom_reward_function is not specified.

Problem

When RLVRTrainer is initialized, it fetches the model's override params JSON from S3 (via _get_fine_tuning_options_and_model_arn). For RLVR models, this JSON includes a preset_reward_function field with an empty string default value.

When the user does not pass custom_reward_function, hyperparameters.to_dict() still serializes preset_reward_function: "" and sends it to the CreateTrainingJob API. The SageMaker service rejects this with: ValidationException: Preset reward function is not supported.

Root Cause

_process_hyperparameters() is designed to remove hyperparameter keys that are controlled by constructor inputs (to avoid them being passed as raw hyperparameters). It already removes:

  • reward_lambda_arn — controlled by custom_reward_function, passed via ServerlessJobConfig.evaluator_arn
  • data_s3_path / data_path — controlled by training_dataset, passed via InputDataConfig
  • output_path — controlled by s3_output_path, passed via OutputDataConfig

preset_reward_function belongs to the same category but was missing from the removal list.

Fix

Add preset_reward_function to the list of keys removed in _process_hyperparameters(), consistent with the existing pattern for reward_lambda_arn and other constructor-controlled keys.

Customer Impact

  • Before fix: RLVR training without custom_reward_function always fails with ValidationException
  • After fix: Empty preset_reward_function is no longer sent to the API; training starts normally

No negative impact — if a user provides a reward function via custom_reward_function, it is correctly passed through ServerlessJobConfig.evaluator_arn, not as a hyperparameter.

Testing

Re-ran test_rlvr_trainer_lora_complete_workflow integration test to verify the fix.

@lucasjia-aws lucasjia-aws deleted the unskip-integ-tests-train branch May 27, 2026 21:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant