fix: remove preset_reward_function from RLVR hyperparameters to prevent empty value being sent to service#5889
Closed
lucasjia-aws wants to merge 3 commits into
Closed
Conversation
…nt empty value being sent to service
ce90faa to
727e0e5
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fix
RLVRTrainerfailing withValidationException: Preset reward function is not supportedwhencustom_reward_functionis not specified.Problem
When
RLVRTraineris initialized, it fetches the model's override params JSON from S3 (via_get_fine_tuning_options_and_model_arn). For RLVR models, this JSON includes apreset_reward_functionfield with an empty string default value.When the user does not pass
custom_reward_function,hyperparameters.to_dict()still serializespreset_reward_function: ""and sends it to theCreateTrainingJobAPI. The SageMaker service rejects this with:ValidationException: Preset reward function is not supported.Root Cause
_process_hyperparameters()is designed to remove hyperparameter keys that are controlled by constructor inputs (to avoid them being passed as raw hyperparameters). It already removes:reward_lambda_arn— controlled bycustom_reward_function, passed viaServerlessJobConfig.evaluator_arndata_s3_path/data_path— controlled bytraining_dataset, passed viaInputDataConfigoutput_path— controlled bys3_output_path, passed viaOutputDataConfigpreset_reward_functionbelongs to the same category but was missing from the removal list.Fix
Add
preset_reward_functionto the list of keys removed in_process_hyperparameters(), consistent with the existing pattern forreward_lambda_arnand other constructor-controlled keys.Customer Impact
custom_reward_functionalways fails withValidationExceptionpreset_reward_functionis no longer sent to the API; training starts normallyNo negative impact — if a user provides a reward function via
custom_reward_function, it is correctly passed throughServerlessJobConfig.evaluator_arn, not as a hyperparameter.Testing
Re-ran
test_rlvr_trainer_lora_complete_workflowintegration test to verify the fix.