add: unified_reward and qwen_image_bench#1471
Conversation
There was a problem hiding this comment.
Code Review
This pull request adds support for three new image quality metrics: Qwen-Image-Bench, UnifiedReward 2.0, and UnifiedReward Edit, including their respective models, metrics, state dict converters, documentation, and example scripts. The code review identified several critical issues: an accidental deletion of hidream_o1_image_series from the model configurations, a potential ValueError crash when parsing pairwise rank winners in UnifiedRewardEditModel, a potential AttributeError in fix_score_json if the input is not a dictionary, a greedy regex pattern that could break JSON extraction, and a redundant regex character class [::]. Additionally, the input normalization logic in UnifiedRewardEditModel can be simplified.
| if len(prompts) == 1 and len(images) > 1 and not isinstance(images[0], (list, tuple)): | ||
| prompts = prompts * len(images) | ||
| if len(prompts) != len(images): | ||
| if len(prompts) == 1: | ||
| prompts = prompts * len(images) | ||
| elif len(images) == 1: | ||
| images = images * len(prompts) | ||
| if len(prompts) != len(images): | ||
| raise ValueError(f"Expected the same number of prompts and images, got {len(prompts)} and {len(images)}.") |
There was a problem hiding this comment.
The input normalization logic here is redundant and can be simplified significantly. The check not isinstance(images[0], (list, tuple)) prevents duplication when images is a list of lists (which is the case for pairwise tasks), but then the fallback if len(prompts) == 1: duplicates it anyway. We can simplify this to match the clean pattern used in QwenImageBenchModel._normalize_inputs.
if len(prompts) == 1 and len(images) > 1:
prompts = prompts * len(images)
if len(images) == 1 and len(prompts) > 1:
images = images * len(prompts)
if len(prompts) != len(images):
raise ValueError(f"Expected the same number of prompts and images, got {len(prompts)} and {len(images)}.")| def fix_score_json(score_json, l1_dim): | ||
| """Fix flat structure, L3 misplacement, and L3 typos based on checklists.py hierarchy.""" | ||
| if not score_json: | ||
| return score_json |
There was a problem hiding this comment.
To prevent potential AttributeError crashes if the model output fails to parse as a dictionary (e.g., if it parses as a list or a string), we should add a defensive check to ensure score_json is a dictionary before calling .values() or .items().
| def fix_score_json(score_json, l1_dim): | |
| """Fix flat structure, L3 misplacement, and L3 typos based on checklists.py hierarchy.""" | |
| if not score_json: | |
| return score_json | |
| def fix_score_json(score_json, l1_dim): | |
| """Fix flat structure, L3 misplacement, and L3 typos based on checklists.py hierarchy.""" | |
| if not isinstance(score_json, dict): | |
| return {} |
| def _extract_first_json(text: str): | ||
| match = re.search(r"\{.*\}", text, flags=re.S) | ||
| if not match: |
There was a problem hiding this comment.
The regex pattern r"\{.*\}" is greedy and will match from the first opening brace { to the very last closing brace } in the entire text. If the model output contains additional explanations or text with braces after the JSON block, this greedy match will capture them and fail to parse as valid JSON. Using a non-greedy match r"\{.*?\}" is much more robust.
@staticmethod
def _extract_first_json(text: str):
match = re.search(r"\{.*?\}", text, flags=re.S)
No description provided.