Skip to content

[WIP][CI][Accuracy] Add HunyuanImage3 pixel accuracy test and nightly CI#3657

Closed
BLANKETusers wants to merge 0 commit into
vllm-project:mainfrom
BLANKETusers:main
Closed

[WIP][CI][Accuracy] Add HunyuanImage3 pixel accuracy test and nightly CI#3657
BLANKETusers wants to merge 0 commit into
vllm-project:mainfrom
BLANKETusers:main

Conversation

@BLANKETusers
Copy link
Copy Markdown
Contributor

Summary

  • Add assert_images_pixel_close helper for full-image pixel-level comparison
    with mean/p99 absolute channel difference metrics and detailed diagnostics
  • Add test_hunyuan_image3_pixel_accuracy that generates images via offline
    end2end.py and compares output against a pre-saved baseline image
  • Add nightly CI step (4× H100) in the Diffusion X2I group to gate pixel
    accuracy regressions
  • Rename diffusers_imagebaseline_image across accuracy helper APIs
    (assert_similarity, assert_image_sequence_similarity)

Files changed

File Change
.buildkite/test-nightly.yml +38 lines: new CI step vllm-omni · HunyuanImage3 · Accuracy Test
tests/assets/hunyuan/hunyuan_baseline.png Baseline reference image (1024×1024)
tests/e2e/accuracy/helpers.py +68 lines: assert_images_pixel_close; rename params
tests/e2e/accuracy/test_hunyuan_image3_pixel_accuracy.py +142 lines: new test

Test plan

HUNYUAN_IMAGE3_DEPLOY_CONFIG=../hunyuan_image3_dit_copy.yaml pytest -s -v tests/e2e/accuracy/test_hunyuan_image3_pixel_accuracy.py --run-level

Test Result

1 passed, 18 warnings in 101.49s (0:01:41)

Pixel Metrics

tencent/HunyuanImage-3.0-Instruct — (offline vs baseline)

Metric Value Threshold Status
mean_abs_diff 0.000000 ≤ 0.02
p99_abs_diff 0.000000 ≤ 0.10
p50 0.000000
p90 0.000000
p95 0.000000
p99 0.000000
p99.9 0.000000

Mismatch ratios (pixel_ratio / channel_ratio)

Threshold (1/255) 0 1 2 4 8 16 32 64 128
pixel_ratio 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
channel_ratio 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ce7719d521

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread tests/e2e/accuracy/helpers.py Outdated
model_name: str,
vllm_image: Image.Image,
diffusers_image: Image.Image,
baseline_image: Image.Image,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Preserve existing helper keyword names

When the existing accuracy tests call assert_similarity(..., diffusers_image=...) or assert_image_sequence_similarity(..., diffusers_images=...) (for example tests/e2e/accuracy/test_qwen_image.py:114 and test_qwen_image_layered.py:142), this signature change makes pytest fail immediately with TypeError: got an unexpected keyword argument before any image comparison runs. Please keep backward-compatible keyword aliases or update all existing call sites in the same change.

Useful? React with 👍 / 👎.


_REPO_ROOT = Path(__file__).resolve().parent.parent.parent.parent
BASELINE_PATH = _REPO_ROOT / "tests" / "assets" / "hunyuan" / "hunyuan_baseline.png"
_DEFAULT_DEPLOY_CONFIG = _REPO_ROOT / "vllm_omni" / "deploy" / "hunyuan_image3.yaml"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe you can use get_deploy_config_path in stage_config.py

"--stage-init-timeout", "300",
"--init-timeout", "900",
]
with OmniServer(model, server_args, use_omni=True) as omni_server:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe you can use omni_server fixtures

)

# online vs baseline_image
# assert_images_pixel_close(
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this redundant code?

@congw729
Copy link
Copy Markdown
Collaborator

Does this PR need to be closed?

@BLANKETusers
Copy link
Copy Markdown
Contributor Author

new PR:3790

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants