Skip to content

[Frontend/Model] Support Optional Prompt Upscale#3783

Open
alex-jw-brooks wants to merge 11 commits into
vllm-project:mainfrom
alex-jw-brooks:prompt_upscale
Open

[Frontend/Model] Support Optional Prompt Upscale#3783
alex-jw-brooks wants to merge 11 commits into
vllm-project:mainfrom
alex-jw-brooks:prompt_upscale

Conversation

@alex-jw-brooks
Copy link
Copy Markdown
Contributor

Purpose

FIX #3713

Exposes param to turn prompt upscaling on/off and unifies the behavior for the following Diffusion models:

  • Flux2Dev
  • Longcat
  • Ernie Image

In cases where the model prompt upscaler is external (i.e., of the above, Ernie Image), the download for the extra files is gated on whether or not it's actually going to be used, and the prompt upscaler won't be loaded until you actually make a request using it, since the prompt upscaler takes a nontrivial amount of extra VRAM. For models like this, you need to pass an additional opt-in flag enable_external_prompt_upscaler (or the corresponding CLI arg), which is set to False by default. If you try to make a request with prompt upscale on while its disabled, and the model has an external upscaler component, you will get a warning and it'll skip the upscale part:

WARNING 05-21 00:09:43 [pipeline_ernie_image.py:198] Requested prompt upscaling on a model with an external prompt upscaler, but enable_external_prompt_upscaler is not set in the server config; prompt upscaling will be skipped
  • Also worth considering that technically we could have sampling params for the upscaling / rewrite, but I opted to not expose those here as well, since I thought it would be best to minimize the number of new params / flags in this PR. Open to discussion / exploring this in potential follow-ups though.

Test Plan

Validated for all 3 models that outputs do not match the raw gen with same seed when upscale is enabled for:

  • Offline path
  • Chat completions endpoint
  • Image generation endpoint

Test Result

Upscale results are different (i.e., due to prompt manipulation) on all paths for each of the models. Added additional tests for Ernie since the other changes were very straightforward. Also verified when running the server online that the GPU usage was low (~24 Gi) until I made a request with the upscale param, which increased it (to ~30 Gi) and that we get a warning instead of blowing up the memory if it's disabled. For Flux2/LongCat, we don't need to externally disable it since the component is always loaded anyway, which is why it would still work without issues.

@RuixiangMa @retowyss could you please take a look?

Signed-off-by: Alex Brooks <albrooks@redhat.com>
Signed-off-by: Alex Brooks <albrooks@redhat.com>
Signed-off-by: Alex Brooks <albrooks@redhat.com>
Signed-off-by: Alex Brooks <albrooks@redhat.com>
Signed-off-by: Alex Brooks <albrooks@redhat.com>
Signed-off-by: Alex Brooks <albrooks@redhat.com>
Signed-off-by: Alex Brooks <albrooks@redhat.com>
Signed-off-by: Alex Brooks <albrooks@redhat.com>
Signed-off-by: Alex Brooks <albrooks@redhat.com>
Signed-off-by: Alex Brooks <albrooks@redhat.com>
@chatgpt-codex-connector
Copy link
Copy Markdown

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Credits must be used to enable repository wide code reviews.

Signed-off-by: Alex Brooks <albrooks@redhat.com>
@retowyss
Copy link
Copy Markdown

Works wonderfully for Ernie-Image-Turbo. Thanks!

@alex-jw-brooks
Copy link
Copy Markdown
Contributor Author

Great! Thanks for trying it out @retowyss 🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature]: Do not load PE for Ernie-Image

2 participants