[bugfix, rl] Fix sleep do not release full memory in custom pipeline by knlnguyen1802 · Pull Request #3818 · vllm-project/vllm-omni

knlnguyen1802 · 2026-05-22T09:43:09Z

Purpose

Following the release of vLLM version 0.20.0, safetensors now utilizes a direct-to-GPU fast path that invokes cudaMalloc through the driver API, thereby bypassing PyTorch's caching allocator. As a consequence, the allocated memory regions become invisible to CuMemAllocator, which prevents sleep() from offloading or unmapping them. This results in GPU memory remaining pinned and unable to be reclaimed as expected.

cc: @SamitHuang

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan. Please provide the test scripts & test commands. Please state the reasons if your codes don't require additional test scripts. For test file guidelines, please check the test style doc
The test results. Please paste the results comparison before and after, or the e2e results.
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model. Please run mkdocs serve to sync the documentation editions to ./docs.
(Optional) Release notes update. If your change is user-facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

Signed-off-by: knlnguyen1802 <knlnguyen1802@gmail.com>

chatgpt-codex-connector · 2026-05-22T09:43:15Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Credits must be used to enable repository wide code reviews.

SamitHuang

The diagnosis matches what I would expect for RL colocated rollouts with custom_pipeline: constructing under with target_device: makes safetensors take the direct-to-GPU fast path during from_pretrained() inside pipeline __init__ (e.g. QwenImage text encoder / VAE), which bypasses CuMemAllocator and leaves memory pinned after sleep().

Moving custom_pipeline init out of the CUDA default-device context is consistent with the existing HSDP path in _load_model_with_hsdp(), which already avoids with target_device: for the same reason. The follow-up model.to(target_device) keeps non-CPU offload paths unchanged.

Blocking gaps

No test plan or before/after evidence. The PR template checklist is entirely unchecked. For a sleep-mode memory bug, please include peak VRAM / CuMemAllocator.get_current_usage() before and after sleep(level=1) on a custom-pipeline RL path (e.g. QwenImagePipelineWithLogProb + enable_sleep_mode=True).
No regression test. tests/e2e/offline_inference/custom_pipeline/test_async_omni_collective_rpc.py::test_sleep_wake_up_inline_mode exercises custom pipeline sleep/wake but does not set enable_sleep_mode=True and does not assert physical memory is reclaimed. Please extend that test (or add a unit test around DiffusersPipelineLoader.load_model(..., load_format="custom_pipeline")) to assert allocator-tracked usage drops after sleep.

Non-blocking notes

The long NOTE is justified; consider trimming to 4–5 lines if you add a test that encodes the invariant.
This fix addresses loader-level default device context only. Custom pipelines that pass device= into from_pretrained() during __init__ may still bypass the allocator; worth a one-line caveat in the comment.
model.to(target_device) is redundant for pipelines that already .to(self.device) in __init__, but harmless.

Please add the regression test and test results

…yen1802/vllm-omni into fix_sleep_custom_pipeline

Signed-off-by: knlnguyen1802 <knlnguyen1802@gmail.com>

hsliuustc0106

BLOCKING:

Correctness — pre-commit check is failing. Please run pre-commit run --all-files locally and fix any issues before proceeding with the review.

hsliuustc0106

BLOCKING:

Correctness — pre-commit check is failing. Please run pre-commit run --all-files locally and fix any issues before proceeding with the review.

knlnguyen1802 added 2 commits May 19, 2026 11:16

Fix sleep on custom_pipeline

86aeca5

Signed-off-by: knlnguyen1802 <knlnguyen1802@gmail.com>

Add post init for to_cpu

183695d

Signed-off-by: knlnguyen1802 <knlnguyen1802@gmail.com>

knlnguyen1802 requested review from Isotr0py, RuixiangMa, SamitHuang, ZJY0516, david6666666, princepride and wtomin as code owners May 22, 2026 09:43

Merge branch 'main' into fix_sleep_custom_pipeline

b93167d

SamitHuang reviewed May 22, 2026

View reviewed changes

knlnguyen1802 added 2 commits May 22, 2026 17:58

Merge branch 'fix_sleep_custom_pipeline' of https://github.com/knlngu…

c4ae084

…yen1802/vllm-omni into fix_sleep_custom_pipeline

Add test

10f200a

Signed-off-by: knlnguyen1802 <knlnguyen1802@gmail.com>

knlnguyen1802 requested a review from yenuo26 as a code owner May 22, 2026 10:10

hsliuustc0106 reviewed May 22, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[bugfix, rl] Fix sleep do not release full memory in custom pipeline#3818

[bugfix, rl] Fix sleep do not release full memory in custom pipeline#3818
knlnguyen1802 wants to merge 5 commits into
vllm-project:mainfrom
knlnguyen1802:fix_sleep_custom_pipeline

knlnguyen1802 commented May 22, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector Bot commented May 22, 2026

Uh oh!

SamitHuang left a comment •

edited

Loading

Uh oh!

hsliuustc0106 left a comment

Uh oh!

hsliuustc0106 left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

knlnguyen1802 commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Uh oh!

chatgpt-codex-connector Bot commented May 22, 2026

Uh oh!

SamitHuang left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Blocking gaps

Non-blocking notes

Uh oh!

hsliuustc0106 left a comment

Choose a reason for hiding this comment

Uh oh!

hsliuustc0106 left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

knlnguyen1802 commented May 22, 2026 •

edited

Loading

SamitHuang left a comment •

edited

Loading