[Bugfix] Fix qwen3-tts create_causal_mask kwarg for transformers >=5.9.0 by Yadan-Wei · Pull Request #3786 · vllm-project/vllm-omni

Yadan-Wei · 2026-05-21T01:30:05Z

Summary

Qwen3TTSTokenizerV2DecoderTransformerModel.forward constructs a mask_kwargs dict with key \"input_embeds\" (singular) and unpacks it into transformers.masking_utils.create_causal_mask. transformers renamed this kwarg to inputs_embeds (plural) in 5.5.1, kept input_embeds as a deprecated alias via @deprecate_kwarg, and removed the alias in 5.9.0 (released 2026-05-20).

Result: every qwen3-tts request fails on the first forward with:

File ".../qwen3_tts/tokenizer_12hz/modeling_qwen3_tts_tokenizer_v2.py", line 576, in forward
    \"full_attention\": create_causal_mask(**mask_kwargs),
TypeError: create_causal_mask() got an unexpected keyword argument 'input_embeds'

This is a single-character typo — the rest of the file (function signature on line 532, every other usage in this file) already uses the plural inputs_embeds. Line 568 is a stray.

Change

             mask_kwargs = {
                 \"config\": self.config,
-                \"input_embeds\": inputs_embeds,
+                \"inputs_embeds\": inputs_embeds,
                 \"attention_mask\": attention_mask,
                 \"cache_position\": cache_position,
                 \"past_key_values\": past_key_values,
                 \"position_ids\": position_ids,
             }

Compatibility

transformers 5.5.1..5.8.x: still works (the new kwarg name has existed since 5.5.1).
transformers 5.9.0+: now works (was broken before).
transformers 5.5.0 and earlier: not affected (predates the rename and predates vllm core 0.21.0's transformers floor of 4.56.0).

Test plan

Reproduced the failure on transformers 5.9.0 with vllm-omni 0.21.0rc1 + Qwen3-TTS-12hz-1.7B-Base on an L4 GPU. Stack trace matches the report above.
Verify qwen3-tts smoke test passes after applying this patch (will run downstream once merged into a release; can also be cherry-picked into 0.21.0 patch).

References

transformers 5.9.0 release: https://github.com/huggingface/transformers/releases/tag/v5.9.0
transformers @deprecate_kwarg introduction: 5.5.1 in src/transformers/masking_utils.py
transformers alias removal: between 5.8.1 and 5.9.0 in src/transformers/masking_utils.py (@deprecate_kwarg decorator removed)

chatgpt-codex-connector · 2026-05-21T01:30:12Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Credits must be used to enable repository wide code reviews.

linyueqian

Verified on an H20 against the real Qwen3-TTS-12Hz-0.6B-Base speech tokenizer, transformers 4.57.6 / 5.8.1 / 5.9.0.

The bug is real. transformers 5.9.0 (current PyPI latest) removed the deprecated input_embeds alias from create_causal_mask, and vllm 0.21.0 pins transformers!=5.0.*,...,!=5.5.0,>=4.56.0 with no upper bound, so a fresh install resolves to 5.9.0 and every qwen3-tts decode fails as reported.

But this patch does not fix it, and it regresses transformers 4.x. Two things changed in 5.9.0, not one: the input_embeds to inputs_embeds rename, and removal of cache_position from both create_causal_mask and create_sliding_window_causal_mask. mask_kwargs still passes cache_position, so after this rename the decode still raises TypeError: create_causal_mask() got an unexpected keyword argument 'cache_position' on 5.9.0. And transformers 4.56 to 4.57.x, also allowed by vllm 0.21.0, only accept input_embeds (singular), so renaming to inputs_embeds breaks 4.x.

`mask_kwargs`	tfm 4.57.6	tfm 5.8.1	tfm 5.9.0
current main	works	works	fails (`input_embeds`)
this PR	fails (`inputs_embeds`)	works	fails (`cache_position`)
signature-filtered	works	works	works

The matrix above is create_causal_mask binding plus end-to-end decode() on real Qwen3-TTS-12Hz-0.6B-Base weights. This PR trades a 5.9.0-only break for a 4.x break that still fails on 5.9.0. Requesting changes; the fix needs to be version-aware. See the inline comment.

linyueqian · 2026-05-21T03:47:42Z

            mask_kwargs = {
                "config": self.config,
-                "input_embeds": inputs_embeds,
+                "inputs_embeds": inputs_embeds,


[blocking] One rename is not enough, and this also breaks transformers 4.x. transformers 5.9.0 removed two kwargs from create_causal_mask and create_sliding_window_causal_mask: the input_embeds alias and cache_position. mask_kwargs still feeds cache_position into both helpers (lines 576 and 580), so on 5.9.0 the decode still raises TypeError: ... unexpected keyword argument 'cache_position'. And transformers 4.56 to 4.57.x only accept input_embeds (singular), so the plural name fails there. No static dict works on both 4.x and 5.9.0; filter by the live signature instead:

# add `import inspect` at module top mask_kwargs = { "config": self.config, "inputs_embeds": inputs_embeds, "attention_mask": attention_mask, "cache_position": cache_position, "past_key_values": past_key_values, "position_ids": position_ids, } def _mask_args(fn): params = inspect.signature(fn).parameters args = {k: v for k, v in mask_kwargs.items() if k in params} if "inputs_embeds" not in params and "input_embeds" in params: args["input_embeds"] = args.pop("inputs_embeds") return args causal_mask_mapping = {"full_attention": create_causal_mask(**_mask_args(create_causal_mask))} if self.has_sliding_layers: causal_mask_mapping["sliding_attention"] = create_sliding_window_causal_mask( **_mask_args(create_sliding_window_causal_mask) )

Verified end-to-end on real Qwen3-TTS weights: this passes on both transformers 4.57.6 and 5.9.0.

Thanks for the careful matrix — you're right that the rename alone wasn't enough and that 4.x needs the singular form. Adopted the signature-filtered helper exactly as suggested in 77ee11b: added import inspect, kept the full mask_kwargs, and routed both create_causal_mask and create_sliding_window_causal_mask through _mask_args(fn) so each call only forwards kwargs the installed transformers version accepts (drops cache_position on 5.9.0, renames to input_embeds on 4.x). PTAL when you have a moment.

linyueqian · 2026-05-21T14:40:35Z

Pushed a one-line fixup as 5f3fdb7.

The _mask_args helper from my earlier review had a bug. On transformers 4.x the parameter is the singular input_embeds, so the {k: v ... if k in params} comprehension already filtered inputs_embeds out of args, and args.pop("inputs_embeds") then raised KeyError: 'inputs_embeds'. The fixup reads the value from mask_kwargs instead, which always holds it. That was my mistake in the review code, apologies.

Verified end to end on Qwen3-TTS-12Hz-0.6B-Base: decode() now passes on both transformers 4.57.6 and 5.9.0. For reference, current main passes on 4.x and fails on 5.9.0, and 77ee11b passed on 5.9.0 but raised KeyError on 4.x.

@Gaohan123 could you take another look, since you own this tokenizer model. The change is small but it sits on the codec decode path that runs for every request.

linyueqian · 2026-05-21T21:17:52Z

fix dco please

The Qwen3TTSTokenizer V2 forward path passes mask_kwargs to transformers.masking_utils.create_causal_mask with a key named "input_embeds" (singular). transformers renamed this kwarg to "inputs_embeds" in 5.5.1, kept "input_embeds" as a deprecated alias via @deprecate_kwarg, and removed the alias in 5.9.0 (released 2026-05-20, https://github.com/huggingface/transformers/releases/tag/v5.9.0). After the alias removal, qwen3-tts inference fails on first request: File ".../qwen3_tts/tokenizer_12hz/modeling_qwen3_tts_tokenizer_v2.py", line 576 causal_mask_mapping = { "full_attention": create_causal_mask(**mask_kwargs), ...} TypeError: create_causal_mask() got an unexpected keyword argument 'input_embeds' Rename the dict key to "inputs_embeds" so the unpacked kwargs match the current upstream signature. Every other reference to inputs_embeds in this file (including the function signature on line 532) already uses the plural form; line 568 was a stray typo. This restores compatibility with transformers >=5.9.0 while remaining compatible with 5.5.1..5.8.x (the deprecation alias path). Signed-off-by: Yadan Wei <yadanwei@amazon.com>

Reviewer pointed out two problems with the prior single-rename fix: - transformers 5.9.0 also dropped `cache_position` from create_causal_mask and create_sliding_window_causal_mask, so renaming alone still fails on 5.9.0 with `unexpected keyword argument 'cache_position'`. - transformers 4.56-4.57.x (also allowed by vllm 0.21.0) only accept the singular `input_embeds`, so the static rename regresses 4.x. Inspect each helper's live signature and forward only the kwargs it accepts; rename `inputs_embeds` -> `input_embeds` when only the singular form exists. Verified by reviewer end-to-end on Qwen3-TTS-12Hz-0.6B-Base weights against transformers 4.57.6, 5.8.1, and 5.9.0. Signed-off-by: Yadan Wei <yadanwei@amazon.com>

…mers 4.x The _mask_args helper popped "inputs_embeds" from the already-filtered args dict, but on transformers 4.x the parameter is the singular "input_embeds" so that key was never added to args, making the remap raise KeyError: 'inputs_embeds'. Read the value from mask_kwargs, which always holds it, instead. Verified end to end on Qwen3-TTS-12Hz-0.6B-Base: decode() passes on transformers 4.57.6 and 5.9.0. Signed-off-by: Yueqian Lin <linyueqian@outlook.com> Signed-off-by: Yadan Wei <yadanwei@amazon.com>

Yadan-Wei · 2026-05-21T23:05:41Z

fix dco please
Fixed.

Yadan-Wei requested review from ZeldaHuang, linyueqian, princepride and yuanheng-zhao as code owners May 21, 2026 01:30

linyueqian requested changes May 21, 2026

View reviewed changes

Yadan-Wei requested a review from linyueqian May 21, 2026 05:27

Yadan-Wei mentioned this pull request May 21, 2026

feat(vllm-omni): prepare 0.21.0rc1 release branch aws/deep-learning-containers#6110

Merged

6 tasks

Yadan Wei and others added 3 commits May 21, 2026 14:44

Yadan-Wei force-pushed the fix-qwen3-tts-create-causal-mask-kwarg branch from a767a11 to 44a0788 Compare May 21, 2026 21:45

Yadan-Wei added 2 commits May 21, 2026 19:20

Merge branch 'main' into fix-qwen3-tts-create-causal-mask-kwarg

7e36bc2

Merge branch 'main' into fix-qwen3-tts-create-causal-mask-kwarg

8b7aecd

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bugfix] Fix qwen3-tts create_causal_mask kwarg for transformers >=5.9.0#3786

[Bugfix] Fix qwen3-tts create_causal_mask kwarg for transformers >=5.9.0#3786
Yadan-Wei wants to merge 5 commits into
vllm-project:mainfrom
Yadan-Wei:fix-qwen3-tts-create-causal-mask-kwarg

Yadan-Wei commented May 21, 2026

Uh oh!

chatgpt-codex-connector Bot commented May 21, 2026

Uh oh!

linyueqian left a comment

Uh oh!

linyueqian May 21, 2026

Uh oh!

Yadan-Wei May 21, 2026

Uh oh!

linyueqian commented May 21, 2026

Uh oh!

linyueqian commented May 21, 2026

Uh oh!

Yadan-Wei commented May 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Yadan-Wei commented May 21, 2026

Summary

Change

Compatibility

Test plan

References

Uh oh!

chatgpt-codex-connector Bot commented May 21, 2026

Uh oh!

linyueqian left a comment

Choose a reason for hiding this comment

Uh oh!

linyueqian May 21, 2026

Choose a reason for hiding this comment

Uh oh!

Yadan-Wei May 21, 2026

Choose a reason for hiding this comment

Uh oh!

linyueqian commented May 21, 2026

Uh oh!

linyueqian commented May 21, 2026

Uh oh!

Yadan-Wei commented May 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants