Summary
vllm/model_executor/models/deepseek_v4.py has an unbound-local bug in DeepseekV4ForCausalLM.load_weights()'s expert branch. The variable name_mapped is assigned inside a for mapping in expert_mapping loop, but referenced after the loop. When no entry in expert_mapping matches the incoming tensor name (all iterations hit the early continue), name_mapped is never bound — but the subsequent loaded_params.add(name_mapped) still runs:
File "vllm/model_executor/models/deepseek_v4.py", line 1557, in load_weights
loaded_params.add(name_mapped)
UnboundLocalError: cannot access local variable 'name_mapped' where it is not associated with a value
Environment
- vLLM: 0.21.0 (also reproducible on
main as of 2026-05-15)
- Transformers: 4.57.1
- Hardware: 2× A100 80GB, TP=2
- Trigger: any DSV4 checkpoint whose expert weight tensor names don't match every entry in
get_expert_mapping() (e.g., custom quantization-aware suffixes like .tq_packed).
Affected code
vllm/model_executor/models/deepseek_v4.py around lines 1532-1558 (line numbers from main as of 2026-05-15, may drift slightly):
if ".experts." in name:
# E8M0 scales special handling...
if (
"weight_scale" in name
and loaded_weight.dtype == torch.float8_e8m0fnu
):
loaded_weight = loaded_weight.view(torch.uint8)
for mapping in expert_mapping:
param_name, weight_name, expert_id, shard_id = mapping
if weight_name not in name:
continue # <-- early continue
name_mapped = name.replace(weight_name, param_name)
# ... weight loader call ...
if success:
name = name_mapped
break
loaded_params.add(name_mapped) # <-- unbound if loop never set it
continue
If every iteration continues on the if weight_name not in name guard, the for loop exits without break AND without ever assigning name_mapped. loaded_params.add(name_mapped) then raises UnboundLocalError.
Repro
Easiest path: load any DSV4 checkpoint that includes an expert tensor name not present in get_expert_mapping()'s output. We hit it loading a TQ3-native variant where weight names have a .tq_packed suffix, but it would also fire on:
- Custom quantization schemes that decorate weight names
- Any checkpoint shape mismatch where
.experts. is in the name but the suffix doesn't match the expected gate_proj/up_proj/down_proj/w1/w2/w3 patterns
Proposed fix
Initialize name_mapped = None before the loop, then short-circuit if it stayed unset:
if ".experts." in name:
if (
"weight_scale" in name
and loaded_weight.dtype == torch.float8_e8m0fnu
):
loaded_weight = loaded_weight.view(torch.uint8)
name_mapped = None # <-- ADDED
for mapping in expert_mapping:
param_name, weight_name, expert_id, shard_id = mapping
if weight_name not in name:
continue
name_mapped = name.replace(weight_name, param_name)
# ... weight loader call ...
if success:
name = name_mapped
break
if name_mapped is None: # <-- ADDED
continue # <-- ADDED
loaded_params.add(name_mapped)
continue
Three-line change. Skips unrecognized expert tensors cleanly instead of raising. If a maintainer prefers raising explicitly for unrecognized expert names, the continue could be raise ValueError(f"unrecognized expert tensor name: {name}") — that's a stylistic choice.
Workaround we're using
Patching the installed deepseek_v4.py at runtime via a setup script (sed-style replace with the diff above). Three-line surgical patch.
Context
We've been validating a TQ3-native (3-bit weight quantization) version of DeepSeek-V4-Flash on vLLM. This is the second deepseek_v4.py bug we've hit during that work; the first was reported as #42741 (compress_ratios attribute access vs the new transformers normalization). Happy to submit a PR with both fixes together if a maintainer wants.
Summary
vllm/model_executor/models/deepseek_v4.pyhas an unbound-local bug inDeepseekV4ForCausalLM.load_weights()'s expert branch. The variablename_mappedis assigned inside afor mapping in expert_mappingloop, but referenced after the loop. When no entry inexpert_mappingmatches the incoming tensor name (all iterations hit the earlycontinue),name_mappedis never bound — but the subsequentloaded_params.add(name_mapped)still runs:Environment
mainas of 2026-05-15)get_expert_mapping()(e.g., custom quantization-aware suffixes like.tq_packed).Affected code
vllm/model_executor/models/deepseek_v4.pyaround lines 1532-1558 (line numbers from main as of 2026-05-15, may drift slightly):If every iteration
continues on theif weight_name not in nameguard, theforloop exits withoutbreakAND without ever assigningname_mapped.loaded_params.add(name_mapped)then raisesUnboundLocalError.Repro
Easiest path: load any DSV4 checkpoint that includes an expert tensor name not present in
get_expert_mapping()'s output. We hit it loading a TQ3-native variant where weight names have a.tq_packedsuffix, but it would also fire on:.experts.is in the name but the suffix doesn't match the expectedgate_proj/up_proj/down_proj/w1/w2/w3patternsProposed fix
Initialize
name_mapped = Nonebefore the loop, then short-circuit if it stayed unset:Three-line change. Skips unrecognized expert tensors cleanly instead of raising. If a maintainer prefers raising explicitly for unrecognized expert names, the
continuecould beraise ValueError(f"unrecognized expert tensor name: {name}")— that's a stylistic choice.Workaround we're using
Patching the installed
deepseek_v4.pyat runtime via a setup script (sed-style replace with the diff above). Three-line surgical patch.Context
We've been validating a TQ3-native (3-bit weight quantization) version of DeepSeek-V4-Flash on vLLM. This is the second
deepseek_v4.pybug we've hit during that work; the first was reported as #42741 (compress_ratiosattribute access vs the new transformers normalization). Happy to submit a PR with both fixes together if a maintainer wants.