You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
vllm/model_executor/models/deepseek_v4.py:960 reads config.compress_ratios[layer_id] directly. Transformers ≥ 4.57 normalizes the legacy compress_ratios JSON field on DeepseekV4Config.__init__ into layer_types (list of strings) + compress_rates (dict). The original attribute is no longer exposed on the config object.
Result: Every DeepSeek V4 model fails to load on vLLM ≥ 0.20.2 when paired with transformers ≥ 4.57:
AttributeError: 'DeepseekV4Config' object has no attribute 'compress_ratios'.
Did you mean: 'compress_rates'?
The error fires inside the multiproc worker during EngineCore init, propagating as Engine core initialization failed.
fromvllmimportLLMllm=LLM(
model="deepseek-ai/DeepSeek-V4-Flash", # or any DSV4 checkpointquantization=None,
tensor_parallel_size=2,
max_model_len=2048,
gpu_memory_utilization=0.85,
trust_remote_code=True,
)
Full traceback
File "/root/.venv/lib/python3.12/site-packages/vllm/model_executor/models/deepseek_v4.py", line 960, in __init__
self.compress_ratio = max(1, config.compress_ratios[layer_id])
AttributeError: 'DeepseekV4Config' object has no attribute 'compress_ratios'.
Did you mean: 'compress_rates'?
Root cause
Two-side version skew:
vLLM's DSV4 model code (added in [Feat] DeepSeek V4 Rebased #40860, 2026-05-13) reads the legacy form: config.compress_ratios[layer_id] (an int per layer).
Transformers' DeepseekV4Config (≥ 4.57) reshapes the same JSON field into the normalized form: layer_types (["full_attention", "compressed_sparse_attention", "heavily_compressed_attention", ...]) + compress_rates ({"compressed_sparse_attention": 4, "heavily_compressed_attention": 128}).
Update vllm/model_executor/models/deepseek_v4.py to read from the normalized form when present, falling back to the legacy form:
# vllm/model_executor/models/deepseek_v4.py, around line 957-962ifhasattr(config, "layer_types") andhasattr(config, "compress_rates"):
# Modern form: transformers ≥ 4.57 normalizes compress_ratios into these.rates=config.compress_ratesor {}
iflayer_id<config.num_hidden_layers:
layer_type=config.layer_types[layer_id]
self.compress_ratio=max(1, rates.get(layer_type, 0))
else:
self.compress_ratio=1else:
# Legacy form: pre-4.57 transformers / raw config.iflayer_id<config.num_hidden_layers:
self.compress_ratio=max(1, config.compress_ratios[layer_id])
else:
self.compress_ratio=1
This handles both transformers versions and keeps existing behavior for anyone pinning a pre-4.57 stack. The other references in the file (lines 1019, 1039, 1049, 1081) use self.compress_ratio (computed once), so only this construction site needs touching.
Workaround we're using
Monkey-patching DeepseekV4Config.__init__ to re-derive compress_ratios from layer_types + compress_rates before vLLM init. Runs in our eval harness:
Importing this module before from vllm import LLM works around the issue, but it's a downstream hack — would be much cleaner if vLLM read the normalized fields directly.
Context
We're publishing a TQ3-native checkpoint at varjosoft/DeepSeek-V4-Flash-TQ3-native and hit this on first load. Happy to submit a PR with the fix above if a maintainer is in agreement on the shape.
Summary
vllm/model_executor/models/deepseek_v4.py:960readsconfig.compress_ratios[layer_id]directly. Transformers ≥ 4.57 normalizes the legacycompress_ratiosJSON field onDeepseekV4Config.__init__intolayer_types(list of strings) +compress_rates(dict). The original attribute is no longer exposed on the config object.Result: Every DeepSeek V4 model fails to load on vLLM ≥ 0.20.2 when paired with transformers ≥ 4.57:
The error fires inside the multiproc worker during
EngineCoreinit, propagating asEngine core initialization failed.Environment
deepseek-ai/DeepSeek-V4-Flash(likely all DSV4 family)trust_remote_code=TrueRepro
Full traceback
Root cause
Two-side version skew:
config.compress_ratios[layer_id](an int per layer).DeepseekV4Config(≥ 4.57) reshapes the same JSON field into the normalized form:layer_types(["full_attention", "compressed_sparse_attention", "heavily_compressed_attention", ...]) +compress_rates({"compressed_sparse_attention": 4, "heavily_compressed_attention": 128}).The mapping between the two is one-to-one:
Proposed fix
Update
vllm/model_executor/models/deepseek_v4.pyto read from the normalized form when present, falling back to the legacy form:This handles both transformers versions and keeps existing behavior for anyone pinning a pre-4.57 stack. The other references in the file (lines 1019, 1039, 1049, 1081) use
self.compress_ratio(computed once), so only this construction site needs touching.Workaround we're using
Monkey-patching
DeepseekV4Config.__init__to re-derivecompress_ratiosfromlayer_types + compress_ratesbefore vLLM init. Runs in our eval harness:Importing this module before
from vllm import LLMworks around the issue, but it's a downstream hack — would be much cleaner if vLLM read the normalized fields directly.Context
We're publishing a TQ3-native checkpoint at
varjosoft/DeepSeek-V4-Flash-TQ3-nativeand hit this on first load. Happy to submit a PR with the fix above if a maintainer is in agreement on the shape.