Skip to content

[Bug]: DeepSeek V4 model fails to load with transformers ≥ 4.57 — compress_ratios attribute removed #42741

@varjoranta

Description

@varjoranta

Summary

vllm/model_executor/models/deepseek_v4.py:960 reads config.compress_ratios[layer_id] directly. Transformers ≥ 4.57 normalizes the legacy compress_ratios JSON field on DeepseekV4Config.__init__ into layer_types (list of strings) + compress_rates (dict). The original attribute is no longer exposed on the config object.

Result: Every DeepSeek V4 model fails to load on vLLM ≥ 0.20.2 when paired with transformers ≥ 4.57:

AttributeError: 'DeepseekV4Config' object has no attribute 'compress_ratios'.
Did you mean: 'compress_rates'?

The error fires inside the multiproc worker during EngineCore init, propagating as Engine core initialization failed.

Environment

  • vLLM: 0.21.0 (also reproducible on 0.20.2; the affected code landed in [Feat] DeepSeek V4 Rebased  #40860)
  • Transformers: 4.57.1 (and almost certainly any ≥ 4.57)
  • Model: deepseek-ai/DeepSeek-V4-Flash (likely all DSV4 family)
  • Hardware: 2× A100 80GB, TP=2, trust_remote_code=True

Repro

from vllm import LLM
llm = LLM(
    model="deepseek-ai/DeepSeek-V4-Flash",  # or any DSV4 checkpoint
    quantization=None,
    tensor_parallel_size=2,
    max_model_len=2048,
    gpu_memory_utilization=0.85,
    trust_remote_code=True,
)

Full traceback

File "/root/.venv/lib/python3.12/site-packages/vllm/model_executor/models/deepseek_v4.py", line 960, in __init__
    self.compress_ratio = max(1, config.compress_ratios[layer_id])
AttributeError: 'DeepseekV4Config' object has no attribute 'compress_ratios'.
Did you mean: 'compress_rates'?

Root cause

Two-side version skew:

  • vLLM's DSV4 model code (added in [Feat] DeepSeek V4 Rebased  #40860, 2026-05-13) reads the legacy form: config.compress_ratios[layer_id] (an int per layer).
  • Transformers' DeepseekV4Config (≥ 4.57) reshapes the same JSON field into the normalized form: layer_types (["full_attention", "compressed_sparse_attention", "heavily_compressed_attention", ...]) + compress_rates ({"compressed_sparse_attention": 4, "heavily_compressed_attention": 128}).

The mapping between the two is one-to-one:

compress_ratios[i] = compress_rates.get(layer_types[i], 0)

Proposed fix

Update vllm/model_executor/models/deepseek_v4.py to read from the normalized form when present, falling back to the legacy form:

# vllm/model_executor/models/deepseek_v4.py, around line 957-962
if hasattr(config, "layer_types") and hasattr(config, "compress_rates"):
    # Modern form: transformers ≥ 4.57 normalizes compress_ratios into these.
    rates = config.compress_rates or {}
    if layer_id < config.num_hidden_layers:
        layer_type = config.layer_types[layer_id]
        self.compress_ratio = max(1, rates.get(layer_type, 0))
    else:
        self.compress_ratio = 1
else:
    # Legacy form: pre-4.57 transformers / raw config.
    if layer_id < config.num_hidden_layers:
        self.compress_ratio = max(1, config.compress_ratios[layer_id])
    else:
        self.compress_ratio = 1

This handles both transformers versions and keeps existing behavior for anyone pinning a pre-4.57 stack. The other references in the file (lines 1019, 1039, 1049, 1081) use self.compress_ratio (computed once), so only this construction site needs touching.

Workaround we're using

Monkey-patching DeepseekV4Config.__init__ to re-derive compress_ratios from layer_types + compress_rates before vLLM init. Runs in our eval harness:

from transformers import DeepseekV4Config
_orig_init = DeepseekV4Config.__init__
def _patched_init(self, *args, **kwargs):
    _orig_init(self, *args, **kwargs)
    if not hasattr(self, "compress_ratios") and hasattr(self, "layer_types"):
        rates = self.compress_rates or {}
        self.compress_ratios = [rates.get(t, 0) for t in self.layer_types]
DeepseekV4Config.__init__ = _patched_init

Importing this module before from vllm import LLM works around the issue, but it's a downstream hack — would be much cleaner if vLLM read the normalized fields directly.

Context

We're publishing a TQ3-native checkpoint at varjosoft/DeepSeek-V4-Flash-TQ3-native and hit this on first load. Happy to submit a PR with the fix above if a maintainer is in agreement on the shape.


Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions