[Bug]: DeepSeek V4 model fails to load with transformers ≥ 4.57 — `compress_ratios` attribute removed

### Summary

`vllm/model_executor/models/deepseek_v4.py:960` reads `config.compress_ratios[layer_id]` directly. Transformers ≥ 4.57 normalizes the legacy `compress_ratios` JSON field on `DeepseekV4Config.__init__` into `layer_types` (list of strings) + `compress_rates` (dict). The original attribute is no longer exposed on the config object.

**Result:** Every DeepSeek V4 model fails to load on vLLM ≥ 0.20.2 when paired with transformers ≥ 4.57:

```
AttributeError: 'DeepseekV4Config' object has no attribute 'compress_ratios'.
Did you mean: 'compress_rates'?
```

The error fires inside the multiproc worker during `EngineCore` init, propagating as `Engine core initialization failed`.

### Environment

- vLLM: 0.21.0 (also reproducible on 0.20.2; the affected code landed in #40860)
- Transformers: 4.57.1 (and almost certainly any ≥ 4.57)
- Model: `deepseek-ai/DeepSeek-V4-Flash` (likely all DSV4 family)
- Hardware: 2× A100 80GB, TP=2, `trust_remote_code=True`

### Repro

```python
from vllm import LLM
llm = LLM(
    model="deepseek-ai/DeepSeek-V4-Flash",  # or any DSV4 checkpoint
    quantization=None,
    tensor_parallel_size=2,
    max_model_len=2048,
    gpu_memory_utilization=0.85,
    trust_remote_code=True,
)
```

### Full traceback

```
File "/root/.venv/lib/python3.12/site-packages/vllm/model_executor/models/deepseek_v4.py", line 960, in __init__
    self.compress_ratio = max(1, config.compress_ratios[layer_id])
AttributeError: 'DeepseekV4Config' object has no attribute 'compress_ratios'.
Did you mean: 'compress_rates'?
```

### Root cause

Two-side version skew:

- vLLM's DSV4 model code (added in #40860, 2026-05-13) reads the **legacy** form: `config.compress_ratios[layer_id]` (an int per layer).
- Transformers' `DeepseekV4Config` (≥ 4.57) reshapes the same JSON field into the **normalized** form: `layer_types` (`["full_attention", "compressed_sparse_attention", "heavily_compressed_attention", ...]`) + `compress_rates` (`{"compressed_sparse_attention": 4, "heavily_compressed_attention": 128}`).

The mapping between the two is one-to-one:

```python
compress_ratios[i] = compress_rates.get(layer_types[i], 0)
```

### Proposed fix

Update `vllm/model_executor/models/deepseek_v4.py` to read from the normalized form when present, falling back to the legacy form:

```python
# vllm/model_executor/models/deepseek_v4.py, around line 957-962
if hasattr(config, "layer_types") and hasattr(config, "compress_rates"):
    # Modern form: transformers ≥ 4.57 normalizes compress_ratios into these.
    rates = config.compress_rates or {}
    if layer_id < config.num_hidden_layers:
        layer_type = config.layer_types[layer_id]
        self.compress_ratio = max(1, rates.get(layer_type, 0))
    else:
        self.compress_ratio = 1
else:
    # Legacy form: pre-4.57 transformers / raw config.
    if layer_id < config.num_hidden_layers:
        self.compress_ratio = max(1, config.compress_ratios[layer_id])
    else:
        self.compress_ratio = 1
```

This handles both transformers versions and keeps existing behavior for anyone pinning a pre-4.57 stack. The other references in the file (lines 1019, 1039, 1049, 1081) use `self.compress_ratio` (computed once), so only this construction site needs touching.

### Workaround we're using

Monkey-patching `DeepseekV4Config.__init__` to re-derive `compress_ratios` from `layer_types + compress_rates` before vLLM init. Runs in our eval harness:

```python
from transformers import DeepseekV4Config
_orig_init = DeepseekV4Config.__init__
def _patched_init(self, *args, **kwargs):
    _orig_init(self, *args, **kwargs)
    if not hasattr(self, "compress_ratios") and hasattr(self, "layer_types"):
        rates = self.compress_rates or {}
        self.compress_ratios = [rates.get(t, 0) for t in self.layer_types]
DeepseekV4Config.__init__ = _patched_init
```

Importing this module before `from vllm import LLM` works around the issue, but it's a downstream hack — would be much cleaner if vLLM read the normalized fields directly.

### Context

We're publishing a TQ3-native checkpoint at `varjosoft/DeepSeek-V4-Flash-TQ3-native` and hit this on first load. Happy to submit a PR with the fix above if a maintainer is in agreement on the shape.

---

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug]: DeepSeek V4 model fails to load with transformers ≥ 4.57 — `compress_ratios` attribute removed #42741

Summary

Environment

Repro

Full traceback

Root cause

Proposed fix

Workaround we're using

Context

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

[Bug]: DeepSeek V4 model fails to load with transformers ≥ 4.57 — compress_ratios attribute removed #42741

Description

Summary

Environment

Repro

Full traceback

Root cause

Proposed fix

Workaround we're using

Context

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

[Bug]: DeepSeek V4 model fails to load with transformers ≥ 4.57 — `compress_ratios` attribute removed #42741