[Models] Support MLA_SWA functionality by chang-wenbin · Pull Request #8049 · PaddlePaddle/FastDeploy

chang-wenbin · 2026-06-15T12:02:46Z

Motivation

Support MLA sliding window attention (MLA_SWA) functionality for DeepSeek V3 style MLA layers.

Modifications

Add SWA top-k indexer construction and a static SWA attention path in fastdeploy/model_executor/models/deepseek_v3.py.
Add DSAAttentionBackend.forward_static() and reuse it for DSA/MLA SWA FlashMLA calls.
Adjust MLA KV cache shape/dtype for layers selected by window_attn_skip_freq.
Update GPU model runner KV cache allocation to create uint8 MLA cache for SWA layers.

Usage or Command

N/A

Accuracy Tests

N/A

Checklist

Add at least a tag in the PR title.
- Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
- You can add new tags based on the PR content, but the semantics must be clear.
Format your code, run pre-commit before commit.
Add unit tests. Please write the reason in this PR if no unit tests.
Provide accuracy results.
If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

codecov-commenter · 2026-06-15T12:38:59Z

Codecov Report

❌ Patch coverage is 26.92308% with 95 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (develop@cbb0811). Learn more about missing BASE report.

Files with missing lines	Patch %	Lines
fastdeploy/model_executor/models/deepseek_v3.py	8.04%	80 Missing ⚠️
fastdeploy/worker/gpu_model_runner.py	45.45%	4 Missing and 2 partials ⚠️
...executor/layers/attention/dsa_attention_backend.py	80.76%	2 Missing and 3 partials ⚠️
...executor/layers/attention/mla_attention_backend.py	33.33%	3 Missing and 1 partial ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             develop    #8049   +/-   ##
==========================================
  Coverage           ?   67.47%           
==========================================
  Files              ?      475           
  Lines              ?    66739           
  Branches           ?    10293           
==========================================
  Hits               ?    45031           
  Misses             ?    18831           
  Partials           ?     2877

Flag	Coverage Δ
GPU	`77.43% <26.92%> (?)`
XPU	`6.97% <0.00%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

PaddlePaddle-bot

🤖 Paddle-CI-Agent | pr_review | 2026-06-16 02:13:02

总体评价

本轮按风险优先审查了 MLA_SWA 的 attention 前向、SWA indexer、KV cache 分配和新增测试。当前没有新增阻断级 inline finding，但这次改动覆盖 DeepSeekV3 MLA/DSA 的核心前向路径和 KV cache layout，PR 仍缺少能证明 MLA_SWA 语义正确性的精度/回归测试；建议合入前补齐。

问题

🟡 建议：请补充 MLA_SWA 的真实语义/精度回归覆盖。当前测试主要验证 mock 后的 DSA 输出 shape 和默认 MLA cache shape，没有覆盖 window_attn_skip_freq 命中层时的 forward_swa_static()、SWA top-k indexer、prefill/decode/mixed batch 输出，尤其没有与非 SWA/参考实现对齐的 logits 或 attention 输出证据。这个 PR 改了模型核心 attention 与 KV cache dtype/shape，建议至少补充 SWA layer 的 prefill、decode、mixed batch 用例，以及 sliding_window 边界和 prefix/chunked prefill 场景的对齐结果。

PR 规范检查

标题 [Models] Support MLA_SWA functionality 符合 Tag 规范，PR 描述结构完整。Accuracy Tests 目前为 N/A，但本 PR 属于模型前向语义变更，建议在描述中补充实际精度/对齐结果或明确可复现的验证命令。

历史 Finding 状态

已复查历史未解决项：f_d5017472、f_7eff4a88、f_28e9c24f、f_f824c9a1、f_2ac6ab5a 在当前 diff/代码中仍未完全修复；f_c57369dc 对应的非 SWA 层 KV cache 量化配置被清空问题已通过在 else 分支内恢复 quant_config.kv_cache_quant_type 修复。本轮不重复发送这些历史问题的 inline comment。

PaddlePaddle-bot · 2026-06-16T02:56:19Z

🤖 Paddle-CI-Agent | ci_status_monitor | 2026-06-16 10:55:13 UTC+08:00

CI报告基于以下代码生成（30分钟更新一次）:
PR commit: cc20efd | Merge base: cbb0811 (branch: develop)

1 Required任务 : 9/10 通过

总执行（rerun次数）	总任务	✅ 通过	❌ 失败	⏳ 运行中	⏸️ 等待中	跳过
42(0)	42	38	4	0	0	0

任务	错误类型	置信度	日志
`Run FastDeploy Unit Tests and Coverage / run_tests_with_coverage`	PR问题：diff覆盖率31%未达80%	高	Job

2 失败详情

🔴 Run FastDeploy Unit Tests and Coverage / run_tests_with_coverage — PR问题（置信度: 高）

分析器: 通用分析(fallback)
失败用例: 覆盖率检查

用例	错误摘要
`diff-cover python_coverage_all.xml --fail-under=80`	PR diff 覆盖率 31%，低于 80% 阈值

关键日志:

TEST_EXIT_CODE: 0
COVERAGE_EXIT_CODE: 9
Failure. Coverage is below 80%.
fastdeploy/model_executor/layers/attention/mla_attention_backend.py (50.0%): Missing lines 546,626-627
fastdeploy/model_executor/models/deepseek_v3.py (8.0459%): Missing lines 92,94,96-117,128-134,302-303,449-677 等
fastdeploy/worker/gpu_model_runner.py (63.6363%): Missing lines 1656-1657,1668-1669
fastdeploy/model_executor/layers/attention/dsa_attention_backend.py (92.3077%): Missing lines 386,411
total_num_lines: 130, total_num_violations: 89, total_percent_covered: 31
Process completed with exit code 9.

根因摘要: 新增 MLA_SWA 代码覆盖率不足
测试阶段已通过，失败发生在 Verify Code Coverage Threshold (80%)。本 PR 新增/修改的 MLA_SWA 路径、KV cache shape/dtype 分支和 DSA 静态 attention 分支缺少对应测试覆盖，导致 diff-cover 统计 130 个变更行中 89 行未覆盖，总 diff 覆盖率仅 31%。

修复建议:

补充针对 fastdeploy/model_executor/models/deepseek_v3.py 的 MLA_SWA 单测，覆盖 get_swa_indexer_top_k 的 encoder/decoder 分支和 forward_swa_static 调用 DSAAttentionBackend.forward_static 的路径。
扩展 tests/layers/test_mla_attention_kv_cache.py 或新增用例，覆盖 MLAAttentionBackend.get_kv_cache_shape() 在 window_attn_skip_freq[layer_id] == 1 时的 fp8 cache shape 分支。
扩展 GPU runner 初始化相关测试，覆盖 fastdeploy/worker/gpu_model_runner.py 中 MLA SWA 层选择 uint8 cache 的分支；同时补齐 DSAAttentionBackend.forward_static 的 q head padding prefill/decode 分支。

关联变更: fastdeploy/model_executor/models/deepseek_v3.py、fastdeploy/model_executor/layers/attention/mla_attention_backend.py、fastdeploy/model_executor/layers/attention/dsa_attention_backend.py、fastdeploy/worker/gpu_model_runner.py

support mla_swa

9ba4dba

chang-wenbin had a problem deploying to Metax_ci June 15, 2026 12:02 — with GitHub Actions Failure

This comment was marked as outdated.

Sign in to view

fix bug

11d9841

chang-wenbin had a problem deploying to Metax_ci June 15, 2026 13:08 — with GitHub Actions Failure

chang-wenbin requested a review from PaddlePaddle-bot June 15, 2026 13:16

chang-wenbin changed the title ~~Support MLA_SWA functionality~~ [Models] Support MLA_SWA functionality Jun 15, 2026

This comment was marked as outdated.

Sign in to view

fix unitest

cc20efd

chang-wenbin had a problem deploying to Metax_ci June 15, 2026 17:32 — with GitHub Actions Failure

PaddlePaddle-bot reviewed Jun 15, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Models] Support MLA_SWA functionality#8049

[Models] Support MLA_SWA functionality#8049
chang-wenbin wants to merge 3 commits into
PaddlePaddle:developfrom
chang-wenbin:mla-swa

chang-wenbin commented Jun 15, 2026 •

edited

Loading

Uh oh!

This comment was marked as outdated.

Uh oh!

codecov-commenter commented Jun 15, 2026 •

edited

Loading

Uh oh!

This comment was marked as outdated.

Uh oh!

PaddlePaddle-bot left a comment

Uh oh!

PaddlePaddle-bot commented Jun 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

chang-wenbin commented Jun 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Usage or Command

Accuracy Tests

Checklist

Uh oh!

This comment was marked as outdated.

Uh oh!

codecov-commenter commented Jun 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

This comment was marked as outdated.

Uh oh!

PaddlePaddle-bot left a comment

Choose a reason for hiding this comment

总体评价

问题

PR 规范检查

历史 Finding 状态

Uh oh!

PaddlePaddle-bot commented Jun 16, 2026

1 Required任务 : 9/10 通过

2 失败详情

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

chang-wenbin commented Jun 15, 2026 •

edited

Loading

codecov-commenter commented Jun 15, 2026 •

edited

Loading