Support swa mha#8053
Conversation
PaddlePaddle-bot
left a comment
There was a problem hiding this comment.
🤖 Paddle-CI-Agent | pr_review |
2026-06-16 14:07:52
📋 Review 摘要
PR 概述:尝试为 DeepSeek/DSA/MLA attention 增加 SWA MHA 支持,并调整 MLA baseline mask。
变更范围:fastdeploy/model_executor/layers/attention/、fastdeploy/model_executor/models/deepseek_v3.py、DSA layer tests
影响面 Tag:[Models] [OP]
问题
| 级别 | 文件 | 概述 |
|---|---|---|
| 🔴 Bug | fastdeploy/model_executor/layers/attention/mla_attention_backend.py:878 |
mha_baseline 新增 window_size 但调用处固定传 -1,Blackwell prefill 仍不会执行 SWA mask |
| 🟡 建议 | fastdeploy/model_executor/models/deepseek_v3.py:610 |
新增 SWA sparse attention 路径被 if False 完全屏蔽,运行时不可达 |
| 🟡 建议 | tests/layers/test_dsa_attention_backend.py |
修改 DSA attention 行为的同时删除了 DSA backend 单测,缺少回归保护 |
📝 PR 规范检查
标题缺少官方 Tag,描述仍是模板占位内容,Checklist 也未按实际情况勾选。
标题建议(可直接复制):
[Feature] Support SWA MHA attention
PR 描述建议(点击展开,可直接复制)
## Motivation
Support sliding-window attention (SWA) for MHA/DSA attention paths used by DeepSeek/DSA models.
## Modifications
- Add SWA indexer top-k helper and a static DSA attention forwarding path.
- Build DSA prefill KV from compressed KV and RoPE key, and pad query heads to FlashMLA's head alignment.
- Add `window_size` handling to `MLAAttentionBackend.mha_baseline`.
- Wire DeepSeek DSA attention to pass `indexer_top_k` through the DSA backend.
- Remove `tests/layers/test_dsa_attention_backend.py`.
## Usage or Command
N/A
## Accuracy Tests
N/A(当前 PR 未提供精度结果)
## Checklist
- [x] Add at least a tag in the PR title.
- Tag list: [`[FDConfig]`,`[APIServer]`,`[Engine]`, `[Scheduler]`, `[PD Disaggregation]`, `[Executor]`, `[Graph Optimization]`, `[Speculative Decoding]`, `[RL]`, `[Models]`, `[Quantization]`, `[Loader]`, `[OP]`, `[KVCache]`, `[DataProcessor]`, `[BugFix]`, `[Docs]`, `[CI]`, `[Optimization]`, `[Feature]`, `[Benchmark]`, `[Others]`, `[XPU]`, `[HPU]`, `[GCU]`, `[DCU]`, `[Iluvatar]`, `[Metax]`]
- You can add new tags based on the PR content, but the semantics must be clear.
- [ ] Format your code, run `pre-commit` before commit.
- [ ] Add unit tests. Please write the reason in this PR if no unit tests.
- [ ] Provide accuracy results.
- [x] If the current PR is submitting to the `release` branch, make sure the PR has been submitted to the `develop` branch, then cherry-pick it to the `release` branch with the `[Cherry-Pick]` PR tag.总体评价
本轮按风险优先审查了 attention 前向语义、DeepSeek 调用链、测试覆盖和 PR 规范。当前实现仍有 SWA mask 未接入到唯一 Blackwell baseline 调用的阻塞问题,建议修复后补充精度/回归测试再合入。
| forward_meta.cu_seqlens_q, | ||
| forward_meta.cu_seqlens_k, | ||
| causal=self.causal, | ||
| window_size=-1, |
There was a problem hiding this comment.
🔴 Bug 新增的 mha_baseline(..., window_size, ...) 在唯一调用处固定传 -1,SWA mask 永远不会生效。
Attention 已经按 layer_types 设置了 layer.sliding_window,但这里没有读取它;因此 Blackwell prefill 仍走全量 causal attention,mha_baseline 里新增的 window_size > 0 分支不可达,SWA MHA 输出会和预期滑窗语义不一致。
建议修复方式:在 forward_mixed 里按当前层传入窗口,例如:
window_size = getattr(layer, "sliding_window", 0) or -1
fmha_out = MLAAttentionBackend.mha_baseline(..., window_size=window_size, **self.flash_attn_kwargs)并补充滑窗层的 prefill 对齐用例。
| else: | ||
| attn_out = fmqa_out | ||
|
|
||
| if False: |
There was a problem hiding this comment.
🟡 建议 这段新加的 SWA sparse attention 分支被 if False 固定屏蔽,运行时永远不会执行。
当前 PR 新增的 get_swa_indexer_top_k、DSAAttentionBackend.forward_static 调用和 512 窗口逻辑都只在这个死分支里使用;实际 DeepseekV3MLAAttention.forward 仍按上面的普通 MLA 路径返回,无法覆盖或验证这里的 SWA 行为。
建议修复方式:要么删除这段未启用代码;要么用明确的模型配置/attention backend 开关接入,并保证 prefill/decode 混合路径和精度测试一起提交。
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## develop #8053 +/- ##
==========================================
Coverage ? 67.34%
==========================================
Files ? 475
Lines ? 66711
Branches ? 10289
==========================================
Hits ? 44926
Misses ? 18919
Partials ? 2866
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
CI报告基于以下代码生成(30分钟更新一次): 1 Required任务 : 9/10 通过
2 失败详情🔴 Run FastDeploy Unit Tests and Coverage / run_tests_with_coverage — PR问题(置信度: 高)分析器: 通用分析(fallback) 失败用例: 覆盖率阈值校验失败(单测通过)
关键日志:
本次 PR 新增 SWA/DSA 相关逻辑,覆盖率报告列出的违规行落在这些新增/修改处: 修复建议:
关联变更: |
Motivation
Modifications
Usage or Command
Accuracy Tests
Checklist
[FDConfig],[APIServer],[Engine],[Scheduler],[PD Disaggregation],[Executor],[Graph Optimization],[Speculative Decoding],[RL],[Models],[Quantization],[Loader],[OP],[KVCache],[DataProcessor],[BugFix],[Docs],[CI],[Optimization],[Feature],[Benchmark],[Others],[XPU],[HPU],[GCU],[DCU],[Iluvatar],[Metax]]pre-commitbefore commit.releasebranch, make sure the PR has been submitted to thedevelopbranch, then cherry-pick it to thereleasebranch with the[Cherry-Pick]PR tag.