Support swa mha by zhoutianzi666 · Pull Request #8053 · PaddlePaddle/FastDeploy

zhoutianzi666 · 2026-06-16T05:51:44Z

Motivation

💡 If this PR is a Cherry Pick, the PR title needs to follow the format by adding the [Cherry-Pick] label at the very beginning and appending the original PR ID at the end. For example, [Cherry-Pick][CI] Add check trigger and logic(#5191)

💡 如若此PR是Cherry Pick，PR标题需遵循格式，在最开始加上[Cherry-Pick]标签，以及最后面加上原PR ID，例如[Cherry-Pick][CI] Add check trigger and logic(#5191)

Modifications

Usage or Command

Accuracy Tests

Checklist

Add at least a tag in the PR title.
- Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
- You can add new tags based on the PR content, but the semantics must be clear.
Format your code, run pre-commit before commit.
Add unit tests. Please write the reason in this PR if no unit tests.
Provide accuracy results.
If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

PaddlePaddle-bot

🤖 Paddle-CI-Agent | pr_review | 2026-06-16 14:07:52

📋 Review 摘要

PR 概述：尝试为 DeepSeek/DSA/MLA attention 增加 SWA MHA 支持，并调整 MLA baseline mask。
变更范围：fastdeploy/model_executor/layers/attention/、fastdeploy/model_executor/models/deepseek_v3.py、DSA layer tests
影响面 Tag：[Models] [OP]

问题

级别	文件	概述
🔴 Bug	`fastdeploy/model_executor/layers/attention/mla_attention_backend.py:878`	`mha_baseline` 新增 `window_size` 但调用处固定传 `-1`，Blackwell prefill 仍不会执行 SWA mask
🟡 建议	`fastdeploy/model_executor/models/deepseek_v3.py:610`	新增 SWA sparse attention 路径被 `if False` 完全屏蔽，运行时不可达
🟡 建议	`tests/layers/test_dsa_attention_backend.py`	修改 DSA attention 行为的同时删除了 DSA backend 单测，缺少回归保护

📝 PR 规范检查

标题缺少官方 Tag，描述仍是模板占位内容，Checklist 也未按实际情况勾选。

标题建议（可直接复制）：

[Feature] Support SWA MHA attention

PR 描述建议（点击展开，可直接复制）

## Motivation
Support sliding-window attention (SWA) for MHA/DSA attention paths used by DeepSeek/DSA models.

## Modifications
- Add SWA indexer top-k helper and a static DSA attention forwarding path.
- Build DSA prefill KV from compressed KV and RoPE key, and pad query heads to FlashMLA's head alignment.
- Add `window_size` handling to `MLAAttentionBackend.mha_baseline`.
- Wire DeepSeek DSA attention to pass `indexer_top_k` through the DSA backend.
- Remove `tests/layers/test_dsa_attention_backend.py`.

## Usage or Command
N/A

## Accuracy Tests
N/A（当前 PR 未提供精度结果）

## Checklist

- [x] Add at least a tag in the PR title.
  - Tag list: [`[FDConfig]`,`[APIServer]`,`[Engine]`, `[Scheduler]`, `[PD Disaggregation]`, `[Executor]`, `[Graph Optimization]`, `[Speculative Decoding]`, `[RL]`, `[Models]`, `[Quantization]`, `[Loader]`, `[OP]`, `[KVCache]`, `[DataProcessor]`, `[BugFix]`, `[Docs]`, `[CI]`, `[Optimization]`, `[Feature]`, `[Benchmark]`, `[Others]`, `[XPU]`, `[HPU]`, `[GCU]`, `[DCU]`, `[Iluvatar]`, `[Metax]`]       
  - You can add new tags based on the PR content, but the semantics must be clear.
- [ ] Format your code, run `pre-commit` before commit.
- [ ] Add unit tests. Please write the reason in this PR if no unit tests.
- [ ] Provide accuracy results.
- [x] If the current PR is submitting to the `release` branch, make sure the PR has been submitted to the `develop` branch, then cherry-pick it to the `release` branch with the `[Cherry-Pick]` PR tag.

总体评价

本轮按风险优先审查了 attention 前向语义、DeepSeek 调用链、测试覆盖和 PR 规范。当前实现仍有 SWA mask 未接入到唯一 Blackwell baseline 调用的阻塞问题，建议修复后补充精度/回归测试再合入。

PaddlePaddle-bot · 2026-06-16T06:13:49Z

                    forward_meta.cu_seqlens_q,
                    forward_meta.cu_seqlens_k,
                    causal=self.causal,
+                    window_size=-1,


🔴 Bug 新增的 mha_baseline(..., window_size, ...) 在唯一调用处固定传 -1，SWA mask 永远不会生效。

Attention 已经按 layer_types 设置了 layer.sliding_window，但这里没有读取它；因此 Blackwell prefill 仍走全量 causal attention，mha_baseline 里新增的 window_size > 0 分支不可达，SWA MHA 输出会和预期滑窗语义不一致。

建议修复方式：在 forward_mixed 里按当前层传入窗口，例如：

window_size = getattr(layer, "sliding_window", 0) or -1 fmha_out = MLAAttentionBackend.mha_baseline(..., window_size=window_size, **self.flash_attn_kwargs)

并补充滑窗层的 prefill 对齐用例。

PaddlePaddle-bot · 2026-06-16T06:13:49Z

            else:
                attn_out = fmqa_out
+
+        if False:


🟡 建议 这段新加的 SWA sparse attention 分支被 if False 固定屏蔽，运行时永远不会执行。

当前 PR 新增的 get_swa_indexer_top_k、DSAAttentionBackend.forward_static 调用和 512 窗口逻辑都只在这个死分支里使用；实际 DeepseekV3MLAAttention.forward 仍按上面的普通 MLA 路径返回，无法覆盖或验证这里的 SWA 行为。

建议修复方式：要么删除这段未启用代码；要么用明确的模型配置/attention backend 开关接入，并保证 prefill/decode 混合路径和精度测试一起提交。

codecov-commenter · 2026-06-16T06:33:34Z

Codecov Report

❌ Patch coverage is 15.00000% with 51 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (develop@02a0042). Learn more about missing BASE report.

Files with missing lines	Patch %	Lines
fastdeploy/model_executor/models/deepseek_v3.py	20.00%	24 Missing ⚠️
...executor/layers/attention/dsa_attention_backend.py	8.00%	23 Missing ⚠️
...executor/layers/attention/mla_attention_backend.py	20.00%	4 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             develop    #8053   +/-   ##
==========================================
  Coverage           ?   67.34%           
==========================================
  Files              ?      475           
  Lines              ?    66711           
  Branches           ?    10289           
==========================================
  Hits               ?    44926           
  Misses             ?    18919           
  Partials           ?     2866

Flag	Coverage Δ
GPU	`77.29% <15.00%> (?)`
XPU	`6.97% <0.00%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

PaddlePaddle-bot · 2026-06-16T06:40:34Z

🤖 Paddle-CI-Agent | ci_status_monitor | 2026-06-18 18:41:34 UTC+08:00

CI报告基于以下代码生成（30分钟更新一次）:
PR commit: 6daadc4 | Merge base: 02a0042 (branch: develop)

1 Required任务 : 9/10 通过

总执行（rerun次数）	总任务	✅ 通过	❌ 失败	⏳ 运行中	⏸️ 等待中	跳过
41(0)	41	36	5	0	0	0

任务	错误类型	置信度	日志
`Run FastDeploy Unit Tests and Coverage / run_tests_with_coverage`	PR问题：diff覆盖率仅15%	高	Job

2 失败详情

🔴 Run FastDeploy Unit Tests and Coverage / run_tests_with_coverage — PR问题（置信度: 高）

分析器: 通用分析(fallback)

失败用例: 覆盖率阈值校验失败（单测通过）

用例	错误摘要
`Verify Code Coverage Threshold (80%)`	`COVERAGE_EXIT_CODE=9`，diff 覆盖率 15%，51/60 行未覆盖

关键日志:

TEST_EXIT_CODE: 0
COVERAGE_EXIT_CODE: 9
GPU Patch Coverage Details:
total_num_lines=60, total_num_violations=51, total_percent_covered=15
fastdeploy/model_executor/models/deepseek_v3.py: 24 violation lines
fastdeploy/model_executor/layers/attention/dsa_attention_backend.py: 23 violation lines
fastdeploy/model_executor/layers/attention/mla_attention_backend.py: 4 violation lines

根因摘要: PR新增/修改代码缺少覆盖率

本次 PR 新增 SWA/DSA 相关逻辑，覆盖率报告列出的违规行落在这些新增/修改处：deepseek_v3.py:92-134,610、dsa_attention_backend.py:338-431、mla_attention_backend.py:1195-1200。日志显示单测阶段已通过，但覆盖率校验发现 diff 覆盖率只有 15%，低于 80% 阈值；PR 还删除了 tests/layers/test_dsa_attention_backend.py，当前没有对应新测试补足这些路径。

修复建议:

为 get_swa_indexer_top_k / get_swa_indexer_top_k_kernel 增加覆盖 encoder/decode 两类索引计算的单测，覆盖 deepseek_v3.py:92-134。
为 DSAAttentionBackend.forward_static 增加 prefill、decode、ceil64_num_heads != q_num_heads 和 cache shape 路径测试，覆盖 dsa_attention_backend.py:338-431。
为 mha_baseline(..., window_size, ...) 增加 window_size > 0 和 window_size <= 0 的窗口 mask case，覆盖 mla_attention_backend.py:1195-1200。
如果删除 tests/layers/test_dsa_attention_backend.py 是预期行为，需要提交等价或更高覆盖的新测试；否则恢复并更新原测试以适配新接口。

关联变更: fastdeploy/model_executor/models/deepseek_v3.py、fastdeploy/model_executor/layers/attention/dsa_attention_backend.py、fastdeploy/model_executor/layers/attention/mla_attention_backend.py、tests/layers/test_dsa_attention_backend.py

zhoutianzi666 added 5 commits June 15, 2026 13:57

clean code

df50d04

clean code

5dfb98c

clean code

c41940b

clean code

5a4d660

commit

6daadc4

zhoutianzi666 had a problem deploying to Metax_ci June 16, 2026 05:51 — with GitHub Actions Failure

PaddlePaddle-bot suggested changes Jun 16, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support swa mha#8053

Support swa mha#8053
zhoutianzi666 wants to merge 5 commits into
PaddlePaddle:developfrom
zhoutianzi666:support_swa_mha

zhoutianzi666 commented Jun 16, 2026

Uh oh!

PaddlePaddle-bot left a comment

Uh oh!

PaddlePaddle-bot Jun 16, 2026

Uh oh!

PaddlePaddle-bot Jun 16, 2026

Uh oh!

codecov-commenter commented Jun 16, 2026 •

edited

Loading

Uh oh!

PaddlePaddle-bot commented Jun 16, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

zhoutianzi666 commented Jun 16, 2026

Motivation

Modifications

Usage or Command

Accuracy Tests

Checklist

Uh oh!

PaddlePaddle-bot left a comment

Choose a reason for hiding this comment

📋 Review 摘要

问题

📝 PR 规范检查

总体评价

Uh oh!

PaddlePaddle-bot Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

PaddlePaddle-bot Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

codecov-commenter commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

PaddlePaddle-bot commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

1 Required任务 : 9/10 通过

2 失败详情

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codecov-commenter commented Jun 16, 2026 •

edited

Loading

PaddlePaddle-bot commented Jun 16, 2026 •

edited

Loading