Skip to content

Support swa mha#8053

Open
zhoutianzi666 wants to merge 5 commits into
PaddlePaddle:developfrom
zhoutianzi666:support_swa_mha
Open

Support swa mha#8053
zhoutianzi666 wants to merge 5 commits into
PaddlePaddle:developfrom
zhoutianzi666:support_swa_mha

Conversation

@zhoutianzi666

Copy link
Copy Markdown
Collaborator

Motivation

💡 If this PR is a Cherry Pick, the PR title needs to follow the format by adding the [Cherry-Pick] label at the very beginning and appending the original PR ID at the end. For example, [Cherry-Pick][CI] Add check trigger and logic(#5191)

💡 如若此PR是Cherry Pick,PR标题需遵循格式,在最开始加上[Cherry-Pick]标签,以及最后面加上原PR ID,例如[Cherry-Pick][CI] Add check trigger and logic(#5191)

Modifications

Usage or Command

Accuracy Tests

Checklist

  • Add at least a tag in the PR title.
    • Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
    • You can add new tags based on the PR content, but the semantics must be clear.
  • Format your code, run pre-commit before commit.
  • Add unit tests. Please write the reason in this PR if no unit tests.
  • Provide accuracy results.
  • If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

@PaddlePaddle-bot PaddlePaddle-bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Paddle-CI-Agent | pr_review | 2026-06-16 14:07:52

📋 Review 摘要

PR 概述:尝试为 DeepSeek/DSA/MLA attention 增加 SWA MHA 支持,并调整 MLA baseline mask。
变更范围fastdeploy/model_executor/layers/attention/fastdeploy/model_executor/models/deepseek_v3.py、DSA layer tests
影响面 Tag[Models] [OP]

问题

级别 文件 概述
🔴 Bug fastdeploy/model_executor/layers/attention/mla_attention_backend.py:878 mha_baseline 新增 window_size 但调用处固定传 -1,Blackwell prefill 仍不会执行 SWA mask
🟡 建议 fastdeploy/model_executor/models/deepseek_v3.py:610 新增 SWA sparse attention 路径被 if False 完全屏蔽,运行时不可达
🟡 建议 tests/layers/test_dsa_attention_backend.py 修改 DSA attention 行为的同时删除了 DSA backend 单测,缺少回归保护

📝 PR 规范检查

标题缺少官方 Tag,描述仍是模板占位内容,Checklist 也未按实际情况勾选。

标题建议(可直接复制):

  • [Feature] Support SWA MHA attention
PR 描述建议(点击展开,可直接复制)
## Motivation
Support sliding-window attention (SWA) for MHA/DSA attention paths used by DeepSeek/DSA models.

## Modifications
- Add SWA indexer top-k helper and a static DSA attention forwarding path.
- Build DSA prefill KV from compressed KV and RoPE key, and pad query heads to FlashMLA's head alignment.
- Add `window_size` handling to `MLAAttentionBackend.mha_baseline`.
- Wire DeepSeek DSA attention to pass `indexer_top_k` through the DSA backend.
- Remove `tests/layers/test_dsa_attention_backend.py`.

## Usage or Command
N/A

## Accuracy Tests
N/A(当前 PR 未提供精度结果)

## Checklist

- [x] Add at least a tag in the PR title.
  - Tag list: [`[FDConfig]`,`[APIServer]`,`[Engine]`, `[Scheduler]`, `[PD Disaggregation]`, `[Executor]`, `[Graph Optimization]`, `[Speculative Decoding]`, `[RL]`, `[Models]`, `[Quantization]`, `[Loader]`, `[OP]`, `[KVCache]`, `[DataProcessor]`, `[BugFix]`, `[Docs]`, `[CI]`, `[Optimization]`, `[Feature]`, `[Benchmark]`, `[Others]`, `[XPU]`, `[HPU]`, `[GCU]`, `[DCU]`, `[Iluvatar]`, `[Metax]`]       
  - You can add new tags based on the PR content, but the semantics must be clear.
- [ ] Format your code, run `pre-commit` before commit.
- [ ] Add unit tests. Please write the reason in this PR if no unit tests.
- [ ] Provide accuracy results.
- [x] If the current PR is submitting to the `release` branch, make sure the PR has been submitted to the `develop` branch, then cherry-pick it to the `release` branch with the `[Cherry-Pick]` PR tag.

总体评价

本轮按风险优先审查了 attention 前向语义、DeepSeek 调用链、测试覆盖和 PR 规范。当前实现仍有 SWA mask 未接入到唯一 Blackwell baseline 调用的阻塞问题,建议修复后补充精度/回归测试再合入。

forward_meta.cu_seqlens_q,
forward_meta.cu_seqlens_k,
causal=self.causal,
window_size=-1,

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Bug 新增的 mha_baseline(..., window_size, ...) 在唯一调用处固定传 -1,SWA mask 永远不会生效。

Attention 已经按 layer_types 设置了 layer.sliding_window,但这里没有读取它;因此 Blackwell prefill 仍走全量 causal attention,mha_baseline 里新增的 window_size > 0 分支不可达,SWA MHA 输出会和预期滑窗语义不一致。

建议修复方式:在 forward_mixed 里按当前层传入窗口,例如:

window_size = getattr(layer, "sliding_window", 0) or -1
fmha_out = MLAAttentionBackend.mha_baseline(..., window_size=window_size, **self.flash_attn_kwargs)

并补充滑窗层的 prefill 对齐用例。

else:
attn_out = fmqa_out

if False:

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 建议 这段新加的 SWA sparse attention 分支被 if False 固定屏蔽,运行时永远不会执行。

当前 PR 新增的 get_swa_indexer_top_kDSAAttentionBackend.forward_static 调用和 512 窗口逻辑都只在这个死分支里使用;实际 DeepseekV3MLAAttention.forward 仍按上面的普通 MLA 路径返回,无法覆盖或验证这里的 SWA 行为。

建议修复方式:要么删除这段未启用代码;要么用明确的模型配置/attention backend 开关接入,并保证 prefill/decode 混合路径和精度测试一起提交。

@codecov-commenter

codecov-commenter commented Jun 16, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 15.00000% with 51 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (develop@02a0042). Learn more about missing BASE report.

Files with missing lines Patch % Lines
fastdeploy/model_executor/models/deepseek_v3.py 20.00% 24 Missing ⚠️
...executor/layers/attention/dsa_attention_backend.py 8.00% 23 Missing ⚠️
...executor/layers/attention/mla_attention_backend.py 20.00% 4 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             develop    #8053   +/-   ##
==========================================
  Coverage           ?   67.34%           
==========================================
  Files              ?      475           
  Lines              ?    66711           
  Branches           ?    10289           
==========================================
  Hits               ?    44926           
  Misses             ?    18919           
  Partials           ?     2866           
Flag Coverage Δ
GPU 77.29% <15.00%> (?)
XPU 6.97% <0.00%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@PaddlePaddle-bot

PaddlePaddle-bot commented Jun 16, 2026

Copy link
Copy Markdown

🤖 Paddle-CI-Agent | ci_status_monitor | 2026-06-18 18:41:34 UTC+08:00

CI报告基于以下代码生成(30分钟更新一次):
PR commit: 6daadc4 | Merge base: 02a0042 (branch: develop)


1 Required任务 : 9/10 通过

总执行(rerun次数) 总任务 ✅ 通过 ❌ 失败 ⏳ 运行中 ⏸️ 等待中 跳过
41(0) 41 36 5 0 0 0
任务 错误类型 置信度 日志
Run FastDeploy Unit Tests and Coverage / run_tests_with_coverage PR问题:diff覆盖率仅15% Job

2 失败详情

🔴 Run FastDeploy Unit Tests and Coverage / run_tests_with_coverage — PR问题(置信度: 高)

分析器: 通用分析(fallback)

失败用例: 覆盖率阈值校验失败(单测通过)

用例 错误摘要
Verify Code Coverage Threshold (80%) COVERAGE_EXIT_CODE=9,diff 覆盖率 15%,51/60 行未覆盖

关键日志:

TEST_EXIT_CODE: 0
COVERAGE_EXIT_CODE: 9
GPU Patch Coverage Details:
total_num_lines=60, total_num_violations=51, total_percent_covered=15
fastdeploy/model_executor/models/deepseek_v3.py: 24 violation lines
fastdeploy/model_executor/layers/attention/dsa_attention_backend.py: 23 violation lines
fastdeploy/model_executor/layers/attention/mla_attention_backend.py: 4 violation lines
  • 根因摘要: PR新增/修改代码缺少覆盖率

本次 PR 新增 SWA/DSA 相关逻辑,覆盖率报告列出的违规行落在这些新增/修改处:deepseek_v3.py:92-134,610dsa_attention_backend.py:338-431mla_attention_backend.py:1195-1200。日志显示单测阶段已通过,但覆盖率校验发现 diff 覆盖率只有 15%,低于 80% 阈值;PR 还删除了 tests/layers/test_dsa_attention_backend.py,当前没有对应新测试补足这些路径。

修复建议:

  1. get_swa_indexer_top_k / get_swa_indexer_top_k_kernel 增加覆盖 encoder/decode 两类索引计算的单测,覆盖 deepseek_v3.py:92-134
  2. DSAAttentionBackend.forward_static 增加 prefill、decode、ceil64_num_heads != q_num_heads 和 cache shape 路径测试,覆盖 dsa_attention_backend.py:338-431
  3. mha_baseline(..., window_size, ...) 增加 window_size > 0window_size <= 0 的窗口 mask case,覆盖 mla_attention_backend.py:1195-1200
  4. 如果删除 tests/layers/test_dsa_attention_backend.py 是预期行为,需要提交等价或更高覆盖的新测试;否则恢复并更新原测试以适配新接口。

关联变更: fastdeploy/model_executor/models/deepseek_v3.pyfastdeploy/model_executor/layers/attention/dsa_attention_backend.pyfastdeploy/model_executor/layers/attention/mla_attention_backend.pytests/layers/test_dsa_attention_backend.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants