[Models] Support MLA_SWA functionality#8049
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## develop #8049 +/- ##
==========================================
Coverage ? 67.47%
==========================================
Files ? 475
Lines ? 66739
Branches ? 10293
==========================================
Hits ? 45031
Misses ? 18831
Partials ? 2877
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
PaddlePaddle-bot
left a comment
There was a problem hiding this comment.
🤖 Paddle-CI-Agent | pr_review |
2026-06-16 02:13:02
总体评价
本轮按风险优先审查了 MLA_SWA 的 attention 前向、SWA indexer、KV cache 分配和新增测试。当前没有新增阻断级 inline finding,但这次改动覆盖 DeepSeekV3 MLA/DSA 的核心前向路径和 KV cache layout,PR 仍缺少能证明 MLA_SWA 语义正确性的精度/回归测试;建议合入前补齐。
问题
- 🟡 建议:请补充 MLA_SWA 的真实语义/精度回归覆盖。当前测试主要验证 mock 后的 DSA 输出 shape 和默认 MLA cache shape,没有覆盖
window_attn_skip_freq命中层时的forward_swa_static()、SWA top-k indexer、prefill/decode/mixed batch 输出,尤其没有与非 SWA/参考实现对齐的 logits 或 attention 输出证据。这个 PR 改了模型核心 attention 与 KV cache dtype/shape,建议至少补充 SWA layer 的 prefill、decode、mixed batch 用例,以及sliding_window边界和 prefix/chunked prefill 场景的对齐结果。
PR 规范检查
标题 [Models] Support MLA_SWA functionality 符合 Tag 规范,PR 描述结构完整。Accuracy Tests 目前为 N/A,但本 PR 属于模型前向语义变更,建议在描述中补充实际精度/对齐结果或明确可复现的验证命令。
历史 Finding 状态
已复查历史未解决项:f_d5017472、f_7eff4a88、f_28e9c24f、f_f824c9a1、f_2ac6ab5a 在当前 diff/代码中仍未完全修复;f_c57369dc 对应的非 SWA 层 KV cache 量化配置被清空问题已通过在 else 分支内恢复 quant_config.kv_cache_quant_type 修复。本轮不重复发送这些历史问题的 inline comment。
CI报告基于以下代码生成(30分钟更新一次): 1 Required任务 : 9/10 通过
2 失败详情🔴 Run FastDeploy Unit Tests and Coverage / run_tests_with_coverage — PR问题(置信度: 高)分析器: 通用分析(fallback)
关键日志:
修复建议:
关联变更: |
Motivation
Support MLA sliding window attention (MLA_SWA) functionality for DeepSeek V3 style MLA layers.
Modifications
fastdeploy/model_executor/models/deepseek_v3.py.DSAAttentionBackend.forward_static()and reuse it for DSA/MLA SWA FlashMLA calls.window_attn_skip_freq.Usage or Command
N/A
Accuracy Tests
N/A
Checklist
[FDConfig],[APIServer],[Engine],[Scheduler],[PD Disaggregation],[Executor],[Graph Optimization],[Speculative Decoding],[RL],[Models],[Quantization],[Loader],[OP],[KVCache],[DataProcessor],[BugFix],[Docs],[CI],[Optimization],[Feature],[Benchmark],[Others],[XPU],[HPU],[GCU],[DCU],[Iluvatar],[Metax]]pre-commitbefore commit.releasebranch, make sure the PR has been submitted to thedevelopbranch, then cherry-pick it to thereleasebranch with the[Cherry-Pick]PR tag.