swa by zhoutianzi666 · Pull Request #8054 · PaddlePaddle/FastDeploy

zhoutianzi666 · 2026-06-16T05:53:51Z

Motivation

💡 If this PR is a Cherry Pick, the PR title needs to follow the format by adding the [Cherry-Pick] label at the very beginning and appending the original PR ID at the end. For example, [Cherry-Pick][CI] Add check trigger and logic(#5191)

💡 如若此PR是Cherry Pick，PR标题需遵循格式，在最开始加上[Cherry-Pick]标签，以及最后面加上原PR ID，例如[Cherry-Pick][CI] Add check trigger and logic(#5191)

Modifications

Usage or Command

Accuracy Tests

Checklist

Add at least a tag in the PR title.
- Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
- You can add new tags based on the PR content, but the semantics must be clear.
Format your code, run pre-commit before commit.
Add unit tests. Please write the reason in this PR if no unit tests.
Provide accuracy results.
If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

PaddlePaddle-bot

🤖 Paddle-CI-Agent | pr_review | 2026-06-16 14:07:55

📋 Review 摘要

PR 概述：调整 MLA attention baseline 的 causal / sliding-window mask 构造。
变更范围：fastdeploy/model_executor/layers/attention/mla_attention_backend.py
影响面 Tag：[OP]

问题

级别	文件	概述
🔴 Bug	`fastdeploy/model_executor/layers/attention/mla_attention_backend.py:878`	SM10 MLA mixed prefill 固定传 `window_size=-1`，SWA 层仍执行 full attention

📝 PR 规范检查

PR 标题缺少官方 Tag，描述仍保留模板占位内容且未填写核心章节。建议替换为以下内容。

标题建议（可直接复制）：

[OP] Fix MLA baseline SWA mask propagation

PR 描述建议（点击展开，可直接复制）

## Motivation
完善 MLA attention baseline 在 mixed prefill / prefix cache 场景下的 causal/SWA mask 构造，避免 cached KV 与当前 query 长度不一致时 mask 语义错误。

## Modifications
- 在 `MLAAttentionBackend.mha_baseline` 中新增 `window_size` 参数。
- 按 `kv_len - q_len + i + 1` 计算当前 query 可见的 KV 边界，并在 `window_size > 0` 时限制可见窗口。
- SM10 mixed prefill 分支调用 `mha_baseline` 时补充 `window_size` 形参。

## Usage or Command
N/A

## Accuracy Tests
N/A（当前 PR 未提供精度结果）

## Checklist

- [x] Add at least a tag in the PR title.
  - Tag list: [`[FDConfig]`,`[APIServer]`,`[Engine]`, `[Scheduler]`, `[PD Disaggregation]`, `[Executor]`, `[Graph Optimization]`, `[Speculative Decoding]`, `[RL]`, `[Models]`, `[Quantization]`, `[Loader]`, `[OP]`, `[KVCache]`, `[DataProcessor]`, `[BugFix]`, `[Docs]`, `[CI]`, `[Optimization]`, `[Feature]`, `[Benchmark]`, `[Others]`, `[XPU]`, `[HPU]`, `[GCU]`, `[DCU]`, `[Iluvatar]`, `[Metax]`]
  - You can add new tags based on the PR content, but the semantics must be clear.
- [ ] Format your code, run `pre-commit` before commit.
- [ ] Add unit tests. Please write the reason in this PR if no unit tests.
- [ ] Provide accuracy results.
- [ ] If the current PR is submitting to the `release` branch, make sure the PR has been submitted to the `develop` branch, then cherry-pick it to the `release` branch with the `[Cherry-Pick]` PR tag.

总体评价

代码变更方向是修正 MLA baseline mask，但调用点没有传递已有 SWA 层窗口，会让新增逻辑在目标场景失效。请先修复该语义传递；规范问题也需补齐。

PaddlePaddle-bot · 2026-06-16T06:10:27Z

                    forward_meta.cu_seqlens_q,
                    forward_meta.cu_seqlens_k,
                    causal=self.causal,
+                    window_size=-1,


🔴 Bug 这里把 window_size 固定为 -1，导致新增的 SWA mask 分支不会在唯一调用点生效。

Attention 层已经在 layer_types == "sliding_attention" 时保存了 layer.sliding_window，append_attn_backend 也会把正的 sliding_window 传给 attention kernel；但 SM10 的 MLA mixed prefill 会走这个 mha_baseline 分支，当前固定 -1 后即使该层是 sliding attention，也会执行 full causal attention，输出语义和 SWA 配置不一致。

建议修复方式：在调用前按现有 attention 约定计算当前层窗口，例如 sliding 层传 layer.sliding_window（或与 append backend 一致地优先使用 backend/model 级 self.sliding_window），非 sliding 层传 -1/0 表示 full attention，并补充 kv_len > q_len 的 mixed prefill + SWA 对齐测试。

codecov-commenter · 2026-06-16T06:39:37Z

Codecov Report

❌ Patch coverage is 20.00000% with 4 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (develop@02a0042). Learn more about missing BASE report.

Files with missing lines	Patch %	Lines
...executor/layers/attention/mla_attention_backend.py	20.00%	4 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             develop    #8054   +/-   ##
==========================================
  Coverage           ?   67.50%           
==========================================
  Files              ?      475           
  Lines              ?    66660           
  Branches           ?    10284           
==========================================
  Hits               ?    45000           
  Misses             ?    18789           
  Partials           ?     2871

Flag	Coverage Δ
GPU	`77.48% <20.00%> (?)`
XPU	`6.98% <0.00%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

PaddlePaddle-bot · 2026-06-16T06:41:31Z

🤖 Paddle-CI-Agent | ci_status_monitor | 2026-06-16 17:07:52

CI报告基于以下代码生成（30分钟更新一次）:
PR commit: 8222968 | Merge base: 02a0042 (branch: develop)

1 Required任务 : 9/10 通过

总执行（rerun次数）	总任务	✅ 通过	❌ 失败	⏳ 运行中	⏸️ 等待中	跳过
41(0)	41	36	4	0	1	0

任务	错误类型	置信度	日志
`Run FastDeploy Unit Tests and Coverage / run_tests_with_coverage`	环境问题：Mooncake 9003端口占用	高	Job

2 失败详情

🔴 Run FastDeploy Unit Tests and Coverage / run_tests_with_coverage — 环境问题（置信度: 高）

分析器: 通用分析(fallback)

失败用例:

用例	错误摘要
`tests/e2e/test_ernie_03b_pd_router_v1_rdma_global_cache.py::test_metrics_config`, `::test_chat_usage_stream`, `::test_multi_turn_global_cache_pooling`	fixture 启动 Mooncake Master 失败，抛出 `RuntimeError: Mooncake Master did not start`

关键日志:

E0616 14:38:13.400389 17718 rpc_service.cpp:234] Failed to start master admin server on port 9003
E0616 14:38:13.400404 17718 master.cpp:1104] Failed to start master admin server
RuntimeError: Mooncake Master did not start
tests/e2e/test_ernie_03b_pd_router_v1_rdma_global_cache.py:183: RuntimeError

根因摘要: Mooncake admin 9003端口占用

主测试任务中 420 个 pytest 文件成功 419 个，唯一失败文件是 tests/e2e/test_ernie_03b_pd_router_v1_rdma_global_cache.py。该文件在 session fixture 中先执行端口清理并启动 mooncake_master，但 Mooncake 进程日志显示默认 metrics_port=9003 的 admin server 启动失败，随后 fixture 在第 183 行抛出 RuntimeError，导致 3 个用例 setup 阶段失败。

本 PR 只修改 fastdeploy/model_executor/layers/attention/mla_attention_backend.py 的 MHA baseline/window_size 逻辑，未修改该 e2e fixture、Mooncake 启动参数或端口清理逻辑；失败发生在服务启动前，判断为 CI 环境端口冲突/隔离问题。

修复建议:

环境问题，请 rerun。
如果该错误持续复现，建议在 tests/e2e/test_ernie_03b_pd_router_v1_rdma_global_cache.py 中为 mooncake_master 显式指定独立 --metrics_port 并纳入 PORTS_TO_CLEAN。

关联变更: fastdeploy/model_executor/layers/attention/mla_attention_backend.py；与 Mooncake master 端口启动失败无直接关联。

commit

8222968

zhoutianzi666 had a problem deploying to Metax_ci June 16, 2026 05:53 — with GitHub Actions Failure

zhoutianzi666 changed the title ~~commit~~ swa Jun 16, 2026

PaddlePaddle-bot suggested changes Jun 16, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

swa#8054

swa#8054
zhoutianzi666 wants to merge 1 commit into
PaddlePaddle:developfrom
zhoutianzi666:support_swa_mha1

zhoutianzi666 commented Jun 16, 2026

Uh oh!

PaddlePaddle-bot left a comment

Uh oh!

PaddlePaddle-bot Jun 16, 2026

Uh oh!

codecov-commenter commented Jun 16, 2026 •

edited

Loading

Uh oh!

PaddlePaddle-bot commented Jun 16, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

zhoutianzi666 commented Jun 16, 2026

Motivation

Modifications

Usage or Command

Accuracy Tests

Checklist

Uh oh!

PaddlePaddle-bot left a comment

Choose a reason for hiding this comment

📋 Review 摘要

问题

📝 PR 规范检查

总体评价

Uh oh!

PaddlePaddle-bot Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

codecov-commenter commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

PaddlePaddle-bot commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

1 Required任务 : 9/10 通过

2 失败详情

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codecov-commenter commented Jun 16, 2026 •

edited

Loading

PaddlePaddle-bot commented Jun 16, 2026 •

edited

Loading