[Refactor] Remove redundant code in MLA cache management by HayzelHan · Pull Request #8050 · PaddlePaddle/FastDeploy

HayzelHan · 2026-06-15T13:00:37Z

No description provided.

CLAassistant · 2026-06-15T13:01:06Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
1 out of 2 committers have signed the CLA.

✅ zhoutianzi666
❌ HayzelHan
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

PaddlePaddle-bot

🤖 Paddle-CI-Agent | pr_review | 2026-06-16 18:26:27

📋 Review 摘要

PR 概述：重命名 MLA write-cache 自定义 op，并移除 prefill/decode 两套冗余注册入口。
变更范围：custom_ops/gpu_ops/append_attn/、custom_ops/gpu_ops/cpp_extensions.cc、fastdeploy/model_executor/layers/attention/
影响面 Tag：[OP]

问题

未发现新的阻塞性问题。PR 规范问题在下面章节报，不在这里重复。

历史 Findings 修复情况

Finding	问题	状态
F1	这里只保留 `mla_write_cache`，但旧的 MLA write-cache 消费者还没有全部迁移。	⚠️ 仍存在

📝 PR 规范检查

标题使用了非官方 Tag [Refactor]，且 PR 描述缺少 FastDeploy 要求的完整模板。

标题建议（可直接复制）：

[OP] Remove redundant code in MLA cache management

描述建议（可直接复制）：

展开完整 PR 描述模板

## Motivation
Remove redundant metadata parameter passing in MLA cache management and keep prefill cache shape metadata derived from the input tensors directly.

## Modifications
- Update `custom_ops/gpu_ops/append_attn/mla_cache_kernel.cu` so `PrefillMLAWriteCache` derives `max_blocks_per_seq`, `num_tokens`, `block_size`, `kv_num_heads`, and head dimensions from `block_tables`, `kv_nope`, and `kv_cache`.
- Remove the `meta_data` argument from the BF16/FP16 prefill write-cache dispatch calls.

## Usage or Command
N/A

## Accuracy Tests
N/A（不涉及模型精度逻辑变更）

## Checklist

- [x] Add at least a tag in the PR title.
  - Tag list: [`[FDConfig]`,`[APIServer]`,`[Engine]`, `[Scheduler]`, `[PD Disaggregation]`, `[Executor]`, `[Graph Optimization]`, `[Speculative Decoding]`, `[RL]`, `[Models]`, `[Quantization]`, `[Loader]`, `[OP]`, `[KVCache]`, `[DataProcessor]`, `[BugFix]`, `[Docs]`, `[CI]`, `[Optimization]`, `[Feature]`, `[Benchmark]`, `[Others]`, `[XPU]`, `[HPU]`, `[GCU]`, `[DCU]`, `[Iluvatar]`, `[Metax]`]
  - You can add new tags based on the PR content, but the semantics must be clear.
- [ ] Format your code, run `pre-commit` before commit.
- [ ] Add unit tests. Please write the reason in this PR if no unit tests.
- [ ] Provide accuracy results.
- [ ] If the current PR is submitting to the `release` branch, make sure the PR has been submitted to the `develop` branch, then cherry-pick it to the `release` branch with the `[Cherry-Pick]` PR tag.

总体评价

本轮按风险优先检查了 custom op 注册、Python 调用侧分派、MLA prefill/decode 写缓存路径和历史 finding 状态。当前未发现新增问题，但历史 finding 仍未修复：forward_extend / forward_decode 仍引用旧 op 名，而本 PR 只保留并导入了 mla_write_cache。

[Refactor] Remove redundant code in MLA cache management

b9ce07b

HayzelHan had a problem deploying to Metax_ci June 15, 2026 13:00 — with GitHub Actions Failure

This comment was marked as outdated.

Sign in to view

[OP] simplify code

4e8ed01

HayzelHan had a problem deploying to Metax_ci June 16, 2026 08:47 — with GitHub Actions Error

change inferface name

ac4fe40

HayzelHan had a problem deploying to Metax_ci June 16, 2026 08:53 — with GitHub Actions Failure

This comment was marked as outdated.

Sign in to view

reformat

6f05b47

HayzelHan had a problem deploying to Metax_ci June 16, 2026 09:52 — with GitHub Actions Failure

PaddlePaddle-bot reviewed Jun 16, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Refactor] Remove redundant code in MLA cache management#8050

[Refactor] Remove redundant code in MLA cache management#8050
HayzelHan wants to merge 4 commits into
PaddlePaddle:developfrom
HayzelHan:refactor-ops

HayzelHan commented Jun 15, 2026

Uh oh!

CLAassistant commented Jun 15, 2026 •

edited

Loading

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

PaddlePaddle-bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

HayzelHan commented Jun 15, 2026

Uh oh!

CLAassistant commented Jun 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

PaddlePaddle-bot left a comment

Choose a reason for hiding this comment

📋 Review 摘要

问题

历史 Findings 修复情况

📝 PR 规范检查

总体评价

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

CLAassistant commented Jun 15, 2026 •

edited

Loading