Skip to content

[Refactor] Remove redundant code in MLA cache management#8050

Open
HayzelHan wants to merge 4 commits into
PaddlePaddle:developfrom
HayzelHan:refactor-ops
Open

[Refactor] Remove redundant code in MLA cache management#8050
HayzelHan wants to merge 4 commits into
PaddlePaddle:developfrom
HayzelHan:refactor-ops

Conversation

@HayzelHan

Copy link
Copy Markdown

No description provided.

@CLAassistant

CLAassistant commented Jun 15, 2026

Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
1 out of 2 committers have signed the CLA.

✅ zhoutianzi666
❌ HayzelHan
You have signed the CLA already but the status is still pending? Let us recheck it.

PaddlePaddle-bot

This comment was marked as outdated.

PaddlePaddle-bot

This comment was marked as outdated.

@PaddlePaddle-bot PaddlePaddle-bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Paddle-CI-Agent | pr_review | 2026-06-16 18:26:27

📋 Review 摘要

PR 概述:重命名 MLA write-cache 自定义 op,并移除 prefill/decode 两套冗余注册入口。
变更范围custom_ops/gpu_ops/append_attn/custom_ops/gpu_ops/cpp_extensions.ccfastdeploy/model_executor/layers/attention/
影响面 Tag[OP]

问题

未发现新的阻塞性问题。PR 规范问题在下面章节报,不在这里重复。

历史 Findings 修复情况

Finding 问题 状态
F1 这里只保留 mla_write_cache,但旧的 MLA write-cache 消费者还没有全部迁移。 ⚠️ 仍存在

📝 PR 规范检查

标题使用了非官方 Tag [Refactor],且 PR 描述缺少 FastDeploy 要求的完整模板。

标题建议(可直接复制):

[OP] Remove redundant code in MLA cache management

描述建议(可直接复制):

展开完整 PR 描述模板
## Motivation
Remove redundant metadata parameter passing in MLA cache management and keep prefill cache shape metadata derived from the input tensors directly.

## Modifications
- Update `custom_ops/gpu_ops/append_attn/mla_cache_kernel.cu` so `PrefillMLAWriteCache` derives `max_blocks_per_seq`, `num_tokens`, `block_size`, `kv_num_heads`, and head dimensions from `block_tables`, `kv_nope`, and `kv_cache`.
- Remove the `meta_data` argument from the BF16/FP16 prefill write-cache dispatch calls.

## Usage or Command
N/A

## Accuracy Tests
N/A(不涉及模型精度逻辑变更)

## Checklist

- [x] Add at least a tag in the PR title.
  - Tag list: [`[FDConfig]`,`[APIServer]`,`[Engine]`, `[Scheduler]`, `[PD Disaggregation]`, `[Executor]`, `[Graph Optimization]`, `[Speculative Decoding]`, `[RL]`, `[Models]`, `[Quantization]`, `[Loader]`, `[OP]`, `[KVCache]`, `[DataProcessor]`, `[BugFix]`, `[Docs]`, `[CI]`, `[Optimization]`, `[Feature]`, `[Benchmark]`, `[Others]`, `[XPU]`, `[HPU]`, `[GCU]`, `[DCU]`, `[Iluvatar]`, `[Metax]`]
  - You can add new tags based on the PR content, but the semantics must be clear.
- [ ] Format your code, run `pre-commit` before commit.
- [ ] Add unit tests. Please write the reason in this PR if no unit tests.
- [ ] Provide accuracy results.
- [ ] If the current PR is submitting to the `release` branch, make sure the PR has been submitted to the `develop` branch, then cherry-pick it to the `release` branch with the `[Cherry-Pick]` PR tag.

总体评价

本轮按风险优先检查了 custom op 注册、Python 调用侧分派、MLA prefill/decode 写缓存路径和历史 finding 状态。当前未发现新增问题,但历史 finding 仍未修复:forward_extend / forward_decode 仍引用旧 op 名,而本 PR 只保留并导入了 mla_write_cache

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants