[Refactor] Remove redundant code in MLA cache management#8050
[Refactor] Remove redundant code in MLA cache management#8050HayzelHan wants to merge 4 commits into
Conversation
|
|
PaddlePaddle-bot
left a comment
There was a problem hiding this comment.
🤖 Paddle-CI-Agent | pr_review |
2026-06-16 18:26:27
📋 Review 摘要
PR 概述:重命名 MLA write-cache 自定义 op,并移除 prefill/decode 两套冗余注册入口。
变更范围:custom_ops/gpu_ops/append_attn/、custom_ops/gpu_ops/cpp_extensions.cc、fastdeploy/model_executor/layers/attention/
影响面 Tag:[OP]
问题
未发现新的阻塞性问题。PR 规范问题在下面章节报,不在这里重复。
历史 Findings 修复情况
| Finding | 问题 | 状态 |
|---|---|---|
| F1 | 这里只保留 mla_write_cache,但旧的 MLA write-cache 消费者还没有全部迁移。 |
📝 PR 规范检查
标题使用了非官方 Tag [Refactor],且 PR 描述缺少 FastDeploy 要求的完整模板。
标题建议(可直接复制):
[OP] Remove redundant code in MLA cache management描述建议(可直接复制):
展开完整 PR 描述模板
## Motivation
Remove redundant metadata parameter passing in MLA cache management and keep prefill cache shape metadata derived from the input tensors directly.
## Modifications
- Update `custom_ops/gpu_ops/append_attn/mla_cache_kernel.cu` so `PrefillMLAWriteCache` derives `max_blocks_per_seq`, `num_tokens`, `block_size`, `kv_num_heads`, and head dimensions from `block_tables`, `kv_nope`, and `kv_cache`.
- Remove the `meta_data` argument from the BF16/FP16 prefill write-cache dispatch calls.
## Usage or Command
N/A
## Accuracy Tests
N/A(不涉及模型精度逻辑变更)
## Checklist
- [x] Add at least a tag in the PR title.
- Tag list: [`[FDConfig]`,`[APIServer]`,`[Engine]`, `[Scheduler]`, `[PD Disaggregation]`, `[Executor]`, `[Graph Optimization]`, `[Speculative Decoding]`, `[RL]`, `[Models]`, `[Quantization]`, `[Loader]`, `[OP]`, `[KVCache]`, `[DataProcessor]`, `[BugFix]`, `[Docs]`, `[CI]`, `[Optimization]`, `[Feature]`, `[Benchmark]`, `[Others]`, `[XPU]`, `[HPU]`, `[GCU]`, `[DCU]`, `[Iluvatar]`, `[Metax]`]
- You can add new tags based on the PR content, but the semantics must be clear.
- [ ] Format your code, run `pre-commit` before commit.
- [ ] Add unit tests. Please write the reason in this PR if no unit tests.
- [ ] Provide accuracy results.
- [ ] If the current PR is submitting to the `release` branch, make sure the PR has been submitted to the `develop` branch, then cherry-pick it to the `release` branch with the `[Cherry-Pick]` PR tag.总体评价
本轮按风险优先检查了 custom op 注册、Python 调用侧分派、MLA prefill/decode 写缓存路径和历史 finding 状态。当前未发现新增问题,但历史 finding 仍未修复:forward_extend / forward_decode 仍引用旧 op 名,而本 PR 只保留并导入了 mla_write_cache。
No description provided.