fp8 triton moe config. by xuanyuanminzheng · Pull Request #8036 · PaddlePaddle/FastDeploy

xuanyuanminzheng · 2026-06-10T08:02:08Z

Motivation

💡 If this PR is a Cherry Pick, the PR title needs to follow the format by adding the [Cherry-Pick] label at the very beginning and appending the original PR ID at the end. For example, [Cherry-Pick][CI] Add check trigger and logic(#5191)

💡 如若此PR是Cherry Pick，PR标题需遵循格式，在最开始加上[Cherry-Pick]标签，以及最后面加上原PR ID，例如[Cherry-Pick][CI] Add check trigger and logic(#5191)

Modifications

Usage or Command

Accuracy Tests

Checklist

Add at least a tag in the PR title.
- Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
- You can add new tags based on the PR content, but the semantics must be clear.
Format your code, run pre-commit before commit.
Add unit tests. Please write the reason in this PR if no unit tests.
Provide accuracy results.
If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

codecov-commenter · 2026-06-10T08:38:45Z

Codecov Report

❌ Patch coverage is 20.00000% with 4 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (develop@9431c4f). Learn more about missing BASE report.

Files with missing lines	Patch %	Lines
...el_executor/layers/moe/fused_moe_triton_backend.py	20.00%	3 Missing and 1 partial ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             develop    #8036   +/-   ##
==========================================
  Coverage           ?   67.66%           
==========================================
  Files              ?      470           
  Lines              ?    66115           
  Branches           ?    10189           
==========================================
  Hits               ?    44740           
  Misses             ?    18523           
  Partials           ?     2852

Flag	Coverage Δ
GPU	`77.76% <20.00%> (?)`
XPU	`7.01% <0.00%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

PaddlePaddle-bot

🤖 Paddle-CI-Agent | pr_review | 2026-06-11 00:03:39

📋 Review 摘要

PR 概述：调整 FP8 Triton MoE 在小 token 场景下的 tiling 配置。
变更范围：fastdeploy/model_executor/layers/moe/fused_moe_triton_backend.py
影响面 Tag：[OP] [Quantization] [Optimization]

问题

未发现阻塞性问题。PR 规范问题在下面章节报,不要在这里重复

📝 PR 规范检查

标题缺少官方 Tag，PR 描述仍是模板占位内容，未填写 Motivation、Modifications、Usage or Command、Accuracy Tests 和 Checklist 状态。建议替换为以下内容。

标题建议（可直接复制）：

[Optimization] Tune FP8 Triton MoE small-token config

PR 描述建议（点击展开，可直接复制）

## Motivation
Tune the Triton MoE FP8 kernel configuration for small-token workloads to use smaller M/N tiles and adjusted pipeline stages.

## Modifications
- In `fastdeploy/model_executor/layers/moe/fused_moe_triton_backend.py`, refine the `Wfp8Afp8MoEMethod.apply` config when `token_num <= E`:
  - `token_num <= 16`: use `BLOCK_SIZE_M=16`, `BLOCK_SIZE_N=64`, `num_stages=4`.
  - `token_num <= 32`: use `BLOCK_SIZE_M=32`, `BLOCK_SIZE_N=64`, `num_stages=3`.
  - `token_num > 32` and `token_num <= E`: keep `BLOCK_SIZE_M=64`, `BLOCK_SIZE_N=128`, and set `num_stages=3`.

## Usage or Command
N/A

## Accuracy Tests
N/A

## Checklist

- [x] Add at least a tag in the PR title.
  - Tag list: [`[FDConfig]`,`[APIServer]`,`[Engine]`, `[Scheduler]`, `[PD Disaggregation]`, `[Executor]`, `[Graph Optimization]`, `[Speculative Decoding]`, `[RL]`, `[Models]`, `[Quantization]`, `[Loader]`, `[OP]`, `[KVCache]`, `[DataProcessor]`, `[BugFix]`, `[Docs]`, `[CI]`, `[Optimization]`, `[Feature]`, `[Benchmark]`, `[Others]`, `[XPU]`, `[HPU]`, `[GCU]`, `[DCU]`, `[Iluvatar]`, `[Metax]`]
  - You can add new tags based on the PR content, but the semantics must be clear.
- [ ] Format your code, run `pre-commit` before commit.
- [ ] Add unit tests. Please write the reason in this PR if no unit tests.
- [ ] Provide accuracy results.
- [ ] If the current PR is submitting to the `release` branch, make sure the PR has been submitted to the `develop` branch, then cherry-pick it to the `release` branch with the `[Cherry-Pick]` PR tag.

总体评价

本轮按风险优先审查了 FP8 Triton MoE 小 token 配置分支，以及该配置传入 preprocess 和两次 fused MoE GEMM kernel 的调用关系；未确认到会改变路由、shape、scale 或写回语义的阻塞性问题。建议补齐 PR 标题和描述中的规范信息，尤其是性能/精度验证结果或未提供的原因。

PaddlePaddle-bot · 2026-06-11T15:07:59Z

🤖 Paddle-CI-Agent | ci_status_monitor | 2026-06-17 19:20:58

CI报告基于以下代码生成（30分钟更新一次）:
PR commit: c083797 | Merge base: 9431c4f (branch: develop)

1 Required任务 : 9/10 通过

总执行（rerun次数）	总任务	✅ 通过	❌ 失败	⏳ 运行中	⏸️ 等待中	跳过
41(0)	41	37	4	0	0	0

任务	错误类型	置信度	日志
`Run FastDeploy Unit Tests and Coverage / run_tests_with_coverage`	PR问题：diff coverage 未达 80%	高	Job

2 失败详情

🔴 Run FastDeploy Unit Tests and Coverage / run_tests_with_coverage — PR问题：diff coverage 未达 80%（置信度: 高）

分析器: ci_analyze_unittest_fastdeploy

失败用例: 无单测用例失败证据，失败点为 PR diff coverage 阈值检查

用例	错误摘要
`fastdeploy/model_executor/layers/moe/fused_moe_triton_backend.py`	新增 FP8 MoE Triton 配置分支 diff coverage 未达到 80%，CI 以 exit code 9 失败

关键证据:

Process completed with exit code 9.
.github/workflows/_unit_test_coverage.yml:254 diff-cover python_coverage_all.xml --diff-file=diff.txt --fail-under=80 --json-report diff_coverage.json || COVERAGE_EXIT_CODE=9
.github/workflows/_unit_test_coverage.yml:387-404 COVERAGE_EXIT_CODE=9 时输出 Coverage generation failed 并 exit 9

根因摘要: 新增 FP8 MoE 分支缺少 diff 覆盖
PR 仅修改 Wfp8Afp8MoEMethod.apply 中 token_num <= E 下的配置选择，新增了 token_num <= 16、token_num <= 32、以及更大 decode token 的三段配置。现有 tests/layers/test_fused_moe_triton_backend.py 里 Wfp8Afp8 apply 用例只执行 1 token 路径，未覆盖新增的 17 <= token_num <= 32 和 33 <= token_num <= E 分支，因此 diff-cover 低于 80%。本次深度日志未返回 unittest_details/log_file_path，未发现具体 pytest 断言失败证据。

修复建议:

在 tests/layers/test_fused_moe_triton_backend.py 为 Wfp8Afp8MoEMethod.apply 增加参数化测试，构造 num_local_experts >= 64，分别用 16、32、64 个 token，断言首次 kernel 调用的 BLOCK_SIZE_M、BLOCK_SIZE_N、num_stages 等配置。
补充 token_num > E 的默认配置保持不变测试，避免后续配置优化误伤 prefill/default 路径。

关联变更: fastdeploy/model_executor/layers/moe/fused_moe_triton_backend.py:749-776

fp8 triton moe config.

c083797

xuanyuanminzheng had a problem deploying to Metax_ci June 10, 2026 08:02 — with GitHub Actions Failure

PaddlePaddle-bot reviewed Jun 10, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fp8 triton moe config.#8036

fp8 triton moe config.#8036
xuanyuanminzheng wants to merge 1 commit into
PaddlePaddle:developfrom
xuanyuanminzheng:fp8_triton_moe_config

xuanyuanminzheng commented Jun 10, 2026

Uh oh!

codecov-commenter commented Jun 10, 2026 •

edited

Loading

Uh oh!

PaddlePaddle-bot left a comment

Uh oh!

PaddlePaddle-bot commented Jun 11, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

xuanyuanminzheng commented Jun 10, 2026

Motivation

Modifications

Usage or Command

Accuracy Tests

Checklist

Uh oh!

codecov-commenter commented Jun 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

PaddlePaddle-bot left a comment

Choose a reason for hiding this comment

📋 Review 摘要

问题

📝 PR 规范检查

总体评价

Uh oh!

PaddlePaddle-bot commented Jun 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

1 Required任务 : 9/10 通过

2 失败详情

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codecov-commenter commented Jun 10, 2026 •

edited

Loading

PaddlePaddle-bot commented Jun 11, 2026 •

edited

Loading