Fix moe acc by BingooYang · Pull Request #7988 · PaddlePaddle/FastDeploy

BingooYang · 2026-06-03T03:41:56Z

Motivation

💡 If this PR is a Cherry Pick, the PR title needs to follow the format by adding the [Cherry-Pick] label at the very beginning and appending the original PR ID at the end. For example, [Cherry-Pick][CI] Add check trigger and logic(#5191)

💡 如若此PR是Cherry Pick，PR标题需遵循格式，在最开始加上[Cherry-Pick]标签，以及最后面加上原PR ID，例如[Cherry-Pick][CI] Add check trigger and logic(#5191)

Modifications

Usage or Command

Accuracy Tests

Checklist

Add at least a tag in the PR title.
- Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
- You can add new tags based on the PR content, but the semantics must be clear.
Format your code, run pre-commit before commit.
Add unit tests. Please write the reason in this PR if no unit tests.
Provide accuracy results.
If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

…thout a slot(PaddlePaddle#7141) (PaddlePaddle#7181) * [BugFix] Set MC_MAX_MR_SIZE to avoid register hang (PaddlePaddle#7163) * Set MC_MAX_MR_SIZE to avoid register hang * up * [fix] prevent requests from entering running state without a slot * [fix] count abort set * [fix] count preempted task in waiting list --------- Co-authored-by: jc <52520497+juncaipeng@users.noreply.github.com>

…ddlePaddle#7186) (PaddlePaddle#7195)

… (PaddlePaddle#7192) * fix MTP bugs in TP and overlap * fix

Co-authored-by: K11OntheBoat <ruianmaidanglao@163.com> Co-authored-by: liuruian <liuruian@MacBook-Pro.local>

* [Feature]whl version * [Feature]whl version,set root_is_pure = false * [Feature]code style Co-authored-by: ChowMingSing <610208940@qq.com>

…7218 (PaddlePaddle#7256) * support moe-topk use topk_reduce_func * fix ep error * fix ut * fix ut

…s in SM90 flash_mask_attn (PaddlePaddle#7216)

…addle#7266) * Remove duplicate NICs from environment variables * Update version for xvllm in download_dependencies.sh Co-authored-by: Jiaxin Sui <95567040+plusNew001@users.noreply.github.com>

…addlePaddle#7191) * merge matmul and add * modify format * using paddle.nn.functional.linear * using _C_ops.linear * using paddle.nn.functional.linear * add FLAGS_use_legacy_linear env var in test case * fix format * add assert and remove env * modify format * using matmul for no bias * modify accurate baseline

…7277) * Update docs for release/2.5 * Update English docs for release/2.5 - Update README_EN.md: add v2.5 news entry, reformat v2.4 entry with release link - Update docs/get_started/installation/nvidia_gpu.md: - Docker image: 2.4.0 -> 2.5.0, notice now shows SM80/86/89/90 support - paddlepaddle-gpu: 3.3.0 -> 3.3.1, add CUDA 12.9 alternatives - fastdeploy-gpu: 2.4.0 -> 2.5.0, unified arch install with CUDA 12.9 option - Update docs/zh/get_started/installation/nvidia_gpu.md: - Fix remaining paddlepaddle-gpu==3.3.0 refs in sections 4&5 -> 3.3.1 Agent-Logs-Url: https://github.com/PaddlePaddle/FastDeploy/sessions/fa0be381-324e-4b0d-b7a6-e2c1fa12174f * Clarify --extra-index-url usage in installation docs Add note explaining that --extra-index-url is only for downloading fastdeploy-gpu dependencies; fastdeploy-gpu itself must be installed from the Paddle source specified by -i. Applied to both Chinese and English nvidia_gpu.md installation guides. Agent-Logs-Url: https://github.com/PaddlePaddle/FastDeploy/sessions/9fa8b3c9-7555-4eae-b9b9-026cddd7e74c * Update nvidia_gpu.md --------- Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com> Co-authored-by: jiang-jia-jun <jiangjiajun@baidu.com> Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>

… (PaddlePaddle#7279)

…nd bug (PaddlePaddle#7221) (PaddlePaddle#7296) Co-authored-by: ming1753 <61511741+ming1753@users.noreply.github.com>

…#7276) * fix * refine code * refine code * refine code * refine code * refine code

…ion Params + CUDAGraph Validation (PaddlePaddle#7215,PaddlePaddle#7281) (PaddlePaddle#7301) * refactor cudagraph args * refactor quant cli param * fix * fix * tmp skip xpu * fix

…e#7320) (PaddlePaddle#7322) Co-authored-by: Jiaxin Sui <95567040+plusNew001@users.noreply.github.com>

…dle#7341)

…addlePaddle#7318) * change glm rope_emb calculation * glm without EnforceFmulRN * fix ci

) (PaddlePaddle#7339) * moe bf16 ep support paddle batch_gemm

…addlePaddle#7343)

…#7308) (PaddlePaddle#7310) * support quant use pow2scale * fix * fix

…ePaddle#7159) (PaddlePaddle#7351) * [Feature] Support set PREEMPTED_TOKEN_ID in GET_SAVE_OUTPUT_V1 * [Feature] Support set PREEMPTED_TOKEN_ID in GET_SAVE_OUTPUT_V1 * fix

…_stop_value kernels (PaddlePaddle#7370) - speculate_limit_thinking_content_length: update current_base_step to step_idx+1 (step_idx now records history count before current round); remove incorrect step_idx decrement on accept_num truncation; mark step_idx param as const. - speculate_set_stop_value_multi_seqs: fix can_stop gate to use step_idx_now+accept_num>=min_token_limit; fix skip check and pre_ids_idx formula (remove stale -accept_num offset); use <= condition so accept_idx maps directly to the accepted token that ends the stop sequence; fix accept_tokens index (remove -1). - Update unit tests for speculate_set_stop_value_multi_seqs kernel.

…it scenario (PaddlePaddle#7364) (PaddlePaddle#7387) ## Motivation 在 PD 分离场景下，decode 节点在接收 prefill 节点转发的请求后，没有及时更新 cache block 的命中信息，导致 prefix cache 命中率低，影响推理性能。 ## Modifications 1. 在 `_free_blocks_when_stop` 方法中，额外排除 prefill 节点（`splitwise_role == "prefill"`）的 cache block 更新，避免 prefill 节点重复更新 cache 导致状态混乱。 2. 在 decode 节点分配请求（`_alloc_requests_with_cache`）成功后，主动调用 `update_cache_blocks` 使用 `need_prefill_tokens` 更新 cache block 信息，确保 decode 节点能正确感知已命中的 prefix cache。 Co-authored-by: kevin <chengyf112@gmail.com>

…addlePaddle#7843 (PaddlePaddle#7845) * [Feature]console metrics log for pd disaggregation * [Feature]console metrics log for pd disaggregation fix test

…ePaddle#7881) (PaddlePaddle#7831) * Add inner benchmark metrics component * Add window_mode * remove temp scripts * fix ut * increase coverage lines

…ddlePaddle#7906) (PaddlePaddle#7909)

* Update _xpu_4cards_case_test.yml * Update _xpu_8cards_case_test.yml

Co-authored-by: kevin <chengyf112@gmail.com>

…e threashold for prefill instance (PaddlePaddle#7871)

…ePaddle#7688) (PaddlePaddle#7729) * support c8 decode attention * support c16 attention && backend * opt kernel * fix * opt larger batch * inplace out * fix input_batch && remove fast_math * fix xpu * fix bug * fix ci * opt and fix mtp * fix merge * clean code * fix merge * update * update test * fix test * fix test * opt buffer * fix conflict --------- Co-authored-by: Jiaxin Sui <95567040+plusNew001@users.noreply.github.com>

…dlePaddle#7883) (PaddlePaddle#7884) * opt mtp logprob * fix * fix test and log * fix bits * Adapt logprobs baseline update in test_ernie_21b_mtp_multistep.py --------- Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>

…ng CUDAGraph recapture(PaddlePaddle#7934) (PaddlePaddle#7933) * fix clear bug in rl * fix: use self.max_chunk_tokens instead of fd_config.get_max_chunk_tokens() for buffer recreation fd_config.get_max_chunk_tokens() without mm_max_tokens_per_item arg may return a smaller value than the actual initial buffer size when enable_mm and mm_max_tokens_per_item is None. Use self.max_chunk_tokens which is already computed during __init__ and consistent with first CUDAGraph capture.

…addle#7839) * PD send cache via storage & Refine swap_cache_layout op * skip messager * up * consider write cache error * fix ci * up

…ddlePaddle#7936) (PaddlePaddle#7917) * support fused noauxtc kernel on ep mode * fix unit test

…dle#7892) and Triton SamplerBackend (PaddlePaddle#7639) (PaddlePaddle#7910) * [CP][Feature] support new sampler backend with triton (PaddlePaddle#7639) * [Optimization] TopP=1.0 using _random_sample (PaddlePaddle#7892) * code check * add env FD_ENABLE_TOP_P_ONE_OPT control top_p=1 opt * defalut FD_ENABLE_TOP_P_ONE_OPT=0 * change FD_ENABLE_TOP_P_ONE_OPT=1 * fix mtp triton seed * change triton seed int64 * fix triton sampler * add seed for mtp triton sampler --------- Co-authored-by: Zero Rains <linjunlu@zerorains.top> Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>

…ddle#7923) (PaddlePaddle#7922) * fix accurate issue * fix acc issue in ep + tp mode --------- Co-authored-by: root <root@tjzj-inf-sci-k8s-bzz2-0271.tjzj.baidu.com>

…addlePaddle#7958)

…in accuracy (PaddlePaddle#7960) * Reset buffer size of R3 * refine code * R3 fix Eos bug * pre-commit * fix r3 ci and support dsa * refine code * refine code * reset ci dir * refine code * fix dsv3

* Reset buffer size of R3 * refine code * R3 fix Eos bug * pre-commit * fix r3 ci and support dsa * refine code * refine code * reset ci dir * refine code * fix dsv3 * fix ernie5 mm bug

…lePaddle#7951) (PaddlePaddle#7971) * Add GDR streaming weight update path * [RL] Unify GDR and IPC weight update

PaddlePaddle-bot

🤖 Paddle-CI-Agent | pr_review | 2026-06-03 17:32:34

📋 Review 摘要

PR 概述：修复 MoE 模型推理精度问题，同步更新 CI 构建配置（固定 PaddlePaddle wheel 版本、改进容器清理逻辑、迁移 runner 至 APPROVAL group）。

⚠️ 本 PR 变更量较大（319 文件），建议拆分以降低审查难度和合入风险。

建议拆分方案：

PR 1: [CI] CI 基础设施更新 — .github/workflows/**, scripts/**
PR 2: [BugFix] MoE 精度修复 — custom_ops/gpu_ops/moe/, fastdeploy/model_executor/layers/moe/, custom_ops/gpu_ops/grouped_topk_kernels.cu
PR 3: [Models] 模型 forward 变更 — fastdeploy/model_executor/models/**
PR 4: [OP] Attention / Quantization kernel 变更 — custom_ops/gpu_ops/append_attn/, custom_ops/gpu_ops/decode_unified_attention/, fastdeploy/model_executor/layers/quantization/

变更范围：CI workflows、MoE kernels、Models、Attention backends、Quantization

影响面 Tag：[CI] [Models] [OP] [BugFix]

问题

级别	文件	概述
🟡 建议	`custom_ops/gpu_ops/moe/tritonmoe_preprocess.cu`	`topk_ids_numel` 以 `int` 承接 `topk_ids.numel()`（int64_t），大 batch 下存在 int32 截断风险
🟡 建议	`fastdeploy/model_executor/layers/moe/triton_moe_kernels.py`	新 kernel `fused_moe_kernel_bf16` 已添加 `offs_token.to(tl.int64)` 修复 stride 溢出，但旧 kernel `fused_moe_kernel_paddle` 未同步此修复
❓ 疑问	`.github/workflows/_accuracy_test.yml`	移除 `--ipc=host --pid=host`，可能影响容器内分布式多进程的 IPC 通信

未发现阻塞性问题。PR 规范问题在下面章节报。

历史 Findings 修复情况

Finding	问题	状态
F1	`check-bypass.yml` `per_page=100` 分页遗漏（本 PR 涉及 319 个文件，实际只检查了前 100 个）	⚠️ 仍存在
F2	硬编码 bcebos 内部 wheel URL，长期维护风险	⚠️ 仍存在（已扩展至 cu129/cu130/RL 等更多 workflow）

📝 PR 规范检查

标题 "Fix moe acc" 缺少官方 Tag，所有描述 section 均为空（仅模板占位符）。与上次 Review 一致，未修改。

标题建议（可直接复制）：

[BugFix] Fix MoE accuracy regression

PR 描述建议（点击展开，可直接复制）

## Motivation

修复 MoE 模型精度问题（Triton kernel 中 `stride_cm * offs_token` int32 溢出导致精度异常），同步更新 CI 构建配置以提升稳定性。

## Modifications

- 新增 `fused_moe_kernel_bf16` Triton kernel，在索引计算前统一将 `offs_token`、`off_experts`、`offs_bn` 提升为 `tl.int64`，修复大 batch 下 stride 乘法溢出
- 固定 CI 中 PaddlePaddle GPU wheel 为 3.3.1.post20260420 版本（cu126/cu129/cu130/RL/XPU 全覆盖），替换原先的 nightly pre 版本
- 所有构建/测试 workflow 新增 "Terminate and delete the container" step（`if: always()`），确保异常退出时也能清理容器
- 改进 workspace 清理逻辑，新增 `find` force cleanup fallback，避免残留目录导致 CI 卡住
- `tar` 命令统一加 `--no-same-owner` 选项，避免解压权限问题
- 多个 workflow 的 runner 从 `ubuntu-latest` 迁移到 `APPROVAL` group，runner 环境更一致
- 移除 docker 构建容器的 `--privileged` 标志，提升 CI 安全性

## Usage or Command

N/A

## Accuracy Tests

N/A（请补充 MoE 精度修复前后对比数据）

## Checklist

- [ ] Add at least a tag in the PR title.
  - Tag list: [`[FDConfig]`,`[APIServer]`,`[Engine]`, `[Scheduler]`, `[PD Disaggregation]`, `[Executor]`, `[Graph Optimization]`, `[Speculative Decoding]`, `[RL]`, `[Models]`, `[Quantization]`, `[Loader]`, `[OP]`, `[KVCache]`, `[DataProcessor]`, `[BugFix]`, `[Docs]`, `[CI]`, `[Optimization]`, `[Feature]`, `[Benchmark]`, `[Others]`, `[XPU]`, `[HPU]`, `[GCU]`, `[DCU]`, `[Iluvatar]`, `[Metax]`]
  - You can add new tags based on the PR content, but the semantics must be clear.
- [ ] Format your code, run `pre-commit` before commit.
- [ ] Add unit tests. Please write the reason in this PR if no unit tests.
- [ ] Provide accuracy results.
- [ ] If the current PR is submitting to the `release` branch, make sure the PR has been submitted to the `develop` branch, then cherry-pick it to the `release` branch with the `[Cherry-Pick]` PR tag.

总体评价

CI 基础设施改进合理；MoE 精度修复通过新 Triton kernel 解决了 int32 stride 溢出问题，方向正确。主要关注点：旧 kernel fused_moe_kernel_paddle 未同步 int64 修复，tritonmoe_preprocess.cu 存在 int 截断风险，以及 --ipc=host 移除对分布式测试的潜在影响。

PaddlePaddle-bot · 2026-06-06T04:18:37Z

🤖 Paddle-CI-Agent | ci_status_monitor | 2026-06-06 12:17:05

CI报告基于以下代码生成（30分钟更新一次）:
PR commit: b5ec4fa | Merge base: e3aed6d (branch: develop)

1 Required任务 : 9/10 通过

总执行（rerun次数）	总任务	✅ 通过	❌ 失败	⏳ 运行中	⏸️ 等待中	跳过
41(0)	41	36	5	0	0	0

任务	错误类型	置信度	日志
`Approval`	需要 Approval	高	Job

2 失败详情

🔴 Approval — 需要 Approval（置信度: 高）

根因摘要

该 Job 需要人工 Approval，完成审批后 CI 才会继续执行。Approval workflow 会运行 scripts/check_approval.sh，当 PR 修改触及受保护范围且缺少指定 reviewer approve 时脚本会以 exit 6 失败。

修复建议摘要

请通过人工审批：根据 Approval Job 日志中的提示邀请对应 reviewer 完成审批；审批完成后重新触发/等待 CI 即可。

Jiang-Jia-Jun and others added 30 commits April 3, 2026 11:29

Update setup.py

b24765a

[Cherry-Pick][CI] Use GPU-Build-RL runner for _build_linux_rl.yml (Pa…

7ab48c4

…ddlePaddle#7186) (PaddlePaddle#7195)

[Cherry-Pick][BugFix] fix MTP bugs in TP and overlap(PaddlePaddle#7172)…

36909bf

… (PaddlePaddle#7192) * fix MTP bugs in TP and overlap * fix

remove arctic_inference deps (PaddlePaddle#7236)

403ce13

Split enable_mm (PaddlePaddle#7183) (PaddlePaddle#7233)

6b78981

Co-authored-by: K11OntheBoat <ruianmaidanglao@163.com> Co-authored-by: liuruian <liuruian@MacBook-Pro.local>

[Feature]distinguish whl version (PaddlePaddle#7204) (PaddlePaddle#7224)

84d6271

* [Feature]whl version * [Feature]whl version,set root_is_pure = false * [Feature]code style Co-authored-by: ChowMingSing <610208940@qq.com>

support moe for sm103 (PaddlePaddle#7240)

0181884

[Cherry-Pick][RL] support moe-topk use topk_reduce_func PaddlePaddle#…

9c65655

…7218 (PaddlePaddle#7256) * support moe-topk use topk_reduce_func * fix ep error * fix ut * fix ut

[Cherry-Pick][BugFix] Fix batch_size derivation and relax shape check…

5fd8020

…s in SM90 flash_mask_attn (PaddlePaddle#7216)

[XPU][CI] lock xvllm version for fix bug (PaddlePaddle#7264) (PaddleP…

098dd2c

…addle#7266) * Remove duplicate NICs from environment variables * Update version for xvllm in download_dependencies.sh Co-authored-by: Jiaxin Sui <95567040+plusNew001@users.noreply.github.com>

Update ci_metax.yml (PaddlePaddle#7286)

6fcc25f

[OP]Unify MoE op with moe_permute path for bf16 GLM (PaddlePaddle#7164)…

dea9d35

… (PaddlePaddle#7279)

[BugFix] Fix Async D2H copy bug & flash mash atten cache V out of bou…

dd0863b

…nd bug (PaddlePaddle#7221) (PaddlePaddle#7296) Co-authored-by: ming1753 <61511741+ming1753@users.noreply.github.com>

[Cherry-Pick] change rms norm for glm PaddlePaddle#7269 (PaddlePaddle…

4f36346

…#7276) * fix * refine code * refine code * refine code * refine code * refine code

[Cherry-Pick][FDConfig] Auto-scale CUDA Graph Capture & CLI Quantizat…

c756038

…ion Params + CUDAGraph Validation (PaddlePaddle#7215,PaddlePaddle#7281) (PaddlePaddle#7301) * refactor cudagraph args * refactor quant cli param * fix * fix * tmp skip xpu * fix

[XPU][CI]Update xtdk version in download_dependencies.sh (PaddlePaddl…

2ac9b89

…e#7320) (PaddlePaddle#7322) Co-authored-by: Jiaxin Sui <95567040+plusNew001@users.noreply.github.com>

[Cherry-Pick][Docs] Update Release Note(PaddlePaddle#7302) (PaddlePad…

65c6e72

…dle#7341)

[Cherry-Pick][RL] change glm rope_emb calculation PaddlePaddle#7316 (P…

42b0f59

…addlePaddle#7318) * change glm rope_emb calculation * glm without EnforceFmulRN * fix ci

[Cherry-Pick][RL]moe bf16 ep support paddle batch_gemm(PaddlePaddle#7337

7446665

) (PaddlePaddle#7339) * moe bf16 ep support paddle batch_gemm

[Cherry-Pick][CI] Sync dev optimizations to 2.6(PaddlePaddle#7335) (P…

9e8ea7d

…addlePaddle#7343)

[Cherry-Pick][TI-consistent] support quant use pow2scale(PaddlePaddle…

9cb82d7

…#7308) (PaddlePaddle#7310) * support quant use pow2scale * fix * fix

fix overlap mtp empty run (PaddlePaddle#7314)

b2997f3

[Feature] Support set PREEMPTED_TOKEN_ID in GET_SAVE_OUTPUT_V1 (Paddl…

d9a008f

…ePaddle#7159) (PaddlePaddle#7351) * [Feature] Support set PREEMPTED_TOKEN_ID in GET_SAVE_OUTPUT_V1 * [Feature] Support set PREEMPTED_TOKEN_ID in GET_SAVE_OUTPUT_V1 * fix

remove fa4 requirements (PaddlePaddle#7354)

9823d63

update attn_mask_q 2 (PaddlePaddle#7373)

144dc17

liuruyan and others added 21 commits May 21, 2026 14:14

fix ce bug (PaddlePaddle#7874)

b562b8d

[Cherry-Pick][Feature][Log]console metrics log for pd disaggregation P…

485f6c2

…addlePaddle#7843 (PaddlePaddle#7845) * [Feature]console metrics log for pd disaggregation * [Feature]console metrics log for pd disaggregation fix test

[Cherry-Pick][Benchmark] Add inner benchmark metrics component (Paddl…

e7815be

…ePaddle#7881) (PaddlePaddle#7831) * Add inner benchmark metrics component * Add window_mode * remove temp scripts * fix ut * increase coverage lines

fix(kvcache): buffer early layer0 signals (PaddlePaddle#7896)

5d18984

[Cherry-Pick][CI] Restore self-hosted runners for GitHub workflows(Pa…

3ffeb44

…ddlePaddle#7906) (PaddlePaddle#7909)

[Cherry-pick][XPU][CI] fix logs update bug (PaddlePaddle#7915)

85399db

* Update _xpu_4cards_case_test.yml * Update _xpu_8cards_case_test.yml

supoort glm yarn rope (PaddlePaddle#7894)

e7a02e2

[bugfix] AS block leaks (PaddlePaddle#7895)

0a5d4b6

Co-authored-by: kevin <chengyf112@gmail.com>

[Scheduler] Increase sleep interval in fetch loops and cancel schedul…

bf0dace

…e threashold for prefill instance (PaddlePaddle#7871)

[PD] PD send cache via storage & Refine swap_cache_layout op (PaddleP…

8a1e71d

…addle#7839) * PD send cache via storage & Refine swap_cache_layout op * skip messager * up * consider write cache error * fix ci * up

[Cherry-Pick][Optimization]support fused noauxtc kernel on ep mode(Pa…

2b0fd53

…ddlePaddle#7936) (PaddlePaddle#7917) * support fused noauxtc kernel on ep mode * fix unit test

[Cherry-Pick] [BugFix] fix all reduce fusion accurate issue (PaddlePa…

fefbcff

…ddle#7923) (PaddlePaddle#7922) * fix accurate issue * fix acc issue in ep + tp mode --------- Co-authored-by: root <root@tjzj-inf-sci-k8s-bzz2-0271.tjzj.baidu.com>

[Cherry-Pick][BugFix] fix mtp reset bugs in rl (PaddlePaddle#7957) (P…

ac24fcc

…addlePaddle#7958)

[RL] Fix the incorrect routing of EOS tokens, which leads to changes …

7198b58

…in accuracy (PaddlePaddle#7960) * Reset buffer size of R3 * refine code * R3 fix Eos bug * pre-commit * fix r3 ci and support dsa * refine code * refine code * reset ci dir * refine code * fix dsv3

[RL] Fix Ernie mm bug (PaddlePaddle#7966)

eeed8a3

* Reset buffer size of R3 * refine code * R3 fix Eos bug * pre-commit * fix r3 ci and support dsa * refine code * refine code * reset ci dir * refine code * fix dsv3 * fix ernie5 mm bug

[Cherry-Pick][RL][Feature] Add GDR streaming weight update path (Padd…

780c000

…lePaddle#7951) (PaddlePaddle#7971) * Add GDR streaming weight update path * [RL] Unify GDR and IPC weight update

fix moe accurate issue

99c7df1

BingooYang had a problem deploying to Metax_ci June 3, 2026 03:42 — with GitHub Actions Failure

This comment was marked as outdated.

Sign in to view

fix bug

f232ed9

BingooYang had a problem deploying to Metax_ci June 3, 2026 04:56 — with GitHub Actions Failure

This comment was marked as outdated.

Sign in to view

add test

b5ec4fa

BingooYang had a problem deploying to Metax_ci June 3, 2026 09:15 — with GitHub Actions Failure

PaddlePaddle-bot reviewed Jun 3, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix moe acc#7988

Fix moe acc#7988
BingooYang wants to merge 143 commits into
PaddlePaddle:developfrom
BingooYang:fix_moe_acc

BingooYang commented Jun 3, 2026

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

PaddlePaddle-bot left a comment

Uh oh!

PaddlePaddle-bot commented Jun 6, 2026

根因摘要

修复建议摘要

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

Conversation

BingooYang commented Jun 3, 2026

Motivation

Modifications

Usage or Command

Accuracy Tests

Checklist

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

PaddlePaddle-bot left a comment

Choose a reason for hiding this comment

📋 Review 摘要

问题

历史 Findings 修复情况

📝 PR 规范检查

总体评价

Uh oh!

PaddlePaddle-bot commented Jun 6, 2026

1 Required任务 : 9/10 通过

2 失败详情

根因摘要

修复建议摘要

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants