Add shard exp on fsdp custom mesh by NuojCheng · Pull Request #3988 · AI-Hypercomputer/maxtext

NuojCheng · 2026-05-27T02:38:47Z

Description

This PR introduces a new custom mesh and rule that enabling FSDP sharding exp dimension for weights in MoE component. When non-elementwise optimization is enabled, e.g. Muon, this custom mesh show better benefits.

Compared with previous implementation of shard_exp_on_fsdp, this custom mesh support mixture of FSDP and EP.

Use custom_mesh_and_rule=shard-exp-on-fsdp to enable.

Tests

Performance Regression

Tpu7x-8 on DSv2-16b, the losses of 10 steps perfectly match and performance are similar, see https://diff.googleplex.com/#key=r2vyDl7870kN.

Support on FSDP + EP

Support FSDP=4 and EP=2 (a2a): https://paste.googleplex.com/4617065838280704
Support FSDP=4 and EP=2 (AG-RS): https://paste.googleplex.com/5176088446763008

Support on explicit sharding

Pure FSDP=8: https://paste.googleplex.com/4860389375475712
FSDP=4 and EP=2: https://paste.googleplex.com/4890472467267584

Performance improvement on Muon optimizer

FSDP+EP MFU improved 120%: https://diff.googleplex.com/#key=ieiMesgH7AMi
Pure FSDP same performance: https://diff.googleplex.com/#key=FdgAlFcTiTHF

Checklist

Before submitting this PR, please make sure (put X in square brackets):

I have performed a self-review of my code. For an optional AI review, add the gemini-review label.
I have necessary comments in my code, particularly in hard-to-understand areas.
I have run end-to-end tests tests and provided workload links above if applicable.
I have made or will make corresponding changes to the doc if needed, including adding new documentation pages to the relevant Table of Contents (toctree directive) as explained in our documentation.

codecov · 2026-05-27T02:44:09Z

Codecov Report

❌ Patch coverage is 70.00000% with 9 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
src/maxtext/utils/sharding.py	62.50%	3 Missing and 3 partials ⚠️
src/maxtext/layers/moe.py	76.92%	1 Missing and 2 partials ⚠️

📢 Thoughts on this report? Let us know!

Shuwen-Fang · 2026-05-28T00:25:05Z

      return lhs_quantize_dtype, rhs_quantize_dtype

-    def gmm(inputs, kernel, tiling, group_sizes, expert_assignments, weight_gather_axes):
+    def gmm(inputs, kernel, tiling, group_sizes, expert_assignments, weight_gather_axes=None):


can we do weight_gather_axes = []

Tried it first but got some pylint warning..

Shuwen-Fang · 2026-05-28T00:26:42Z


+def remove_fsdp_pspec(pspec):
+  """Removes 'fsdp' and 'fsdp_transpose' from a PartitionSpec."""
+  if isinstance(pspec, jax.sharding.PartitionSpec):


in what scenario is it not a jax.sharding.PartitionSpec type?

in some cases it might be none. Since it is a shared funciton in sharding.py, I tried to make this funciton as general-purpose as possible.

nit, can we change it to:

if pspec == None:
return psepc
new_spec = []
...

Shuwen-Fang · 2026-05-28T00:30:12Z

+      w1_pspec = self._logical_to_mesh_axes(("exp", "embed_tensor_transpose", "mlp_no_fsdp"))
+      wo_pspec = self._logical_to_mesh_axes(("exp", "mlp_no_fsdp", "embed_tensor_transpose"))
+      # Update kernel pspec for FSDP AG
+      w0_pspec = remove_fsdp_pspec(w0_pspec)


For my own understanding, why remove fsdp from pspec?

For the sparse matmul wrapper function, we FSDP all gather weights before starting the shard map. I think it is just a decision made earlier.

github-actions · 2026-05-28T01:25:10Z

🤖 Hi @NuojCheng, I've received your request, and I'm working on it now! You can track my progress in the logs for more details.

github-actions · 2026-05-28T01:27:15Z

🤖 I'm sorry @NuojCheng, but I was unable to process your request. Please see the logs for more details.

Shuwen-Fang · 2026-05-28T18:47:39Z


+def remove_fsdp_pspec(pspec):
+  """Removes 'fsdp' and 'fsdp_transpose' from a PartitionSpec."""
+  if isinstance(pspec, jax.sharding.PartitionSpec):


nit, can we change it to:

if pspec == None:
return psepc
new_spec = []
...

NuojCheng force-pushed the chengnuojin-fsdp-exp branch 6 times, most recently from 0331b14 to 11f85f7 Compare May 27, 2026 21:32

add shard exp on fsdp custom mesh

7385158

NuojCheng force-pushed the chengnuojin-fsdp-exp branch from 11f85f7 to 7385158 Compare May 27, 2026 23:21

NuojCheng marked this pull request as ready for review May 27, 2026 23:46

NuojCheng requested review from dipannita08 and igorts-git as code owners May 27, 2026 23:46

Shuwen-Fang reviewed May 28, 2026

View reviewed changes

NuojCheng added the gemini-review label May 28, 2026

Shuwen-Fang approved these changes May 28, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add shard exp on fsdp custom mesh#3988

Add shard exp on fsdp custom mesh#3988
NuojCheng wants to merge 1 commit into
mainfrom
chengnuojin-fsdp-exp

NuojCheng commented May 27, 2026 •

edited

Loading

Uh oh!

codecov Bot commented May 27, 2026 •

edited

Loading

Uh oh!

Shuwen-Fang May 28, 2026

Uh oh!

NuojCheng May 28, 2026

Uh oh!

Shuwen-Fang May 28, 2026

Uh oh!

NuojCheng May 28, 2026

Uh oh!

Shuwen-Fang May 28, 2026

Uh oh!

Shuwen-Fang May 28, 2026

Uh oh!

NuojCheng May 28, 2026

Uh oh!

github-actions Bot commented May 28, 2026

Uh oh!

github-actions Bot commented May 28, 2026

Uh oh!

Shuwen-Fang May 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

NuojCheng commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Tests

Performance Regression

Support on FSDP + EP

Support on explicit sharding

Performance improvement on Muon optimizer

Checklist

Uh oh!

codecov Bot commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented May 28, 2026

Uh oh!

github-actions Bot commented May 28, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

NuojCheng commented May 27, 2026 •

edited

Loading

codecov Bot commented May 27, 2026 •

edited

Loading