Skip to content

Support torchrun-style InfiniTrain multi-process launch#184

Open
chen2021673 wants to merge 1 commit into
masterfrom
8_proc
Open

Support torchrun-style InfiniTrain multi-process launch#184
chen2021673 wants to merge 1 commit into
masterfrom
8_proc

Conversation

@chen2021673

@chen2021673 chen2021673 commented Jul 3, 2026

Copy link
Copy Markdown
Contributor

Summary

This PR adds torchrun-style multi-process launch support for InfiniTrain while preserving the existing single-process multi-thread workflow.

Background

InfiniTrain previously relied mainly on --nthread_per_process for multi-GPU runs, which creates multiple C++ threads inside one process. PyTorch torchrun --nproc_per_node=N instead launches N processes and injects rank-related environment variables such as RANK, LOCAL_RANK, WORLD_SIZE, and LOCAL_WORLD_SIZE.

Profiling scripts need the InfiniTrain path to match the PyTorch process model.

Changes

  • Update infini_run to:

    • support -- as the launcher/training-args separator
    • launch nproc_per_node child processes
    • inject both InfiniTrain and torchrun-compatible rank env vars
    • propagate child process failures via exit code
  • Update parallel runtime to:

    • read torchrun-compatible env vars as fallback
    • validate process topology and rank bounds
    • map local process/thread rank to CUDA device index
  • Update GPT-2/Llama3 examples and parallel helpers to use local-device mapping.

  • Update scripts/run_models_and_profile.bash to:

    • always launch model commands through infini_run
    • treat nproc_per_node as launcher-only config
    • keep nthread_per_process as the per-process thread count
  • Update scripts/test_config.json to use multi-process configs:

    • 8-thread cases become nproc_per_node=8, nthread_per_process=1
    • original 4-rank VPP cases become nproc_per_node=4, nthread_per_process=1
  • Add documentation describing behavior, compatibility, and example usage.

Compatibility

Existing direct runs remain supported:

./llama3 ... --nthread_per_process 8

The launcher can also preserve the old single-process multi-thread behavior:

./infini_run --nproc_per_node=1 ./llama3 ... --nthread_per_process 8

The recommended single-node 8-GPU multi-process usage is:

./infini_run --nproc_per_node=8 ./llama3 ... --nthread_per_process 1

Test

image image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant