Skip to content

fix(ci): skip oclif shell probe in egg-bin#6014

Merged
killagu merged 1 commit into
nextfrom
codex/egg-bin-ci-diagnostics
Jun 28, 2026
Merged

fix(ci): skip oclif shell probe in egg-bin#6014
killagu merged 1 commit into
nextfrom
codex/egg-bin-ci-diagnostics

Conversation

@killagu

@killagu killagu commented Jun 28, 2026

Copy link
Copy Markdown
Contributor

Summary

  • preset SHELL before loading oclif in the main egg-bin entrypoint when Windows children start without it
  • apply the same preset to the custom my-egg-bin fixture entrypoint
  • move the investigation details into the wiki and keep temporary diagnostic code out of the final PR

Root Cause

On GitHub-hosted Windows runners, spawned egg-bin children can start without SHELL. oclif then synchronously probes the parent process shell through PowerShell/CIM during startup. tools/egg-bin/test/commands/test.test.ts starts many child CLI processes, so that startup probe dominated the file runtime.

Validation

  • git diff --check origin/next...HEAD
  • node --check tools/egg-bin/bin/run.js
  • node --check tools/egg-bin/test/fixtures/my-egg-bin/bin/run.js
  • local egg-bin smoke after build, with CI=1 and SHELL unset:
    • test/commands/test.test.ts -t "should only test files specified by TESTS argv"
    • test/my-egg-bin.test.ts -t "should my-egg-bin nsp success"
  • CI Test bin (windows-latest, 24) passed in about 3m06s after diagnostic code was removed

Evidence

The diagnostic code used to isolate the hotspot was removed from this PR. The durable notes and CI evidence are recorded in wiki/workflows/egg-bin-windows-shell-probe.md.

  • before fix: test/commands/test.test.ts on Windows took about 1,077s
  • after main entrypoint fix: test/commands/test.test.ts dropped to about 42-49s
  • after fixture entrypoint fix: test/my-egg-bin.test.ts dropped from about 299s to about 8.5s
  • latest final-code Windows bin run: test/commands/test.test.ts 48.8s, test/my-egg-bin.test.ts 8.5s, test/commands/cov.test.ts 30.0s

Summary by CodeRabbit

  • Bug Fixes

    • Improved Windows startup behavior for the CLI by ensuring a shell is available when one isn’t already set.
    • Reduced unnecessary startup overhead in Windows environments, making related commands and tests run faster and more reliably.
  • Documentation

    • Added and updated wiki entries explaining the Windows startup investigation, timing impact, and final resolution.

@coderabbitai

coderabbitai Bot commented Jun 28, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 4df61f82-e028-41d8-bc88-6787b3895a0c

📥 Commits

Reviewing files that changed from the base of the PR and between 92c47ea and c8e3851.

📒 Files selected for processing (5)
  • tools/egg-bin/bin/run.js
  • tools/egg-bin/test/fixtures/my-egg-bin/bin/run.js
  • wiki/index.md
  • wiki/log.md
  • wiki/workflows/egg-bin-windows-shell-probe.md

📝 Walkthrough

Walkthrough

Adds a presetWindowsShell() helper to both tools/egg-bin/bin/run.js and the test fixture run.js. On Windows, when SHELL is unset, it derives the value from ComSpec/COMSPEC or defaults to cmd.exe. The static @oclif/core import is replaced with a dynamic await import called after the preset. Wiki documentation is added covering the investigation, fix, and CI timing evidence.

Changes

Windows SHELL preset and dynamic import

Layer / File(s) Summary
presetWindowsShell helper and dynamic import
tools/egg-bin/bin/run.js, tools/egg-bin/test/fixtures/my-egg-bin/bin/run.js
Adds presetWindowsShell() that sets process.env.SHELL from COMSPEC/ComSpec or cmd.exe on win32 when unset; replaces static @oclif/core import with dynamic await import('@oclif/core') called after the preset in both entry points.
Wiki investigation log and workflow page
wiki/workflows/egg-bin-windows-shell-probe.md, wiki/log.md, wiki/index.md
Adds a new workflow page documenting the finding (oclif shell probing without SHELL), the fix, and CI timing evidence; adds a dated log entry and an index link.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Suggested reviewers

  • fengmk2

Poem

🐰 On Windows the shell was a mystery deep,
COMSPEC to the rescue, no more sluggish sleep!
A dynamic import waits for the preset to land,
So oclif sees SHELL right from the start as planned.
Hop hop, CI times drop — all is well in egg-land! 🥚

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch codex/egg-bin-ci-diagnostics

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@cloudflare-workers-and-pages

cloudflare-workers-and-pages Bot commented Jun 28, 2026

Copy link
Copy Markdown

Deploying egg with  Cloudflare Pages  Cloudflare Pages

Latest commit: c8e3851
Status: ✅  Deploy successful!
Preview URL: https://253ceb83.egg-cci.pages.dev
Branch Preview URL: https://codex-egg-bin-ci-diagnostics.egg-cci.pages.dev

View logs

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces CI runner diagnostics and improves the test benchmarking tool to support sharding and long-tail test reporting. Specifically, it adds a new script scripts/ci-runner-diagnostics.js to measure hardware, process startup, and I/O baselines, and updates the benchmarking tool to capture --shard parameters and report individual slow tests. Feedback was provided on scripts/ci-runner-diagnostics.js to wrap the temporary directory I/O operations in a try...finally block to guarantee cleanup and prevent resource leaks if an error occurs.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread scripts/ci-runner-diagnostics.js Outdated
Comment on lines +216 to +232
async function measureTempFileIo(bytes) {
const dir = await fs.mkdtemp(path.join(os.tmpdir(), 'egg-ci-diagnostics-'));
const file = path.join(dir, 'io.bin');
const buffer = Buffer.alloc(bytes, 7);
const writeStart = performance.now();
await fs.writeFile(file, buffer);
const writeMs = Math.round(performance.now() - writeStart);
const readStart = performance.now();
const readBuffer = await fs.readFile(file);
const readMs = Math.round(performance.now() - readStart);
await fs.rm(dir, { force: true, recursive: true });
return {
bytes: readBuffer.length,
readMs,
writeMs,
};
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The temporary directory created by fs.mkdtemp is not guaranteed to be cleaned up if an error occurs during fs.writeFile or fs.readFile (for example, due to disk space or permission issues). Wrapping the I/O operations in a try...finally block ensures that the temporary directory is always deleted, preventing resource leaks on the CI runner.

async function measureTempFileIo(bytes) {
  const dir = await fs.mkdtemp(path.join(os.tmpdir(), 'egg-ci-diagnostics-'));
  try {
    const file = path.join(dir, 'io.bin');
    const buffer = Buffer.alloc(bytes, 7);
    const writeStart = performance.now();
    await fs.writeFile(file, buffer);
    const writeMs = Math.round(performance.now() - writeStart);
    const readStart = performance.now();
    const readBuffer = await fs.readFile(file);
    const readMs = Math.round(performance.now() - readStart);
    return {
      bytes: readBuffer.length,
      readMs,
      writeMs,
    };
  } finally {
    await fs.rm(dir, { force: true, recursive: true });
  }
}

@codecov

codecov Bot commented Jun 28, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 81.94%. Comparing base (92c47ea) to head (c8e3851).
⚠️ Report is 2 commits behind head on next.

Additional details and impacted files
@@           Coverage Diff           @@
##             next    #6014   +/-   ##
=======================================
  Coverage   81.94%   81.94%           
=======================================
  Files         677      677           
  Lines       20651    20651           
  Branches     4099     4099           
=======================================
  Hits        16922    16922           
  Misses       3215     3215           
  Partials      514      514           

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@cloudflare-workers-and-pages

cloudflare-workers-and-pages Bot commented Jun 28, 2026

Copy link
Copy Markdown

Deploying egg-v3 with  Cloudflare Pages  Cloudflare Pages

Latest commit: c8e3851
Status: ✅  Deploy successful!
Preview URL: https://031223d2.egg-v3.pages.dev
Branch Preview URL: https://codex-egg-bin-ci-diagnostics.egg-v3.pages.dev

View logs

@socket-security

socket-security Bot commented Jun 28, 2026

Copy link
Copy Markdown

No dependency changes detected. Learn more about Socket for GitHub.

👍 No dependency changes detected in pull request

@killagu killagu force-pushed the codex/egg-bin-ci-diagnostics branch 2 times, most recently from 36b1341 to 8060729 Compare June 28, 2026 09:12
@killagu killagu changed the title [codex] Diagnose and shard Windows egg-bin CI fix(ci): skip oclif shell probe in egg-bin Jun 28, 2026

killagu commented Jun 28, 2026

Copy link
Copy Markdown
Contributor Author

排查结论如下,方便集中看;排查过程中加过的 runner/timing/benchmark 诊断代码已经从最终 PR 移除了,过程和数据保留在 wiki/workflows/egg-bin-windows-shell-probe.md

  • 真正热点不是 Vitest shard 数量,也不是 globby / config 构建。test/commands/test.test.ts 慢在 Windows 上反复启动子 egg-bin 进程,每个子进程在进入 BaseCommand.init 前被 oclif 启动成本拖住。
  • 进一步打开 oclif performance 后,热点收敛到 Windows 缺少 SHELL 环境变量时的 shell detection。oclif 会同步走 PowerShell/CIM 探测父进程 shell;这个成本在 test.test.ts 这种大量 spawn 的文件里被放大。
  • 最终修复方式很小:在 tools/egg-bin/bin/run.js 和 custom CLI fixture 的 bin/run.js 入口里,Windows 且 SHELL 缺失时预设为 cmd.exe(优先取 COMSPEC basename),再动态 import @oclif/core,从而绕开 oclif 的同步 shell probe。
  • CI 数据:test/commands/test.test.ts 从约 1,077s 降到约 42-49s;test/my-egg-bin.test.ts 在 fixture 入口同样修复后从约 299s 降到约 8.5s。
  • 最终代码(无诊断代码)的单 Windows test-egg-bin leg 已通过:job 09:37:29 → 09:40:35,约 3m06s;其中 test/commands/test.test.ts 48.8s,test/my-egg-bin.test.ts 8.5s,test/commands/cov.test.ts 30.0s。
  • 因为主因已修掉,Windows test-egg-bin 不需要拆 shard;最终 PR 只保留 shell preset 修复和 wiki 记录。

@killagu killagu force-pushed the codex/egg-bin-ci-diagnostics branch from 8060729 to 051f007 Compare June 28, 2026 09:37
@killagu killagu force-pushed the codex/egg-bin-ci-diagnostics branch from 051f007 to c8e3851 Compare June 28, 2026 09:43

- Before the fix, `test/commands/test.test.ts` on Windows took about 1,077s.
- After the main entrypoint preset, `test/commands/test.test.ts` dropped to about
42-48s.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个差距也太大了吧。。。

@killagu killagu marked this pull request as ready for review June 28, 2026 11:18
Copilot AI review requested due to automatic review settings June 28, 2026 11:18
@killagu killagu merged commit 9080894 into next Jun 28, 2026
27 checks passed
@killagu killagu deleted the codex/egg-bin-ci-diagnostics branch June 28, 2026 11:18

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot was unable to review this pull request because the user who requested the review has reached their quota limit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants