Skip to content

perf(docs-tests): shard route walk to drop CI wall-clock#189

Merged
PatrickRitchie merged 9 commits into
TrakHound:masterfrom
ottobolyos:perf/route-check-sharding
Jun 6, 2026
Merged

perf(docs-tests): shard route walk to drop CI wall-clock#189
PatrickRitchie merged 9 commits into
TrakHound:masterfrom
ottobolyos:perf/route-check-sharding

Conversation

@ottobolyos
Copy link
Copy Markdown
Contributor

Summary

Implements a four-way shard of the RouteCheckTests Playwright route walk so the build-test-coverage Ubuntu leg drops from a single sequential walk of every markdown-backed route to four parallel walks of a quarter each. The longest single shard becomes the critical-path wall-clock floor rather than the whole walk.

  • Add RouteCheckHelpers.ShardRoutes(routes, index, total) implementing round-robin distribution: route at position i lands in shard (i % total) + 1. Round-robin keeps per-shard wall-clock balanced even when expensive routes cluster in the source markdown tree (e.g. the generated reference pages under docs/reference/).
  • Add RouteCheckHelpers.ReadShardEnv() reading ROUTE_SHARD_INDEX / ROUTE_SHARD_TOTAL from the process environment with (1, 1) as the unset / unparseable default. Local dotnet test invocations without matrix-injected env vars walk every route, identity-shard.
  • Wire Every_Markdown_Backed_Route_Resolves_Without_A_404 to slice the collected route list through ShardRoutes before walking it. A surplus shard (more shards than routes) walks zero routes and short-circuits to success.
  • Add shard / shardTotal matrix dimensions to .github/workflows/dotnet.yml. shard is [1, 2, 3, 4] on ubuntu-latest; windows-latest is capped at shard 1 of 4 via exclude rules because the Windows leg filters out Category=E2E and has no shardable work. Each leg env-injects ROUTE_SHARD_INDEX / ROUTE_SHARD_TOTAL from the static matrix integers (never from github.event.* untrusted input), names its TRX with the shard suffix to avoid collisions, and uploads its artifacts under a shard-qualified name.

@ottobolyos ottobolyos force-pushed the perf/route-check-sharding branch from b4985b1 to 967d967 Compare June 3, 2026 15:12
@ottobolyos ottobolyos force-pushed the perf/route-check-sharding branch 2 times, most recently from 8253d41 to ab34927 Compare June 5, 2026 02:41
Add ShardRoutes contract tests pinning round-robin distribution
across N shards, the union+disjointness partition invariants, the
within-shard order preservation, and the argument-validation
failure modes (null routes, non-positive total, out-of-range
index). The implementation arrives in the follow-up commit; this
commit is the RED step and intentionally does not compile.
Drop the build-test-coverage Ubuntu wall-clock by partitioning the
RouteCheckTests Playwright route walk across four parallel matrix
shards. The longest single shard becomes the critical-path floor
rather than the full sequential walk of every markdown-backed route.

- Add ShardRoutes(routes, index, total) to RouteCheckHelpers,
  implementing round-robin distribution (route at position i lands
  in shard (i % total) + 1) so per-shard wall-clock stays balanced
  even when expensive routes cluster in the source tree.
- Add ReadShardEnv() reading ROUTE_SHARD_INDEX and ROUTE_SHARD_TOTAL
  from the process environment, defaulting to (1, 1) on unset or
  unparseable values so local dotnet test invocations walk every
  route by default. Clamps a half-set matrix where
  ROUTE_SHARD_INDEX exceeds ROUTE_SHARD_TOTAL to the largest valid
  shard so the caller never receives an out-of-range index.
- Wire Every_Markdown_Backed_Route_Resolves_Without_A_404 to slice
  the collected route list through ShardRoutes before walking it.
  A surplus shard (more shards than routes) legitimately walks zero
  routes and short-circuits to success.
- Add shard / shardTotal matrix dimensions to the build-test-coverage
  workflow. shard is [1, 2, 3, 4] on ubuntu-latest; windows-latest
  is capped at shard 1 of 4 via exclude rules because the Windows
  leg filters out Category=E2E and has no shardable work. Each leg
  env-injects ROUTE_SHARD_INDEX and ROUTE_SHARD_TOTAL into the test
  runner, names its TRX with the shard suffix to avoid collisions,
  and uploads its artifacts under a shard-qualified name.
IReadOnlyList<string> has no IndexOf extension on the target
framework; materialise the sample routes to List<string> before
looking up positions in the order-preservation assertion.
@ottobolyos ottobolyos force-pushed the perf/route-check-sharding branch from ab34927 to d329c9d Compare June 6, 2026 13:24
ottobolyos added a commit to ottobolyos/mtconnect.net that referenced this pull request Jun 6, 2026
@ottobolyos ottobolyos marked this pull request as ready for review June 6, 2026 13:26
@ottobolyos ottobolyos marked this pull request as draft June 6, 2026 13:55
The shard jobs download docs/.vitepress/dist from the docs-prepare
artefact and do not install docfx, but the route-check OneTimeSetUp
re-runs `npm run build` unconditionally. The prebuild hook then
shells out to `docfx metadata` and fails with "docfx not found on
PATH", taking shards 1, 3, and 4 red. Honour the existing
dist/index.html sentinel so a prebuilt tree short-circuits the
build, matching the workflow's documented contract that docs-prepare
is the single docfx-owning producer. Local clean clones still
trigger a full build.
@ottobolyos ottobolyos marked this pull request as ready for review June 6, 2026 13:58
@ottobolyos ottobolyos marked this pull request as draft June 6, 2026 14:17
The OneTimeSetUp_Rebuilds_Dist_Even_When_Index_Exists assertion
unconditionally requires the in-fixture build to write a fresh
dist/index.html, which collides with the sharded CI contract where
the docs-prepare job is the dist producer and each shard is a pure
consumer (shards do not install docfx, so the prebuild hook would
fail). Mark the assertion inconclusive when ROUTE_SHARD_TOTAL > 1
so it still pins the stale-dist guard on local + unsharded runs
without false-failing the shard fan-out. Rename the test to reflect
the producer-mode scope.
ottobolyos added a commit to ottobolyos/mtconnect.net that referenced this pull request Jun 6, 2026
@ottobolyos ottobolyos marked this pull request as ready for review June 6, 2026 14:18
@PatrickRitchie PatrickRitchie moved this from In Progress to Reviewing in MTConnect.NET-Development Jun 6, 2026
@PatrickRitchie PatrickRitchie moved this from Reviewing to Ready to Merge in MTConnect.NET-Development Jun 6, 2026
@PatrickRitchie PatrickRitchie merged commit be0f19f into TrakHound:master Jun 6, 2026
14 checks passed
@github-project-automation github-project-automation Bot moved this from Ready to Merge to Done in MTConnect.NET-Development Jun 6, 2026
@ottobolyos ottobolyos deleted the perf/route-check-sharding branch June 6, 2026 16:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Development

Successfully merging this pull request may close these issues.

2 participants