FE-872: Install-failure classification — infra-vs-test split by kostandinang · Pull Request #218 · hashintel/brunch

kostandinang · 2026-06-15T16:39:30Z

Stacks on FE-871. Third Arc-1 frontier -- the FE-843-deferred fail/infra test-outcome split. No install verb: the install action stays agent-native (bash + FE-843 testConventions, A98).

What?

Make a broken toolchain distinguishable from a failing test, report it honestly, pin the promoted tree as reproducible, and make that classification visible everywhere tests run.

Slice 1 -- classify (TestResult.failureKind, classifyTestFailure): a failed run carries 'infra' | 'test'. Classified conservatively -- only an unambiguous "runner isn't there" signal is infra (spawn ENOENT, shell command not found). Everything else, incl. a missing module, stays test (ambiguous with a legitimate TDD red -- mislabeling would silently skip a real failure). The tests-run report surfaces an aggregate failureKind.
Slice 2 -- react (net-compiler.ts): when a slice exhausts retries because tests never ran, the halt reads toolchain/install failure (tests never ran in N attempts) instead of the misdirecting retry exhaustion.
Slice 3 -- greenfield dep capture (promote-run.test.ts): the agent's manifest + lockfile are pinned as a promotion invariant via git ls-files -- turning promoteGreenfieldRun's incidental copy into an asserted reproducible-tree guarantee.
Slice 4 -- unify the seam (runVerification): collapse the three diverged test-execution paths onto one TestRunner + one verdict helper, so failureKind is visible at every site, not just the net path.

Why?

TestResult was a single { passed } boolean, so a failed npm install looked identical to a logic bug -- sending the code-writer to "fix the code" while the toolchain never installed, then halting with a cause that named the wrong thing. And without a pinned capture invariant, a promoted tree could silently drop the lockfile. app-runtime-probe / integration-oracle depend on both: infra separated from test, and deps reproducible in the promoted tree.

Slice 4 closes the gap that slices 1-2 left: classification was only half-wired. evaluate-done and verify-epic ran tests through a private, spawn-based runTest that returned a bare boolean, so the new failureKind never reached them -- only the net run-tests path saw it. The duplicate execution path had also drifted (spawn vs spawnSync). Now there is one execution seam (TestRunner.run) and one verification seam (runVerification) owning the >=1-and-all-pass verdict rule and the infra-dominates aggregate; a runner that throws is treated as infra. pi-actions.runTest and evaluateVerificationTargets are deleted.

Scope discipline

Not a bespoke re-install arc -- the loop already retries and the agent re-installs via bash on its next turn. The harness owns only what bash install can't give: the classification, the honest terminal cause, the capture invariant, and one path that carries them.

Deferred

Brownfield dep-delta capture over the CoW baseline is blocked on brownfield-promotion (no brownfield promote path yet).

Co-authored-by: Amp amp@ampcode.com

kostandinang · 2026-06-15T16:39:48Z

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more

This stack of pull requests is managed by Graphite. Learn more about stacking.

cursor · 2026-06-16T14:18:10Z

PR Summary

Medium Risk
Changes core cook verification, retry halt semantics, and shared test execution paths used by evaluate-done, verify-epic, and the Petri net; misclassification could still misroute retries, but behavior is conservative and well covered by new unit/contract tests.

Overview
FE-872 splits toolchain/install failures from test assertion failures so the cook loop and halt messages stop blaming code when the runner never ran.

Failed runs now carry optional failureKind: 'infra' | 'test' on TestResult, set by classifyTestFailure (only missing-runner signals count as infra; missing modules stay test for TDD reds). ToolchainTestRunner stamps this on failed spawns; runVerification is the single seam for the “≥1 target, all pass” verdict and infra-dominates aggregate, used by evaluate-done, verify-epic, and the net run-tests transition. The old pi-actions spawn/evaluateVerificationTargets path is removed; createPiActions takes an optional testRunner (wired from cook-cli).

When retries are exhausted with failureKind === 'infra', the slice halt reason becomes toolchain/install failure during verification instead of generic retry exhaustion. Reports (eval-done, epic-verified, tests-run) include failureKind.

Greenfield promotion gains a test that package.json and lockfile are tracked in git after promote (reproducible tree invariant). memory/PLAN.md and SPEC.md (A98) are updated for FE-872 / dogfood-spike status.

^{Reviewed by Cursor Bugbot for commit 371b6e4. Bugbot is set up for automated code reviews on this repo. Configure here.}

TestResult gains a failureKind?: 'infra' | 'test' discriminant so a broken toolchain (missing runner binary / deps never installed) is no longer indistinguishable from a logic failure that should send the code-writer to fix the code. ToolchainTestRunner.run classifies a failed run via classifyTestFailure, deliberately conservative: only an unambiguous "the runner itself isn't there" signal (spawn ENOENT, or a shell command-not-found) is infra; everything else is test, because a missing module is ambiguous with a legitimate TDD red and mislabeling a real failure as infra would silently skip it. The tests-run net report surfaces an aggregate failureKind (infra dominates) so consumers don't rescan results. Amp-Thread-ID: https://ampcode.com/threads/T-019ecb9a-9a08-733b-833d-76885fc8243a Co-authored-by: Amp <amp@ampcode.com>

When a slice exhausts its retry budget because the tests never executed (toolchain missing / deps not installed), the halt reason now reads "toolchain/install failure" instead of the misdirecting "retry exhaustion", using the failureKind classified in slice 1. Deliberately not a bespoke re-install net arc: the loop already loops back and the agent re-installs natively via bash on its next turn, so the harness only owns the honest terminal cause. Completes acceptance 1 (classify + react). Amp-Thread-ID: https://ampcode.com/threads/T-019ecb9a-9a08-733b-833d-76885fc8243a Co-authored-by: Amp <amp@ampcode.com>

…ice 3) promoteGreenfieldRun's blanket copy already lands the manifest + lockfile the cook agent produced; this turns that incidental behavior into an asserted promotion invariant (git ls-files), guaranteeing a reproducible promoted tree. Closes acceptance 2 for greenfield; brownfield dep-delta capture stays blocked on the brownfield-promotion frontier. Amp-Thread-ID: https://ampcode.com/threads/T-019ecb9a-9a08-733b-833d-76885fc8243a Co-authored-by: Amp <amp@ampcode.com>

Ran a real brownfield cook (hand-authored 2-slice plan + node:http app, throwaway repo) to de-risk app-runtime-probe / integration-oracle before building them. Verdict: the chain works end-to-end (CoW worktree, clean-tree gate, per-slice->__epic__ merge composed the wiring, TDD red/green, working branch untouched). The agent wired the feature reachable and self-authored a genuine boot-and-probe integration test — the orphan did not reproduce. But reachability was agent-discretion, not enforced, confirming the value of an independent integration-oracle. Two refinements: app-runtime-probe should own the boot mechanism (the agent had to invent a .js->.ts resolve hook); dep-install was unexercised (zero-dep app). Bonus: the 'Cannot find module' TDD red was handled as a test-red, not infra — validates FE-872 slice 1 live. Marks dogfood-spike done; folds findings into app-runtime-probe and integration-oracle; updates SPEC A98 to partially-validated. Amp-Thread-ID: https://ampcode.com/threads/T-019ecb9a-9a08-733b-833d-76885fc8243a Co-authored-by: Amp <amp@ampcode.com>

Collapse three diverged test-execution paths onto a single TestRunner seam and one runVerification verdict helper (Design C). evaluate-done and verify-epic previously used a private spawn-based runTest that returned a bare boolean, so FE-872's infra-vs-test failureKind was only visible on the net run-tests path. - add VerificationOutcome/VerificationResult to types - add runVerification (test-runner.ts): the one place the >=1-and-all-pass verdict rule and the infra-dominates aggregate live; a throwing runner is an infra failure - delete pi-actions.runTest + evaluateVerificationTargets; thread TestRunner (and an injectable session factory for tests) through createPiActions - net-compiler run-tests now calls runVerification, dropping its inline loop + aggregate - evaluate-done and verify-epic now surface failureKind in their reports Amp-Thread-ID: https://ampcode.com/threads/T-019ecb9a-9a08-733b-833d-76885fc8243a Co-authored-by: Amp <amp@ampcode.com>

Co-authored-by: Cursor <cursoragent@cursor.com>

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit 371b6e4. Configure here.}

This was referenced Jun 15, 2026

FE-843: Toolchain profile expansion — TS runtimes + live profile selection #198

Open

FE-864: Plan orchestrator brownfield enhancements #212

Open

FE-867: Agent extension host — dual-mode pi-harness contract #213

Open

kostandinang mentioned this pull request Jun 15, 2026

FE-871: Brunch toolchain detection — detect + plan-emitter wiring #214

Open

kostandinang changed the title ~~FE-872: classify test-run failures as infra vs test (slice 1)~~ FE-872: Install-failure classification — infra-vs-test split + honest halt reason Jun 15, 2026

kostandinang changed the title ~~FE-872: Install-failure classification — infra-vs-test split + honest halt reason~~ FE-872: Install-failure classification — infra-vs-test split, honest halt, unified runner seam Jun 16, 2026

kostandinang force-pushed the ka/fe-872-dep-install-classification branch from a87e942 to ff041c6 Compare June 16, 2026 10:00

This was referenced Jun 16, 2026

FE-875: App runtime probe — boot, probe, classify #219

Open

FE-876: Integration oracle — reachability gate + grounding seam #220

Open

kostandinang changed the title ~~FE-872: Install-failure classification — infra-vs-test split, honest halt, unified runner seam~~ FE-872: Install-failure classification — infra-vs-test split Jun 16, 2026

This was referenced Jun 16, 2026

FE-877: Brownfield promotion — commit the cook result onto cook/<runId> #221

Open

FE-878: Brunch serve — one-shot plan-then-cook capstone #222

Open

kostandinang force-pushed the ka/fe-871-brunch-detect branch from c949e53 to a1afc98 Compare June 16, 2026 12:53

kostandinang force-pushed the ka/fe-872-dep-install-classification branch 3 times, most recently from 2cec280 to fecc5c6 Compare June 16, 2026 13:38

kostandinang mentioned this pull request Jun 16, 2026

FE-879: Lazy per-slice cook worktrees and shared node_modules for brownfield #223

Open

kostandinang marked this pull request as ready for review June 16, 2026 14:18

kostandinang mentioned this pull request Jun 16, 2026

FE-864: Orchestrator improvements #224

Open

cursor Bot reviewed Jun 16, 2026

View reviewed changes

Comment thread src/orchestrator/src/test-runner.ts Outdated

kostandinang force-pushed the ka/fe-872-dep-install-classification branch from 4ef65e5 to c902500 Compare June 16, 2026 23:45

kostandinang force-pushed the ka/fe-871-brunch-detect branch from f60b7ea to 625b1b5 Compare June 16, 2026 23:45

cursor Bot reviewed Jun 16, 2026

View reviewed changes

Comment thread src/orchestrator/src/net-compiler.ts

kostandinang force-pushed the ka/fe-871-brunch-detect branch from 625b1b5 to f0c931a Compare June 16, 2026 23:55

kostandinang force-pushed the ka/fe-872-dep-install-classification branch from c902500 to ba9c819 Compare June 16, 2026 23:55

kostandinang and others added 3 commits June 17, 2026 09:36

kostandinang and others added 4 commits June 17, 2026 09:36

FE-872: classify only missing runner spawn errors as infra

a2a916c

Co-authored-by: Cursor <cursoragent@cursor.com>

FE-872: avoid overclaiming infra halt details

371b6e4

Co-authored-by: Cursor <cursoragent@cursor.com>

kostandinang force-pushed the ka/fe-871-brunch-detect branch from f0c931a to a5eb05c Compare June 17, 2026 08:51

kostandinang force-pushed the ka/fe-872-dep-install-classification branch from ba9c819 to 371b6e4 Compare June 17, 2026 08:51

cursor Bot reviewed Jun 17, 2026

View reviewed changes

Comment thread src/orchestrator/src/test-runner.ts

kostandinang mentioned this pull request Jun 17, 2026

FE-881: Cook agent loads the target repo's sandbox-scoped skills #227

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FE-872: Install-failure classification — infra-vs-test split#218

FE-872: Install-failure classification — infra-vs-test split#218
kostandinang wants to merge 7 commits into
ka/fe-871-brunch-detectfrom
ka/fe-872-dep-install-classification

kostandinang commented Jun 15, 2026 •

edited

Loading

Uh oh!

kostandinang commented Jun 15, 2026 •

edited

Loading

Uh oh!

cursor Bot commented Jun 16, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

cursor Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

kostandinang commented Jun 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What?

Why?

Scope discipline

Deferred

Uh oh!

kostandinang commented Jun 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cursor Bot commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Summary

Uh oh!

Uh oh!

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

kostandinang commented Jun 15, 2026 •

edited

Loading

kostandinang commented Jun 15, 2026 •

edited

Loading

cursor Bot commented Jun 16, 2026 •

edited

Loading