Skip to content

FE-872: Install-failure classification — infra-vs-test split#218

Open
kostandinang wants to merge 7 commits into
ka/fe-871-brunch-detectfrom
ka/fe-872-dep-install-classification
Open

FE-872: Install-failure classification — infra-vs-test split#218
kostandinang wants to merge 7 commits into
ka/fe-871-brunch-detectfrom
ka/fe-872-dep-install-classification

Conversation

@kostandinang

@kostandinang kostandinang commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

Stacks on FE-871. Third Arc-1 frontier -- the FE-843-deferred fail/infra test-outcome split. No install verb: the install action stays agent-native (bash + FE-843 testConventions, A98).

What?

Make a broken toolchain distinguishable from a failing test, report it honestly, pin the promoted tree as reproducible, and make that classification visible everywhere tests run.

  • Slice 1 -- classify (TestResult.failureKind, classifyTestFailure): a failed run carries 'infra' | 'test'. Classified conservatively -- only an unambiguous "runner isn't there" signal is infra (spawn ENOENT, shell command not found). Everything else, incl. a missing module, stays test (ambiguous with a legitimate TDD red -- mislabeling would silently skip a real failure). The tests-run report surfaces an aggregate failureKind.
  • Slice 2 -- react (net-compiler.ts): when a slice exhausts retries because tests never ran, the halt reads toolchain/install failure (tests never ran in N attempts) instead of the misdirecting retry exhaustion.
  • Slice 3 -- greenfield dep capture (promote-run.test.ts): the agent's manifest + lockfile are pinned as a promotion invariant via git ls-files -- turning promoteGreenfieldRun's incidental copy into an asserted reproducible-tree guarantee.
  • Slice 4 -- unify the seam (runVerification): collapse the three diverged test-execution paths onto one TestRunner + one verdict helper, so failureKind is visible at every site, not just the net path.

Why?

TestResult was a single { passed } boolean, so a failed npm install looked identical to a logic bug -- sending the code-writer to "fix the code" while the toolchain never installed, then halting with a cause that named the wrong thing. And without a pinned capture invariant, a promoted tree could silently drop the lockfile. app-runtime-probe / integration-oracle depend on both: infra separated from test, and deps reproducible in the promoted tree.

Slice 4 closes the gap that slices 1-2 left: classification was only half-wired. evaluate-done and verify-epic ran tests through a private, spawn-based runTest that returned a bare boolean, so the new failureKind never reached them -- only the net run-tests path saw it. The duplicate execution path had also drifted (spawn vs spawnSync). Now there is one execution seam (TestRunner.run) and one verification seam (runVerification) owning the >=1-and-all-pass verdict rule and the infra-dominates aggregate; a runner that throws is treated as infra. pi-actions.runTest and evaluateVerificationTargets are deleted.

Scope discipline

Not a bespoke re-install arc -- the loop already retries and the agent re-installs via bash on its next turn. The harness owns only what bash install can't give: the classification, the honest terminal cause, the capture invariant, and one path that carries them.

Deferred

Brownfield dep-delta capture over the CoW baseline is blocked on brownfield-promotion (no brownfield promote path yet).

Co-authored-by: Amp amp@ampcode.com

@kostandinang kostandinang changed the title FE-872: classify test-run failures as infra vs test (slice 1) FE-872: Install-failure classification — infra-vs-test split + honest halt reason Jun 15, 2026
@kostandinang kostandinang changed the title FE-872: Install-failure classification — infra-vs-test split + honest halt reason FE-872: Install-failure classification — infra-vs-test split, honest halt, unified runner seam Jun 16, 2026
@kostandinang kostandinang force-pushed the ka/fe-872-dep-install-classification branch from a87e942 to ff041c6 Compare June 16, 2026 10:00
@kostandinang kostandinang changed the title FE-872: Install-failure classification — infra-vs-test split, honest halt, unified runner seam FE-872: Install-failure classification — infra-vs-test split Jun 16, 2026
@kostandinang kostandinang force-pushed the ka/fe-871-brunch-detect branch from c949e53 to a1afc98 Compare June 16, 2026 12:53
@kostandinang kostandinang force-pushed the ka/fe-872-dep-install-classification branch 3 times, most recently from 2cec280 to fecc5c6 Compare June 16, 2026 13:38
@kostandinang kostandinang marked this pull request as ready for review June 16, 2026 14:18
@cursor

cursor Bot commented Jun 16, 2026

Copy link
Copy Markdown

PR Summary

Medium Risk
Changes core cook verification, retry halt semantics, and shared test execution paths used by evaluate-done, verify-epic, and the Petri net; misclassification could still misroute retries, but behavior is conservative and well covered by new unit/contract tests.

Overview
FE-872 splits toolchain/install failures from test assertion failures so the cook loop and halt messages stop blaming code when the runner never ran.

Failed runs now carry optional failureKind: 'infra' | 'test' on TestResult, set by classifyTestFailure (only missing-runner signals count as infra; missing modules stay test for TDD reds). ToolchainTestRunner stamps this on failed spawns; runVerification is the single seam for the “≥1 target, all pass” verdict and infra-dominates aggregate, used by evaluate-done, verify-epic, and the net run-tests transition. The old pi-actions spawn/evaluateVerificationTargets path is removed; createPiActions takes an optional testRunner (wired from cook-cli).

When retries are exhausted with failureKind === 'infra', the slice halt reason becomes toolchain/install failure during verification instead of generic retry exhaustion. Reports (eval-done, epic-verified, tests-run) include failureKind.

Greenfield promotion gains a test that package.json and lockfile are tracked in git after promote (reproducible tree invariant). memory/PLAN.md and SPEC.md (A98) are updated for FE-872 / dogfood-spike status.

Reviewed by Cursor Bugbot for commit 371b6e4. Bugbot is set up for automated code reviews on this repo. Configure here.

Comment thread src/orchestrator/src/test-runner.ts Outdated
@kostandinang kostandinang force-pushed the ka/fe-872-dep-install-classification branch from 4ef65e5 to c902500 Compare June 16, 2026 23:45
@kostandinang kostandinang force-pushed the ka/fe-871-brunch-detect branch from f60b7ea to 625b1b5 Compare June 16, 2026 23:45
Comment thread src/orchestrator/src/net-compiler.ts
@kostandinang kostandinang force-pushed the ka/fe-871-brunch-detect branch from 625b1b5 to f0c931a Compare June 16, 2026 23:55
@kostandinang kostandinang force-pushed the ka/fe-872-dep-install-classification branch from c902500 to ba9c819 Compare June 16, 2026 23:55
kostandinang and others added 3 commits June 17, 2026 09:36
TestResult gains a failureKind?: 'infra' | 'test' discriminant so a
broken toolchain (missing runner binary / deps never installed) is no
longer indistinguishable from a logic failure that should send the
code-writer to fix the code.

ToolchainTestRunner.run classifies a failed run via classifyTestFailure,
deliberately conservative: only an unambiguous "the runner itself isn't
there" signal (spawn ENOENT, or a shell command-not-found) is infra;
everything else is test, because a missing module is ambiguous with a
legitimate TDD red and mislabeling a real failure as infra would
silently skip it. The tests-run net report surfaces an aggregate
failureKind (infra dominates) so consumers don't rescan results.

Amp-Thread-ID: https://ampcode.com/threads/T-019ecb9a-9a08-733b-833d-76885fc8243a
Co-authored-by: Amp <amp@ampcode.com>
When a slice exhausts its retry budget because the tests never executed
(toolchain missing / deps not installed), the halt reason now reads
"toolchain/install failure" instead of the misdirecting "retry
exhaustion", using the failureKind classified in slice 1.

Deliberately not a bespoke re-install net arc: the loop already loops
back and the agent re-installs natively via bash on its next turn, so the
harness only owns the honest terminal cause. Completes acceptance 1
(classify + react).

Amp-Thread-ID: https://ampcode.com/threads/T-019ecb9a-9a08-733b-833d-76885fc8243a
Co-authored-by: Amp <amp@ampcode.com>
…ice 3)

promoteGreenfieldRun's blanket copy already lands the manifest + lockfile
the cook agent produced; this turns that incidental behavior into an
asserted promotion invariant (git ls-files), guaranteeing a reproducible
promoted tree. Closes acceptance 2 for greenfield; brownfield dep-delta
capture stays blocked on the brownfield-promotion frontier.

Amp-Thread-ID: https://ampcode.com/threads/T-019ecb9a-9a08-733b-833d-76885fc8243a
Co-authored-by: Amp <amp@ampcode.com>
kostandinang and others added 4 commits June 17, 2026 09:36
Ran a real brownfield cook (hand-authored 2-slice plan + node:http app,
throwaway repo) to de-risk app-runtime-probe / integration-oracle before
building them.

Verdict: the chain works end-to-end (CoW worktree, clean-tree gate,
per-slice->__epic__ merge composed the wiring, TDD red/green, working
branch untouched). The agent wired the feature reachable and
self-authored a genuine boot-and-probe integration test — the orphan did
not reproduce. But reachability was agent-discretion, not enforced,
confirming the value of an independent integration-oracle. Two
refinements: app-runtime-probe should own the boot mechanism (the agent
had to invent a .js->.ts resolve hook); dep-install was unexercised
(zero-dep app). Bonus: the 'Cannot find module' TDD red was handled as a
test-red, not infra — validates FE-872 slice 1 live.

Marks dogfood-spike done; folds findings into app-runtime-probe and
integration-oracle; updates SPEC A98 to partially-validated.

Amp-Thread-ID: https://ampcode.com/threads/T-019ecb9a-9a08-733b-833d-76885fc8243a
Co-authored-by: Amp <amp@ampcode.com>
Collapse three diverged test-execution paths onto a single TestRunner seam
and one runVerification verdict helper (Design C). evaluate-done and
verify-epic previously used a private spawn-based runTest that returned a
bare boolean, so FE-872's infra-vs-test failureKind was only visible on the
net run-tests path.

- add VerificationOutcome/VerificationResult to types
- add runVerification (test-runner.ts): the one place the >=1-and-all-pass
  verdict rule and the infra-dominates aggregate live; a throwing runner is
  an infra failure
- delete pi-actions.runTest + evaluateVerificationTargets; thread TestRunner
  (and an injectable session factory for tests) through createPiActions
- net-compiler run-tests now calls runVerification, dropping its inline
  loop + aggregate
- evaluate-done and verify-epic now surface failureKind in their reports

Amp-Thread-ID: https://ampcode.com/threads/T-019ecb9a-9a08-733b-833d-76885fc8243a
Co-authored-by: Amp <amp@ampcode.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
@kostandinang kostandinang force-pushed the ka/fe-871-brunch-detect branch from f0c931a to a5eb05c Compare June 17, 2026 08:51
@kostandinang kostandinang force-pushed the ka/fe-872-dep-install-classification branch from ba9c819 to 371b6e4 Compare June 17, 2026 08:51

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 371b6e4. Configure here.

Comment thread src/orchestrator/src/test-runner.ts
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant