C++ search#210
Draft
ms609 wants to merge 790 commits into
Draft
Conversation
ms609
added a commit
that referenced
this pull request
Mar 28, 2026
ms609
added a commit
that referenced
this pull request
Mar 28, 2026
F-008: Fix constrained drift constraint staleness (T-279)
F-T-245: TBR 4-wide candidate batching (EW flat path)
Stage 4 results analysed (G-001): syab07205/206t starvation at 60s from full-TBR polish per PR cycle (~7s x 5 = 35s overhead). Agent E implemented pruneReinsertNni fix on feature/tbr-batch; Stage 5 scripts uploaded and submitted to Hamilton (SLURM 16622224, ~4-6h).
…f Stage 5 running
…nt-G state updated
…atch deleted after PR #238 merge
…ignores constraints P3)
Hamilton SLURM 16622421 (7h, EPYC 7702). 5 large-tree datasets (131-206t), 20 seeds, 60/120s budgets, EW scoring. pr_nni: wins 7/10 expected-best conditions. Huge benefit on project3701 (146t, -178 median at 60s). Modest at 173-180t. Slight regression at 206t (+12-34 EB). pr_tbr: harmful (1/9 wins; total starvation at 206t/60s). Decision: not enabled in large preset. Available via SearchControl().
Tier guidance: 5 (smoke), 10 (screening), 20 (comparison), 30 (definitive). Calibrated from T-289f Stage 5 empirical significance results. Cross-reference added to AGENTS.md.
Add ClipOrder enum, TBRPassRecord struct, and per-pass diagnostic counters to tbr_search() (guarded behind TBRParams::diagnostics=true). Add ts_tbr_diagnostics() Rcpp bridge returning per-pass data frame. Add order_clips() helper implementing RANDOM/INV_WEIGHT/TIPS_FIRST/BUCKET strategies (Phase 2 infrastructure, disabled by default). Add diag_clip_ordering.R to characterise baseline behaviour. Diagnostic results (10 seeds × 4 datasets, random Wagner starts): Tip-clip enrichment in productive passes: 0.43–0.76× Tip clips (~51% of all clips) account for only 22–38% of accepted moves. Medium-small clips (2..sqrt(n)) appear most productive. CONCLUSION (Phase 4): the small/tip-first hypothesis is FALSIFIED. All three proposed variants (INV_WEIGHT, TIPS_FIRST, BUCKET) favour tip clips, which are the LEAST productive clip type. Phase 2–3 skipped. Branch will be closed after coordination notes are updated.
Phase 1 diagnostic completed 2026-03-29. Hypothesis falsified: tip clips are UNDER-represented in TBR acceptances (0.43-0.76x enrichment across 4 datasets). Medium-small clips most productive. All three ordering variants (inv-weight, tips-first, bucket) favour tips — counterproductive. Branch feature/weighted-clip-order closed. See completed-tasks.md entry PA-001 and AGENTS.md item 12.
5 datasets (62-180t), 20 seeds, EW/IW10/IW3. IW hypothesis weak signal (closed). Real finding: XSS benefit scales with tree size. At 180t: TAEB delta -6.8 to -9.8 EW steps (12-19% overhead). At ≤88t: zero TAEB benefit. No preset change needed.
Stage 5 benchmark (SLURM 16622483, EPYC 7702, 5 datasets 131-206t, 10 seeds, 60s+120s) showed pr_nni (NNI full-tree polish) fixes the Stage 4 showstopper (0 reps at 206t/60s) while improving 131-180t: project3701 (146t): -178 steps at 60s, -128 at 120s project804 (173t): -9 / -2 steps mbank_X30754(180t): -4 / -7 steps syab07205 (206t): +17.5 at 60s, neutral at 120s Enable in large preset: pruneReinsertCycles=5L, pruneReinsertNni=TRUE. Update AGENTS.md and completed-tasks.md. Results in dev/benchmarks/t289f_pr_nni_polish.csv.
…_search When params.nni_full is true but a ConstraintData is active, guard falls through to TBR (which enforces constraints). One-line change mirroring the nni_wagner guard in ts_driven.cpp. Only affects users who combine pruneReinsertNni=TRUE with topological constraints; no preset does this. Also: S-COORD round 46 (task queue, PR status), to-do cleanup.
Agents now check remote-jobs.md at /assign time (new step 4) for retrievable results before claiming tasks. Prevents SLURM results from being silently lost across conversation boundaries.
C++ instrumentation of tbr_search() with post-acceptance sector-masked TBR on clip subtree. Hit rate ~35% regardless of scoring mode (no IW-specific benefit), but NET HARMFUL: disrupts global TBR trajectory. mbank_X30754 EW: +17 to +34 steps TAEB at 30-120s. Validates existing pipeline design (XSS as separate post-convergence phase). Closed.
Phase 1 (a159311) added diagnostic instrumentation and the TIPS_FIRST, INV_WEIGHT, BUCKET, ANTI_TIP, LARGE_FIRST ordering variants to ts_tbr.cpp. Phase 2 completes the implementation: Bug fix: clip_order was only propagated to the initial TBR and final TBR polish (~10% of replicate time). The ratchet and all sectorial TBR calls defaulted to RANDOM, making the ordering variants effectively inert for the dominant phase (ratchet ~76%). Fix: add clip_order field to RatchetParams and SectorParams, propagate from SearchControl through ts_driven.cpp into every TBR call site in ts_ratchet.cpp and ts_sector.cpp (6 sites + search_sector signature). Empirical validation (5 seeds, 30s, default config): Agnarsson2004 (62t, default preset): TIPS_FIRST -2%, INV_WEIGHT neutral Zhu2013 (75t, thorough preset): TIPS_FIRST +13%, INV_WEIGHT +9% Dikow2009 (88t, thorough preset): TIPS_FIRST +8%, INV_WEIGHT +3% Theoretical model (Poisson bucket, corrected): TIPS_FIRST saves ~48% per productive TBR pass at 88t; practical throughput gain is ~8-13% because null passes (ordering-invariant, exhaust all clips) dilute savings. Benefit is dataset-size dependent: < ~65t: tip enrichment is low (Agnarsson2004: 0.43); TIPS_FIRST neutral 65-120t (thorough): tip enrichment moderate; TIPS_FIRST +8-13% No preset defaults changed yet — pending GHA 10-seed validation. bench_clip_ordering.R contains the full benchmark driver.
The SearchControl.Rd usage section was generated from an old installed
build (missing clipOrder and many parameters added since). The codoc
check correctly flagged the mismatch.
- Added @param clipOrder documentation in R/SearchControl.R
- Regenerated man/SearchControl.Rd with correct \usage and \item{clipOrder}
TBR clip-ordering strategy (SearchControl clipOrder)
…253) The parallel NA search (nThreads>=2) intermittently aborted with STATUS_HEAP_CORRUPTION. Root cause: the per-thread scratch in the TBR kernel (ts_tbr.cpp, ts_fitch.cpp) and exact_verify_sweep's optimum cache were function-local `static thread_local`. On MinGW these resolve via emutls, whose thread_local teardown across std::thread spawn/exit corrupts the heap. EW is unaffected (light TLS); the NA path trips it because exact_verify adds a thread_local unordered_set plus more scratch. Fix: convert all worker-reachable scratch to plain function-locals (each worker owns its call frame -> per-thread-safe; per-clip realloc measured <=1.6% on 88-tip data, ~0% typical). Move exact_verify_sweep's optimum memoization to mutable members on DataSet so it keeps the same per-worker, cross-replicate persistence the thread_local had, without emutls. Verified on clean builds (rm src/*.o; CCACHE_DISABLE=1; --preclean): parallel NA survives 120/120 (was iter ~4-8), EW 200/200, serial scores bit-identical, NA perf 4.15s (cache intact, vs 5.81s cache-disabled). Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
…ot loops Bank the validated micro-lever sweep from branch claude/tbr-microlevers (task #48). All changes are BYTE-IDENTICAL: score + candidates_evaluated unchanged on Wortley2006/Zhu2013/Zanol2014 x seed{1,2} (verify_l1.R, 6/6). THE WIN — a diagnostic std::getenv("TS_REVERT_CHECK") left in the per-clip teardown (~100k+ calls/search) was costing 13-19% of EW MaximizeParsimony wall on Windows/ucrt, where getenv is us-scale (locked env-block scan), not sub-ns. Hoisted to a per-call bool. Quiet-machine same-seed paired A/B: Zanol -13.2% (20/20, p=0), Zhu -19.1% (12/12, p=0); 3-way attribution proves the getenv hoist alone is the entire win. Also folded in (both byte-identical, both ~0 measured effect, kept as exact cleanups): - cutoff hoist: maintain the EW/NA bail cutoff across the clip, recompute only on improvement (+0.00%, attribution-proven). - kept_ei: precompute sub_edge-invariant reroot skip predicates once per clip (marginal/wash even at Zanol-1261; droppable). Caveat: getenv magnitude is env-size + platform dependent (Windows/ucrt large; Linux cheaper) — Hamilton/Linux confirmation owed. Byte-identical and strictly removes ~100k getenv/search regardless. Detail: dev/profiling/findings.md T-P5n + dev/profiling/tbr-microlever-sweep.md Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Lands the sectorial component-isolation round (gate 1) onto cpp-search.
ts_sector.cpp: 3 byte-identical micro-levers (search_sector ras_starts==1
fast path; compute_from_above new_from_above alloc->std::swap; TS_FREE_HTU_PROBE
/ TS_SECT_DEBUG getenv -> cached static — the per-sector getenvs T-P5n flagged
"for the sectorial agent"). ~2.8% isolated sectorial; verified score +
n_sectors byte-identical across {Zanol,Zhu,Wortley}x{wagner,tbr}xseed{1,2} and
the mission A/B; 8 search test files pass.
Harness: drivers/sector-rss.R (isolated rss), drivers/mission-getenv-ab.R
(full-search getenv A/B), run_sector_tests.R, microbench/bench_getenv.cpp
(std::getenv = 2398 ns/call on ucrt), sector-levers.patch, PRODUCTION-LEVERS.md.
focus-areas.md: RSS #3 / CSS-XSS #6 -> AT-LIMIT (throughput at-limit by
inheritance; ~96% is the at-limit inner+global tbr_search).
findings.md T-S6a-e left uncommitted to land with the in-flight T-P5n/o rows.
The mission-wide getenv finding (T-S6d) converges with T-P5n (commit 3a50537e).
Measurement-only instrumentation NOT included (env-gated, stayed on the
scratch worktree branch).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Completes beb5213's getenv-hoist work. TS_PHYS_REROOT was the last diagnostic getenv read inside tbr_search's hot/warm path -- the outer reroot-control loop (>=1x/call) at ts_tbr.cpp:2133. Hoist it to a per-call const bool at function entry (1277), matching revert_check / iw_scanchk from beb5213. It selects the legacy physical-reroot reference path and is constant for the process, so this is byte-identical. std::getenv is ~2.4us/call on Windows/ucrt (linear env-block scan) and its cost hides in VTune's ucrtbase self-time. The headline per-clip TS_REVERT_CHECK win (~13-22% EW wall) was ALREADY merged in beb5213 (findings T-P5n) and confirmed gone by the post-getenv re-survey (T-P5o); this only clears the small residual per-reroot read. Verify: ts-tbr-search + ts-ratchet-search 45/45 pass, CustomSearch clean (isolated install). findings.md: add T-P5l/m cross-ref notes so future readers trusting "TBR kernel at-limit / closed" don't miss that a profiler-invisible getenv cost sat next to the at-limit kernel. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…etenv out of tbr_search + findings T-P5l/m at-limit≠no-overhead correction
tree_fuse segfaulted on >64-tip data (wps>=2) with intraFuse=TRUE: reroot_at_tip0 ran once before the round loop, but the round-end TBR moves tip 0 off the root, so round >=2 split-matching matched a clade against its complement and replace_subtree corrupted the tree. Fix = re-root every round (early-returns when already rooted, so round 1 is byte-identical) + a defensive replace_subtree size guard. Ported from TreeSearch-nonclade; 22/22 fuse tests pass incl. an 80-tip regression test. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> (cherry picked from commit da21f5dce652ee85d12e1e50d1cba349c74824b4)
…n plan - dev/red-team/proofs/lever-c-bound-then-verify.md: 531-line durable proof settling lever-c (bound-then-verify) as dead-by-proof-plus-magnitude (no-forced-step lemma + net-overhead inequality + origin-recovery cap). - dev/plans/2026-06-20-fuse-drift-isolation.md: fuse/drift component isolation plan + progress log (gate-1 AT-LIMIT complete across all components; fuse >64t crash fix; SCOREAPPROX finding T-F1). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The union-of-finals Y = final(A)|final(D) is an APPROXIMATION that UNDER-counts insertion cost (it is a superset of the true directional edge set), not an exact non-additive method. The exact cost is the directional edge_set[D] = combine(prelim[D], up[D]) via compute_insertion_edge_sets + fitch_indirect_length_cached. Comment now matches ts_fitch.h and the validated directional-fix finding. Doc-only; no behaviour change. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Backwards comment fixed+landed (8671fda); Δ-probe showed ~48% greedy-regret share in expand_and_reinsert on Zanol; exact-scorer port prepared+validated on worktree (41b0d237); heavy A/B path-killed (no mission dataset >=120t, so prune_reinsert never auto-enables) → land+A/B deferred to composition #40. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
ts_search.cpp spr_search uses the bounded scorer but is off-default (sprFirst=FALSE everywhere), exact-verify-gated (never false-accepts), and a warmup washed by the subsequent exact tbr_search → silent-miss mooted, no action. All remaining union-of-finals sites accounted for and benign. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Read of ts_driven.cpp orchestration: per-phase score_tree prints are verbosity>=2-gated (off at default v=1L); only un-gated full rescores are the 2 per-outer-cycle convergence checks (~µs each, ~0.001% wall, one redundant but sub-floor) + 1 final/replicate. Step-switching minimal (each phase owns its state). R/C marshalling already T-P5o'd as amortizable. Last undone non-gated aspect of the isolation plan; addressable wall now lives in composition #40. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…I blocker - ns=9 representation/bit-packing reopen CLOSED analytically: transposed bitset already bit-dense (0.14 op/pattern); states-per-word packing serializes patterns -> strictly worse; scalar reopen is ns<=4 only. - Cherry-pick build-check PASSED (HEAD: fuse 22/0, tbr 28/0, prune 44/0). - Hamilton mission-KPI re-measure BLOCKED: ratchet 12->6 flip is uncommitted shared WIP; cannot define clean reproducible code-state unattended. Flagged for user. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
… (closure holds) Settles the unverified literature pillar of the T-P5p TBR closure. Primary- source check (one full-text chapter + Goloboff 1996 abstract): TNT/Goloboff builds equivalent two-pass down+up state sets and scores reinsertion by a root-to-root comparison — same structure as TreeSearch's edge_set[D]. Disambiguates two amortization levels: Level-1 (per-candidate, within one clip) TS already matches (full-text confirmed); Level-2 (per-clip incremental view derivation) = the already-deferred lever-b, supported by the 1996 abstract only (unread full text) → revisit at large-N, not via literature. TBR closure HOLDS, now on stronger evidence. Minor: Goloboff's up-aware approximate "check one node" screen differs from lever-c's up-ignoring admissible bounds (flagged if lever-c ever reopens; it screens, doesn't bound). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Profiling (T-P5d, 2026-06-19) found the ratchet over-provisioned: halving cycles saved 20-38% wall on the mid-size EW benchmarks (Wills/Zanol/Zhu/ Giles) at zero quality loss (gapB unchanged at full budget). Flips the formal SearchControl default and the `default` strategy preset; updates the vignette. The `large` preset deliberately keeps 12 (large-tree tradeoff, T-179). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…n race (#39 closed) Post-flip cpp-search KPI (Hamilton, freshness-asserted ratchetCycles==6L): - QUALITY CLOSED: TS reaches the optimum on every dataset/seed; on Zanol (ns=9) TS is the ONLY reliably-1261 config (TNT fast configs miss +1). - Wall gap is NOT algorithmic: candidate-efficiency ~1.2-1.9x near-parity (count-based), throughput ~2x at-limit; the 8-110x is a default-budget mismatch (TS default heavy / TNT default light), corrected from an initial overreach (advisor). - #39 CLOSED: ratchet isolated race = cycle-quality PARITY (TNT does NOT reach the optimum in fewer reweight cycles) + ~2x at-limit throughput, no lever. - Component-isolation program now COMPLETE; only composition #40 (gated, modest + reliability-bounded) remains. Adds the ratchet-race driver, KPI CSVs, and the component-isolation plan STATUS. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…te before #40 All components closed on both gates (ratchet cycle-parity race 2026-06-21). Resolve the stale Next-task/TBD sections; add the pre-composition fresh-eyes re-audit gate. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…kernels stand Adversarial re-audit (27 agents, 8 lenses): 18 candidates -> 3 survived -> 15 killed. Core kernel/TBR throughput verdicts STAND; no second getenv-class hotspot. Survivors: fuse value stale post-reroot-fix (#55, re-measuring), sectorial column-axis reduction (#56), x4 reroot wasted-block (#57, weak). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…lock counter (#57) #55: capture FuseResult + verbosity>=2 'Fuse attempt' print (pool size + n_exchanges) to distinguish fires-but-useless from never-fires (pool-collapse). #57: TS_AUDIT_PROBE-gated counter in fitch_indirect_cached_flat_x4 measuring blocks scanned past each member's individual bail (the x4 'deepest-bailing member' ceiling). Default build unaffected (counter fully #ifdef'd; print is verbosity>=2 only). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…UDIT_PROBE) Measures the realized informative-within-sector fraction on the actual sectors: a char is droppable iff some state is shared by ALL sector tips (incl HTU) -> 0 Fitch steps -> ranking-preserving. fp/tot_blocks = the no-bail precompute saving (compute_insertion_edge_sets scans all n_blocks/node). Inert in production. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…s (TS_AUDIT_PROBE) RAII timer to measure the no-bail precompute's share of SECTOR wall (the load-bearing multiplier in #56's saving estimate). Inert in production.
…e namespace ts The precompute timer's mid-namespace #include <chrono> made std::ratio parse as ts::std::ratio (compile error under -DTS_AUDIT_PROBE). Move includes to global scope. Default build unaffected (all ifdef'd).
…OLREDUCE)
Drops characters constant-within-{sector tips + HTU} (0 Fitch steps -> scores
stay exact) and re-packs informative survivors into fewer blocks, shrinking the
per-node block scan in the inner-sector TBR (esp. the no-bail precompute
compute_insertion_edge_sets). EW only (weight 1, no upweight, no inapplicable).
Off by default.
Validated (Hamilton 17533059): dScore=0 on 9/9 full searches, valgrind clean,
adversarial review verified the 0-step invariance + bit arithmetic. The review
also caught (and this fixes) a stale rd.subtree stride that would OOB. rss-
isolated saving: Giles 17%, Zhu 9%, Zanol ~0% (uniform ns=9 = least reduction =
the load-bearing case). Changes the search trajectory on mixed-n_states data
(dCand!=0, equally-optimal path) => OPT-IN, NOT a default flip. Before any
default-on: run a sector-score oracle (reduced vs full, same topology, mixed
state); an accept-gated search cannot discriminate a masked packing bug.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Every MaximizeParsimony/SearchControl switch + preset + the opt-in env levers (TS_SECT_COLREDUCE), each with a when-relevant assessment grounded in this session's findings (ratchet 6, fuse=dead-weight, col-reduce mixed-state-only, rasStarts, prune-reinsert >=120t, clipOrder, the 3x trailing-TBR consolidation). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…5-6% slower) Force-scalar A/B (17533065): GATE dScore=0 & dCand=0 9/9; speedup x4/scalar Giles 0.939 / Zhu 0.945 / Zanol 1.001. ~1.9% waste ceiling not realizable (x4 ILP covers it). All 3 audit survivors now resolved. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ot global Jobs 17533071 (20-rep) + 17541277 (40-rep), 3 seeds EW. clipOrder=2 (tips-first) ~1.25x faster / ~26% fewer candidates, but biases the trajectory: clean ~1.5x win on Zanol (uniform ns=9, 3/3 optima); +1 quality tradeoff on Zhu that 2x budget does NOT recover; wall-unstable on Giles. Complements TS_SECT_COLREDUCE. Default stays 0L. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Manual testing underway; shiny app in particular has some usability issues.