Skip to content

C++ search#210

Draft
ms609 wants to merge 790 commits into
mainfrom
cpp-search
Draft

C++ search#210
ms609 wants to merge 790 commits into
mainfrom
cpp-search

Conversation

@ms609

@ms609 ms609 commented Mar 19, 2026

Copy link
Copy Markdown
Owner
  • other optimizations + features

Manual testing underway; shiny app in particular has some usability issues.

@ms609 ms609 marked this pull request as draft March 25, 2026 14:21
ms609 added a commit that referenced this pull request Mar 28, 2026
ms609 added 27 commits March 28, 2026 17:44
F-008: Fix constrained drift constraint staleness (T-279)
F-T-245: TBR 4-wide candidate batching (EW flat path)
Stage 4 results analysed (G-001): syab07205/206t starvation at 60s from
full-TBR polish per PR cycle (~7s x 5 = 35s overhead). Agent E implemented
pruneReinsertNni fix on feature/tbr-batch; Stage 5 scripts uploaded and
submitted to Hamilton (SLURM 16622224, ~4-6h).
Hamilton SLURM 16622421 (7h, EPYC 7702). 5 large-tree datasets
(131-206t), 20 seeds, 60/120s budgets, EW scoring.

pr_nni: wins 7/10 expected-best conditions. Huge benefit on
project3701 (146t, -178 median at 60s). Modest at 173-180t.
Slight regression at 206t (+12-34 EB).

pr_tbr: harmful (1/9 wins; total starvation at 206t/60s).

Decision: not enabled in large preset. Available via SearchControl().
Tier guidance: 5 (smoke), 10 (screening), 20 (comparison), 30 (definitive).
Calibrated from T-289f Stage 5 empirical significance results.
Cross-reference added to AGENTS.md.
Add ClipOrder enum, TBRPassRecord struct, and per-pass diagnostic
counters to tbr_search() (guarded behind TBRParams::diagnostics=true).
Add ts_tbr_diagnostics() Rcpp bridge returning per-pass data frame.
Add order_clips() helper implementing RANDOM/INV_WEIGHT/TIPS_FIRST/BUCKET
strategies (Phase 2 infrastructure, disabled by default).
Add diag_clip_ordering.R to characterise baseline behaviour.

Diagnostic results (10 seeds × 4 datasets, random Wagner starts):
  Tip-clip enrichment in productive passes: 0.43–0.76×
  Tip clips (~51% of all clips) account for only 22–38% of accepted moves.
  Medium-small clips (2..sqrt(n)) appear most productive.

CONCLUSION (Phase 4): the small/tip-first hypothesis is FALSIFIED.
All three proposed variants (INV_WEIGHT, TIPS_FIRST, BUCKET) favour
tip clips, which are the LEAST productive clip type. Phase 2–3 skipped.
Branch will be closed after coordination notes are updated.
Phase 1 diagnostic completed 2026-03-29. Hypothesis falsified:
tip clips are UNDER-represented in TBR acceptances (0.43-0.76x
enrichment across 4 datasets). Medium-small clips most productive.
All three ordering variants (inv-weight, tips-first, bucket) favour
tips — counterproductive. Branch feature/weighted-clip-order closed.
See completed-tasks.md entry PA-001 and AGENTS.md item 12.
5 datasets (62-180t), 20 seeds, EW/IW10/IW3. IW hypothesis weak signal
(closed). Real finding: XSS benefit scales with tree size. At 180t:
TAEB delta -6.8 to -9.8 EW steps (12-19% overhead). At ≤88t: zero
TAEB benefit. No preset change needed.
Stage 5 benchmark (SLURM 16622483, EPYC 7702, 5 datasets 131-206t,
10 seeds, 60s+120s) showed pr_nni (NNI full-tree polish) fixes the
Stage 4 showstopper (0 reps at 206t/60s) while improving 131-180t:

  project3701 (146t): -178 steps at 60s, -128 at 120s
  project804  (173t): -9 / -2 steps
  mbank_X30754(180t): -4 / -7 steps
  syab07205   (206t): +17.5 at 60s, neutral at 120s

Enable in large preset: pruneReinsertCycles=5L, pruneReinsertNni=TRUE.
Update AGENTS.md and completed-tasks.md. Results in
dev/benchmarks/t289f_pr_nni_polish.csv.
…_search

When params.nni_full is true but a ConstraintData is active, guard
falls through to TBR (which enforces constraints). One-line change
mirroring the nni_wagner guard in ts_driven.cpp. Only affects users
who combine pruneReinsertNni=TRUE with topological constraints; no
preset does this.

Also: S-COORD round 46 (task queue, PR status), to-do cleanup.
Agents now check remote-jobs.md at /assign time (new step 4) for
retrievable results before claiming tasks. Prevents SLURM results
from being silently lost across conversation boundaries.
C++ instrumentation of tbr_search() with post-acceptance sector-masked
TBR on clip subtree. Hit rate ~35% regardless of scoring mode (no
IW-specific benefit), but NET HARMFUL: disrupts global TBR trajectory.
mbank_X30754 EW: +17 to +34 steps TAEB at 30-120s. Validates existing
pipeline design (XSS as separate post-convergence phase). Closed.
Phase 1 (a159311) added diagnostic instrumentation and the TIPS_FIRST,
INV_WEIGHT, BUCKET, ANTI_TIP, LARGE_FIRST ordering variants to ts_tbr.cpp.
Phase 2 completes the implementation:

Bug fix: clip_order was only propagated to the initial TBR and final TBR
polish (~10% of replicate time). The ratchet and all sectorial TBR calls
defaulted to RANDOM, making the ordering variants effectively inert for
the dominant phase (ratchet ~76%).

Fix: add clip_order field to RatchetParams and SectorParams, propagate
from SearchControl through ts_driven.cpp into every TBR call site in
ts_ratchet.cpp and ts_sector.cpp (6 sites + search_sector signature).

Empirical validation (5 seeds, 30s, default config):
  Agnarsson2004 (62t, default preset): TIPS_FIRST -2%, INV_WEIGHT neutral
  Zhu2013       (75t, thorough preset): TIPS_FIRST +13%, INV_WEIGHT +9%
  Dikow2009     (88t, thorough preset): TIPS_FIRST +8%, INV_WEIGHT +3%

Theoretical model (Poisson bucket, corrected): TIPS_FIRST saves ~48%
per productive TBR pass at 88t; practical throughput gain is ~8-13%
because null passes (ordering-invariant, exhaust all clips) dilute savings.

Benefit is dataset-size dependent:
  < ~65t: tip enrichment is low (Agnarsson2004: 0.43); TIPS_FIRST neutral
  65-120t (thorough): tip enrichment moderate; TIPS_FIRST +8-13%

No preset defaults changed yet — pending GHA 10-seed validation.
bench_clip_ordering.R contains the full benchmark driver.
The SearchControl.Rd usage section was generated from an old installed
build (missing clipOrder and many parameters added since). The codoc
check correctly flagged the mismatch.

- Added @param clipOrder documentation in R/SearchControl.R
- Regenerated man/SearchControl.Rd with correct \usage and \item{clipOrder}
 TBR clip-ordering strategy (SearchControl clipOrder)
ms609 and others added 30 commits June 20, 2026 11:45
…253)

The parallel NA search (nThreads>=2) intermittently aborted with
STATUS_HEAP_CORRUPTION.  Root cause: the per-thread scratch in the TBR
kernel (ts_tbr.cpp, ts_fitch.cpp) and exact_verify_sweep's optimum cache
were function-local `static thread_local`.  On MinGW these resolve via
emutls, whose thread_local teardown across std::thread spawn/exit corrupts
the heap.  EW is unaffected (light TLS); the NA path trips it because
exact_verify adds a thread_local unordered_set plus more scratch.

Fix: convert all worker-reachable scratch to plain function-locals (each
worker owns its call frame -> per-thread-safe; per-clip realloc measured
<=1.6% on 88-tip data, ~0% typical).  Move exact_verify_sweep's optimum
memoization to mutable members on DataSet so it keeps the same per-worker,
cross-replicate persistence the thread_local had, without emutls.

Verified on clean builds (rm src/*.o; CCACHE_DISABLE=1; --preclean):
parallel NA survives 120/120 (was iter ~4-8), EW 200/200, serial scores
bit-identical, NA perf 4.15s (cache intact, vs 5.81s cache-disabled).

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
…ot loops

Bank the validated micro-lever sweep from branch claude/tbr-microlevers
(task #48). All changes are BYTE-IDENTICAL: score + candidates_evaluated
unchanged on Wortley2006/Zhu2013/Zanol2014 x seed{1,2} (verify_l1.R, 6/6).

THE WIN — a diagnostic std::getenv("TS_REVERT_CHECK") left in the per-clip
teardown (~100k+ calls/search) was costing 13-19% of EW MaximizeParsimony
wall on Windows/ucrt, where getenv is us-scale (locked env-block scan), not
sub-ns. Hoisted to a per-call bool. Quiet-machine same-seed paired A/B:
Zanol -13.2% (20/20, p=0), Zhu -19.1% (12/12, p=0); 3-way attribution proves
the getenv hoist alone is the entire win.

Also folded in (both byte-identical, both ~0 measured effect, kept as exact
cleanups):
  - cutoff hoist: maintain the EW/NA bail cutoff across the clip, recompute
    only on improvement (+0.00%, attribution-proven).
  - kept_ei: precompute sub_edge-invariant reroot skip predicates once per
    clip (marginal/wash even at Zanol-1261; droppable).

Caveat: getenv magnitude is env-size + platform dependent (Windows/ucrt
large; Linux cheaper) — Hamilton/Linux confirmation owed. Byte-identical and
strictly removes ~100k getenv/search regardless.

Detail: dev/profiling/findings.md T-P5n + dev/profiling/tbr-microlever-sweep.md

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Lands the sectorial component-isolation round (gate 1) onto cpp-search.

ts_sector.cpp: 3 byte-identical micro-levers (search_sector ras_starts==1
fast path; compute_from_above new_from_above alloc->std::swap; TS_FREE_HTU_PROBE
/ TS_SECT_DEBUG getenv -> cached static — the per-sector getenvs T-P5n flagged
"for the sectorial agent"). ~2.8% isolated sectorial; verified score +
n_sectors byte-identical across {Zanol,Zhu,Wortley}x{wagner,tbr}xseed{1,2} and
the mission A/B; 8 search test files pass.

Harness: drivers/sector-rss.R (isolated rss), drivers/mission-getenv-ab.R
(full-search getenv A/B), run_sector_tests.R, microbench/bench_getenv.cpp
(std::getenv = 2398 ns/call on ucrt), sector-levers.patch, PRODUCTION-LEVERS.md.

focus-areas.md: RSS #3 / CSS-XSS #6 -> AT-LIMIT (throughput at-limit by
inheritance; ~96% is the at-limit inner+global tbr_search).

findings.md T-S6a-e left uncommitted to land with the in-flight T-P5n/o rows.
The mission-wide getenv finding (T-S6d) converges with T-P5n (commit 3a50537e).
Measurement-only instrumentation NOT included (env-gated, stayed on the
scratch worktree branch).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Completes beb5213's getenv-hoist work. TS_PHYS_REROOT was the last
diagnostic getenv read inside tbr_search's hot/warm path -- the outer
reroot-control loop (>=1x/call) at ts_tbr.cpp:2133. Hoist it to a
per-call const bool at function entry (1277), matching revert_check /
iw_scanchk from beb5213. It selects the legacy physical-reroot
reference path and is constant for the process, so this is byte-identical.

std::getenv is ~2.4us/call on Windows/ucrt (linear env-block scan) and
its cost hides in VTune's ucrtbase self-time. The headline per-clip
TS_REVERT_CHECK win (~13-22% EW wall) was ALREADY merged in beb5213
(findings T-P5n) and confirmed gone by the post-getenv re-survey
(T-P5o); this only clears the small residual per-reroot read.

Verify: ts-tbr-search + ts-ratchet-search 45/45 pass, CustomSearch
clean (isolated install).

findings.md: add T-P5l/m cross-ref notes so future readers trusting
"TBR kernel at-limit / closed" don't miss that a profiler-invisible
getenv cost sat next to the at-limit kernel.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…etenv out of tbr_search + findings T-P5l/m at-limit≠no-overhead correction
tree_fuse segfaulted on >64-tip data (wps>=2) with intraFuse=TRUE: reroot_at_tip0
ran once before the round loop, but the round-end TBR moves tip 0 off the root, so
round >=2 split-matching matched a clade against its complement and replace_subtree
corrupted the tree. Fix = re-root every round (early-returns when already rooted, so
round 1 is byte-identical) + a defensive replace_subtree size guard. Ported from
TreeSearch-nonclade; 22/22 fuse tests pass incl. an 80-tip regression test.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
(cherry picked from commit da21f5dce652ee85d12e1e50d1cba349c74824b4)
…n plan

- dev/red-team/proofs/lever-c-bound-then-verify.md: 531-line durable proof
  settling lever-c (bound-then-verify) as dead-by-proof-plus-magnitude
  (no-forced-step lemma + net-overhead inequality + origin-recovery cap).
- dev/plans/2026-06-20-fuse-drift-isolation.md: fuse/drift component
  isolation plan + progress log (gate-1 AT-LIMIT complete across all
  components; fuse >64t crash fix; SCOREAPPROX finding T-F1).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The union-of-finals Y = final(A)|final(D) is an APPROXIMATION that
UNDER-counts insertion cost (it is a superset of the true directional
edge set), not an exact non-additive method. The exact cost is the
directional edge_set[D] = combine(prelim[D], up[D]) via
compute_insertion_edge_sets + fitch_indirect_length_cached. Comment now
matches ts_fitch.h and the validated directional-fix finding. Doc-only;
no behaviour change.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Backwards comment fixed+landed (8671fda); Δ-probe showed ~48% greedy-regret
share in expand_and_reinsert on Zanol; exact-scorer port prepared+validated on
worktree (41b0d237); heavy A/B path-killed (no mission dataset >=120t, so
prune_reinsert never auto-enables) → land+A/B deferred to composition #40.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
ts_search.cpp spr_search uses the bounded scorer but is off-default
(sprFirst=FALSE everywhere), exact-verify-gated (never false-accepts), and a
warmup washed by the subsequent exact tbr_search → silent-miss mooted, no
action. All remaining union-of-finals sites accounted for and benign.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Read of ts_driven.cpp orchestration: per-phase score_tree prints are
verbosity>=2-gated (off at default v=1L); only un-gated full rescores are the
2 per-outer-cycle convergence checks (~µs each, ~0.001% wall, one redundant but
sub-floor) + 1 final/replicate. Step-switching minimal (each phase owns its
state). R/C marshalling already T-P5o'd as amortizable. Last undone non-gated
aspect of the isolation plan; addressable wall now lives in composition #40.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…I blocker

- ns=9 representation/bit-packing reopen CLOSED analytically: transposed
  bitset already bit-dense (0.14 op/pattern); states-per-word packing
  serializes patterns -> strictly worse; scalar reopen is ns<=4 only.
- Cherry-pick build-check PASSED (HEAD: fuse 22/0, tbr 28/0, prune 44/0).
- Hamilton mission-KPI re-measure BLOCKED: ratchet 12->6 flip is uncommitted
  shared WIP; cannot define clean reproducible code-state unattended.
  Flagged for user.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
… (closure holds)

Settles the unverified literature pillar of the T-P5p TBR closure. Primary-
source check (one full-text chapter + Goloboff 1996 abstract): TNT/Goloboff
builds equivalent two-pass down+up state sets and scores reinsertion by a
root-to-root comparison — same structure as TreeSearch's edge_set[D].

Disambiguates two amortization levels: Level-1 (per-candidate, within one clip)
TS already matches (full-text confirmed); Level-2 (per-clip incremental view
derivation) = the already-deferred lever-b, supported by the 1996 abstract only
(unread full text) → revisit at large-N, not via literature. TBR closure HOLDS,
now on stronger evidence. Minor: Goloboff's up-aware approximate "check one node"
screen differs from lever-c's up-ignoring admissible bounds (flagged if lever-c
ever reopens; it screens, doesn't bound).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Profiling (T-P5d, 2026-06-19) found the ratchet over-provisioned: halving
cycles saved 20-38% wall on the mid-size EW benchmarks (Wills/Zanol/Zhu/
Giles) at zero quality loss (gapB unchanged at full budget). Flips the
formal SearchControl default and the `default` strategy preset; updates
the vignette. The `large` preset deliberately keeps 12 (large-tree
tradeoff, T-179).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…n race (#39 closed)

Post-flip cpp-search KPI (Hamilton, freshness-asserted ratchetCycles==6L):
- QUALITY CLOSED: TS reaches the optimum on every dataset/seed; on Zanol (ns=9)
  TS is the ONLY reliably-1261 config (TNT fast configs miss +1).
- Wall gap is NOT algorithmic: candidate-efficiency ~1.2-1.9x near-parity
  (count-based), throughput ~2x at-limit; the 8-110x is a default-budget
  mismatch (TS default heavy / TNT default light), corrected from an initial
  overreach (advisor).
- #39 CLOSED: ratchet isolated race = cycle-quality PARITY (TNT does NOT reach
  the optimum in fewer reweight cycles) + ~2x at-limit throughput, no lever.
- Component-isolation program now COMPLETE; only composition #40 (gated,
  modest + reliability-bounded) remains.

Adds the ratchet-race driver, KPI CSVs, and the component-isolation plan STATUS.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…te before #40

All components closed on both gates (ratchet cycle-parity race 2026-06-21).
Resolve the stale Next-task/TBD sections; add the pre-composition fresh-eyes
re-audit gate.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…kernels stand

Adversarial re-audit (27 agents, 8 lenses): 18 candidates -> 3 survived -> 15
killed. Core kernel/TBR throughput verdicts STAND; no second getenv-class hotspot.
Survivors: fuse value stale post-reroot-fix (#55, re-measuring), sectorial
column-axis reduction (#56), x4 reroot wasted-block (#57, weak).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…lock counter (#57)

#55: capture FuseResult + verbosity>=2 'Fuse attempt' print (pool size + n_exchanges)
to distinguish fires-but-useless from never-fires (pool-collapse).
#57: TS_AUDIT_PROBE-gated counter in fitch_indirect_cached_flat_x4 measuring blocks
scanned past each member's individual bail (the x4 'deepest-bailing member' ceiling).
Default build unaffected (counter fully #ifdef'd; print is verbosity>=2 only).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…UDIT_PROBE)

Measures the realized informative-within-sector fraction on the actual sectors:
a char is droppable iff some state is shared by ALL sector tips (incl HTU) -> 0
Fitch steps -> ranking-preserving. fp/tot_blocks = the no-bail precompute saving
(compute_insertion_edge_sets scans all n_blocks/node). Inert in production.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…s (TS_AUDIT_PROBE)

RAII timer to measure the no-bail precompute's share of SECTOR wall (the
load-bearing multiplier in #56's saving estimate). Inert in production.
…e namespace ts

The precompute timer's mid-namespace #include <chrono> made std::ratio parse as
ts::std::ratio (compile error under -DTS_AUDIT_PROBE). Move includes to global
scope. Default build unaffected (all ifdef'd).
…OLREDUCE)

Drops characters constant-within-{sector tips + HTU} (0 Fitch steps -> scores
stay exact) and re-packs informative survivors into fewer blocks, shrinking the
per-node block scan in the inner-sector TBR (esp. the no-bail precompute
compute_insertion_edge_sets). EW only (weight 1, no upweight, no inapplicable).
Off by default.

Validated (Hamilton 17533059): dScore=0 on 9/9 full searches, valgrind clean,
adversarial review verified the 0-step invariance + bit arithmetic. The review
also caught (and this fixes) a stale rd.subtree stride that would OOB. rss-
isolated saving: Giles 17%, Zhu 9%, Zanol ~0% (uniform ns=9 = least reduction =
the load-bearing case). Changes the search trajectory on mixed-n_states data
(dCand!=0, equally-optimal path) => OPT-IN, NOT a default flip. Before any
default-on: run a sector-score oracle (reduced vs full, same topology, mixed
state); an accept-gated search cannot discriminate a masked packing bug.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
#55 fuse=dead-weight (drop); #56 col-reduction shipped opt-in 830b8cc (Giles 17%
Zhu 9% Zanol ~0%, dCand!=0 mixed-state -> opt-in not default, oracle before any
default-on); #57 ~1.9% EW sub-floor deferred.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Every MaximizeParsimony/SearchControl switch + preset + the opt-in env levers
(TS_SECT_COLREDUCE), each with a when-relevant assessment grounded in this
session's findings (ratchet 6, fuse=dead-weight, col-reduce mixed-state-only,
rasStarts, prune-reinsert >=120t, clipOrder, the 3x trailing-TBR consolidation).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…5-6% slower)

Force-scalar A/B (17533065): GATE dScore=0 & dCand=0 9/9; speedup x4/scalar
Giles 0.939 / Zhu 0.945 / Zanol 1.001. ~1.9% waste ceiling not realizable
(x4 ILP covers it). All 3 audit survivors now resolved.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ot global

Jobs 17533071 (20-rep) + 17541277 (40-rep), 3 seeds EW. clipOrder=2 (tips-first)
~1.25x faster / ~26% fewer candidates, but biases the trajectory: clean ~1.5x win
on Zanol (uniform ns=9, 3/3 optima); +1 quality tradeoff on Zhu that 2x budget does
NOT recover; wall-unstable on Giles. Complements TS_SECT_COLREDUCE. Default stays 0L.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant