[IP+NF] modular SSA optimizer + split Thumb backend + RP2350 fixes#7
Merged
Conversation
… scratch
Two independent self-host miscompile root causes:
1. ssa_opt_reassoc.c: reassoc_binary removed the still-live inner
instruction's operand use record when folding (x OP c1) OP c2.
GVN could then CSE the outer back onto the live inner and drop the
operand's use count to zero, letting SSA-DCE delete the operand's
def (e.g. a SELECT) out from under live code. Broke
tcc_yaff_write_data_relocations (section=CODE instead of DATA) →
bench_strlen_scan, bug_const_ptr_got_deref, mibench_qsort,
mibench_stringsearch.
2. arm-thumb-gen.c: mla/umull/smull _mop handlers did not pre-exclude
the pre-allocated destination register from scratch allocation, so a
source load could pick it as a saved (push/pop) scratch and the
restoring pop clobbered the just-computed result. Broke
find_nested_func_by_sym (returned sym instead of &nested_funcs[i])
→ trampoline never emitted → nested_funcptr* HardFault.
Full QEMU smoke suite: 424 passed, 0 failed.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
fix four gcc-torture failure classes (892 -> ~14 failures)
1. ir/opt_dce.c: dead_var_store_elim's read scan missed the MLA
accumulator operand, so after mla-fusion it deleted the def of a VAR
still read as an accumulator. In the self-hosted compiler this killed
`ar_index = data + entrysize` in create_archive_sym_cache, making the
archive symbol table load from the saved-r4 stack slot — every link
touching armv8m-libtcc1.a failed with "invalid archive" (850 tests).
2. tccgen.c: two late frame-shrink passes ran after the variadic
`loc = -28` guards and re-shrank the frame (the prologue-managed
va_area never appears as an IR STACKOFF). The va descriptor then sat
below SP and the va_start helper's own push{r3,r4} clobbered it with
the caller's 4th argument register. Re-clamp after both shrinks.
3. arm-thumb-gen.c + ir/codegen.c + tcc.h: the nested-call R9/arg-reg
save area was addressed SP-relative; once a VLA/alloca moved SP the
slots landed inside the user's buffer and the callee's writes
corrupted the saved GOT base. New per-function func_dynamic_sp flag
(set on VLA_ALLOC) switches the save area to FP-relative addressing.
4. tccgen.c: __builtin_setjmp/__builtin_longjmp now emit the NL
(non-local-goto) IR ops. The 3-word variant restored only FP/SP/PC,
so code resumed after longjmp with the longjmp caller's r4-r11
(including register-allocated locals and the r9 GOT-base protocol).
The NL buffer (40 bytes) restores the full callee-saved file.
QEMU smoke incl. gcc torture: 4153 passed / 14 failed (from 892 failed).
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
fix alloca routing, __builtin_setjmp 5-word ABI, and soft dadd rounding
Three gcc-torture failure classes:
- tccgen.c: '#ifdef TOK_alloca' was always false (TOK_* are enum
constants, not macros), so plain alloca() calls bypassed the
VLA_ALLOC builtin and called lib/alloca.S, which moves SP behind the
backend's back; the SP-relative per-call R9 save area then reloaded
garbage from inside the alloca'd buffer. Use the real target guard so
alloca routes through unary_builtin_alloca (fixes 20020314-1,
941202-1, pr22061-1).
- __builtin_setjmp/__builtin_longjmp: GCC's ABI gives the builtin a
5-WORD buffer and pr84521 passes exactly void *buf[5]; the previous
NL_SETJMP routing wrote 40 bytes and smashed the caller's stack.
TCCIR_OP_SETJMP now saves r4-r11 into a hidden 32-byte frame area
(src2, FRAME_ADDR operand) and stores only FP/resume/SP/&area in the
buffer; LONGJMP restores the register file via buf[3]. NL_* keeps its
layout for the nested-function non-local-goto path (fixes pr84521 at
-O0 and -O1; built-in-setjmp, pr86528, 20021113-1, 20020412-1 still
pass).
- lib/fp/soft/dadd.c: __aeabi_dadd had no guard/round/sticky bits, so
aligning the smaller operand just truncated it (1 + -2^53 returned
-2^53 instead of the exact -(2^53-1)). Add 3-bit GRS alignment and
round-to-nearest-even; verified bit-exact against the host FPU on 2M
randomized normal cases (fixes the dadd half of ieee/pr28634).
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
more fixes, tests are passing on hardware now
…d STRD
Two fixes for ARMv8-M self-hosted compiles that crashed the entire
gcc-torture suite at -O1/-O2 (all passing at -O0):
1. tcc_ir_opt_vrp had two VRPRange[VRP_MAX_POS*3] (~18 KB each) stack
arrays -> a 74,364-byte prologue frame, which alone overflows the
32 KB process stack (CFSR=0x00100000 STKOF on the prologue `sub sp`;
fault dump r12=0x1227C confirms the frame size). VRP only runs at
-O1+, hence -O0 was unaffected. Move both arrays to the heap
(tcc_mallocz) and free them at the single return; replace sizeof(ranges)
in the memset/memcpy with the explicit byte count. Frame 74364 -> 732 B.
2. try_rotate_loop zero-initialised its packed IROperand (sizeof==9)
scratch slots with `= (IROperand){0}`, which the codegen STRD-pairing
peephole lowered to an 8-byte STRD. With a stride of 9, &arr[b] is not
4-aligned for odd b, and STRD requires >=4-byte alignment on ARMv8-M ->
UNALIGNED fault (CFSR=0x01000000). The backing buffer is tcc_mallocz'd
(already zeroed), so the inits were redundant; drop them.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The yasos-native double-zero removal assumed malloc() always returns zeroed memory, but that is false for bump allocations served from a RECYCLED pool: mk_pool() resets the bump pointer without re-zeroing the pool body, so those bytes hold stale data. The device tcc then read non-zero where it expected zero and crashed self-host -O2 compiles (e.g. builtin-bitops-1: free() of a -1 sentinel). Host builds memset unconditionally, so the bug only manifested on device. Also includes in-progress IR optimization work. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Rewrite ir/opt into modular SSA passes, split arm-thumb gen into per-insn thop_* units, fix -O1/-O2 on-target crashes, add unit/asm tests: