Notes from chasing "apps look garbled in rterm but the harness says everything's fine." The headline lesson: some bugs live in places the deterministic harness structurally cannot see — and there's a concrete technique to drag them back into reproducibility.
Live-only display corruption in full-screen, diff-rendering apps
(htop, emacs, Claude Code, vim, less, tmux). These apps render to their
own virtual screen and emit only the minimal escape sequences needed
to morph the terminal into that screen. Consequence: any divergence
between the app's model of the screen and the terminal's actual state
becomes permanent — the app never repaints cells it believes are
already correct, so a momentary glitch sticks until a forced full redraw
(C-l, resize, etc.). This is why the corruption looks like stale/merged
content rather than a flicker.
The byte→grid oracle (tmux) and the redraw/markers checks run the core single-threaded and synchronously, replaying a fixed byte stream. That's great for parser/state bugs and incremental-render bugs, but it is blind to three things:
- Concurrency — the live terminal runs several unsynchronized threads (input processing, blink, 30fps flush, X events). The harness has none of that.
- Size reporting —
winsize/TIOCSWINSZis not byte interpretation, so no byte→grid check can see it. - Unexercised ops — a recorded
.recmay simply not contain the buggy sequences. The optimization path an app takes depends onTERM/terminfo and the reported size, so a recording made under different conditions replays perfectly clean even though the live app corrupts. (Our emacs.recwas all full repaints; htop used scroll regions — only htop exposed the bug.)
When a live bug won't reproduce from a .rec, capture what the live
terminal actually received, with the real read boundaries:
- Instrument
RubyTerm#writeto log every pty chunk as it arrives (base64, one line perwritecall = oneread_nonblock(128)worth, preserving the real boundary). A thin launcher thatrequiresrubyterm+ the debug server and prepends a write-logger is enough; no production changes. - Run the app live in the instrumented terminal. htop is the
ideal target: it self-redraws on a timer, so you need no
keypresses (avoids all xdotool/focus flakiness).
SIGSTOPit to freeze the captured stream, thenrender_barrier+dump_stateover the debug socket. - Replay headless with the exact boundaries:
session.feed(chunk, chunk: chunk.bytesize)for each logged chunk. - Three-way compare live
dump_statevs headless replay vs tmux:- live == headless, headless != tmux → deterministic parser/state bug, now reproducible headless → minimize and fix normally.
- live == headless == tmux → not a byte-interpretation bug; the bytes are correct. Suspect size reporting or concurrency.
- live != headless → concurrency/timing in the live path only.
If headless diverges from tmux, replay the same bytes at geometry ±1
row/col. If it matches at a different size, the byte stream was
correct for that other size — i.e. the app was told the wrong size.
That's a winsize/report_size bug, not a parser bug.
This is exactly how the root cause this session was found: the htop
stream was 203 cells wrong at 50×14 but 0 diffs at 50×15, and the
live app's TTY was confirmed 15×50 while the terminal rendered 14.
Controller#report_size reported [h + 1, w] to the child
(TIOCSWINSZ) — one row too many. Every full-screen app drew an extra
row, so its minimal diff-updates landed one row off and shifted/
overflowed. One-character fix (h + 1 → h); fixes all full-screen
apps at once. Guarded by a unit test (winsize == [rows, cols]), since
the byte→grid oracle can't see size reporting.
Separately, the live terminal mutated the shared buffer from multiple
threads with no locking: the input processor (feed), the blink
timer (redraw_blink), the 30fps flush, and X events
(Expose/ConfigureNotify → resize → buffer mutation). Routed blink,
flush, and resize through the single input queue so only the processing
thread touches the buffer. This was not the cause of the size bug, but
it removes genuine data races. (Caveat: one flaky resize-not-applying
was observed during testing — worth a closer look before relying on it.)
- Xephyr gives an isolated, WM-free display you can size freely and
screenshot without touching the user's real terminals:
Xephyr :1 -screen 1500x900 -ac -noreset, then run the terminal withDISPLAY=:1. It also sidesteps window-identification entirely. - The real WM on
:0force-sizes the window (tiling), soxdotool windowsizewon't stick. Under Xephyr (no WM) it works; iterate size →dump_stateuntil you hit the target cols/rows. - Identify your window by before/after diff of
xdotool search --class rterm— the user runs rterm as a daily driver (many windows) and the terminal doesn't export_NET_WM_PID. pkill -f "live.rb"matches its own command line and kills your shell. Kill by PID excluding$$instead.- The shell snapshot runs with
set -e; guard fallible commands (pkill ... || true) or the whole step aborts silently. -c <instance>(to set a unique WM_CLASS) crashes startup via a separate X property-length bug — don't use it for identification.- Read PNG screenshots directly to confirm corruption;
dump_statefor the buffer. Translucent windows show the desktop/other windows behind — don't mistake that faint layer for corruption; the bug is in the solid foreground text.
Resize bugs live entirely in the live X layer (the harness never resizes
a real window), so they need live driving + xwininfo/screenshots.
Three distinct bugs, all found this way:
- Flash (whole window blanks on resize). The window was created with
the default bit gravity = ForgetGravity, so on every resize the X
server discards the window contents and fills with the background
before we repaint.
xwininfo -id <win> -all | grep Gravityshows it. Fix: create the window withCWBitGravity => 1(NorthWestGravity) so the server retains the existing pixels (anchored top-left). - Resize doesn't track the window / freezes. The event mask had
SubstructureNotifyMask(events about child windows) but notStructureNotifyMask(events about this window) -- so the window never receivedConfigureNotifyfor its own resize. Resize had been limping along on Expose alone. Fix: addStructureNotifyMask. - Resize shrinks to a thin strip.
ExposeandConfigureNotifywere handled identically (resize(pkt.width, pkt.height)), but Expose's width/height are the damage rectangle, not the window size -- a partial Expose resized the terminal to a few rows. Fix:ConfigureNotifyresizes;Exposeonly repaints.
General rule: in X, ConfigureNotify = size changed (resize),
Expose = pixels lost (repaint). Never resize on Expose, and make
sure the window actually selects StructureNotifyMask or its own
ConfigureNotify never arrives.
- Resize is slow / "hangs until it catches up". A drag fires a
flood of ConfigureNotify + Expose; repainting on each one builds a
backlog the renderer then grinds through. A single redraw is cheap
(a few ms) -- the cost is volume. Fix: coalesce. Keep at most one
redraw request queued (a
@redraw_pendingflag guarded by a mutex) and always repaint to the latest pending size; further events while one is in flight just update the target. The event thread stays responsive (requesting a redraw is cheap) and the renderer does one repaint per slot at the current size instead of one per event, so it catches up within a single redraw of the drag stopping. Coalescing must be mutex-serialised with the apply step or the final size can be lost to a race.
- Reproduce live; screenshot and
dump_state. Clean buffer + dirty pixels → render layer. Dirty buffer → state/size. - Capture the pty stream with real boundaries; replay headless; do the three-way compare above.
- headless == tmux but live differs → concurrency or size, not the parser.
- Size-sweep ±1 → size-reporting bug.
- Full-screen app + a one-row/col shift → check
report_size/winsizefirst.