fix: render Thai combining vowels and tone marks correctly#64
fix: render Thai combining vowels and tone marks correctly#64noomzopendream wants to merge 5 commits into
Conversation
The draw loop built each cell's Text from the base char alone, so combining characters stored in Cell::zerowidth() (e.g. Thai upper/lower vowels and tone marks) were silently dropped. Append them to the shaped content so cosmic-text positions the full grapheme cluster, and draw cells whose base is a space when they carry combining marks. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Both tests hardcoded /bin/zsh as the new shell value; on machines where $SHELL is already /bin/zsh the value matched the default draft, no dirty flag was set, and the assertions failed. Derive the target value from the live default so it always differs. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Gate resize-grip and window imports behind cfg(not(macos)) since they are only used by the non-macOS resize path, split the ResizeWindow match arm per platform instead of binding an unused direction on macOS, and apply pending rustfmt reordering in otty-libterm re-exports. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…ement THAI SARA AM (U+0E33) and LAO AM (U+0EB3) have display width 1, so they were written to their own cell and shaped standalone, leaving the nikhahit/niggahita ring detached from the preceding consonant. Split them on print into the zero-width mark (attached to the previous cell) plus the spacing AA vowel so the cluster shapes correctly. Copied text now yields the decomposed sequence (U+0E4D U+0E32) instead of the original single codepoint; it remains canonically equivalent under NFC normalization. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The re-export grouping applied by a newer local nightly rustfmt disagrees with the nightly-2026-03-10 toolchain pinned in the lint workflow; format with the pinned version instead. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
| /// so without decomposition they are shaped as standalone glyphs. Splitting | ||
| /// them lets the nikhahit/niggahita ring render over the preceding | ||
| /// consonant, matching how the character is actually drawn. | ||
| fn decompose_am(ch: char) -> Option<(char, char)> { |
There was a problem hiding this comment.
This is a valid fix for Thai/Lao SARA AM, but I’m not sure surface is the right layer for this kind of logic.
otty-surface should model and manage the terminal grid: cells, widths, zero-width marks, cursor movement, wrapping, insert/delete behavior, etc. Text shaping is a renderer responsibility, because it depends on Unicode shaping rules, adjacent characters, fonts, OpenType positioning, bidi, and other rendering context.
Adding script-specific decomposition rules to surface can fix this case, but it does not scale well. Each new Unicode/script-specific rendering issue would need another exception in grid construction, and some of those exceptions can also change the logical text used by copy/search. For example, SARA AM decomposition is a compatibility decomposition, so the grid no longer stores exactly the same text emitted by the application.
I prepared the another PR with render aproach impl that could solving your case.
Type of change
User readable description
Thai text rendered with combining vowels and tone marks in the wrong position (or dropped entirely). Words such as
ที่นี่,น้ำ, orกำลังlost their upper/lower vowels and tone marks.Two root causes, fixed here:
Renderer dropped zero-width combining marks (
otty-ui-term). The grid correctly stores combining characters per cell viaCell::zerowidth(), but the draw loop built each cell'sTextfrom the base character alone, so the marks never reached the shaper. The draw loop now appends the stored marks to the shaped content, letting cosmic-text position the full grapheme cluster over the base glyph. Cells whose base is a space but which carry combining marks are now drawn as well.THAI SARA AM (U+0E33) shaped standalone (
otty-surface). SARA AM has display width 1, so it landed in its own cell and its nikhahit ring rendered detached from the preceding consonant.print()now decomposes it into NIKHAHIT (zero-width, attached to the previous cell) + SARA AA (spacing), matching how the character is actually drawn. The Lao analog (U+0EB3 → U+0ECD + U+0EB2) is handled the same way. Note: copied text containing ำ now yields the decomposed pair, which is canonically equivalent under NFC normalization.Also included:
$SHELL(they hardcoded/bin/zshand failed on zsh machines).otty-libtermre-exports.Before / after
Before: upper/lower vowels and tone marks missing or floating (e.g.
ที่นี่มีน้ำrendered as bare consonants).After: all marks shape correctly over their base consonants — verified visually with a test matrix covering upper vowels (กิ กี กึ กื กั), sara am (กำ น้ำ ทำไม), lower vowels (กุ กู ปู่), and tone marks (ก่ ก้ ก๊ ก๋).
Verification
cargo test --workspace --all-features: 445 passed, 0 failed (includes 3 new decomposition unit tests)cargo clippy --workspace --all-targets --all-features -- -D warnings: cleancargo +nightly fmt --check: clean🤖 Generated with Claude Code