fix: render Thai combining vowels and tone marks correctly by noomzopendream · Pull Request #64 · otty-shell/otty

noomzopendream · 2026-07-02T16:51:11Z

Type of change

User readable description

Thai text rendered with combining vowels and tone marks in the wrong position (or dropped entirely). Words such as ที่นี่, น้ำ, or กำลัง lost their upper/lower vowels and tone marks.

Two root causes, fixed here:

Renderer dropped zero-width combining marks (otty-ui-term). The grid correctly stores combining characters per cell via Cell::zerowidth(), but the draw loop built each cell's Text from the base character alone, so the marks never reached the shaper. The draw loop now appends the stored marks to the shaped content, letting cosmic-text position the full grapheme cluster over the base glyph. Cells whose base is a space but which carry combining marks are now drawn as well.
THAI SARA AM (U+0E33) shaped standalone (otty-surface). SARA AM has display width 1, so it landed in its own cell and its nikhahit ring rendered detached from the preceding consonant. print() now decomposes it into NIKHAHIT (zero-width, attached to the previous cell) + SARA AA (spacing), matching how the character is actually drawn. The Lao analog (U+0EB3 → U+0ECD + U+0EB2) is handled the same way. Note: copied text containing ำ now yields the decomposed pair, which is canonically equivalent under NFC normalization.

Also included:

Settings shell dirty-tracking tests no longer depend on the machine's $SHELL (they hardcoded /bin/zsh and failed on zsh machines).
macOS clippy warnings (imports/bindings used only by the non-macOS resize path) and pending rustfmt drift in otty-libterm re-exports.

Before / after

Before: upper/lower vowels and tone marks missing or floating (e.g. ที่นี่มีน้ำ rendered as bare consonants).
After: all marks shape correctly over their base consonants — verified visually with a test matrix covering upper vowels (กิ กี กึ กื กั), sara am (กำ น้ำ ทำไม), lower vowels (กุ กู ปู่), and tone marks (ก่ ก้ ก๊ ก๋).

Verification

cargo test --workspace --all-features: 445 passed, 0 failed (includes 3 new decomposition unit tests)
cargo clippy --workspace --all-targets --all-features -- -D warnings: clean
cargo +nightly fmt --check: clean

🤖 Generated with Claude Code

The draw loop built each cell's Text from the base char alone, so combining characters stored in Cell::zerowidth() (e.g. Thai upper/lower vowels and tone marks) were silently dropped. Append them to the shaped content so cosmic-text positions the full grapheme cluster, and draw cells whose base is a space when they carry combining marks. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Both tests hardcoded /bin/zsh as the new shell value; on machines where $SHELL is already /bin/zsh the value matched the default draft, no dirty flag was set, and the assertions failed. Derive the target value from the live default so it always differs. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Gate resize-grip and window imports behind cfg(not(macos)) since they are only used by the non-macOS resize path, split the ResizeWindow match arm per platform instead of binding an unused direction on macOS, and apply pending rustfmt reordering in otty-libterm re-exports. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

…ement THAI SARA AM (U+0E33) and LAO AM (U+0EB3) have display width 1, so they were written to their own cell and shaped standalone, leaving the nikhahit/niggahita ring detached from the preceding consonant. Split them on print into the zero-width mark (attached to the previous cell) plus the spacing AA vowel so the cluster shapes correctly. Copied text now yields the decomposed sequence (U+0E4D U+0E32) instead of the original single codepoint; it remains canonically equivalent under NFC normalization. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

The re-export grouping applied by a newer local nightly rustfmt disagrees with the nightly-2026-03-10 toolchain pinned in the lint workflow; format with the pinned version instead. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Harzu · 2026-07-03T23:04:51Z

+/// so without decomposition they are shaped as standalone glyphs. Splitting
+/// them lets the nikhahit/niggahita ring render over the preceding
+/// consonant, matching how the character is actually drawn.
+fn decompose_am(ch: char) -> Option<(char, char)> {


This is a valid fix for Thai/Lao SARA AM, but I’m not sure surface is the right layer for this kind of logic.

otty-surface should model and manage the terminal grid: cells, widths, zero-width marks, cursor movement, wrapping, insert/delete behavior, etc. Text shaping is a renderer responsibility, because it depends on Unicode shaping rules, adjacent characters, fonts, OpenType positioning, bidi, and other rendering context.

Adding script-specific decomposition rules to surface can fix this case, but it does not scale well. Each new Unicode/script-specific rendering issue would need another exception in grid construction, and some of those exceptions can also change the logical text used by copy/search. For example, SARA AM decomposition is a compatibility decomposition, so the grid no longer stores exactly the same text emitted by the application.

I prepared the another PR with render aproach impl that could solving your case.

noomzopendream and others added 5 commits July 2, 2026 23:08

Harzu mentioned this pull request Jul 3, 2026

feat: render chars shaping #66

Open

11 tasks

Harzu reviewed Jul 3, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: render Thai combining vowels and tone marks correctly#64

fix: render Thai combining vowels and tone marks correctly#64
noomzopendream wants to merge 5 commits into
otty-shell:mainfrom
noomzopendream:fix/thai-combining-marks

noomzopendream commented Jul 2, 2026

Uh oh!

Harzu Jul 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

noomzopendream commented Jul 2, 2026

Type of change

User readable description

Before / after

Verification

Uh oh!

Harzu Jul 3, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants