Skip to content

fix: render Thai combining vowels and tone marks correctly#64

Open
noomzopendream wants to merge 5 commits into
otty-shell:mainfrom
noomzopendream:fix/thai-combining-marks
Open

fix: render Thai combining vowels and tone marks correctly#64
noomzopendream wants to merge 5 commits into
otty-shell:mainfrom
noomzopendream:fix/thai-combining-marks

Conversation

@noomzopendream

Copy link
Copy Markdown

Type of change

  • New Feature
  • Experimental Feature
  • Improvement
  • Performance Improvement
  • Backward Incompatible Change
  • Build/Testing/Packaging Improvement
  • Documentation (changelog entry is not required)
  • Critical Bug Fix (crash, data loss, RBAC) or LOGICAL_ERROR
  • Bug Fix (user-visible misbehavior in an official stable release)
  • CI Fix or Improvement (changelog entry is not required)
  • Not for changelog (changelog entry is not required)

User readable description

Thai text rendered with combining vowels and tone marks in the wrong position (or dropped entirely). Words such as ที่นี่, น้ำ, or กำลัง lost their upper/lower vowels and tone marks.

Two root causes, fixed here:

  1. Renderer dropped zero-width combining marks (otty-ui-term). The grid correctly stores combining characters per cell via Cell::zerowidth(), but the draw loop built each cell's Text from the base character alone, so the marks never reached the shaper. The draw loop now appends the stored marks to the shaped content, letting cosmic-text position the full grapheme cluster over the base glyph. Cells whose base is a space but which carry combining marks are now drawn as well.

  2. THAI SARA AM (U+0E33) shaped standalone (otty-surface). SARA AM has display width 1, so it landed in its own cell and its nikhahit ring rendered detached from the preceding consonant. print() now decomposes it into NIKHAHIT (zero-width, attached to the previous cell) + SARA AA (spacing), matching how the character is actually drawn. The Lao analog (U+0EB3 → U+0ECD + U+0EB2) is handled the same way. Note: copied text containing ำ now yields the decomposed pair, which is canonically equivalent under NFC normalization.

Also included:

  • Settings shell dirty-tracking tests no longer depend on the machine's $SHELL (they hardcoded /bin/zsh and failed on zsh machines).
  • macOS clippy warnings (imports/bindings used only by the non-macOS resize path) and pending rustfmt drift in otty-libterm re-exports.

Before / after

Before: upper/lower vowels and tone marks missing or floating (e.g. ที่นี่มีน้ำ rendered as bare consonants).
After: all marks shape correctly over their base consonants — verified visually with a test matrix covering upper vowels (กิ กี กึ กื กั), sara am (กำ น้ำ ทำไม), lower vowels (กุ กู ปู่), and tone marks (ก่ ก้ ก๊ ก๋).

Verification

  • cargo test --workspace --all-features: 445 passed, 0 failed (includes 3 new decomposition unit tests)
  • cargo clippy --workspace --all-targets --all-features -- -D warnings: clean
  • cargo +nightly fmt --check: clean

🤖 Generated with Claude Code

noomzopendream and others added 5 commits July 2, 2026 23:08
The draw loop built each cell's Text from the base char alone, so
combining characters stored in Cell::zerowidth() (e.g. Thai upper/lower
vowels and tone marks) were silently dropped. Append them to the shaped
content so cosmic-text positions the full grapheme cluster, and draw
cells whose base is a space when they carry combining marks.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Both tests hardcoded /bin/zsh as the new shell value; on machines where
$SHELL is already /bin/zsh the value matched the default draft, no dirty
flag was set, and the assertions failed. Derive the target value from
the live default so it always differs.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Gate resize-grip and window imports behind cfg(not(macos)) since they
are only used by the non-macOS resize path, split the ResizeWindow
match arm per platform instead of binding an unused direction on macOS,
and apply pending rustfmt reordering in otty-libterm re-exports.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…ement

THAI SARA AM (U+0E33) and LAO AM (U+0EB3) have display width 1, so they
were written to their own cell and shaped standalone, leaving the
nikhahit/niggahita ring detached from the preceding consonant. Split
them on print into the zero-width mark (attached to the previous cell)
plus the spacing AA vowel so the cluster shapes correctly.

Copied text now yields the decomposed sequence (U+0E4D U+0E32) instead
of the original single codepoint; it remains canonically equivalent
under NFC normalization.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The re-export grouping applied by a newer local nightly rustfmt
disagrees with the nightly-2026-03-10 toolchain pinned in the lint
workflow; format with the pinned version instead.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@Harzu Harzu mentioned this pull request Jul 3, 2026
11 tasks
/// so without decomposition they are shaped as standalone glyphs. Splitting
/// them lets the nikhahit/niggahita ring render over the preceding
/// consonant, matching how the character is actually drawn.
fn decompose_am(ch: char) -> Option<(char, char)> {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a valid fix for Thai/Lao SARA AM, but I’m not sure surface is the right layer for this kind of logic.

otty-surface should model and manage the terminal grid: cells, widths, zero-width marks, cursor movement, wrapping, insert/delete behavior, etc. Text shaping is a renderer responsibility, because it depends on Unicode shaping rules, adjacent characters, fonts, OpenType positioning, bidi, and other rendering context.

Adding script-specific decomposition rules to surface can fix this case, but it does not scale well. Each new Unicode/script-specific rendering issue would need another exception in grid construction, and some of those exceptions can also change the logical text used by copy/search. For example, SARA AM decomposition is a compatibility decomposition, so the grid no longer stores exactly the same text emitted by the application.

I prepared the another PR with render aproach impl that could solving your case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants