PDF stage 2.2: glyph advances & metrics#539
Conversation
fe34d47 to
8afe023
Compare
7b46376 to
e889169
Compare
04283f1 to
1b1ed2e
Compare
Parse font glyph widths and advance the text matrix per glyph, on top of 2.1's placed-text emission, so segments, TJ kerning and lines land in the right place. - Font metrics (pdf_document_parser, Font): /FirstChar + /Widths + /FontDescriptor /MissingWidth (simple), /W + /DW (descendant CIDFont, both `c [w...]` and `c_first c_last w` forms). Font::advance_width(code) returns the advance in text-space units with the MissingWidth/DW fallbacks; code_byte_width() is 1 (simple) / 2 (composite). - Advance application (extract_text, GraphicsState::advance_text): emit one TextElement per shown segment (one Tj/'/", or one string of a TJ array); after each, advance Tm by sum(width*Tfs + Tc [+ Tw for single-byte 0x20]) * Th, and translate Tm by -n/1000*Tfs*Th for a TJ number. The element carries its total advance; per-glyph placement stays re-derivable from font->advance_width, keeping the run-vs-glyph choice in the renderer. Out of scope (later): intra-segment glyph shaping (stage 3), AFM widths for non-embedded standard-14 fonts (stage 3), vertical writing advances (2.6). Tests: composite /W+/DW and simple /Widths+/MissingWidth parsing asserted through advance_width; extract_text advance coverage (simple widths, TJ adjustment, char/word spacing, composite /DW, advance_width fallbacks). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
9f6baa6 to
256b988
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 256b988f1b
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| const double height = page_box[3].as_real() - box_y0; | ||
|
|
||
| out.write_element_begin( | ||
| <<<<<<< HEAD |
There was a problem hiding this comment.
Resolve the leftover merge conflict
This leaves merge-conflict markers in the committed source, so any build that compiles the PDF HTML service will fail before tests can run because the compiler sees <<<<<<</=======/>>>>>>> inside write_document. Pick the intended write_element_begin version and remove the markers.
Useful? React with 👍 / 👎.
🤖 Generated with Claude Code
Stacked on #538 (stage 2.1) — base is
pdf-text-transforms; retarget tomainonce 2.1 merges.Second slice of stage 2. Parses font glyph widths and advances the text matrix
per glyph on top of 2.1's placed-text emission, so segments,
TJkerning andlines land in the right place. Per the agreed architecture, the run-vs-glyph
choice stays in the renderer.
What's in here
pdf_document_parser,Font):/FirstChar+/Widths+/FontDescriptor/MissingWidth(simple);/W+/DWfrom the descendantCIDFont (both
c [w…]andc_first c_last wforms, with a range guard).Font::advance_width(code)returns the advance in text-space units with the/MissingWidth//DWfallbacks;code_byte_width()is 1 (simple) / 2(composite, the Identity-H/V case).
extract_text,GraphicsState::advance_text): aTextElementis now emitted per shown segment (oneTj/'/", or onestring of a
TJarray). After each segmentTmadvances byΣ(width × Tfs + Tc [+ Tw for single-byte 0x20]) × Th, and aTJnumbertranslates
Tmby−n/1000 × Tfs × Th. The element carries its total advance;a renderer wanting per-glyph placement re-derives per-code advances from
font->advance_width.Out of scope (later)
Intra-segment glyph shaping (the browser lays a segment out in a fallback font
until the embedded font lands — stage 3), AFM widths for the non-embedded
standard-14 fonts (stage 3), and vertical writing-mode advances (stage 2.6).
Precise baseline placement (needs ascent metrics) also remains deferred.
Tests
pdf_document_parser.cpp— composite/W+/DWand a simple/FirstChar//Widths//MissingWidthfont, asserted throughadvance_width.pdf_page_text.cpp— simple/Widthsadvancing a following show,TJemittingper string with the numeric adjustment applied, char spacing, word spacing on
the single-byte space, the composite 2-byte
/DWadvance, and theadvance_widthfallbacks.HtmlOutputTests.