Skip to content

PDF: stage-1 roadmap wrap-up — summary + drop sub-stage tags from code#536

Merged
andiwand merged 2 commits into
mainfrom
pdf-plan-stage1-wrapup
Jun 15, 2026
Merged

PDF: stage-1 roadmap wrap-up — summary + drop sub-stage tags from code#536
andiwand merged 2 commits into
mainfrom
pdf-plan-stage1-wrapup

Conversation

@andiwand

Copy link
Copy Markdown
Member

Stacked on #534 (stage 1.3 part B). Retarget to main as the stack merges.

Stage 1 (the code → Unicode extraction chain) is functionally done — 1.1
(ToUnicode multi-byte), 1.2 (simple-font /Encoding + AGL) and 1.3 (Type0 +
Identity-H/V + /ToUnicode + predefined Unicode CMaps) all landed and cover
the corpus. This PR is pure wrap-up; no behaviour change.

Roadmap (src/odr/internal/pdf/AGENTS.md)

  • Collapsed stage 1 from ~150 lines of sub-stage detail to a goal + achieved
    summary (the mechanics already live in What works → Fonts / text mapping).
  • Relocated the leftovers to the follow-up stages they belong to:
    • legacy CJK code→CID CMaps + CID→Unicode tables → Other known gaps (large
      data; generator scaffolding already landed);
    • embedded-font reverse map (was 1.4) → stage 3 (font reading);
    • "no Unicode" run marking + /ActualText (was 1.5) → stage 2.

Code/test comments

  • Dropped the historical sub-stage tags (stage 1.1/1.2/1.3, part A/part B)
    from comments across the PDF module and its tests — they only said which
    roadmap sub-stage implemented a line and add no value in the code. Substance
    and the live stage 2/stage 3 forward pointers are kept. Comment-only.

🤖 Generated with Claude Code

@andiwand andiwand marked this pull request as ready for review June 14, 2026 22:06
@andiwand andiwand force-pushed the pdf-composite-cid-fonts-cjk branch from bf136a1 to 1d9079a Compare June 14, 2026 23:17
@andiwand andiwand force-pushed the pdf-plan-stage1-wrapup branch from 7ea4ab0 to ee4b884 Compare June 14, 2026 23:17
Base automatically changed from pdf-composite-cid-fonts-cjk to main June 15, 2026 07:41
andiwand and others added 2 commits June 15, 2026 09:54
…vers

Stage 1 (code → Unicode extraction) is functionally done: 1.1 (ToUnicode
multi-byte), 1.2 (simple-font /Encoding + AGL) and 1.3 (Type0 + Identity +
/ToUnicode + predefined Unicode CMaps) all landed and cover the corpus. The
nothing-implementable-now remainder is moved out of stage 1:

- legacy CJK code→CID CMaps + CID→Unicode tables → tracked under Other known
  gaps (large data; generator scaffolding already landed),
- embedded-font reverse map (was 1.4) → folded into stage 3 (font reading),
- "no Unicode" run marking + /ActualText (was 1.5) → folded into stage 2.

Stage 1 now reads as a goal + achieved summary. Updated the few code comments
that referenced the relocated sub-stage numbers.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The roadmap sub-stage numbers (stage 1.1/1.2/1.3, part A/part B) tagged onto
code and test comments only said which roadmap sub-stage implemented a line;
the technical explanation stands on its own, and the numbering now lives only in
the roadmap. Strip them, keeping each comment's substance and the live
"stage 2"/"stage 3" forward pointers. Comment-only.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@andiwand andiwand force-pushed the pdf-plan-stage1-wrapup branch from ee4b884 to 95520c8 Compare June 15, 2026 07:54
@andiwand andiwand enabled auto-merge (squash) June 15, 2026 08:22
@andiwand andiwand merged commit f9147f5 into main Jun 15, 2026
11 checks passed
@andiwand andiwand deleted the pdf-plan-stage1-wrapup branch June 15, 2026 08:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant