Skip to content

feat(deck): HTML slide-deck generator + universal skill loader#68

Open
KylinMountain wants to merge 24 commits into
mainfrom
feat/deck-target
Open

feat(deck): HTML slide-deck generator + universal skill loader#68
KylinMountain wants to merge 24 commits into
mainfrom
feat/deck-target

Conversation

@KylinMountain
Copy link
Copy Markdown
Collaborator

Summary

  • openkb deck new <name> "<intent>" [--critique] [--skill <name>] — new CLI command that generates a polished single-file HTML slide deck from the compiled wiki. Default skill is openkb-deck-editorial (warm cream + serif + brick-red, magazine style). --critique chains the openkb-html-critic skill to catch CSS specificity bugs / missing keyboard nav / failure-of-self-containment.
  • Universal skill loader in openkb chat — chat now auto-discovers Anthropic-style SKILL.md packages from <kb>/skills/, ~/.openkb/skills/, and ~/.claude/skills/. Users say "make me a deck about X" in natural language and the agent picks the matching skill via read_skill. New /critique <path> slash command runs html-critic on any existing HTML.
  • Unified architecture — every generator path (openkb deck new, /deck new, freeform chat skill invocation, /critique) routes through a single run_skill(skill_name, intent, ...) entry point. Skill frontmatter (od.mode, od.output_path_template, od.deck_grammar) drives output-path templating + post-run validation. New artifact types = drop a SKILL.md into ~/.openkb/skills/<name>/, zero core code change. Third-party deck skills (e.g. nexu-io/open-design's deck-guizang-editorial, deck-swiss-international) work out of the box.
  • Generalized deck validatorvalidate_deck() defaults to skill-agnostic checks (file exists, parses, ≥5 slides, self-contained, no external <link>/<script>/<img>). The Editorial Monocle skill declares its slide-grammar via its frontmatter; other skills inherit only the generic guarantees.

22 commits, +2200 / -33 LOC net across 19 files. Brainstorm + design spec live in docs/superpowers/specs/2026-05-22-openkb-deck-target-design.md (local-only per repo convention).

Test plan

  • pytest tests/ — full suite passes (currently 538 tests, +15 net on this branch including 13 new tests/test_read_kb_file.py covering the read-side allow-list)
  • openkb deck new transformers-pitch "Explain attention to engineers" produces output/decks/transformers-pitch/index.html that opens in browser, supports ← → / F / P keyboard nav
  • openkb deck new ... --critique chains html-critic and patches CSS specificity bugs in place (e.g. .divider{display:flex}.divider{})
  • openkb deck new ... --skill deck-guizang-editorial routes to a third-party skill after cp -r open-design/skills/deck-guizang-editorial ~/.openkb/skills/
  • openkb chat then "make me a magazine-style deck about X" — natural-language path discovers + invokes the matching skill
  • openkb chat then /critique output/decks/foo/index.html — critic skill reads, diagnoses, patches in one atomic write
  • Validator generic mode accepts third-party deck shapes (no data-type attrs); Editorial-grammar mode rejects them — both behaviors pinned by tests/test_deck_validator.py

Architecture notes

The branch went through several iterations as the design was refined:

  1. Initial spec built a deck-specific Generator target with hardcoded prompt + bespoke critic + snapshot/restore (commits 5c2d6ebdaacce5).
  2. User feedback (mid-branch) pointed out the right architecture: openkb chat as a skill loader, third-party SKILL.md packages as the unit of extension. Refactor on top: openkb/agent/skill_runner.py + skills/openkb-deck-editorial/SKILL.md + skills/openkb-html-critic/SKILL.md (commits 03c3bbb708baa3).
  3. Final architectural-review pass (4 parallel Opus auditors) found 6 high-confidence inconsistencies between the skill-runner core (generic) and the deck surface (still fused to Editorial Monocle). Closed in 63b62c2: --skill flag, validator generalized via DeckGrammar TypedDict + skill frontmatter, output-path + validation pushed down into run_skill, security-adjacent read_kb_file got 13 tests.

Out of scope (follow-ups)

Lower-confidence findings (score 60-75 in the same review):

  • Generator if/else → plugin registry refactor
  • Iteration backup in chat freeform path (currently CLI-only)
  • Direct unit tests for run_skill / scan_local_skills / _handle_slash_critique
  • Doc drift in 3 module docstrings
  • Inline bitmap image handling (currently SVG-only)

Mirrors `openkb skill new` structurally — name validation, KB lookup,
LLM key check, config load, overwrite handling, Generator.run dispatch,
validation surfacing. Adds `--critique` flag to opt into the deck
critic pass.

`openkb.skill.workspace.save_iteration` is hard-wired to skill paths
(uses skill_dir / skill_workspace_dir), so this commit inlines a
small `_save_deck_iteration` helper that mirrors the iteration-N
copy-then-rmtree behavior using `deck_workspace_dir`. Keeps deck
rollback history scoped to decks without coupling deck CLI to skill
internals or churning skill code.
* critic.py: narrow restore docstring — it fires only on validation
  errors, not on critic exceptions or MaxTurnsExceeded (those bubble
  out of Generator.run before the restore branch runs).
* creator.py: drop stale "(see Task 7 — until then, only critique=False
  is wired)" qualifier; critique=True is now wired.
* chat.py: chat handler comment claimed "same gate as CLI" but chat
  prints issues and continues while CLI exits 1. Reworded for both
  /skill new and /deck new handlers.
…pic-style skills

* openkb/agent/skills.py — scans <kb>/skills/, ~/.openkb/skills/,
  ~/.claude/skills/ for SKILL.md packages with YAML frontmatter.
* openkb/agent/safe_shell.py — subprocess ShellExecutor (left for
  reference; not currently wired because OpenAI Agents SDK's hosted
  ShellTool isn't compatible with LiteLLM's ChatCompletions routing).
* openkb/agent/query.py::build_chat_agent — injects list_skills() +
  read_skill(name) function tools when local skills exist, and
  appends a "## Available skills" block to the system prompt so the
  agent treats skill use as the default for matching requests.
* skills/openkb-deck-editorial/SKILL.md — first internal skill.
  Repackaged from openkb/prompts/deck_create.md with placeholders
  removed (since the agent now reads it as instructions, not as a
  format-string).

POC verified: 'openkb chat' + 'make me a magazine-style deck about X'
loads the skill, follows its working method, writes a valid
Editorial Monocle deck under output/decks/<slug>/.

openkb/prompts/deck_create.md is intentionally left in place — the
existing 'openkb deck new' CLI still uses it. Consolidation is a
follow-up: refactor deck.creator to load via the skill system, then
the prompts/ copy can be deleted.
…s /critique

Architecture: every generator path now goes through one entry point —
`openkb.agent.skill_runner.run_skill(skill_name, intent, ...)`. Loads
the named SKILL.md, builds an Agent with that body as instructions plus
the standard wiki-read tools and a path-scoped write_file. The CLI and
chat surfaces are both thin wrappers around it.

* openkb/agent/skill_runner.py — new. Generic loader.
* openkb/agent/tools.py — add `read_kb_file` (wiki/+output/+skills/
  read scope) to mirror write_kb_file's allow-list; skill_runner
  exposes it as the `read_output_or_skill_file` tool so the html-critic
  skill can inspect existing deck artifacts (the wiki-scoped read tool
  could not).
* openkb/deck/creator.py::run_deck_create — refactored to delegate to
  `run_skill("openkb-deck-editorial", ...)`. Pre-refactor implementation
  kept under `_legacy_run_deck_create_pre_skill_refactor` so existing
  agent-shape tests still pass while the rewrite is under review.
* openkb/agent/chat.py — `/critique <path>` slash command. Runs the
  openkb-html-critic skill on a target HTML file (typically
  `output/decks/<name>/index.html`).
* skills/openkb-html-critic/SKILL.md — new skill. Reads an HTML deck,
  diagnoses CSS specificity bugs (slide-modifier classes overriding
  `.slide{display:none}` — the most common LLM mistake), missing nav,
  failure-of-self-containment, then patches in one atomic write.
  Never touches slide content (numbers, names, quotes).

POC verified end-to-end against the bug-prone gpt-5.4-mini guizang
output: critic stripped `.divider{display:flex}` and
`.center{display:flex}` in one pass. Full regression: 539 tests pass.
Now that openkb deck new and openkb chat both route through
openkb.agent.skill_runner.run_skill loading SKILL.md files, the
pre-refactor pipeline is dead code. Removed:

* openkb/deck/critic.py — replaced by skills/openkb-html-critic/SKILL.md.
  snapshot_pre_critique / restore_pre_critique / cleanup_pre_critique
  all gone — html-critic patches in place, no snapshot needed.
* openkb/deck/tools.py — write_deck_file / read_deck_file replaced by
  the broader read_kb_file / write_kb_file in openkb/agent/tools.py
  (skill_runner exposes them as function tools).
* openkb/prompts/deck_create.md — repackaged as
  skills/openkb-deck-editorial/SKILL.md.
* openkb/prompts/deck_critique.md — repackaged into the html-critic
  skill's working method.
* openkb/agent/safe_shell.py — was for SDK ShellTool which is hosted-
  only (incompatible with LiteLLM routing). Function tools replaced
  this path; removing the dead reference.
* tests/test_deck_critic.py / test_deck_tools.py — covered deleted
  symbols.
* Snapshot/restore tests in tests/test_generator.py — covered the
  Generator branch that no longer exists.

Simplified:
* openkb/skill/generator.py::Generator.run deck branch — no more
  restore/cleanup hooks; just run + validate.
* openkb/deck/creator.py — pruned to the skill-runner wrapper plus
  the index.html existence check; the previous file was 250+ lines
  of agent wiring, now ~80 lines.

Rewrote:
* tests/test_deck_creator.py — 6 focused smoke tests against the
  new wrapper (mocks run_skill; verifies skill name, chaining,
  error paths).
* tests/test_deck_prompt.py — targets the new SKILL.md location and
  checks the same content anchors (palette, slide grammar,
  description triggers).

Regression: 523 tests pass (down from 539; the missing 16 are the
deleted-test count). Zero functional regressions; new architecture
end-to-end verified earlier on bug-prone gpt-5.4-mini guizang output.
Architectural review (4 parallel Opus auditors) found that the skill_runner
core was already generic, but the deck SURFACE was still fused to
Editorial Monocle. Fixed:

* validator: now takes optional `grammar` param (DeckGrammar TypedDict);
  skill-agnostic by default (only checks file present, parses, ≥5
  slides, self-contained). Third-party deck skills (guizang, swiss)
  now pass validation cleanly. Editorial-specific rules opt-in via
  `EDITORIAL_MONOCLE_GRAMMAR`. (finding #2)
* skills/openkb-deck-editorial/SKILL.md: declares its grammar +
  output_path_template under `od:` frontmatter — `run_skill` reads
  these and applies them post-run.
* run_skill: now honors frontmatter `od.mode`, `od.output_path_template`,
  `od.deck_grammar`. When mode=="deck" and template is set, the runner
  injects the path into intent, verifies the file exists post-run, and
  runs validate_deck with the skill's grammar. Validation result is
  returned via new SkillRunResult dataclass. (findings #4, #5)
* `openkb deck new --skill <name>`: CLI flag accepts any installed deck
  skill (default openkb-deck-editorial). guizang and swiss now usable
  from the scripted CLI, not only freeform chat. (finding #1)
* `/deck new --skill <name>` chat slash: same flag, parsed positionally
  alongside --critique. (finding #1)
* tests/test_read_kb_file.py: 13 new tests mirroring test_write_kb_file
  for the read-side allow-list. Pins refusal of `.openkb/config.yaml`,
  `.env`, `raw/`, `..` traversal, absolute paths. (finding #6)
* Generator deck branch: no longer calls validate_deck directly; just
  propagates run_deck_create's SkillRunResult.validation up. Validation
  is now a property of "this skill declared mode=deck", not of "this
  CLI path was taken".

Existing tests updated:
* tests/test_deck_validator.py: explicit grammar arg on Editorial-
  specific tests; added test_guizang_shape_passes_generic_mode +
  test_missing_cover_ignored_in_generic_mode to pin both modes.
* tests/test_deck_creator.py: mocks return SkillRunResult; new
  test_run_deck_create_honors_skill_name_override for --skill flag.
* tests/test_generator.py: deck dispatch test mocks SkillRunResult.

Below-threshold findings deferred:
* Generator if/else → registry (score 70) — works, just not extensible
  via plugin; future.
* Iteration backup in chat freeform path (score 75) — needs write_kb_file
  hook; separate change.
* run_skill / scan_local_skills / _handle_slash_critique direct tests
  (scores 60-70) — covered indirectly by integration; can add later.

Regression: 538 tests pass (was 523 pre-fix; net +15 = 13 new
read_kb_file tests + 2 new validator-mode tests).
Comment thread tests/test_deck_validator.py Fixed
Comment thread openkb/agent/skill_runner.py Fixed
Comment thread tests/test_deck_cli.py Fixed
PR review bot flagged 3 CodeQL alerts; smoke test caught a 4th:

* openkb/agent/skill_runner.py: drop unused `Agent` import (we only use
  Runner and function_tool).
* tests/test_deck_cli.py: drop unused `pytest` import.
* tests/test_deck_validator.py: drop the inline `import openkb.deck.validator
  as v` — the module is already imported via `from openkb.deck.validator
  import ...` at the top. Use `monkeypatch.setattr("openkb.deck.validator.
  MAX_FILE_BYTES", 100)` instead so there's one canonical reference.
* openkb/deck/creator.py: when chaining the critic skill,
  `result.output_path.relative_to(kb_dir)` blew up on macOS because
  `run_skill` returns a `.resolve()`-d path (e.g. `/private/tmp/.../`)
  while `kb_dir` is still the symlink form (`/tmp/.../`). Resolve
  kb_dir before the relative_to call. Caught by a manual e2e smoke
  test, NOT by the existing suite — pinned with a new
  `test_run_deck_create_critique_handles_symlinked_tmp` regression.

539 tests pass.
…h + doc fix

Three follow-up items from the architectural review (scores 60-70):

1. **openkb/skill/generator.py docstring** — claimed "future targets
   plug in by declaring an output dir and a validator" but no plug-in
   registry exists; current dispatch is a literal ``if target_type ==
   "skill"`` else-branch. Reworded to be honest about scope and to
   point at the deferred-followups entry for the registry refactor.

2. **tests/test_skill_runner.py** (NEW, 8 tests) — direct coverage of
   the function every CLI/chat path now funnels through:
   - SkillNotFoundError lists what IS available
   - body lands in agent.instructions with "## User intent" section
   - write_file + read_output_or_skill_file tools wired
   - output_path_template substitutes {slug} and enforces post-run
     existence
   - od.mode=="deck" triggers validate_deck with the skill's grammar
   - od.mode missing or non-deck skips validation
   - MaxTurnsExceeded → RuntimeError with skill-name + step-cap info
   - default max_turns matches MAX_TURNS constant

3. **tests/test_skills.py** (NEW, 16 tests) — direct coverage of the
   scanner & frontmatter parser:
   - empty when no roots exist
   - SDK shape (name/description/path, path absolute)
   - skips dirs without SKILL.md
   - skips skills missing description
   - falls back to dir name when frontmatter omits name (pinned as
     intentional sane-default, not a bug)
   - truncates description to 1024 chars
   - first-hit-wins precedence across roots
   - extra_roots appended after defaults (doesn't override)
   - DEFAULT_SKILL_ROOTS constant pinned
   - frontmatter: happy path, no-delim, unclosed, malformed YAML,
     non-dict YAML, body-containing-dashes preservation
   - autouse $HOME isolation so the user's real ~/.claude/skills
     doesn't pollute test results

4. **tests/test_critique_slash.py** (NEW, 6 tests) — direct coverage
   of /critique <path> chat slash:
   - no arg → usage print, no run_skill call
   - missing file → [ERROR], no run_skill call
   - happy path → run_skill called with openkb-html-critic + relative
     path in intent
   - absolute path inside KB → converted to relative form
   - SkillNotFoundError → [ERROR] (does not crash chat turn)
   - RuntimeError → [ERROR] (does not propagate)

Regression: 569 tests pass (was 539; net +30 new tests in this commit).

The fifth deferred follow-up — bitmap image inline handling — is out
of scope (no concrete user need yet). The Generator if/else → registry
refactor and the chat-freeform iteration-backup gap stay deferred per
the original PR description.
from __future__ import annotations

from pathlib import Path
from unittest.mock import AsyncMock, MagicMock, patch
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant