feat(deck): HTML slide-deck generator + universal skill loader#68
Open
KylinMountain wants to merge 24 commits into
Open
feat(deck): HTML slide-deck generator + universal skill loader#68KylinMountain wants to merge 24 commits into
KylinMountain wants to merge 24 commits into
Conversation
…xception, arc-count alignment
Mirrors `openkb skill new` structurally — name validation, KB lookup, LLM key check, config load, overwrite handling, Generator.run dispatch, validation surfacing. Adds `--critique` flag to opt into the deck critic pass. `openkb.skill.workspace.save_iteration` is hard-wired to skill paths (uses skill_dir / skill_workspace_dir), so this commit inlines a small `_save_deck_iteration` helper that mirrors the iteration-N copy-then-rmtree behavior using `deck_workspace_dir`. Keeps deck rollback history scoped to decks without coupling deck CLI to skill internals or churning skill code.
* critic.py: narrow restore docstring — it fires only on validation errors, not on critic exceptions or MaxTurnsExceeded (those bubble out of Generator.run before the restore branch runs). * creator.py: drop stale "(see Task 7 — until then, only critique=False is wired)" qualifier; critique=True is now wired. * chat.py: chat handler comment claimed "same gate as CLI" but chat prints issues and continues while CLI exits 1. Reworded for both /skill new and /deck new handlers.
…pic-style skills * openkb/agent/skills.py — scans <kb>/skills/, ~/.openkb/skills/, ~/.claude/skills/ for SKILL.md packages with YAML frontmatter. * openkb/agent/safe_shell.py — subprocess ShellExecutor (left for reference; not currently wired because OpenAI Agents SDK's hosted ShellTool isn't compatible with LiteLLM's ChatCompletions routing). * openkb/agent/query.py::build_chat_agent — injects list_skills() + read_skill(name) function tools when local skills exist, and appends a "## Available skills" block to the system prompt so the agent treats skill use as the default for matching requests. * skills/openkb-deck-editorial/SKILL.md — first internal skill. Repackaged from openkb/prompts/deck_create.md with placeholders removed (since the agent now reads it as instructions, not as a format-string). POC verified: 'openkb chat' + 'make me a magazine-style deck about X' loads the skill, follows its working method, writes a valid Editorial Monocle deck under output/decks/<slug>/. openkb/prompts/deck_create.md is intentionally left in place — the existing 'openkb deck new' CLI still uses it. Consolidation is a follow-up: refactor deck.creator to load via the skill system, then the prompts/ copy can be deleted.
…e, data centering)
…s /critique
Architecture: every generator path now goes through one entry point —
`openkb.agent.skill_runner.run_skill(skill_name, intent, ...)`. Loads
the named SKILL.md, builds an Agent with that body as instructions plus
the standard wiki-read tools and a path-scoped write_file. The CLI and
chat surfaces are both thin wrappers around it.
* openkb/agent/skill_runner.py — new. Generic loader.
* openkb/agent/tools.py — add `read_kb_file` (wiki/+output/+skills/
read scope) to mirror write_kb_file's allow-list; skill_runner
exposes it as the `read_output_or_skill_file` tool so the html-critic
skill can inspect existing deck artifacts (the wiki-scoped read tool
could not).
* openkb/deck/creator.py::run_deck_create — refactored to delegate to
`run_skill("openkb-deck-editorial", ...)`. Pre-refactor implementation
kept under `_legacy_run_deck_create_pre_skill_refactor` so existing
agent-shape tests still pass while the rewrite is under review.
* openkb/agent/chat.py — `/critique <path>` slash command. Runs the
openkb-html-critic skill on a target HTML file (typically
`output/decks/<name>/index.html`).
* skills/openkb-html-critic/SKILL.md — new skill. Reads an HTML deck,
diagnoses CSS specificity bugs (slide-modifier classes overriding
`.slide{display:none}` — the most common LLM mistake), missing nav,
failure-of-self-containment, then patches in one atomic write.
Never touches slide content (numbers, names, quotes).
POC verified end-to-end against the bug-prone gpt-5.4-mini guizang
output: critic stripped `.divider{display:flex}` and
`.center{display:flex}` in one pass. Full regression: 539 tests pass.
Now that openkb deck new and openkb chat both route through openkb.agent.skill_runner.run_skill loading SKILL.md files, the pre-refactor pipeline is dead code. Removed: * openkb/deck/critic.py — replaced by skills/openkb-html-critic/SKILL.md. snapshot_pre_critique / restore_pre_critique / cleanup_pre_critique all gone — html-critic patches in place, no snapshot needed. * openkb/deck/tools.py — write_deck_file / read_deck_file replaced by the broader read_kb_file / write_kb_file in openkb/agent/tools.py (skill_runner exposes them as function tools). * openkb/prompts/deck_create.md — repackaged as skills/openkb-deck-editorial/SKILL.md. * openkb/prompts/deck_critique.md — repackaged into the html-critic skill's working method. * openkb/agent/safe_shell.py — was for SDK ShellTool which is hosted- only (incompatible with LiteLLM routing). Function tools replaced this path; removing the dead reference. * tests/test_deck_critic.py / test_deck_tools.py — covered deleted symbols. * Snapshot/restore tests in tests/test_generator.py — covered the Generator branch that no longer exists. Simplified: * openkb/skill/generator.py::Generator.run deck branch — no more restore/cleanup hooks; just run + validate. * openkb/deck/creator.py — pruned to the skill-runner wrapper plus the index.html existence check; the previous file was 250+ lines of agent wiring, now ~80 lines. Rewrote: * tests/test_deck_creator.py — 6 focused smoke tests against the new wrapper (mocks run_skill; verifies skill name, chaining, error paths). * tests/test_deck_prompt.py — targets the new SKILL.md location and checks the same content anchors (palette, slide grammar, description triggers). Regression: 523 tests pass (down from 539; the missing 16 are the deleted-test count). Zero functional regressions; new architecture end-to-end verified earlier on bug-prone gpt-5.4-mini guizang output.
Architectural review (4 parallel Opus auditors) found that the skill_runner core was already generic, but the deck SURFACE was still fused to Editorial Monocle. Fixed: * validator: now takes optional `grammar` param (DeckGrammar TypedDict); skill-agnostic by default (only checks file present, parses, ≥5 slides, self-contained). Third-party deck skills (guizang, swiss) now pass validation cleanly. Editorial-specific rules opt-in via `EDITORIAL_MONOCLE_GRAMMAR`. (finding #2) * skills/openkb-deck-editorial/SKILL.md: declares its grammar + output_path_template under `od:` frontmatter — `run_skill` reads these and applies them post-run. * run_skill: now honors frontmatter `od.mode`, `od.output_path_template`, `od.deck_grammar`. When mode=="deck" and template is set, the runner injects the path into intent, verifies the file exists post-run, and runs validate_deck with the skill's grammar. Validation result is returned via new SkillRunResult dataclass. (findings #4, #5) * `openkb deck new --skill <name>`: CLI flag accepts any installed deck skill (default openkb-deck-editorial). guizang and swiss now usable from the scripted CLI, not only freeform chat. (finding #1) * `/deck new --skill <name>` chat slash: same flag, parsed positionally alongside --critique. (finding #1) * tests/test_read_kb_file.py: 13 new tests mirroring test_write_kb_file for the read-side allow-list. Pins refusal of `.openkb/config.yaml`, `.env`, `raw/`, `..` traversal, absolute paths. (finding #6) * Generator deck branch: no longer calls validate_deck directly; just propagates run_deck_create's SkillRunResult.validation up. Validation is now a property of "this skill declared mode=deck", not of "this CLI path was taken". Existing tests updated: * tests/test_deck_validator.py: explicit grammar arg on Editorial- specific tests; added test_guizang_shape_passes_generic_mode + test_missing_cover_ignored_in_generic_mode to pin both modes. * tests/test_deck_creator.py: mocks return SkillRunResult; new test_run_deck_create_honors_skill_name_override for --skill flag. * tests/test_generator.py: deck dispatch test mocks SkillRunResult. Below-threshold findings deferred: * Generator if/else → registry (score 70) — works, just not extensible via plugin; future. * Iteration backup in chat freeform path (score 75) — needs write_kb_file hook; separate change. * run_skill / scan_local_skills / _handle_slash_critique direct tests (scores 60-70) — covered indirectly by integration; can add later. Regression: 538 tests pass (was 523 pre-fix; net +15 = 13 new read_kb_file tests + 2 new validator-mode tests).
PR review bot flagged 3 CodeQL alerts; smoke test caught a 4th:
* openkb/agent/skill_runner.py: drop unused `Agent` import (we only use
Runner and function_tool).
* tests/test_deck_cli.py: drop unused `pytest` import.
* tests/test_deck_validator.py: drop the inline `import openkb.deck.validator
as v` — the module is already imported via `from openkb.deck.validator
import ...` at the top. Use `monkeypatch.setattr("openkb.deck.validator.
MAX_FILE_BYTES", 100)` instead so there's one canonical reference.
* openkb/deck/creator.py: when chaining the critic skill,
`result.output_path.relative_to(kb_dir)` blew up on macOS because
`run_skill` returns a `.resolve()`-d path (e.g. `/private/tmp/.../`)
while `kb_dir` is still the symlink form (`/tmp/.../`). Resolve
kb_dir before the relative_to call. Caught by a manual e2e smoke
test, NOT by the existing suite — pinned with a new
`test_run_deck_create_critique_handles_symlinked_tmp` regression.
539 tests pass.
…h + doc fix
Three follow-up items from the architectural review (scores 60-70):
1. **openkb/skill/generator.py docstring** — claimed "future targets
plug in by declaring an output dir and a validator" but no plug-in
registry exists; current dispatch is a literal ``if target_type ==
"skill"`` else-branch. Reworded to be honest about scope and to
point at the deferred-followups entry for the registry refactor.
2. **tests/test_skill_runner.py** (NEW, 8 tests) — direct coverage of
the function every CLI/chat path now funnels through:
- SkillNotFoundError lists what IS available
- body lands in agent.instructions with "## User intent" section
- write_file + read_output_or_skill_file tools wired
- output_path_template substitutes {slug} and enforces post-run
existence
- od.mode=="deck" triggers validate_deck with the skill's grammar
- od.mode missing or non-deck skips validation
- MaxTurnsExceeded → RuntimeError with skill-name + step-cap info
- default max_turns matches MAX_TURNS constant
3. **tests/test_skills.py** (NEW, 16 tests) — direct coverage of the
scanner & frontmatter parser:
- empty when no roots exist
- SDK shape (name/description/path, path absolute)
- skips dirs without SKILL.md
- skips skills missing description
- falls back to dir name when frontmatter omits name (pinned as
intentional sane-default, not a bug)
- truncates description to 1024 chars
- first-hit-wins precedence across roots
- extra_roots appended after defaults (doesn't override)
- DEFAULT_SKILL_ROOTS constant pinned
- frontmatter: happy path, no-delim, unclosed, malformed YAML,
non-dict YAML, body-containing-dashes preservation
- autouse $HOME isolation so the user's real ~/.claude/skills
doesn't pollute test results
4. **tests/test_critique_slash.py** (NEW, 6 tests) — direct coverage
of /critique <path> chat slash:
- no arg → usage print, no run_skill call
- missing file → [ERROR], no run_skill call
- happy path → run_skill called with openkb-html-critic + relative
path in intent
- absolute path inside KB → converted to relative form
- SkillNotFoundError → [ERROR] (does not crash chat turn)
- RuntimeError → [ERROR] (does not propagate)
Regression: 569 tests pass (was 539; net +30 new tests in this commit).
The fifth deferred follow-up — bitmap image inline handling — is out
of scope (no concrete user need yet). The Generator if/else → registry
refactor and the chat-freeform iteration-backup gap stay deferred per
the original PR description.
| from __future__ import annotations | ||
|
|
||
| from pathlib import Path | ||
| from unittest.mock import AsyncMock, MagicMock, patch |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
openkb deck new <name> "<intent>" [--critique] [--skill <name>]— new CLI command that generates a polished single-file HTML slide deck from the compiled wiki. Default skill isopenkb-deck-editorial(warm cream + serif + brick-red, magazine style).--critiquechains theopenkb-html-criticskill to catch CSS specificity bugs / missing keyboard nav / failure-of-self-containment.openkb chat— chat now auto-discovers Anthropic-styleSKILL.mdpackages from<kb>/skills/,~/.openkb/skills/, and~/.claude/skills/. Users say "make me a deck about X" in natural language and the agent picks the matching skill viaread_skill. New/critique <path>slash command runs html-critic on any existing HTML.openkb deck new,/deck new, freeform chat skill invocation,/critique) routes through a singlerun_skill(skill_name, intent, ...)entry point. Skill frontmatter (od.mode,od.output_path_template,od.deck_grammar) drives output-path templating + post-run validation. New artifact types = drop aSKILL.mdinto~/.openkb/skills/<name>/, zero core code change. Third-party deck skills (e.g.nexu-io/open-design'sdeck-guizang-editorial,deck-swiss-international) work out of the box.validate_deck()defaults to skill-agnostic checks (file exists, parses, ≥5 slides, self-contained, no external<link>/<script>/<img>). The Editorial Monocle skill declares its slide-grammar via its frontmatter; other skills inherit only the generic guarantees.22 commits, +2200 / -33 LOC net across 19 files. Brainstorm + design spec live in
docs/superpowers/specs/2026-05-22-openkb-deck-target-design.md(local-only per repo convention).Test plan
pytest tests/— full suite passes (currently 538 tests, +15 net on this branch including 13 newtests/test_read_kb_file.pycovering the read-side allow-list)openkb deck new transformers-pitch "Explain attention to engineers"producesoutput/decks/transformers-pitch/index.htmlthat opens in browser, supports ← → / F / P keyboard navopenkb deck new ... --critiquechains html-critic and patches CSS specificity bugs in place (e.g..divider{display:flex}→.divider{})openkb deck new ... --skill deck-guizang-editorialroutes to a third-party skill aftercp -r open-design/skills/deck-guizang-editorial ~/.openkb/skills/openkb chatthen "make me a magazine-style deck about X" — natural-language path discovers + invokes the matching skillopenkb chatthen/critique output/decks/foo/index.html— critic skill reads, diagnoses, patches in one atomic writedata-typeattrs); Editorial-grammar mode rejects them — both behaviors pinned bytests/test_deck_validator.pyArchitecture notes
The branch went through several iterations as the design was refined:
Generatortarget with hardcoded prompt + bespoke critic + snapshot/restore (commits5c2d6eb→daacce5).SKILL.mdpackages as the unit of extension. Refactor on top:openkb/agent/skill_runner.py+skills/openkb-deck-editorial/SKILL.md+skills/openkb-html-critic/SKILL.md(commits03c3bbb→708baa3).63b62c2:--skillflag, validator generalized viaDeckGrammarTypedDict + skill frontmatter, output-path + validation pushed down intorun_skill, security-adjacentread_kb_filegot 13 tests.Out of scope (follow-ups)
Lower-confidence findings (score 60-75 in the same review):
Generatorif/else → plugin registry refactorrun_skill/scan_local_skills/_handle_slash_critique