bbingz · bbingz · Jun 14, 2026 · Jun 14, 2026
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -6,9 +6,32 @@ Separate from `docs/release.md` (release-focused) and `docs/archive/session-memo
 
 ---
 
+## 2026-06-15 — Codex — Qwen audit checklist remediation
+
+- Consolidated the Qwen third-party review batch into `docs/audit/third-party-review-followup-2026-06-15.md`, then verified all 11 claims with independent subagents against the current worktree before editing. All 11 were confirmed still-present before remediation.
+- Fixed the three behavior/security issues with regressions: `writeFileAtomicSync` now removes its temp file on failed rename/write paths; no-diff review cleanup now runs in a `finally` so Gemini isolated tempdirs are removed; unsafe pid values (`<=1` / non-integers) are rejected before process-group termination.
+- Added the missing Claude health logged-out integration coverage by making the fake Claude auth fixture emit `loggedIn:false` via env, and asserting the companion reports Claude unhealthy with a populated probe error.
+- Closed the docs/parity findings: plugin README lists `debug` / `sessions` and terminal TUI ownership; root and translated READMEs describe Claude health as auth-only, add the terminal package badge, outcome diagnostics, and `minimax` (`mmx-cli`) alias; timing/runtime package READMEs document v1 `cold`/`retry` and `REVIEW_FLAG_EXPECTATIONS`.
+- Verification: focused regressions pass; `npm test` exit 0 (508/508); `npm run release:check` exit 0 including bundle, fixture, manifest, host-map, Codex adapter, review-drift, Claude plugin validation, and npm pack dry-runs.
+
+## 2026-06-15 — Codex — multi-review adjudication follow-up
+
+- Adjudicated the Minimax/Kimi/MiMo review batch against the current source after the Claude tmux TUI remediation. Kept the user-requested Claude tmux TUI default instead of reverting `ask`/`review` to `claude -p`; treated "restore synchronous LLM answer" findings as a product-semantics conflict, not a fix to apply.
+- Fixed two confirmed issues: Claude legacy `auth status` non-JSON success output is now parsed or marked inconclusive instead of treated as logout, and `session-lifecycle-hook.mjs` now removes session jobs through locked `updateState` rather than naked load/save.
+- Closed release-safety/doc drift found in the review batch: fixture freshness probes now cover the 11-provider runtime surface (`cmd`, `agy`, `grok` included); README capability notes, `docs/provider-paths.md`, `docs/polycli-v1-public-surface.md`, `CLAUDE.md`, and `docs/roadmap.md` describe Claude tmux TUI startup-only timing and the `tmuxSession`/`attachCommand` response shape.
+- Added draft `docs/release-notes-v0.6.21.md` for the current unreleased patch rather than rewriting the already-published v0.6.20 notes.
+
+## 2026-06-14 — Codex — Claude tmux TUI review remediation
+
+- Adjudicated the Claude/DeepSeek review findings against the current code and the user requirement that Claude subagent calls avoid the upcoming `claude -p` pay-as-you-go path. Confirmed the ask/review semantic drift, timing ambiguity, missing signal cleanup, tmux environment propagation gap, and auth-only health ambiguity; intentionally did **not** revert Claude ask/review defaults to `-p`.
+- Hardened Claude tmux TUI mode: `tmux new-session` now receives an explicit allowlist of Claude/Anthropic/proxy/cert env vars via `-e`; SIGINT/SIGTERM during orchestration kill the created tmux session before process shutdown; missing tmux gets a direct install/config error; successful tmux launches return `detached:true`, `responseKind:"tmux_tui_session_started"`, `warnings`, and `timingMeta` that says timing covers only `tmux_startup` and `llmCompletionObserved:false`.
+- Runtime timing now merges provider `timingMeta` and uses the run-level timing support for Claude tmux TUI, so `ttft/gen/tail` stay `unsupported`, `total` remains schema-valid `measured`, and the record explicitly marks `tmuxDetached:true` / startup-only timing. Claude health remains no-model-call/auth-only by design and now reports `probe.kind:"auth_status"` plus `authOnly:true` instead of looking like a sentinel LLM probe.
+- Tests added/updated for tmux env propagation, detached payload semantics, startup-only timing metadata, signal cleanup, Claude health auth-only reporting, and companion ask/review integration. Bundles regenerated for all host surfaces.
+- Verification: `npm test` exit 0 (500/500); `node --test packages/polycli-runtime/test/claude.test.js`; `node --test packages/polycli-runtime/test/registry.test.js`; `node --test plugins/polycli/scripts/tests/integration.test.mjs`; `npm run validate:bundles`; `npm run validate:manifests`; `npm run validate:host-map`.
+
 ## 2026-06-02 — Claude — repo cleanup: removed stale R8 worktrees + `release/v0.6.19` branch
 
-- After the v0.6.20 release, deleted the merged `release/v0.6.19` branch and the 3 abandoned `worktree-agent-*` git worktrees + their branches (all local-only — none on origin). Verified safe first: each branch had 0 commits not in `main` (so `git branch -d` succeeded, git-confirming they were merged); the worktrees' only uncommitted content was an identical, obsolete 2026-04-24 path-rewrite (`/home/user/…`→`/Users/bing/…`) on a snapshot ~41k lines behind `main`, locked by a dead pid (96484).
+- After the v0.6.20 release, deleted the merged `release/v0.6.19` branch and the 3 abandoned `worktree-agent-*` git worktrees + their branches (all local-only — none on origin). Verified safe first: each branch had 0 commits not in `main` (so `git branch -d` succeeded, git-confirming they were merged); the worktrees' only uncommitted content was an identical, obsolete 2026-04-24 path-rewrite (`/home/user/…`→`<local-home>/…`) on a snapshot ~41k lines behind `main`, locked by a dead pid (96484).
 - The single (identical across all 3) staged diff was saved to `/tmp/r8-worktree-staged-pathrewrite.patch` as insurance, but applying it is NOT advised: active files (README/docs) no longer carry those paths, and the remaining `/home/user/` references on `main` are historical records (CHANGELOG, `docs/archive/*`, `release-notes-v0.6.1`) that should not be rewritten.
 - Repo now has a single `main` branch, synced with origin, at v0.6.20.
 

diff --git a/CLAUDE.md b/CLAUDE.md
@@ -22,13 +22,14 @@ Claude Code 专属补丁。基础规则见 [AGENTS.md](AGENTS.md)，此处只列
 | 任务类型 | 命令 |
 |---|---|
 | 动单个 package | `node --test packages/<pkg>/test/*.test.js` 先跑 |
-| 改 runtime / host 之一 | 再跑 `npm test`（会先 `build:plugins` 再跑全量 119+ 测试） |
+| 改 runtime / host 之一 | 再跑 `npm test`（会先 `build:plugins` 再跑全量测试） |
 | 要发布前校验 | `npm run release:check`（依赖 `claude plugin validate`） |
 
 注意 `npm test` 已内含 `build:plugins`，**不要**另外先手动 build 再 test。
 
 ## Claude-specific provider notes
-- `claude` runtime 用 `--output-format stream-json` 时必须带 `--verbose`，这是 CLI 契约
+- `claude` runtime 的 print/headless 路径用 `--output-format stream-json` 时必须带 `--verbose`，这是 CLI 契约；不要把这个 `-p`/stream-json 规则套到默认 ask/review 的 tmux TUI 路径上
+- `claude` ask/review 默认启动 detached tmux TUI session，响应包含 `tmuxSession`/`attachCommand`，timing 只覆盖 tmux 启动和 prompt 提交，不代表 LLM 完成时间
 - `claude` 可能通过 `subtype: "error"` 而非 `is_error` 报错，sync/streaming 两路错误处理必须对齐
 - `gemini` 无独立 auth-status 子命令，auth probe 是推断式；不要把 timeout/429 倒退回 `loggedIn=false`
 - `pi` 在 trivial prompt 上仍可能调 tool，属上游行为；非本地解析问题

diff --git a/README.ja.md b/README.ja.md
@@ -4,10 +4,11 @@
 
 # polycli
 
-**普段使っている AI ホストの中で、9 種類の AI コーディング CLI を 1 つのコマンド体系から操作できます。**
+**普段使っている AI ホストの中で、11 種類の AI コーディング CLI を 1 つのコマンド体系から操作できます。**
 
 [![GitHub release](https://img.shields.io/github/v/release/bbingz/polycli?label=release&color=111827)](https://github.com/bbingz/polycli/releases)
 [![CI](https://github.com/bbingz/polycli/actions/workflows/ci.yml/badge.svg)](https://github.com/bbingz/polycli/actions/workflows/ci.yml)
+[![npm: polycli](https://img.shields.io/npm/v/@bbingz/polycli?label=%40bbingz%2Fpolycli&color=cb3837)](https://www.npmjs.com/package/@bbingz/polycli)
 [![npm: polycli-opencode](https://img.shields.io/npm/v/@bbingz/polycli-opencode?label=%40bbingz%2Fpolycli-opencode&color=cb3837)](https://www.npmjs.com/package/@bbingz/polycli-opencode)
 [![npm: polycli-utils](https://img.shields.io/npm/v/@bbingz/polycli-utils?label=%40bbingz%2Fpolycli-utils&color=cb3837)](https://www.npmjs.com/package/@bbingz/polycli-utils)
 [![npm: polycli-timing](https://img.shields.io/npm/v/@bbingz/polycli-timing?label=%40bbingz%2Fpolycli-timing&color=cb3837)](https://www.npmjs.com/package/@bbingz/polycli-timing)
@@ -22,7 +23,7 @@
 
 ## polycli とは？
 
-`polycli` は、Claude Code・Codex・GitHub Copilot CLI・OpenCode のいずれかのホスト上で、共通のコマンド (`health`・`ask`・`review`・`rescue`・`timing`・`debug`、加えてバックグラウンドジョブ制御とターミナル inspector) を使って 9 種類の AI コーディング CLI — **`claude`**・**`gemini`**・**`kimi`**・**`qwen`**・**`copilot`**・**`opencode`**・**`pi`**・**`cmd`** (Command Code)・**`mini-agent`** (MiniMax) — を操作できるツールです。
+`polycli` は、Claude Code・Codex・GitHub Copilot CLI・OpenCode のいずれかのホスト上で、共通のコマンド (`health`・`ask`・`review`・`rescue`・`timing`・`debug`、加えてバックグラウンドジョブ制御とターミナル inspector) を使って 11 種類の AI コーディング CLI — **`claude`**・**`gemini`**・**`kimi`**・**`qwen`**・**`copilot`**・**`opencode`**・**`pi`**・**`cmd`** (Command Code)・**`agy`** (Antigravity)・**`grok`** (xAI Grok)・**`mmx-cli`** (MiniMax) — を操作できるツールです。
 
 これは **ユーティリティ専用の Path B モノレポ** です。プロバイダ間の差異を偽の抽象化で覆い隠したり、ランタイム基底クラスを発明したりはしません。公式の上流 CLI をサブプロセスとして組み合わせ、単一のコマンド面を公開し、4 状態の timing スキーマで能力の違いを正直に表現します。
 
@@ -39,7 +40,7 @@
 
 | ホスト (polycli のインストール先) | プロバイダ (polycli が呼び出せる対象) |
 |---|---|
-| Claude Code · Codex · GitHub Copilot CLI · OpenCode | `claude` · `copilot` · `gemini` · `kimi` · `qwen` · `opencode` · `pi` · `cmd` · `mini-agent` |
+| Claude Code · Codex · GitHub Copilot CLI · OpenCode | `claude` · `copilot` · `gemini` · `kimi` · `qwen` · `opencode` · `pi` · `cmd` · `agy` · `grok` · `minimax` (`mmx-cli`) |
 
 各プロバイダの対応能力は [Capability matrix](#capability-matrix) を参照してください。
 
@@ -95,7 +96,7 @@ polycli health
 polycli health
 ```
 
-`health` は認証済みのすべてのプロバイダに対してエンドツーエンドのプローブを実行し、生きているものを `healthyProviders` に報告します。その後の日常利用は直接呼び出すだけです:
+`health` は認証済みプロバイダに対してプローブを実行し、生きているものを `healthyProviders` に報告します。Claude は例外で、`claude auth status --json` だけを使い、health prompt は送信しません。その後の日常利用は直接呼び出すだけです:
 
 ```text
 Choose Polycli with @, then ask it to run: ask --provider qwen "このスタックトレースを説明して ..."
@@ -112,7 +113,7 @@ Choose Polycli with @, then ask it to run: rescue --provider gemini --background
 | コマンド | 動作 |
 |---|---|
 | `setup` | プロバイダ CLI のインストール状態と認証状態を確認 (モデル呼び出しなし、軽量) |
-| `health` | 短いプロンプトでエンドツーエンド検査。`healthyProviders` を返し、timing を記録 |
+| `health` | Claude 以外は短いプロンプトでエンドツーエンド検査。Claude は auth-only status を使う。`healthyProviders` を返し、適用できる場合は timing を記録 |
 | `ask` | 一発のプロンプト |
 | `review` | 現在の `git diff` に対するコードレビュー |
 | `rescue` | 長めのトリアージ / 解析タスク |
@@ -135,15 +136,18 @@ Choose Polycli with @, then ask it to run: rescue --provider gemini --background
 | `gemini` | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | — |
 | `kimi` | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | — |
 | `qwen` | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
-| `mini-agent` | ✓ | — | — | — | — | — | — |
+| `minimax` (`mmx-cli`) | ✓ | — | ✓ | — | — | — | — |
 | `opencode` | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | — |
 | `pi` | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | — |
 | `cmd` | ✓ | — | — | ✓ | ✓ | ✓ | — |
+| `agy` | ✓ | ✓ | — | ✓ | ✓ | ✓ | — |
+| `grok` | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | — |
 
 補足:
 
 - `cold` と `retry` は全プロバイダで `unsupported` です。上流 CLI に安定したシグナルがなく、polycli は偽装を拒否します。`total` は常に `measured` です。
-- `mini-agent` はログ再生方式で、session resume・構造化出力・細粒度 streaming timing をサポートしません。`cmd` は Command Code 公式の headless mode を使うため、各呼び出しは standalone session で、stdout が可視回答になります。
+- `claude` の `ask` / `review` は、`claude -p` の従量課金パスを避けるため、デフォルトで detached tmux TUI mode を使います。この mode では `ttft` / `gen` / `tail` は `unsupported` として報告され、`total` は tmux 起動と prompt 投入だけを測ります。応答には `tmuxSession` + `attachCommand` が含まれます。
+- `minimax` は `mmx-cli` の非対話 JSON 呼び出しで、session resume・細粒度 streaming timing をサポートしません。`cmd` は Command Code 公式の headless mode を使うため、各呼び出しは standalone session で、stdout が可視回答になります。`agy` は Antigravity session mode、`grok` は xAI Grok Build CLI を使います。
 - `tool: true` を宣言しているのは `qwen` のみです。`qwen` がツールを呼び出さなかったとき `missing` (観測可能だが今回は発生せず) を、他のプロバイダは `unsupported` (能力レベルで追跡しない) を報告します。両者の意味は異なるため、混同しないでください。
 
 ## Timing のセマンティクス
@@ -163,6 +167,7 @@ polycli の timing 契約が統一するのは**状態の表現**であって、
 
 - `runtimePersistence` — `ephemeral | session | daemon`
 - `measurementScope` — `request | turn | job`
+- outcome diagnostics — `outcome`, `exitCode`, `terminationReason`, `responseMatched`, and `errorCode`
 
 ## パッケージ
 

diff --git a/README.md b/README.md
@@ -4,7 +4,7 @@
 
 # polycli
 
-**One command surface across 9 AI coding CLIs, inside the host you already use.**
+**One command surface across 11 AI coding CLIs, inside the host you already use.**
 
 [![GitHub release](https://img.shields.io/github/v/release/bbingz/polycli?label=release&color=111827)](https://github.com/bbingz/polycli/releases)
 [![CI](https://github.com/bbingz/polycli/actions/workflows/ci.yml/badge.svg)](https://github.com/bbingz/polycli/actions/workflows/ci.yml)
@@ -29,16 +29,16 @@
 
 It is a **utility-only Path B monorepo**: it does not unify provider differences behind fake abstractions, and it does not invent a runtime base class. It composes the official upstream CLIs as subprocesses, exposes one command surface, and surfaces honest capability differences in a four-state timing schema.
 
-## Latest release: v0.6.19
+## Latest release: v0.6.20
 
-The latest patch adds upstream session-pollution control and provider-drift maintenance hardening (spec-driven, gated by two Codex review rounds):
+The latest patch ships the grok provider, the kimi-code v0.6.0 migration, and the deep-review hardening set:
 
-- New `polycli sessions [list | purge --confirm]` command cleans up the session/history files upstream CLIs leave under `$HOME`. Dry-run by default; deletion is driven only by ledger-recorded, re-validated realpaths — never a path guess or glob.
-- Run-ledger events now record the upstream `sessionId` + a verified `sessionArtifactPath`, so polycli-created sessions are auditable and purgeable.
-- `npm run check:fixture-freshness` flags fixtures pinned to a stale CLI version; `REVIEW_FLAG_EXPECTATIONS` is now the single source of review-flag truth (consistency-tested); `check:review-drift` is wired into the release gate.
-- No provider behavior, host command grammar, or timing schema changed.
+- Added `grok` (xAI Grok Build CLI) as the 11th provider across runtime, host adapters, skills, docs, and release validation.
+- Migrated the kimi adapter and guidance to kimi-code v0.6.0 session semantics (`--session` / `-C`) and structured `session.resume_hint` parsing.
+- Kept the Path B flat-adapter architecture intact while tightening review/deep-review hardening and bundle drift checks.
+- Utility packages stay on their independent v1.x cadence.
 
-See [`docs/release-notes-v0.6.19.md`](./docs/release-notes-v0.6.19.md).
+See [`docs/release-notes-v0.6.20.md`](./docs/release-notes-v0.6.20.md).
 
 ## Why polycli?
 
@@ -141,7 +141,7 @@ polycli health
 # OpenCode (tool call — call polycli_run with ["health","--json"])
 ```
 
-`health` runs an end-to-end probe against every provider with valid auth and reports which ones are alive in `healthyProviders`. After that, daily use is direct. In Codex, either describe the task directly or type `@`, choose Polycli, and ask it to run the companion command:
+`health` runs an end-to-end probe against every provider with valid auth and reports which ones are alive in `healthyProviders`. Claude is the exception: it uses `claude auth status --json` only and does not send a health prompt. After that, daily use is direct. In Codex, either describe the task directly or type `@`, choose Polycli, and ask it to run the companion command:
 
 ```text
 Choose Polycli with @, then ask it to run: ask --provider qwen "explain this stack trace ..."
@@ -173,7 +173,7 @@ All commands work identically across hosts:
 | Command | What it does |
 |---|---|
 | `setup` | Check provider CLI install + auth status (cheap; no model call) |
-| `health` | End-to-end short-prompt probe; returns `healthyProviders` and writes timing |
+| `health` | End-to-end short-prompt probe for providers except Claude; Claude uses auth-only status; returns `healthyProviders` and writes timing where applicable |
 | `ask` | One-shot prompt |
 | `review` | Code review against the current `git diff` |
 | `rescue` | Longer triage / analysis task |
@@ -206,6 +206,7 @@ Source of truth: [`packages/polycli-runtime/src/registry.js`](./packages/polycli
 Notes:
 
 - `cold` and `retry` are `unsupported` for every provider. Upstream CLIs lack a stable signal, and polycli refuses to fake them. `total` is always `measured`.
+- Claude `ask` and `review` run in detached tmux TUI mode by default to avoid the `claude -p` cost path. In that mode `ttft`, `gen`, and `tail` are reported as `unsupported`; `total` measures tmux startup/prompt submission only, and the response contains `tmuxSession` + `attachCommand`.
 - `minimax` uses official `mmx text chat --output json --non-interactive`; no session resume and no fine-grained streaming timing. `cmd` uses documented Command Code headless mode, where each invocation is a standalone session and stdout is the visible answer.
 - Only `qwen` declares `tool: true`. When no tool is invoked, `qwen` reports `missing` (observable but absent); the others report `unsupported` (capability-level not tracked). The two states are not interchangeable.