Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 24 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,32 @@ Separate from `docs/release.md` (release-focused) and `docs/archive/session-memo

---

## 2026-06-15 — Codex — Qwen audit checklist remediation

- Consolidated the Qwen third-party review batch into `docs/audit/third-party-review-followup-2026-06-15.md`, then verified all 11 claims with independent subagents against the current worktree before editing. All 11 were confirmed still-present before remediation.
- Fixed the three behavior/security issues with regressions: `writeFileAtomicSync` now removes its temp file on failed rename/write paths; no-diff review cleanup now runs in a `finally` so Gemini isolated tempdirs are removed; unsafe pid values (`<=1` / non-integers) are rejected before process-group termination.
- Added the missing Claude health logged-out integration coverage by making the fake Claude auth fixture emit `loggedIn:false` via env, and asserting the companion reports Claude unhealthy with a populated probe error.
- Closed the docs/parity findings: plugin README lists `debug` / `sessions` and terminal TUI ownership; root and translated READMEs describe Claude health as auth-only, add the terminal package badge, outcome diagnostics, and `minimax` (`mmx-cli`) alias; timing/runtime package READMEs document v1 `cold`/`retry` and `REVIEW_FLAG_EXPECTATIONS`.
- Verification: focused regressions pass; `npm test` exit 0 (508/508); `npm run release:check` exit 0 including bundle, fixture, manifest, host-map, Codex adapter, review-drift, Claude plugin validation, and npm pack dry-runs.

## 2026-06-15 — Codex — multi-review adjudication follow-up

- Adjudicated the Minimax/Kimi/MiMo review batch against the current source after the Claude tmux TUI remediation. Kept the user-requested Claude tmux TUI default instead of reverting `ask`/`review` to `claude -p`; treated "restore synchronous LLM answer" findings as a product-semantics conflict, not a fix to apply.
- Fixed two confirmed issues: Claude legacy `auth status` non-JSON success output is now parsed or marked inconclusive instead of treated as logout, and `session-lifecycle-hook.mjs` now removes session jobs through locked `updateState` rather than naked load/save.
- Closed release-safety/doc drift found in the review batch: fixture freshness probes now cover the 11-provider runtime surface (`cmd`, `agy`, `grok` included); README capability notes, `docs/provider-paths.md`, `docs/polycli-v1-public-surface.md`, `CLAUDE.md`, and `docs/roadmap.md` describe Claude tmux TUI startup-only timing and the `tmuxSession`/`attachCommand` response shape.
- Added draft `docs/release-notes-v0.6.21.md` for the current unreleased patch rather than rewriting the already-published v0.6.20 notes.

## 2026-06-14 — Codex — Claude tmux TUI review remediation

- Adjudicated the Claude/DeepSeek review findings against the current code and the user requirement that Claude subagent calls avoid the upcoming `claude -p` pay-as-you-go path. Confirmed the ask/review semantic drift, timing ambiguity, missing signal cleanup, tmux environment propagation gap, and auth-only health ambiguity; intentionally did **not** revert Claude ask/review defaults to `-p`.
- Hardened Claude tmux TUI mode: `tmux new-session` now receives an explicit allowlist of Claude/Anthropic/proxy/cert env vars via `-e`; SIGINT/SIGTERM during orchestration kill the created tmux session before process shutdown; missing tmux gets a direct install/config error; successful tmux launches return `detached:true`, `responseKind:"tmux_tui_session_started"`, `warnings`, and `timingMeta` that says timing covers only `tmux_startup` and `llmCompletionObserved:false`.
- Runtime timing now merges provider `timingMeta` and uses the run-level timing support for Claude tmux TUI, so `ttft/gen/tail` stay `unsupported`, `total` remains schema-valid `measured`, and the record explicitly marks `tmuxDetached:true` / startup-only timing. Claude health remains no-model-call/auth-only by design and now reports `probe.kind:"auth_status"` plus `authOnly:true` instead of looking like a sentinel LLM probe.
- Tests added/updated for tmux env propagation, detached payload semantics, startup-only timing metadata, signal cleanup, Claude health auth-only reporting, and companion ask/review integration. Bundles regenerated for all host surfaces.
- Verification: `npm test` exit 0 (500/500); `node --test packages/polycli-runtime/test/claude.test.js`; `node --test packages/polycli-runtime/test/registry.test.js`; `node --test plugins/polycli/scripts/tests/integration.test.mjs`; `npm run validate:bundles`; `npm run validate:manifests`; `npm run validate:host-map`.

## 2026-06-02 — Claude — repo cleanup: removed stale R8 worktrees + `release/v0.6.19` branch

- After the v0.6.20 release, deleted the merged `release/v0.6.19` branch and the 3 abandoned `worktree-agent-*` git worktrees + their branches (all local-only — none on origin). Verified safe first: each branch had 0 commits not in `main` (so `git branch -d` succeeded, git-confirming they were merged); the worktrees' only uncommitted content was an identical, obsolete 2026-04-24 path-rewrite (`/home/user/…`→`/Users/bing/…`) on a snapshot ~41k lines behind `main`, locked by a dead pid (96484).
- After the v0.6.20 release, deleted the merged `release/v0.6.19` branch and the 3 abandoned `worktree-agent-*` git worktrees + their branches (all local-only — none on origin). Verified safe first: each branch had 0 commits not in `main` (so `git branch -d` succeeded, git-confirming they were merged); the worktrees' only uncommitted content was an identical, obsolete 2026-04-24 path-rewrite (`/home/user/…`→`<local-home>/…`) on a snapshot ~41k lines behind `main`, locked by a dead pid (96484).
- The single (identical across all 3) staged diff was saved to `/tmp/r8-worktree-staged-pathrewrite.patch` as insurance, but applying it is NOT advised: active files (README/docs) no longer carry those paths, and the remaining `/home/user/` references on `main` are historical records (CHANGELOG, `docs/archive/*`, `release-notes-v0.6.1`) that should not be rewritten.
- Repo now has a single `main` branch, synced with origin, at v0.6.20.

Expand Down
5 changes: 3 additions & 2 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,13 +22,14 @@ Claude Code 专属补丁。基础规则见 [AGENTS.md](AGENTS.md),此处只列
| 任务类型 | 命令 |
|---|---|
| 动单个 package | `node --test packages/<pkg>/test/*.test.js` 先跑 |
| 改 runtime / host 之一 | 再跑 `npm test`(会先 `build:plugins` 再跑全量 119+ 测试) |
| 改 runtime / host 之一 | 再跑 `npm test`(会先 `build:plugins` 再跑全量测试) |
| 要发布前校验 | `npm run release:check`(依赖 `claude plugin validate`) |

注意 `npm test` 已内含 `build:plugins`,**不要**另外先手动 build 再 test。

## Claude-specific provider notes
- `claude` runtime 用 `--output-format stream-json` 时必须带 `--verbose`,这是 CLI 契约
- `claude` runtime 的 print/headless 路径用 `--output-format stream-json` 时必须带 `--verbose`,这是 CLI 契约;不要把这个 `-p`/stream-json 规则套到默认 ask/review 的 tmux TUI 路径上
- `claude` ask/review 默认启动 detached tmux TUI session,响应包含 `tmuxSession`/`attachCommand`,timing 只覆盖 tmux 启动和 prompt 提交,不代表 LLM 完成时间
- `claude` 可能通过 `subtype: "error"` 而非 `is_error` 报错,sync/streaming 两路错误处理必须对齐
- `gemini` 无独立 auth-status 子命令,auth probe 是推断式;不要把 timeout/429 倒退回 `loggedIn=false`
- `pi` 在 trivial prompt 上仍可能调 tool,属上游行为;非本地解析问题
Expand Down
19 changes: 12 additions & 7 deletions README.ja.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,10 +4,11 @@

# polycli

**普段使っている AI ホストの中で、9 種類の AI コーディング CLI を 1 つのコマンド体系から操作できます。**
**普段使っている AI ホストの中で、11 種類の AI コーディング CLI を 1 つのコマンド体系から操作できます。**

[![GitHub release](https://img.shields.io/github/v/release/bbingz/polycli?label=release&color=111827)](https://github.com/bbingz/polycli/releases)
[![CI](https://github.com/bbingz/polycli/actions/workflows/ci.yml/badge.svg)](https://github.com/bbingz/polycli/actions/workflows/ci.yml)
[![npm: polycli](https://img.shields.io/npm/v/@bbingz/polycli?label=%40bbingz%2Fpolycli&color=cb3837)](https://www.npmjs.com/package/@bbingz/polycli)
[![npm: polycli-opencode](https://img.shields.io/npm/v/@bbingz/polycli-opencode?label=%40bbingz%2Fpolycli-opencode&color=cb3837)](https://www.npmjs.com/package/@bbingz/polycli-opencode)
[![npm: polycli-utils](https://img.shields.io/npm/v/@bbingz/polycli-utils?label=%40bbingz%2Fpolycli-utils&color=cb3837)](https://www.npmjs.com/package/@bbingz/polycli-utils)
[![npm: polycli-timing](https://img.shields.io/npm/v/@bbingz/polycli-timing?label=%40bbingz%2Fpolycli-timing&color=cb3837)](https://www.npmjs.com/package/@bbingz/polycli-timing)
Expand All @@ -22,7 +23,7 @@

## polycli とは?

`polycli` は、Claude Code・Codex・GitHub Copilot CLI・OpenCode のいずれかのホスト上で、共通のコマンド (`health`・`ask`・`review`・`rescue`・`timing`・`debug`、加えてバックグラウンドジョブ制御とターミナル inspector) を使って 9 種類の AI コーディング CLI — **`claude`**・**`gemini`**・**`kimi`**・**`qwen`**・**`copilot`**・**`opencode`**・**`pi`**・**`cmd`** (Command Code)・**`mini-agent`** (MiniMax) — を操作できるツールです。
`polycli` は、Claude Code・Codex・GitHub Copilot CLI・OpenCode のいずれかのホスト上で、共通のコマンド (`health`・`ask`・`review`・`rescue`・`timing`・`debug`、加えてバックグラウンドジョブ制御とターミナル inspector) を使って 11 種類の AI コーディング CLI — **`claude`**・**`gemini`**・**`kimi`**・**`qwen`**・**`copilot`**・**`opencode`**・**`pi`**・**`cmd`** (Command Code)・**`agy`** (Antigravity)・**`grok`** (xAI Grok)・**`mmx-cli`** (MiniMax) — を操作できるツールです。

これは **ユーティリティ専用の Path B モノレポ** です。プロバイダ間の差異を偽の抽象化で覆い隠したり、ランタイム基底クラスを発明したりはしません。公式の上流 CLI をサブプロセスとして組み合わせ、単一のコマンド面を公開し、4 状態の timing スキーマで能力の違いを正直に表現します。

Expand All @@ -39,7 +40,7 @@

| ホスト (polycli のインストール先) | プロバイダ (polycli が呼び出せる対象) |
|---|---|
| Claude Code · Codex · GitHub Copilot CLI · OpenCode | `claude` · `copilot` · `gemini` · `kimi` · `qwen` · `opencode` · `pi` · `cmd` · `mini-agent` |
| Claude Code · Codex · GitHub Copilot CLI · OpenCode | `claude` · `copilot` · `gemini` · `kimi` · `qwen` · `opencode` · `pi` · `cmd` · `agy` · `grok` · `minimax` (`mmx-cli`) |

各プロバイダの対応能力は [Capability matrix](#capability-matrix) を参照してください。

Expand Down Expand Up @@ -95,7 +96,7 @@ polycli health
polycli health
```

`health` は認証済みのすべてのプロバイダに対してエンドツーエンドのプローブを実行し、生きているものを `healthyProviders` に報告します。その後の日常利用は直接呼び出すだけです:
`health` は認証済みプロバイダに対してプローブを実行し、生きているものを `healthyProviders` に報告します。Claude は例外で、`claude auth status --json` だけを使い、health prompt は送信しません。その後の日常利用は直接呼び出すだけです:

```text
Choose Polycli with @, then ask it to run: ask --provider qwen "このスタックトレースを説明して ..."
Expand All @@ -112,7 +113,7 @@ Choose Polycli with @, then ask it to run: rescue --provider gemini --background
| コマンド | 動作 |
|---|---|
| `setup` | プロバイダ CLI のインストール状態と認証状態を確認 (モデル呼び出しなし、軽量) |
| `health` | 短いプロンプトでエンドツーエンド検査。`healthyProviders` を返し、timing を記録 |
| `health` | Claude 以外は短いプロンプトでエンドツーエンド検査。Claude は auth-only status を使う。`healthyProviders` を返し、適用できる場合は timing を記録 |
| `ask` | 一発のプロンプト |
| `review` | 現在の `git diff` に対するコードレビュー |
| `rescue` | 長めのトリアージ / 解析タスク |
Expand All @@ -135,15 +136,18 @@ Choose Polycli with @, then ask it to run: rescue --provider gemini --background
| `gemini` | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | — |
| `kimi` | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | — |
| `qwen` | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
| `mini-agent` | ✓ | — | | — | — | — | — |
| `minimax` (`mmx-cli`) | ✓ | — | | — | — | — | — |
| `opencode` | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | — |
| `pi` | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | — |
| `cmd` | ✓ | — | — | ✓ | ✓ | ✓ | — |
| `agy` | ✓ | ✓ | — | ✓ | ✓ | ✓ | — |
| `grok` | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | — |

補足:

- `cold` と `retry` は全プロバイダで `unsupported` です。上流 CLI に安定したシグナルがなく、polycli は偽装を拒否します。`total` は常に `measured` です。
- `mini-agent` はログ再生方式で、session resume・構造化出力・細粒度 streaming timing をサポートしません。`cmd` は Command Code 公式の headless mode を使うため、各呼び出しは standalone session で、stdout が可視回答になります。
- `claude` の `ask` / `review` は、`claude -p` の従量課金パスを避けるため、デフォルトで detached tmux TUI mode を使います。この mode では `ttft` / `gen` / `tail` は `unsupported` として報告され、`total` は tmux 起動と prompt 投入だけを測ります。応答には `tmuxSession` + `attachCommand` が含まれます。
- `minimax` は `mmx-cli` の非対話 JSON 呼び出しで、session resume・細粒度 streaming timing をサポートしません。`cmd` は Command Code 公式の headless mode を使うため、各呼び出しは standalone session で、stdout が可視回答になります。`agy` は Antigravity session mode、`grok` は xAI Grok Build CLI を使います。
- `tool: true` を宣言しているのは `qwen` のみです。`qwen` がツールを呼び出さなかったとき `missing` (観測可能だが今回は発生せず) を、他のプロバイダは `unsupported` (能力レベルで追跡しない) を報告します。両者の意味は異なるため、混同しないでください。

## Timing のセマンティクス
Expand All @@ -163,6 +167,7 @@ polycli の timing 契約が統一するのは**状態の表現**であって、

- `runtimePersistence` — `ephemeral | session | daemon`
- `measurementScope` — `request | turn | job`
- outcome diagnostics — `outcome`, `exitCode`, `terminationReason`, `responseMatched`, and `errorCode`

## パッケージ

Expand Down
21 changes: 11 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@

# polycli

**One command surface across 9 AI coding CLIs, inside the host you already use.**
**One command surface across 11 AI coding CLIs, inside the host you already use.**

[![GitHub release](https://img.shields.io/github/v/release/bbingz/polycli?label=release&color=111827)](https://github.com/bbingz/polycli/releases)
[![CI](https://github.com/bbingz/polycli/actions/workflows/ci.yml/badge.svg)](https://github.com/bbingz/polycli/actions/workflows/ci.yml)
Expand All @@ -29,16 +29,16 @@

It is a **utility-only Path B monorepo**: it does not unify provider differences behind fake abstractions, and it does not invent a runtime base class. It composes the official upstream CLIs as subprocesses, exposes one command surface, and surfaces honest capability differences in a four-state timing schema.

## Latest release: v0.6.19
## Latest release: v0.6.20

The latest patch adds upstream session-pollution control and provider-drift maintenance hardening (spec-driven, gated by two Codex review rounds):
The latest patch ships the grok provider, the kimi-code v0.6.0 migration, and the deep-review hardening set:

- New `polycli sessions [list | purge --confirm]` command cleans up the session/history files upstream CLIs leave under `$HOME`. Dry-run by default; deletion is driven only by ledger-recorded, re-validated realpaths — never a path guess or glob.
- Run-ledger events now record the upstream `sessionId` + a verified `sessionArtifactPath`, so polycli-created sessions are auditable and purgeable.
- `npm run check:fixture-freshness` flags fixtures pinned to a stale CLI version; `REVIEW_FLAG_EXPECTATIONS` is now the single source of review-flag truth (consistency-tested); `check:review-drift` is wired into the release gate.
- No provider behavior, host command grammar, or timing schema changed.
- Added `grok` (xAI Grok Build CLI) as the 11th provider across runtime, host adapters, skills, docs, and release validation.
- Migrated the kimi adapter and guidance to kimi-code v0.6.0 session semantics (`--session` / `-C`) and structured `session.resume_hint` parsing.
- Kept the Path B flat-adapter architecture intact while tightening review/deep-review hardening and bundle drift checks.
- Utility packages stay on their independent v1.x cadence.

See [`docs/release-notes-v0.6.19.md`](./docs/release-notes-v0.6.19.md).
See [`docs/release-notes-v0.6.20.md`](./docs/release-notes-v0.6.20.md).

## Why polycli?

Expand Down Expand Up @@ -141,7 +141,7 @@ polycli health
# OpenCode (tool call — call polycli_run with ["health","--json"])
```

`health` runs an end-to-end probe against every provider with valid auth and reports which ones are alive in `healthyProviders`. After that, daily use is direct. In Codex, either describe the task directly or type `@`, choose Polycli, and ask it to run the companion command:
`health` runs an end-to-end probe against every provider with valid auth and reports which ones are alive in `healthyProviders`. Claude is the exception: it uses `claude auth status --json` only and does not send a health prompt. After that, daily use is direct. In Codex, either describe the task directly or type `@`, choose Polycli, and ask it to run the companion command:

```text
Choose Polycli with @, then ask it to run: ask --provider qwen "explain this stack trace ..."
Expand Down Expand Up @@ -173,7 +173,7 @@ All commands work identically across hosts:
| Command | What it does |
|---|---|
| `setup` | Check provider CLI install + auth status (cheap; no model call) |
| `health` | End-to-end short-prompt probe; returns `healthyProviders` and writes timing |
| `health` | End-to-end short-prompt probe for providers except Claude; Claude uses auth-only status; returns `healthyProviders` and writes timing where applicable |
| `ask` | One-shot prompt |
| `review` | Code review against the current `git diff` |
| `rescue` | Longer triage / analysis task |
Expand Down Expand Up @@ -206,6 +206,7 @@ Source of truth: [`packages/polycli-runtime/src/registry.js`](./packages/polycli
Notes:

- `cold` and `retry` are `unsupported` for every provider. Upstream CLIs lack a stable signal, and polycli refuses to fake them. `total` is always `measured`.
- Claude `ask` and `review` run in detached tmux TUI mode by default to avoid the `claude -p` cost path. In that mode `ttft`, `gen`, and `tail` are reported as `unsupported`; `total` measures tmux startup/prompt submission only, and the response contains `tmuxSession` + `attachCommand`.
- `minimax` uses official `mmx text chat --output json --non-interactive`; no session resume and no fine-grained streaming timing. `cmd` uses documented Command Code headless mode, where each invocation is a standalone session and stdout is the visible answer.
- Only `qwen` declares `tool: true`. When no tool is invoked, `qwen` reports `missing` (observable but absent); the others report `unsupported` (capability-level not tracked). The two states are not interchangeable.

Expand Down
Loading