Z — The Agentic Coding Harness

Stop context-switching. Start creating.

The Ordinary Way

Every day, millions of developers sit down at their terminals knowing exactly what they want to build. They reach for an AI coding assistant — and immediately hit a wall.

What model do I use? What provider? What API key? Does it support tool calling? Does it have vision? Can it search my codebase? Will it stay within context limits? Will it lose track of what I asked it to do five minutes ago?

There are a hundred tools, a thousand models, and a million configuration options. But none of them do one simple thing: get out of your way and let you build.

Existing coding agents are fragile. They lose context mid-session. They can't handle interruptions. They forget what they were doing after a compaction. They lock you into one provider, one model, one way of working. They treat thinking/reasoning models as an afterthought. They're designed for demos, not for the relentless complexity of real-world engineering.

You've felt it. That moment when the assistant starts a long chain of actions, context fills up, and you know — you know — it's about to lose the plot. You sit there watching tokens stream, hoping it remembers the architecture decision you discussed three turns ago.

There has to be a better way.

The Z Way

Z was built from the ground up to be the agentic coding harness that doesn't break.

It's not another chat UI taped onto an API. It's a production-grade engineering partner that:

Survives context compaction — When token limits hit, Z preserves your todos, your goals, and your tool results. It leaves breadcrumbs. It knows what it was doing and picks right back up after truncation.
Speaks every provider fluently — Switch effortlessly between Z.AI, MiniMax, Anthropic, OpenAI, DeepSeek, Groq, Together, Ollama (local or cloud), Bedrock, GitHub Copilot OAuth, or any OpenAI-compatible API. 26 providers. One interface. Zero friction.
Handles thinking models correctly — DeepSeek-R1, QwQ, gpt-oss with reasoning/reasoning_content, these are first-class citizens, not afterthoughts. Z extracts thinking, reasoning_content, reasoning_details — whatever the model returns — and shows it to you.
Never drops a tool call — Every tool has error recovery. Every result is stored for later retrieval. Every background process is tracked. If Z needs to introspect, it has a dedicated deep-thinking tool that makes a separate API call with zero tool interference.
Checkpoints everything — Git-backed snapshots before every agent turn. /undo and /redo for both files AND conversation. Ctrl+Z on your AI. Yes.
Pays attention to cost — Real-time token and dollar tracking per model, per session. You're never surprised by the bill.

What Makes Z Different

One Terminal, Every Model

model: glm-4.7  ·  reasoning: high  ❯

Select from 26 providers in the interactive setup — or switch on the fly with /providers use. Z auto-detects which provider you're talking to and normalizes the model name so LiteLLM routes correctly every time. Local Ollama, cloud Ollama, Z.AI's GLM-4.7 with coding plan, Anthropic, DeepSeek — all accessible from the same prompt.

Context That Survives

Context compaction doesn't mean context loss. Z's SmartContextManager uses multiple strategies (half, quarter, lastTwo) to trim conversation while preserving your system prompt, your original request, your todo list, and any results you explicitly mark as important. After compaction, it injects a recovery notice so Z knows exactly what was trimmed and can re-read files or re-run commands if needed.

Tool Results That Don't Vanish

Compact away a read_file result? Z stores it. Later, the retrieve_tool_result tool pulls it back. No re-reads. No lost data. Tool results survive across compactions, across sessions, across model switches.

Todo Lists That Ground Every Action

○ [1] Implement payment webhook handler
◐ [2] Add idempotency key to Stripe events
● [3] Write integration tests

The manage_todos tool creates a persistent, structured plan that survives context compaction, model changes, and even crashes. Z checks it after every truncation to reorient itself. You see it at the top of every session. It's always there. Always accurate.

Deep Thinking, No Distractions

The introspect tool is not just another tool. It spawns a separate API call with no tools available — just the model and your thoughts. Pure reasoning without the temptation to reach for a tool. For complex architecture decisions, debugging mysteries, or planning multi-step refactors. The result comes back as part of the conversation.

Undo/Redo for AI Turns

Every agent turn snapshots your workspace via git. /undo reverts both files AND conversation to before the turn. /redo replays it. Entirely optional, entirely transparent, entirely free (binary files are excluded automatically).

Plugins That Extend Everything

def register(api):
    api.add_tool(
        name="deploy",
        description="Deploy the current service to staging",
        params={"env": {"required": True, "description": "staging or production"}},
        handler=lambda params: run_deploy(params["env"]),
    )
    api.on("pre_turn", my_hook)

Plugins can add tools, hook into lifecycle events (pre_turn, post_tool, on_compact), and extend the system prompt. Drop a .py file in ~/.z/plugins/ — Z discovers it automatically.

Vision That Works When the Model Can See

Z auto-detects whether your model supports vision (via LiteLLM's registry or models.dev metadata). If it does, you can paste images directly from clipboard or type image paths. If not, the analyze_image tool routes to Z.AI's multimodal model. Z never sends images to a model that can't handle them.

Web Search, MCP, and Beyond

Built-in web_search uses a configurable search API. mcp_search_tools, mcp_list_tools, mcp_call_tools give you access to the Model Context Protocol ecosystem — databases, APIs, documentation servers, anything with an MCP server.

Quick Start

# Install
cd harness
pip install -e ".[dev]"

# Configure
python -m harness.cli --install

# Or configure headlessly
python -m harness.cli --api-url http://localhost:11434/v1 --api-key ollama --model gpt-oss:20b

# Run
python -m harness.cli

Workspace & Sessions

# Work in a specific project
python -m harness.cli -w /path/to/project

# Name your session
python -m harness.cli -w . -s my-feature-branch

# Run one command and exit
python -m harness.cli -m "Refactor the auth module to use async/await"

# List and resume past sessions
python -m harness.cli --list

The Interactive Prompt

Command	What it does
`/help`	Show all commands
`/clear`	Clear conversation history
`/compact`	Force context compaction
`/undo`	Undo last agent turn (files + conversation)
`/redo`	Redo last undone turn
`/save [name]`	Save session snapshot
`/sessions`	List all sessions
`/session <name>`	Switch to a session
`/providers`	List/manage LLM providers
`/providers setup`	Add or edit a provider profile
`/providers use <name>`	Switch provider on the fly
`/model list`	List available models
`/model search <query>`	Search models across providers
`/cost`	Show token usage and cost summary
`/cost reset`	Reset cost tracking
`/plugins`	List loaded plugins and their tools
`/todo`	Show todo panel
`!command`	Execute a shell command directly
Ctrl+T	Cycle reasoning effort
Ctrl+Z	Undo
Ctrl+Y	Redo
Ctrl+V	Paste image from clipboard

Configuration

Z stores its configuration in ~/.z.json:

{
  "api_url": "https://api.z.ai/api/paas/v4/",
  "api_key": "your-api-key",
  "model": "glm-4.7",
  "max_tokens": 128000,
  "temperature": 0.7,
  "reasoning_effort": "high",
  "compaction_threshold": 0.85
}

Provider profiles are stored alongside — switch between them without editing files.

Provider Support

#	Provider	Model
1	Z.AI Coding	glm-4.7
2	Z.AI Standard	glm-4.7
3	MiniMax	MiniMax-M2.1
4	Amazon Bedrock	qwen.qwen3-32b-v1:0
5	Together AI	Llama-3.3-70B-Instruct-Turbo
6	Anthropic	claude-3-5-sonnet-latest
7	OpenRouter	anthropic/claude-3.5-sonnet
8	OpenAI	gpt-4o
9	Groq	llama-3.3-70b-versatile
10	DeepSeek	deepseek-chat
11	Mistral AI	mistral-large-latest
12	Cohere	command-r-plus
13	Fireworks AI	llama-v3p1-70b-instruct
14	Perplexity	llama-3.1-sonar-large-128k-online
15	AI21	jamba-1.5-large
16	xAI (Grok)	grok-2-latest
17	Google Gemini	gemini-1.5-pro-latest
18	Cerebras	llama3.1-70b
19	Databricks	(workspace URL)
20	Replicate	meta-llama-3-70b-instruct
21	Anyscale	Llama-3.1-70B-Instruct
22	Ollama Cloud	llama3.1
23	OpenAI Subscription (OAuth)	gpt-4o
24	GitHub Copilot (OAuth)	gpt-4o
25	Custom OpenAI-compatible API	(any)
26	Local Ollama	(auto-detected)

Local Ollama

Z detects Ollama running on localhost:11434 and auto-discovers your installed models. Thinking/reasoning models (like gpt-oss:20b) work correctly — Z routes through Ollama's OpenAI-compatible /v1/chat/completions endpoint, which properly separates content from reasoning.

# Before using Local Ollama, make sure it's running:
ollama run gpt-oss:20b

# Then in Z:
# Select option 26 in the provider setup, or use:
/providers setup ollama

How It Works

                    ┌─────────────────────────────┐
                    │      Your Terminal           │
                    │   (prompt_toolkit REPL)      │
                    └──────────┬──────────────────┘
                               │
                    ┌──────────▼──────────────────┐
                    │         ClineAgent           │
                    │  ┌──────────────────────┐   │
                    │  │   SmartContextManager │   │
                    │  │  • Compaction        │   │
                    │  │  • Tool result store │   │
                    │  │  • Recovery notices  │   │
                    │  └──────────────────────┘   │
                    │  ┌──────────────────────┐   │
                    │  │   TodoManager        │   │
                    │  │  • Persistent goals  │   │
                    │  │  • Progress tracking │   │
                    │  └──────────────────────┘   │
                    │  ┌──────────────────────┐   │
                    │  │   CheckpointManager  │   │
                    │  │  • Git snapshots     │   │
                    │  │  • Undo/Redo         │   │
                    │  └──────────────────────┘   │
                    │  ┌──────────────────────┐   │
                    │  │   PluginManager      │   │
                    │  │  • Custom tools      │   │
                    │  │  • Lifecycle hooks   │   │
                    │  └──────────────────────┘   │
                    │  ┌──────────────────────┐   │
                    │  │   CostTracker        │   │
                    │  │  • Per-model pricing │   │
                    │  │  • Session totals    │   │
                    │  └──────────────────────┘   │
                    └──────────┬──────────────────┘
                               │
                    ┌──────────▼──────────────────┐
                    │      StreamingClient        │
                    │   (LiteLLM unified API)     │
                    │                              │
                    │  ┌────────────────────────┐ │
                    │  │  Provider routing      │ │
                    │  │  • Model normalization │ │
                    │  │  • OAuth tokens        │ │
                    │  │  • Reasoning detection │ │
                    │  │  • Vision detection    │ │
                    │  └────────────────────────┘ │
                    └──────────┬──────────────────┘
                               │
         ┌─────────────────────┼─────────────────────┐
         │                     │                     │
    ┌────▼────┐          ┌─────▼─────┐         ┌────▼────┐
    │  Z.AI   │          │  OpenAI   │         │ Ollama  │
    │  GLM-4.7│          │  GPT-4o   │         │  Local  │
    └─────────┘          └───────────┘         └─────────┘
    ... 23 more providers ...

The Tool System

Z exposes 19 built-in tools through OpenAI function calling — the model sees them as native API capabilities:

Category	Tools
File operations	`read_file`, `write_to_file`, `replace_in_file`, `replace_between_anchors`
Shell/Process	`execute_command`, `list_background_processes`, `check_background_process`, `stop_background_process`
Search	`list_files`, `search_files`
External	`analyze_image`, `web_search`, `mcp_search_tools`, `mcp_list_tools`, `mcp_call_tool`
Context	`retrieve_tool_result`
Agent meta	`manage_todos`, `introspect`, `attempt_completion`

All tools share a single, unified schema in tool_registry.py — add one tool there and it propagates to parsing, detection, dispatch, and system prompts automatically.

Running Tests

# All unit tests
pytest

# With coverage
pytest --cov=harness

# Integration tests (requires API key)
pytest -m integration

# Specific test file
pytest tests/test_file_tools.py

Architecture File Map

harness/
├── harness.py                 # Entry point: CLI, REPL, install wizard, session management
├── pyproject.toml             # Project metadata and dependencies
├── README.md                  # You are here
└── src/harness/
    ├── __init__.py            # Package exports
    ├── __main__.py            # python -m harness.cli
    ├── cli.py                 # CLI entry point (loads harness.py)
    ├── cline_agent.py         # Agent core: conversation loop, tool dispatch
    ├── config.py              # Config loading (~/.z.json)
    ├── streaming_client.py    # LiteLLM unified streaming client
    ├── streaming_client_litellm.py  # Alternative LiteLLM client
    ├── tool_registry.py       # Single source of truth for all tool definitions
    ├── tool_handlers.py       # Tool execution implementations
    ├── prompts.py             # System prompt generator
    ├── smart_context.py       # Context compaction, tool result storage
    ├── context_management.py  # Token counting, conversation truncation
    ├── checkpoint.py          # Git-backed undo/redo snapshots
    ├── todo_manager.py        # Structured, persistent goal tracking
    ├── cost_tracker.py        # Per-model API cost tracking
    ├── plugin_manager.py      # Lightweight plugin system
    ├── model_capabilities.py  # Vision detection via LiteLLM + remote metadata
    ├── workspace_index.py     # Codebase indexing for search
    ├── logger.py              # Structured logging
    ├── prompts.py             # System prompt generation
    ├── image_utils.py         # Image processing utilities
    ├── instruction_loader.py  # Custom instruction loading
    ├── interrupt.py           # Graceful interrupt handling
    ├── status_line.py         # Terminal status line display
    ├── usage_report.py        # Session usage reporting
    ├── codex_models.py        # ChatGPT Codex model definitions
    ├── codex_oauth_client.py  # OAuth client for ChatGPT Plus/Pro
    ├── copilot_oauth_client.py# OAuth client for GitHub Copilot
    ├── oauth.py               # OAuth authentication manager
    ├── bedrock_provider.py    # Amazon Bedrock provider support
    ├── context_replay.py      # Context quality replay for evaluation
    ├── context_quality_benchmark.py  # Context quality benchmarking
    ├── context_benchmark_pack.py     # Benchmark data pack
    └── ...

The Philosophy

Z is built on three convictions:

The model is a tool, not the product. The product is what you ship. Z abstracts away provider complexity, context management, and cost tracking so you can focus on building.
Context should never be lost. Every compaction leaves breadcrumbs. Every tool result is stored for retrieval. Every todo survives truncation. Z treats context as a precious resource and manages it ruthlessly.
You should never be locked in. 26 providers, one interface. Switch models mid-session. Run the same task across multiple providers and compare results. Use local models for prototyping and cloud models for production. Z adapts to your workflow, not the other way around.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 175 Commits
.forge		.forge
cmd/cos		cmd/cos
examples/plugins		examples/plugins
src		src
tests		tests
tickets		tickets
torture_test		torture_test
.dockerignore		.dockerignore
.gitignore		.gitignore
.harness_context_20260223_030522.json		.harness_context_20260223_030522.json
.harness_context_20260321_155034.json		.harness_context_20260321_155034.json
.harness_debug.txt		.harness_debug.txt
CLAUDE.md		CLAUDE.md
CLAUDE_CODE_PARITY_REPORT.md		CLAUDE_CODE_PARITY_REPORT.md
COS_INSTRUCTIONS.md		COS_INSTRUCTIONS.md
Dockerfile.build		Dockerfile.build
Dockerfile.test		Dockerfile.test
README.md		README.md
build.py		build.py
claude-code-architecture.md		claude-code-architecture.md
compliment_for_cline.md		compliment_for_cline.md
context-management.md		context-management.md
dead_code_report.md		dead_code_report.md
debug_output.txt		debug_output.txt
harness.py		harness.py
harness.spec		harness.spec
integration_test.py		integration_test.py
parity_validation.py		parity_validation.py
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini
python-logo.png		python-logo.png
refactor_prompt.txt		refactor_prompt.txt
requirements.txt		requirements.txt
sub_agent.md		sub_agent.md
system_prompt_instructions.md		system_prompt_instructions.md
test_copilot_reasoning.py		test_copilot_reasoning.py
test_input.txt		test_input.txt
test_output_1.txt		test_output_1.txt
test_think_flow.py		test_think_flow.py
test_think_tag.py		test_think_tag.py
test_think_token_headless.py		test_think_token_headless.py
test_xml_suppression.py		test_xml_suppression.py
think_token_test.txt		think_token_test.txt
torture_test.py		torture_test.py
vision_test_random.png		vision_test_random.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Z — The Agentic Coding Harness

The Ordinary Way

The Z Way

What Makes Z Different

One Terminal, Every Model

Context That Survives

Tool Results That Don't Vanish

Todo Lists That Ground Every Action

Deep Thinking, No Distractions

Undo/Redo for AI Turns

Plugins That Extend Everything

Vision That Works When the Model Can See

Web Search, MCP, and Beyond

Quick Start

Workspace & Sessions

The Interactive Prompt

Configuration

Provider Support

Local Ollama

How It Works

The Tool System

Running Tests

Architecture File Map

The Philosophy

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Z — The Agentic Coding Harness

The Ordinary Way

The Z Way

What Makes Z Different

One Terminal, Every Model

Context That Survives

Tool Results That Don't Vanish

Todo Lists That Ground Every Action

Deep Thinking, No Distractions

Undo/Redo for AI Turns

Plugins That Extend Everything

Vision That Works When the Model Can See

Web Search, MCP, and Beyond

Quick Start

Workspace & Sessions

The Interactive Prompt

Configuration

Provider Support

Local Ollama

How It Works

The Tool System

Running Tests

Architecture File Map

The Philosophy

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages