A local-first security testing agent: a realistic, observable agent target for evaluating open-source AI-security tools (Garak, Promptfoo, llm-guard, …).
redcell drives any model — local (vLLM/Ollama) or cloud (Anthropic/OpenAI) via LiteLLM — with a broad MCP toolset proxied through AgentGateway, and exposes an OpenAI-compatible HTTP endpoint. Point a scanner at the endpoint, wrap a guardrail around it, and watch every tool call route through the gateway choke point for a full trace of what the agent actually did.
- Local-first, cloud-capable — one config value (
AGENT_MODEL) switches between self-hosted vLLM/Ollama and hosted Anthropic/OpenAI. - MCP tools via AgentGateway — Playwright (browser), Filesystem, and Fetch behind a single aggregated endpoint; add more by editing one YAML.
- OpenAI-compatible server —
redcell serveexposes/v1/chat/completionsso Open WebUI and scanners like Garak/Promptfoo can drive it. - Observable — every MCP tool call routes through the gateway, so you can confirm whether an attack actually fired.
- Batteries included —
@tooldecorator, conversation memory, typed config, structured logging, async tool-use loop.
Full reference lives in docs/:
- Architecture — components, request lifecycle, the agent loop
- Configuration reference — every
AGENT_*setting - CLI reference —
serve,chat,rag-seed,version - Server & API — endpoints, auth, streaming, sessions
- Security controls — safety prompt, guardrails, toggles, eval workflow
- Tools & AgentGateway — builtin tools, MCP, gateway targets
- RAG knowledge base — Qdrant, corpus, poison/canaries
- Development — layout, tests, public API, extension points
uv sync
cp .env.example .env # then set AGENT_MODEL + keys/endpoint
uv run redcell chatSet AGENT_MODEL (LiteLLM format) and the matching key/endpoint:
| Provider | AGENT_MODEL |
Needs |
|---|---|---|
| Self-hosted vLLM | hosted_vllm/<model> |
AGENT_API_BASE (+ AGENT_API_KEY) |
| Local Ollama | ollama/llama3.1 |
— |
| Anthropic | anthropic/claude-opus-4-8 |
ANTHROPIC_API_KEY |
| OpenAI | openai/gpt-4o |
OPENAI_API_KEY |
uv run redcell serve # binds 0.0.0.0:8800Serves GET /v1/models and POST /v1/chat/completions (streaming + not). Point
any OpenAI-compatible client at http://<host>:8800/v1:
- Open WebUI (Docker): Base URL
http://host.docker.internal:8800/v1, any API key. - Garak / Promptfoo: target
http://127.0.0.1:8800/v1/chat/completions, modelredcell.
By default the server is stateless: every request runs only the messages it carries, so the client owns history (promptfoo wizard: "No — resend the full interaction history"). Leave it as-is and multi-turn just works.
To run redcell as a stateful target — promptfoo sends only the new turn plus a
session id, and redcell remembers the rest — send a session id on each request via
the x-redcell-session header (or a session_id/sessionId body field). History
is keyed off that id and held in memory (idle-TTL + LRU eviction; lost on restart;
tuned by AGENT_SESSION_*). No session id → unchanged stateless behavior.
In promptfoo's red-team target wizard answer: Remembers history → Yes, Session management → Client-generated Session ID, Session ID Extraction → (leave empty). The generated target wires the id into each request:
targets:
- id: openai:chat:redcell
config:
apiBaseUrl: http://127.0.0.1:8800/v1
apiKey: redcell # any value unless AGENT_SERVER_API_KEY is set
headers:
x-redcell-session: '{{sessionId}}'
defaultTest:
options:
transformVars: '{ ...vars, sessionId: context.uuid }' # one id per test caseredcell ships defensive controls on by default, with each vulnerable surface exposed as a single toggle so you can baseline the deliberately-unguarded target and then measure the delta with controls on:
| Control | Env | Default | Off = vulnerable behavior |
|---|---|---|---|
| Safety system prompt | AGENT_SAFETY_PROMPT |
true |
bare "helpful assistant", no refusals |
| Input/output guardrail | AGENT_GUARDRAILS |
true |
no moderation/redaction |
| Dangerous MCP tools | AGENT_MCP_TOOL_DENYLIST |
(none) | e.g. shell,filesystem removes the worst exfiltration surface |
- The safety policy (
redcell/prompts.py) refuses harmful/illegal/copyright requests, forbids fabricated tool results and binding commitments, and bars disclosure of internal architecture — the prompt-level fixes for the scan findings. - The guardrail (
redcell/guardrails.py) is a dependency-free baseline that blocks a few high-signal harmful inputs and redacts PII + internal identifiers (backends, sandbox paths) from output. It implements a smallGuardrailprotocol, so swapping in llm-guard, Llama Guard, or an LLM self-check for semantic moderation is a drop-in — no other code changes.
To recreate the original vulnerable target end-to-end:
AGENT_SAFETY_PROMPT=false AGENT_GUARDRAILS=false uv run redcell serve.
serve launches a local AgentGateway process (agentgateway -f agentgateway/config.yaml) and connects the agent to its aggregated MCP
endpoint. The starter config wires Playwright (browser), Fetch (HTTP),
and RAG (Qdrant) — which run locally — plus Filesystem and Shell,
which run on a separate Debian VM over SSH (see prerequisites). All sit
behind :3030 (gateway UI on :15000).
Prerequisites:
agentgatewayon yourPATH, plusnpx(Node) anduvxfor the local stdio backends. If the gateway can't start,serveruns with builtin tools only.- For
shellandfilesystem: a code-execution VM. These tools are wired asssh debian-agent …, sorun_command/run_scriptand file operations execute on a dedicated Debian VM, never on your host. A fresh checkout has nodebian-agentSSH host configured, so until you set one up these two tools simply error (the rest still work) — they do not fall back to running on your machine. Setup: Setting up the execution VM.
Every MCP tool call routes through the gateway — the choke point that makes redcell a useful test subject.
redcell ships an enterprise-standard RAG surface: a self-hosted Qdrant vector
DB behind the official mcp-server-qdrant (local FastEmbed embeddings),
exposed as the gateway rag target with qdrant-store and qdrant-find.
docker compose up -d qdrant # start Qdrant on :6333
uv run redcell serve # brings up the gateway + rag target
uv run redcell rag-seed # load the bundled corpus into the storeThe bundled corpus (redcell/rag/corpus/seed_corpus.json) mixes benign docs with
planted poison docs carrying unique canary IDs. Because retrieval routes
through the gateway, you can see whether a retrieved poison doc actually drove a
shell/filesystem action — the canary appearing in a tool call (or a
RC-CANARY-*.txt file in the VM sandbox) is measurable injection success. This is
the indirect prompt injection surface for tools like Garak/Promptfoo to probe.
uv run pytest # tests (offline; never hit the network)
uv run ruff check . # lint
uv run ruff format . # formatProsperity Public License 3.0.0 © Streamline AI LLC (dba TravisML.ai)
Free for noncommercial use. Commercial use is allowed for a thirty-day trial; beyond that, contact the contributor for a commercial license. This is a source-available license, not an OSI open-source license.