Agent Backtesting Service

The logic in this service has not yet been fully validated. This is working, but results should be reviewed and validated separately. I will continue to build test scenarios and observability to build trust in the output, but the primary focus at this time is the agentic workflow this service supports. This is not directing real investments at this time.

Agent Backtesting Service

A production-grade backtesting engine for the agent-trading-firm ecosystem. Runs event-driven strategy simulations against historical OHLCV data fetched from the market-data-service, computes a comprehensive set of performance metrics, and optionally validates strategies using walk-forward analysis or Black-Scholes options simulation.

I/O contract: JSON → stdout · Rich progress → stderr · Exit 0 = success, 1 = soft failure (threshold breach), 2+ = error.

Architecture
Installation
Environment Variables
CLI Reference
- run
- walk-forward
- options-sim
- report
- list
Strategy Spec Format
- StrategySpec (equities)
- OptionsSpreadSpec (options-sim)
Indicator Namespace
Pass / Fail Thresholds
JSON Output Schemas
- BacktestReport
- WalkForwardReport
Storage
Running Tests
Agent Integration

Architecture

CLI (backtest run / walk-forward / options-sim / report / list)
        │
        ▼
  BacktestEngine
        │
        ├── market-data CLI ──► OHLCV DataFrame (subprocess, JSON stdout)
        │        └── iv-rank  ──► IV rank time series (optional join)
        │        └── yfinance ──► VIX time series   (optional join)
        │
        ├── compute_indicators() ─► 60+ indicator columns on each bar
        │
        ├── run_signals()         ─► bar-by-bar eval() of condition strings
        │        └── BacktestTrade list
        │
        ├── compute_metrics()     ─► BacktestMetrics (Sharpe, Calmar, DD …)
        ├── check_thresholds()    ─► pass/fail list
        │
        └── ResultsStore (SQLite) + MinIO (Parquet equity curves)

Options path (options-sim): replaces run_signals() with run_options_signals(), which models P&L from Black-Scholes option premium changes rather than underlying price changes.

Walk-forward path (walk-forward): runs N pairs of IS/OOS backtests and reports efficiency = oos_sharpe / is_sharpe.

Installation

cd backtest-service
python -m venv .venv
source .venv/bin/activate
pip install -e .

Requires Python 3.12+. Optional VIX support:

pip install yfinance

The market-data CLI must be on $PATH (or set BT_MARKET_DATA_CMD to its absolute path).

Environment Variables

All variables are prefixed BT_. Can be set in a .env file at the project root.

Variable	Default	Description
`BT_DB_PATH`	`data/backtest/results.db`	SQLite database path for runs, metrics, and trades
`BT_MINIO_ENDPOINT`	`localhost:9000`	MinIO endpoint for Parquet equity curve storage
`BT_MINIO_ACCESS_KEY`	`mds`	MinIO access key
`BT_MINIO_SECRET_KEY`	`mds_secret`	MinIO secret key
`BT_MINIO_BUCKET`	`backtest-results`	Bucket for equity curve Parquet files
`BT_MINIO_SECURE`	`false`	Use HTTPS for MinIO
`BT_INITIAL_CAPITAL`	`100000.0`	Default starting capital ($)
`BT_DEFAULT_COMMISSION`	`0.65`	Default commission per contract ($)
`BT_DEFAULT_SLIPPAGE`	`0.005`	Default slippage fraction (0.5%)
`BT_MARKET_DATA_CMD`	`market-data`	Path/name of the market-data CLI
`BT_MIN_TRADES_REQUIRED`	`10`	Minimum trades before warning
`BT_MAX_HOLDING_BARS`	`20`	Safety cap if spec omits `max_holding_bars`

CLI Reference

`run`

Run a full backtest for a strategy spec over a date range.

backtest run \
  --strategy <path/to/spec.json> \
  --symbol AAPL \
  --start 2019-01-01 \
  --end 2024-12-31 \
  [--capital 100000] \
  [--commission 0.65] \
  [--slippage 0.005]

Options

Flag	Required	Description
`--strategy`	Yes	Path to strategy spec JSON or YAML
`--symbol` / `-s`	Yes	Ticker symbol (e.g. `SPY`, `AAPL`)
`--start`	Yes	Start date `YYYY-MM-DD`
`--end`	Yes	End date `YYYY-MM-DD`
`--capital`	No	Override initial capital ($)
`--commission`	No	Override commission per contract ($)
`--slippage`	No	Override slippage fraction

Exit codes: 0 = pass · 1 = fail (threshold breach or no data) · 2 = error

Stdout: BacktestReport JSON

Example

backtest run --strategy strategies/spy_short_put_spread_v1.json \
             --symbol SPY --start 2020-01-01 --end 2024-12-31

`walk-forward`

Run IS/OOS walk-forward validation across N rolling windows.

backtest walk-forward \
  --strategy <path/to/spec.json> \
  --symbol SPY \
  --start 2016-01-01 \
  --end 2024-12-31 \
  [--windows 3] \
  [--is-ratio 0.70]

Options

Flag	Default	Description
`--strategy`	Required	Path to strategy spec
`--symbol` / `-s`	Required	Ticker symbol
`--start`	Required	Start date `YYYY-MM-DD`
`--end`	Required	End date `YYYY-MM-DD`
`--windows`	`3`	Number of walk-forward windows
`--is-ratio`	`0.70`	In-sample fraction per window

Pass criterion: avg_efficiency >= 0.70 (OOS Sharpe / IS Sharpe averaged across all windows)

Stdout: WalkForwardReport JSON

`options-sim`

Simulate an options spread strategy using Black-Scholes with stored IV rank data. P&L is computed from option premium changes, not underlying price movements.

Pre-requisite: IV rank history must be populated first:

market-data iv-rank-backfill --symbol SPY --start 2019-01-01 --yes

backtest options-sim \
  --strategy <path/to/spread_spec.json> \
  --symbol SPY \
  --start 2023-01-01 \
  --end 2024-12-31 \
  [--capital 100000] \
  [--commission 0.65] \
  [--slippage 0.005]

Supported spread types: short_put_spread · short_call_spread · iron_condor

Stdout: BacktestReport JSON (same schema as run)

`report`

Retrieve the full persisted report for a completed run.

backtest report --run-id <uuid>

Stdout:

{
  "run": { ... },
  "metrics": { ... },
  "trades": [ ... ]
}

`list`

List recent backtest runs.

backtest list [--symbol AAPL] [--limit 20]

Stdout:

{
  "count": 5,
  "runs": [ { "run_id": "...", "strategy_name": "...", "symbol": "...", "status": "complete", ... } ]
}

Strategy Spec Format

Strategy specs are JSON (or YAML) files that define entry/exit logic as Python boolean expressions evaluated per bar.

StrategySpec (equities)

Used by backtest run and backtest walk-forward.

{
  "name": "spy_momentum_v1",
  "version": "1.0",
  "asset_class": "equities",
  "timeframe": "swing",

  "entry_conditions": [
    "close > sma_50",
    "rsi_14 > 50",
    "macd_hist > 0",
    "not in_position"
  ],
  "exit_conditions": [
    "rsi_14 > 70",
    "close < ema_21"
  ],
  "stop_conditions": [
    "close < sma_200"
  ],
  "filter_conditions": [
    "vix < 35",
    "not fomc_meeting_day"
  ],

  "position_sizing_formula": "risk_pct * capital / atr_14",
  "max_holding_bars": 10,
  "max_concurrent_positions": 1,
  "per_trade_risk_pct": 0.01,

  "requires_options_data": false,
  "description": "SPY trend-following with RSI/MACD confirmation",
  "instruments": ["SPY"]
}

Field reference

Field	Type	Default	Description
`name`	string	required	Strategy identifier
`version`	string	`"1.0"`	Version label
`asset_class`	enum	`"equities"`	`equities` · `options` · `equity_options` · `futures` · `multi-leg`
`timeframe`	enum	`"swing"`	`intraday` · `daily` · `swing` · `position`
`entry_conditions`	list[str]	required	All must be True to open
`exit_conditions`	list[str]	required	All must be True to close at profit target
`stop_conditions`	list[str]	required	All must be True to close at stop
`filter_conditions`	list[str]	`[]`	All must be True each bar for signals to be evaluated
`position_sizing_formula`	str	`"risk_pct * capital / atr_14"`	Python expression for share count
`max_holding_bars`	int	`10`	Force-exit after N bars
`max_concurrent_positions`	int	`1`	Max open positions at once
`per_trade_risk_pct`	float	`0.01`	Capital fraction risked per trade (hard cap: 0.02)
`requires_options_data`	bool	`false`	When True, engine fetches IV rank history and joins it
`instruments`	list[str]	`[]`	Instruments the strategy trades (informational)

OptionsSpreadSpec (options-sim)

Used by backtest options-sim. P&L uses Black-Scholes premium changes, not OHLCV price moves.

{
  "name": "vrp_short_put_spread",
  "version": "1.0",
  "spread_type": "short_put_spread",

  "entry_conditions": [
    "iv_rank is not None and iv_rank >= 30",
    "close > sma_50",
    "not in_position"
  ],
  "filter_conditions": [
    "vix < 40",
    "not fomc_meeting_day"
  ],

  "target_dte": 21,
  "short_delta": 0.30,
  "long_delta": 0.15,
  "profit_target_pct": 0.50,
  "stop_loss_pct": 2.00,
  "max_holding_bars": 15,

  "max_concurrent_positions": 1,
  "per_trade_risk_pct": 0.02,
  "risk_free_rate": 0.05
}

Field reference

Field	Type	Default	Description
`spread_type`	enum	`"short_put_spread"`	`short_put_spread` · `short_call_spread` · `iron_condor`
`target_dte`	int	`21`	Days to expiration at entry
`short_delta`	float	`0.30`	Short leg absolute delta (e.g. 0.30 = 30Δ)
`long_delta`	float	`0.15`	Long (protective) leg absolute delta
`profit_target_pct`	float	`0.50`	Close when 50% of credit is captured
`stop_loss_pct`	float	`2.00`	Close when spread value = 2× credit received
`risk_free_rate`	float	`0.05`	Risk-free rate for Black-Scholes (annual)

Indicator Namespace

Every bar's condition expressions are evaluated against a namespace containing these variables. All are float unless noted. Missing data resolves to None (so conditions like iv_rank is not None and iv_rank >= 30 are safe).

Trend

Variable	Description
`open`, `high`, `low`, `close`, `volume`	Raw OHLCV values
`sma_20`, `sma_50`, `sma_200`	Simple moving averages
`ema_9`, `ema_21`, `ema_50`	Exponential moving averages
`spy_price`, `spy_close`	Alias for `close` (useful for SPY strategies)
`spy_sma_20`, `spy_sma_50`, `spy_sma_200`	Alias for SMA columns
`prev_close`, `prev_high`, `prev_low`	Previous bar values

Momentum

Variable	Description
`rsi_14`	RSI 14-period, bounded [0, 100]
`macd`	MACD line (EMA12 − EMA26)
`macd_signal`	MACD signal line (EMA9 of MACD)
`macd_hist`	MACD histogram (MACD − signal)
`roc_10`	Rate of change, 10-bar (%)

Volatility

Variable	Description
`atr_14`	Average True Range, 14-period
`bb_upper`, `bb_lower`	Bollinger Bands (20-period, ±2σ)
`bb_pct`	Price position within Bollinger Bands [0, 1]
`adx_14`	Average Directional Index
`plus_di`, `minus_di`	Directional movement indicators
`hv_30`	30-day historical volatility (annualised)
`spy_realized_vol_5d`, `spy_realized_vol_10d`	Short-window realised vol

Volume

Variable	Description
`obv`	On-balance volume
`spy_adv_20d`	20-day average daily volume
`volume_spy_today`	Current bar volume

IV / Options

Variable	Description
`iv_rank`	IV rank 0–100 (requires `iv-rank-backfill` or `requires_options_data: true`)
`iv_percentile`	IV percentile 0–100
`current_iv`	Current ATM implied volatility (decimal)
`iv_hv_ratio`	`current_iv / hv_30` (VRP indicator)
`iv_atm`	ATM IV; falls back to `hv_30 * 1.20` when unavailable
`dte`	Days to next standard monthly OpEx (0–60)
`short_leg_delta`	Approximated short leg delta (default 0.30)
`short_strike_delta`	Alias for `short_leg_delta`
`bid_ask_spread_pct`	Modelled bid/ask spread fraction
`pnl_pct_of_max_credit`	Options P&L as fraction of max credit received
`premium_collected_pct`	Alias for `pnl_pct_of_max_credit`
`open_interest_short_strike`	Open interest at short strike (default 2000)

VIX

Variable	Description
`vix`	VIX close (fetched via yfinance; fallback 20.0)
`vix_spot`	Alias for `vix`

Calendar

Variable	Type	Description
`fomc_meeting_day`	bool	True on FOMC announcement days (2019–2026)
`fomc_within_3d`	bool	True within 3 calendar days of next FOMC
`days_to_fomc`	int	Calendar days until next FOMC
`cpi_release_day`	bool	True on BLS CPI release days (2019–2026)
`opex_week`	bool	True during standard monthly OpEx week
`within_3d_of_opex`	bool	True within 3 days of nearest monthly OpEx
`earnings_within_5d`	bool	Always False (requires external earnings service)
`earnings_within_7d`	bool	Always False (requires external earnings service)

Position State

Variable	Type	Description
`in_position`	bool	True when a position is open
`capital`	float	Current available capital ($)
`bars_held`	int	Bars elapsed since entry (0 if flat)
`days_held`	int	Alias for `bars_held`
`position_pnl_pct`	float	Unrealised P&L as fraction of entry price

Safe Builtins

The eval scope includes: abs, round, min, max, int, float, bool, len, True, False, None.

Pass / Fail Thresholds

backtest run and backtest options-sim check these after every run. Any violation is reported in failures[] and sets passed: false (exit code 1).

Metric	Threshold
`win_rate`	>= 0.50
`profit_factor`	>= 1.50
`max_drawdown_pct`	<= 0.25 (−25%)
`sharpe_ratio`	>= 1.00
`consecutive_losses_max`	<= 8
`walk_forward_efficiency`	>= 0.70 (walk-forward only)

JSON Output Schemas

All commands emit JSON to stdout. Rich progress and warnings go to stderr.

BacktestReport

Returned by backtest run and backtest options-sim.

{
  "run": {
    "run_id": "uuid-string",
    "strategy_name": "spy_momentum_v1",
    "strategy_version": "1.0",
    "symbol": "SPY",
    "start_date": "2020-01-01",
    "end_date": "2024-12-31",
    "initial_capital": 100000.0,
    "commission_per_contract": 0.65,
    "slippage_pct": 0.005,
    "created_at": "2026-04-19T10:00:00",
    "completed_at": "2026-04-19T10:00:05",
    "status": "complete",
    "error_message": null
  },
  "metrics": {
    "run_id": "uuid-string",
    "total_trades": 87,
    "winning_trades": 52,
    "losing_trades": 35,
    "win_rate": 0.5977,
    "profit_factor": 1.82,
    "expectancy_per_trade": 312.40,
    "avg_win_pct": 0.0421,
    "avg_loss_pct": -0.0198,
    "largest_win": 4210.00,
    "largest_loss": -1850.00,
    "consecutive_losses_max": 5,
    "cagr": 0.1423,
    "max_drawdown_pct": -0.1187,
    "max_drawdown_duration_days": 38,
    "sharpe_ratio": 1.34,
    "sortino_ratio": 2.01,
    "calmar_ratio": 1.20,
    "recovery_factor": 3.74,
    "is_sharpe": null,
    "oos_sharpe": null,
    "walk_forward_efficiency": null
  },
  "equity_curve_url": null,
  "trade_count": 87,
  "passed": true,
  "failures": []
}

WalkForwardReport

Returned by backtest walk-forward.

{
  "strategy_name": "spy_momentum_v1",
  "symbol": "SPY",
  "windows": [
    {
      "window_index": 0,
      "is_start": "2016-01-01",
      "is_end": "2018-10-28",
      "oos_start": "2018-10-29",
      "oos_end": "2019-07-27",
      "is_run_id": "uuid-string",
      "oos_run_id": "uuid-string",
      "is_sharpe": 1.42,
      "oos_sharpe": 1.10,
      "efficiency": 0.775
    }
  ],
  "avg_efficiency": 0.79,
  "avg_oos_sharpe": 1.05,
  "passed": true
}

Storage

SQLite (runs, metrics, trades)

Default path: data/backtest/results.db (overridden by BT_DB_PATH).

Three tables: backtest_runs, backtest_metrics, backtest_trades. Schema defined in backtest/migrations/001_backtest_schema.sql.

# Retrieve a run via sqlite3
sqlite3 $BT_DB_PATH \
  "SELECT run_id, strategy_name, symbol, status, sharpe_ratio \
   FROM backtest_runs r \
   JOIN backtest_metrics m USING (run_id) \
   ORDER BY created_at DESC LIMIT 10;"

MinIO (equity curve Parquet)

Equity curves are stored as Parquet in the backtest-results bucket when MinIO is available. The path is returned in BacktestReport.equity_curve_url.

Running Tests

# Unit tests — no infrastructure required
cd backtest-service
.venv/bin/pytest tests/unit/ -m 'not integration' --tb=short -q

# All tests with coverage
.venv/bin/pytest tests/unit/ --cov=backtest --cov-report=term-missing -q

Test suite: tests/unit/test_engine.py, test_metrics.py, test_signal_runner.py, test_cli.py.

Agent Integration

This service is designed to be called by the backtesting_agent in the agent-trading-firm R&D pipeline via the shared call_cli() subprocess wrapper.

Calling from an agent

from shared.tools.cli_tool import call_cli

# Run a backtest
result = call_cli([
    "backtest", "run",
    "--strategy", "/path/to/spec.json",
    "--symbol", "SPY",
    "--start", "2020-01-01",
    "--end", "2024-12-31",
])
# result is a dict parsed from JSON stdout
passed = result["passed"]
run_id = result["run"]["run_id"]
sharpe = result["metrics"]["sharpe_ratio"]

# Walk-forward validation
wf = call_cli([
    "backtest", "walk-forward",
    "--strategy", "/path/to/spec.json",
    "--symbol", "SPY",
    "--start", "2016-01-01",
    "--end", "2024-12-31",
    "--windows", "3",
])
efficiency = wf["avg_efficiency"]   # >= 0.70 = pass

# Retrieve stored report
report = call_cli(["backtest", "report", "--run-id", run_id])

# List recent runs
runs = call_cli(["backtest", "list", "--symbol", "SPY", "--limit", "5"])

Pipeline position

backtest-service sits between strategy_development_agent and monte_carlo_agent in the R&D pipeline:

instrument_research_agent
  → strategy_development_agent  (produces strategy spec JSON)
      → backtesting_agent        (calls this service)
          → monte_carlo_agent    (consumes run_id from BacktestReport)
              → forward_testing_agent
                  → deployment_agent

Key invariants for agents

passed: true is required before passing run_id to the Monte Carlo service.
walk_forward_efficiency >= 0.70 is the bar for walk-forward validation.
All condition strings in specs use the indicator namespace above — eval() is sandboxed; only the listed safe builtins are available.
RSI is bounded [0, 100] — use rsi_14 < 0 as an always-false sentinel in tests, not rsi_14 < 5.
iv_rank is None (not NaN) in the eval namespace when IV data is absent; always guard with iv_rank is not None.
per_trade_risk_pct has a hard server-side cap of 0.02 (2%); specs exceeding this will be rejected at load time.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
backtest		backtest
tests		tests
.gitignore		.gitignore
README.md		README.md
pyproject.toml		pyproject.toml

Folders and files

Latest commit

History

Repository files navigation

Agent Backtesting Service

Table of Contents

Architecture

Installation

Environment Variables

CLI Reference

run

walk-forward

options-sim

report

list

Strategy Spec Format

StrategySpec (equities)

OptionsSpreadSpec (options-sim)

Indicator Namespace

Trend

Momentum

Volatility

Volume

IV / Options

VIX

Calendar

Position State

Safe Builtins

Pass / Fail Thresholds

JSON Output Schemas

BacktestReport

WalkForwardReport

Storage

SQLite (runs, metrics, trades)

MinIO (equity curve Parquet)

Running Tests

Agent Integration

Calling from an agent

Pipeline position

Key invariants for agents

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`run`

`walk-forward`

`options-sim`

`report`

`list`

Packages