The logic in this service has not yet been fully validated. This is working, but results should be reviewed and validated separately. I will continue to build test scenarios and observability to build trust in the output, but the primary focus at this time is the agentic workflow this service supports. This is not directing real investments at this time.
A production-grade backtesting engine for the agent-trading-firm ecosystem. Runs event-driven strategy simulations against historical OHLCV data fetched from the market-data-service, computes a comprehensive set of performance metrics, and optionally validates strategies using walk-forward analysis or Black-Scholes options simulation.
I/O contract: JSON → stdout · Rich progress → stderr · Exit 0 = success, 1 = soft failure (threshold breach), 2+ = error.
- Architecture
- Installation
- Environment Variables
- CLI Reference
- Strategy Spec Format
- Indicator Namespace
- Pass / Fail Thresholds
- JSON Output Schemas
- Storage
- Running Tests
- Agent Integration
CLI (backtest run / walk-forward / options-sim / report / list)
│
▼
BacktestEngine
│
├── market-data CLI ──► OHLCV DataFrame (subprocess, JSON stdout)
│ └── iv-rank ──► IV rank time series (optional join)
│ └── yfinance ──► VIX time series (optional join)
│
├── compute_indicators() ─► 60+ indicator columns on each bar
│
├── run_signals() ─► bar-by-bar eval() of condition strings
│ └── BacktestTrade list
│
├── compute_metrics() ─► BacktestMetrics (Sharpe, Calmar, DD …)
├── check_thresholds() ─► pass/fail list
│
└── ResultsStore (SQLite) + MinIO (Parquet equity curves)
Options path (options-sim): replaces run_signals() with run_options_signals(), which models P&L from Black-Scholes option premium changes rather than underlying price changes.
Walk-forward path (walk-forward): runs N pairs of IS/OOS backtests and reports efficiency = oos_sharpe / is_sharpe.
cd backtest-service
python -m venv .venv
source .venv/bin/activate
pip install -e .Requires Python 3.12+. Optional VIX support:
pip install yfinanceThe market-data CLI must be on $PATH (or set BT_MARKET_DATA_CMD to its absolute path).
All variables are prefixed BT_. Can be set in a .env file at the project root.
| Variable | Default | Description |
|---|---|---|
BT_DB_PATH |
data/backtest/results.db |
SQLite database path for runs, metrics, and trades |
BT_MINIO_ENDPOINT |
localhost:9000 |
MinIO endpoint for Parquet equity curve storage |
BT_MINIO_ACCESS_KEY |
mds |
MinIO access key |
BT_MINIO_SECRET_KEY |
mds_secret |
MinIO secret key |
BT_MINIO_BUCKET |
backtest-results |
Bucket for equity curve Parquet files |
BT_MINIO_SECURE |
false |
Use HTTPS for MinIO |
BT_INITIAL_CAPITAL |
100000.0 |
Default starting capital ($) |
BT_DEFAULT_COMMISSION |
0.65 |
Default commission per contract ($) |
BT_DEFAULT_SLIPPAGE |
0.005 |
Default slippage fraction (0.5%) |
BT_MARKET_DATA_CMD |
market-data |
Path/name of the market-data CLI |
BT_MIN_TRADES_REQUIRED |
10 |
Minimum trades before warning |
BT_MAX_HOLDING_BARS |
20 |
Safety cap if spec omits max_holding_bars |
Run a full backtest for a strategy spec over a date range.
backtest run \
--strategy <path/to/spec.json> \
--symbol AAPL \
--start 2019-01-01 \
--end 2024-12-31 \
[--capital 100000] \
[--commission 0.65] \
[--slippage 0.005]Options
| Flag | Required | Description |
|---|---|---|
--strategy |
Yes | Path to strategy spec JSON or YAML |
--symbol / -s |
Yes | Ticker symbol (e.g. SPY, AAPL) |
--start |
Yes | Start date YYYY-MM-DD |
--end |
Yes | End date YYYY-MM-DD |
--capital |
No | Override initial capital ($) |
--commission |
No | Override commission per contract ($) |
--slippage |
No | Override slippage fraction |
Exit codes: 0 = pass · 1 = fail (threshold breach or no data) · 2 = error
Stdout: BacktestReport JSON
Example
backtest run --strategy strategies/spy_short_put_spread_v1.json \
--symbol SPY --start 2020-01-01 --end 2024-12-31Run IS/OOS walk-forward validation across N rolling windows.
backtest walk-forward \
--strategy <path/to/spec.json> \
--symbol SPY \
--start 2016-01-01 \
--end 2024-12-31 \
[--windows 3] \
[--is-ratio 0.70]Options
| Flag | Default | Description |
|---|---|---|
--strategy |
Required | Path to strategy spec |
--symbol / -s |
Required | Ticker symbol |
--start |
Required | Start date YYYY-MM-DD |
--end |
Required | End date YYYY-MM-DD |
--windows |
3 |
Number of walk-forward windows |
--is-ratio |
0.70 |
In-sample fraction per window |
Pass criterion: avg_efficiency >= 0.70 (OOS Sharpe / IS Sharpe averaged across all windows)
Stdout: WalkForwardReport JSON
Simulate an options spread strategy using Black-Scholes with stored IV rank data. P&L is computed from option premium changes, not underlying price movements.
Pre-requisite: IV rank history must be populated first:
market-data iv-rank-backfill --symbol SPY --start 2019-01-01 --yesbacktest options-sim \
--strategy <path/to/spread_spec.json> \
--symbol SPY \
--start 2023-01-01 \
--end 2024-12-31 \
[--capital 100000] \
[--commission 0.65] \
[--slippage 0.005]Supported spread types: short_put_spread · short_call_spread · iron_condor
Stdout: BacktestReport JSON (same schema as run)
Retrieve the full persisted report for a completed run.
backtest report --run-id <uuid>Stdout:
{
"run": { ... },
"metrics": { ... },
"trades": [ ... ]
}List recent backtest runs.
backtest list [--symbol AAPL] [--limit 20]Stdout:
{
"count": 5,
"runs": [ { "run_id": "...", "strategy_name": "...", "symbol": "...", "status": "complete", ... } ]
}Strategy specs are JSON (or YAML) files that define entry/exit logic as Python boolean expressions evaluated per bar.
Used by backtest run and backtest walk-forward.
{
"name": "spy_momentum_v1",
"version": "1.0",
"asset_class": "equities",
"timeframe": "swing",
"entry_conditions": [
"close > sma_50",
"rsi_14 > 50",
"macd_hist > 0",
"not in_position"
],
"exit_conditions": [
"rsi_14 > 70",
"close < ema_21"
],
"stop_conditions": [
"close < sma_200"
],
"filter_conditions": [
"vix < 35",
"not fomc_meeting_day"
],
"position_sizing_formula": "risk_pct * capital / atr_14",
"max_holding_bars": 10,
"max_concurrent_positions": 1,
"per_trade_risk_pct": 0.01,
"requires_options_data": false,
"description": "SPY trend-following with RSI/MACD confirmation",
"instruments": ["SPY"]
}Field reference
| Field | Type | Default | Description |
|---|---|---|---|
name |
string | required | Strategy identifier |
version |
string | "1.0" |
Version label |
asset_class |
enum | "equities" |
equities · options · equity_options · futures · multi-leg |
timeframe |
enum | "swing" |
intraday · daily · swing · position |
entry_conditions |
list[str] | required | All must be True to open |
exit_conditions |
list[str] | required | All must be True to close at profit target |
stop_conditions |
list[str] | required | All must be True to close at stop |
filter_conditions |
list[str] | [] |
All must be True each bar for signals to be evaluated |
position_sizing_formula |
str | "risk_pct * capital / atr_14" |
Python expression for share count |
max_holding_bars |
int | 10 |
Force-exit after N bars |
max_concurrent_positions |
int | 1 |
Max open positions at once |
per_trade_risk_pct |
float | 0.01 |
Capital fraction risked per trade (hard cap: 0.02) |
requires_options_data |
bool | false |
When True, engine fetches IV rank history and joins it |
instruments |
list[str] | [] |
Instruments the strategy trades (informational) |
Used by backtest options-sim. P&L uses Black-Scholes premium changes, not OHLCV price moves.
{
"name": "vrp_short_put_spread",
"version": "1.0",
"spread_type": "short_put_spread",
"entry_conditions": [
"iv_rank is not None and iv_rank >= 30",
"close > sma_50",
"not in_position"
],
"filter_conditions": [
"vix < 40",
"not fomc_meeting_day"
],
"target_dte": 21,
"short_delta": 0.30,
"long_delta": 0.15,
"profit_target_pct": 0.50,
"stop_loss_pct": 2.00,
"max_holding_bars": 15,
"max_concurrent_positions": 1,
"per_trade_risk_pct": 0.02,
"risk_free_rate": 0.05
}Field reference
| Field | Type | Default | Description |
|---|---|---|---|
spread_type |
enum | "short_put_spread" |
short_put_spread · short_call_spread · iron_condor |
target_dte |
int | 21 |
Days to expiration at entry |
short_delta |
float | 0.30 |
Short leg absolute delta (e.g. 0.30 = 30Δ) |
long_delta |
float | 0.15 |
Long (protective) leg absolute delta |
profit_target_pct |
float | 0.50 |
Close when 50% of credit is captured |
stop_loss_pct |
float | 2.00 |
Close when spread value = 2× credit received |
risk_free_rate |
float | 0.05 |
Risk-free rate for Black-Scholes (annual) |
Every bar's condition expressions are evaluated against a namespace containing these variables. All are float unless noted. Missing data resolves to None (so conditions like iv_rank is not None and iv_rank >= 30 are safe).
| Variable | Description |
|---|---|
open, high, low, close, volume |
Raw OHLCV values |
sma_20, sma_50, sma_200 |
Simple moving averages |
ema_9, ema_21, ema_50 |
Exponential moving averages |
spy_price, spy_close |
Alias for close (useful for SPY strategies) |
spy_sma_20, spy_sma_50, spy_sma_200 |
Alias for SMA columns |
prev_close, prev_high, prev_low |
Previous bar values |
| Variable | Description |
|---|---|
rsi_14 |
RSI 14-period, bounded [0, 100] |
macd |
MACD line (EMA12 − EMA26) |
macd_signal |
MACD signal line (EMA9 of MACD) |
macd_hist |
MACD histogram (MACD − signal) |
roc_10 |
Rate of change, 10-bar (%) |
| Variable | Description |
|---|---|
atr_14 |
Average True Range, 14-period |
bb_upper, bb_lower |
Bollinger Bands (20-period, ±2σ) |
bb_pct |
Price position within Bollinger Bands [0, 1] |
adx_14 |
Average Directional Index |
plus_di, minus_di |
Directional movement indicators |
hv_30 |
30-day historical volatility (annualised) |
spy_realized_vol_5d, spy_realized_vol_10d |
Short-window realised vol |
| Variable | Description |
|---|---|
obv |
On-balance volume |
spy_adv_20d |
20-day average daily volume |
volume_spy_today |
Current bar volume |
| Variable | Description |
|---|---|
iv_rank |
IV rank 0–100 (requires iv-rank-backfill or requires_options_data: true) |
iv_percentile |
IV percentile 0–100 |
current_iv |
Current ATM implied volatility (decimal) |
iv_hv_ratio |
current_iv / hv_30 (VRP indicator) |
iv_atm |
ATM IV; falls back to hv_30 * 1.20 when unavailable |
dte |
Days to next standard monthly OpEx (0–60) |
short_leg_delta |
Approximated short leg delta (default 0.30) |
short_strike_delta |
Alias for short_leg_delta |
bid_ask_spread_pct |
Modelled bid/ask spread fraction |
pnl_pct_of_max_credit |
Options P&L as fraction of max credit received |
premium_collected_pct |
Alias for pnl_pct_of_max_credit |
open_interest_short_strike |
Open interest at short strike (default 2000) |
| Variable | Description |
|---|---|
vix |
VIX close (fetched via yfinance; fallback 20.0) |
vix_spot |
Alias for vix |
| Variable | Type | Description |
|---|---|---|
fomc_meeting_day |
bool | True on FOMC announcement days (2019–2026) |
fomc_within_3d |
bool | True within 3 calendar days of next FOMC |
days_to_fomc |
int | Calendar days until next FOMC |
cpi_release_day |
bool | True on BLS CPI release days (2019–2026) |
opex_week |
bool | True during standard monthly OpEx week |
within_3d_of_opex |
bool | True within 3 days of nearest monthly OpEx |
earnings_within_5d |
bool | Always False (requires external earnings service) |
earnings_within_7d |
bool | Always False (requires external earnings service) |
| Variable | Type | Description |
|---|---|---|
in_position |
bool | True when a position is open |
capital |
float | Current available capital ($) |
bars_held |
int | Bars elapsed since entry (0 if flat) |
days_held |
int | Alias for bars_held |
position_pnl_pct |
float | Unrealised P&L as fraction of entry price |
The eval scope includes: abs, round, min, max, int, float, bool, len, True, False, None.
backtest run and backtest options-sim check these after every run. Any violation is reported in failures[] and sets passed: false (exit code 1).
| Metric | Threshold |
|---|---|
win_rate |
>= 0.50 |
profit_factor |
>= 1.50 |
max_drawdown_pct |
<= 0.25 (−25%) |
sharpe_ratio |
>= 1.00 |
consecutive_losses_max |
<= 8 |
walk_forward_efficiency |
>= 0.70 (walk-forward only) |
All commands emit JSON to stdout. Rich progress and warnings go to stderr.
Returned by backtest run and backtest options-sim.
{
"run": {
"run_id": "uuid-string",
"strategy_name": "spy_momentum_v1",
"strategy_version": "1.0",
"symbol": "SPY",
"start_date": "2020-01-01",
"end_date": "2024-12-31",
"initial_capital": 100000.0,
"commission_per_contract": 0.65,
"slippage_pct": 0.005,
"created_at": "2026-04-19T10:00:00",
"completed_at": "2026-04-19T10:00:05",
"status": "complete",
"error_message": null
},
"metrics": {
"run_id": "uuid-string",
"total_trades": 87,
"winning_trades": 52,
"losing_trades": 35,
"win_rate": 0.5977,
"profit_factor": 1.82,
"expectancy_per_trade": 312.40,
"avg_win_pct": 0.0421,
"avg_loss_pct": -0.0198,
"largest_win": 4210.00,
"largest_loss": -1850.00,
"consecutive_losses_max": 5,
"cagr": 0.1423,
"max_drawdown_pct": -0.1187,
"max_drawdown_duration_days": 38,
"sharpe_ratio": 1.34,
"sortino_ratio": 2.01,
"calmar_ratio": 1.20,
"recovery_factor": 3.74,
"is_sharpe": null,
"oos_sharpe": null,
"walk_forward_efficiency": null
},
"equity_curve_url": null,
"trade_count": 87,
"passed": true,
"failures": []
}Returned by backtest walk-forward.
{
"strategy_name": "spy_momentum_v1",
"symbol": "SPY",
"windows": [
{
"window_index": 0,
"is_start": "2016-01-01",
"is_end": "2018-10-28",
"oos_start": "2018-10-29",
"oos_end": "2019-07-27",
"is_run_id": "uuid-string",
"oos_run_id": "uuid-string",
"is_sharpe": 1.42,
"oos_sharpe": 1.10,
"efficiency": 0.775
}
],
"avg_efficiency": 0.79,
"avg_oos_sharpe": 1.05,
"passed": true
}Default path: data/backtest/results.db (overridden by BT_DB_PATH).
Three tables: backtest_runs, backtest_metrics, backtest_trades. Schema defined in backtest/migrations/001_backtest_schema.sql.
# Retrieve a run via sqlite3
sqlite3 $BT_DB_PATH \
"SELECT run_id, strategy_name, symbol, status, sharpe_ratio \
FROM backtest_runs r \
JOIN backtest_metrics m USING (run_id) \
ORDER BY created_at DESC LIMIT 10;"Equity curves are stored as Parquet in the backtest-results bucket when MinIO is available. The path is returned in BacktestReport.equity_curve_url.
# Unit tests — no infrastructure required
cd backtest-service
.venv/bin/pytest tests/unit/ -m 'not integration' --tb=short -q
# All tests with coverage
.venv/bin/pytest tests/unit/ --cov=backtest --cov-report=term-missing -qTest suite: tests/unit/test_engine.py, test_metrics.py, test_signal_runner.py, test_cli.py.
This service is designed to be called by the backtesting_agent in the agent-trading-firm R&D pipeline via the shared call_cli() subprocess wrapper.
from shared.tools.cli_tool import call_cli
# Run a backtest
result = call_cli([
"backtest", "run",
"--strategy", "/path/to/spec.json",
"--symbol", "SPY",
"--start", "2020-01-01",
"--end", "2024-12-31",
])
# result is a dict parsed from JSON stdout
passed = result["passed"]
run_id = result["run"]["run_id"]
sharpe = result["metrics"]["sharpe_ratio"]
# Walk-forward validation
wf = call_cli([
"backtest", "walk-forward",
"--strategy", "/path/to/spec.json",
"--symbol", "SPY",
"--start", "2016-01-01",
"--end", "2024-12-31",
"--windows", "3",
])
efficiency = wf["avg_efficiency"] # >= 0.70 = pass
# Retrieve stored report
report = call_cli(["backtest", "report", "--run-id", run_id])
# List recent runs
runs = call_cli(["backtest", "list", "--symbol", "SPY", "--limit", "5"])backtest-service sits between strategy_development_agent and monte_carlo_agent in the R&D pipeline:
instrument_research_agent
→ strategy_development_agent (produces strategy spec JSON)
→ backtesting_agent (calls this service)
→ monte_carlo_agent (consumes run_id from BacktestReport)
→ forward_testing_agent
→ deployment_agent
passed: trueis required before passingrun_idto the Monte Carlo service.walk_forward_efficiency >= 0.70is the bar for walk-forward validation.- All condition strings in specs use the indicator namespace above —
eval()is sandboxed; only the listed safe builtins are available. - RSI is bounded
[0, 100]— usersi_14 < 0as an always-false sentinel in tests, notrsi_14 < 5. iv_rankisNone(notNaN) in the eval namespace when IV data is absent; always guard withiv_rank is not None.per_trade_risk_pcthas a hard server-side cap of0.02(2%); specs exceeding this will be rejected at load time.