Skip to content

apiprdt/PhysicsPaper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

134 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

ADCD β€” Anomaly-Driven Correction Discovery

Physics-constrained symbolic regression that discovers correction terms β€” not equations from scratch. The same logic that led from Newton to Einstein, from Rayleigh–Jeans to Planck.

ADCD discovery pipeline animation

β–Ά Download video (MP4)

DOI License: MIT Python Support

Science rarely discovers from a blank slate β€” it corrects. ADCD automates the step between anomaly and theory correction: given a classical law and data that disagrees with it, it searches for the minimal physically-valid correction term $\Delta$ β€” passing every candidate through dimensional, asymptotic, and complexity gates before a single parameter is ever fit.


⚑ Key Features

  • Correction-First Paradigm β€” Starts from a known classical law, not a blank slate. Focuses the search space on the discrepancy $\Delta$ between theory and experiment.
  • Cascaded Physics Gates β€” AST complexity, dimensional homogeneity, transcendental guardrails, and asymptotic consistency (ARC) gates screen out unphysical candidates before running parameter-fitting.
  • JAX-Traced L-BFGS-B Optimizer β€” Highly optimized parameter-scaled differentiable fitting with multi-restart log-uniform initialization.
  • BIC Model Selection β€” Employs the Bayesian Information Criterion (BIC) to rank models, favoring simpler physical theories over overly complex numerical fits.
  • Residual Feature Intelligence β€” Extracts mathematical features (monotonicity, curvature, oscillation, decay) from residuals to bias proposal templates.
  • Phase 2: Multivariable Discovery β€” Buckingham Ξ  group decomposition + per-variable Sequential ARC + variance-factorization separability detection for multi-input physical laws.
  • Real-World Validated β€” Successfully identifies correct structural classes on Mercury's perihelion (GR), Lamb Shift (QED), Muon g-2 (Schwinger), and Blackbody (Planck).

πŸ“¦ Installation

Install the stable package from PyPI:

pip install adcd

Or install from source:

git clone https://github.com/apiprdt/PhysicsPaper.git
cd PhysicsPaper
pip install -e ".[dev]"

Verify your installation:

pytest tests/

πŸ’» Quick Start

1. High-Level Scientific API

Running ADCD on predefined physics benchmarks is extremely simple:

import adcd

# 1. Load a pre-defined benchmark scenario (e.g. Relativistic Kinetic Energy)
scenarios = adcd.get_all_scenarios()
scenario = scenarios[0]

# 2. Run discovery in a single line!
result = adcd.discover_correction(scenario, max_iterations=5, proposer="mock")

# 3. View the best fit
print(f"Discovered correction: {result.best_expr}")       # ΞΈβ‚€ * (v/c)**2
print(f"LaTeX representation:  {result.export_latex()}")   # \theta_0 \left(\frac{v}{c}\right)^2
print(f"Parameters:            {result.best_theta}")
print(f"BIC Score:             {result.best_bic:.2f}")

# 4. Plot residuals
result.plot_residuals()

2. Custom Experimental Datasets

For custom datasets, use the adcd.fit function:

import numpy as np
import adcd

# Your custom data
x = np.linspace(1.0, 5.0, 100)
X = {"x": x}
y_classical = 2.0 * x
y_observed  = 2.0 * x + 0.5 * x**2   # True correction is 0.5 * x^2

# Run ADCD
result = adcd.fit(
    X=X,
    y_obs=y_observed,
    y_classical=y_classical,
    limit_variable="x",
    limit_direction="0",
    correction_mode="additive"
)

result.summary()

πŸ“Š Benchmark Results

Headline (primary claim): a mean structural recovery of 80.4% (Β±7.4%) across sixteen independent seeds (95% bootstrap CI [76.7%, 84.0%]). The reference seed=42 below is disclosed explicitly as the highest-performing seed (94.4%) β€” the mean, not the peak, is the claim. Full per-seed Γ— per-noise breakdown ships in results/seed_distribution.json.

1. Multi-Seed Distribution (primary result, Mock Proposer)

Noise level ADCD mean (16 seeds) ADCD worst seed ADCD best (seed=42)
0% 86.8% (Β±9.8%) 66.7% (6/9) 100% (9/9)
1% 81.2% (Β±14.6%) 44.4% (4/9) 100% (9/9)
5% 77.1% (Β±10.0%) 66.7% (6/9) 88.9% (8/9)
10% 76.4% (Β±12.3%) 55.6% (5/9) 88.9% (8/9)
Overall 80.4% (Β±7.4%) 69.4% (25/36) 94.4% (34/36)

2. Reference-Seed Detail (seed=42, Mock Proposer)

Scenario Tier 0% Noise 1% Noise 5% Noise 10% Noise
Relativistic KE Textbook βœ“ βœ“ βœ“ βœ“
Yukawa Gravity Textbook βœ“ βœ“ βœ“ βœ“
Anharmonic Spring Textbook βœ“ βœ“ βœ“ βœ“
Screened Coulomb Cross-Domain βœ“ βœ“ βœ— βœ—
Net Radiation Cross-Domain βœ“ βœ“ βœ“ βœ“
Nonlinear Drag Cross-Domain βœ“ βœ“ βœ“ βœ“
Mystery-A (tanhΒ²) Synthetic βœ“ βœ“ βœ“ βœ“
Mystery-B (sinc) Synthetic βœ“ βœ“ βœ“ βœ“
Mystery-C (log-quotient) Synthetic βœ“ βœ“ βœ“ βœ“
Overall 100% 100% 88.9% 88.9%

3. PySR Comparison (same residual, 5% noise β€” the structural-selection test)

The gap is seed-independent: even ADCD's worst of 16 seeds beats PySR fair, and PySR with doubled budget cannot reach ADCD's worst seed.

Method (5% noise) 0% 1% 5% 10%
ADCD (ours, seed=42) 9/9 (100%) 9/9 (100%) 8/9 (88.9%) 8/9 (88.9%)
ADCD multi-seed mean 86.8% 81.2% 77.1% 76.4%
ADCD worst of 16 seeds 66.7% 44.4% 66.7% 55.6%
PySR fair (100 iter, 60s) 4/9 (44.4%) 5/9 (55.6%) 1/9 (11.1%) 5/9 (55.6%)
PySR generous (2Γ— budget) 4/9 (44.4%) 4/9 (44.4%) 5/9 (55.6%) 2/9 (22.2%)

At 5% noise the gap is +66.0 points (ADCD multi-seed mean 77.1% vs PySR fair 11.1%). Doubling PySR's budget (generous, 55.6%) does not close it β€” and that doubled-budget figure still sits below ADCD's worst of 16 seeds (66.7%). PySR was run once per (scenario, noise); ADCD across 16 seeds. PySR non-monotonic under noise; ADCD stable.

3. Phase 2: Multivariable Benchmark

Scenario Variables ADCD Solved Notes
Yukawa Mass-Ratio m, M, r, rβ‚€ βœ“ Ξ  groups: m/M, r/rβ‚€
Turbulent Drag v, ρ, A, C_D βœ“ Separable multiplicative
Coupled Oscillator k, m, Ξ©, Ο‰β‚€ βœ— Mixed functional form
Van der Waals MV a, b, P, V, T βœ— Requires 3rd Ξ  group
Overall 2/4 (50%) Baseline: 0/4

4. Real-World Physical Constants

Validation on historical anomalies using physical constants from JPL DE440, NIST, and CODATA:

Physical Scenario Discovered Correction Converged Class Match NMSE
Mercury Perihelion (GR) ΞΈβ‚€Β·vcΒ² β€” βœ“ polynomial 1.11e-05
Hydrogen Lamb Shift (QED) ΞΈβ‚€(n/θ₁)^(-ΞΈβ‚‚) βœ“ βœ“ power_law 1.82e-18
Muon g-2 (Schwinger) ΞΈβ‚€(Ξ±/Ο€)^θ₁ βœ“ βœ“ polynomial 7.94e-07
Blackbody (Planck) -1 + e^(-f/θ₁) β€” βœ“ exponential 2.59e-02

πŸ“ Project Structure

adcd-v3.0.0/
β”œβ”€β”€ src/adcd/                       # Installable package
β”‚   β”œβ”€β”€ __init__.py                 # Public API (fit, discover_correction)
β”‚   β”œβ”€β”€ anomaly_scenarios.py        # 9 standard + 3 blind + 4 multivariable scenarios
β”‚   β”œβ”€β”€ arc_scorer.py               # Asymptotic consistency gate (ARC)
β”‚   β”œβ”€β”€ buckingham_pi.py            # [Phase 2] Buckingham Ξ  group engine
β”‚   β”œβ”€β”€ coarse_evaluator.py         # Coarse numerical pre-filter
β”‚   β”œβ”€β”€ correction_orchestrator.py  # Main multi-iteration discovery loop
β”‚   β”œβ”€β”€ dimensional_checker.py      # Dimensional homogeneity + transcendental gate
β”‚   β”œβ”€β”€ jax_optimizer.py            # JAX L-BFGS-B optimizer
β”‚   β”œβ”€β”€ llm_proposer.py             # Mock + Gemini + OpenAI proposers
β”‚   β”œβ”€β”€ metrics.py                  # NMSE, BIC, structural classification
β”‚   β”œβ”€β”€ multivar_orchestrator.py    # [Phase 2] Multivariable correction pipeline
β”‚   β”œβ”€β”€ pipeline.py                 # Stage 1 filter cascade
β”‚   β”œβ”€β”€ real_data_loader.py         # Real-world data loading (JPL, NIST, CODATA)
β”‚   β”œβ”€β”€ residual_factorizer_v2.py   # [Phase 2] Variance-decomposition separability
β”‚   β”œβ”€β”€ result.py                   # CorrectionResult object
β”‚   └── sequential_arc.py           # [Phase 2] Per-variable Sequential ARC checker
β”œβ”€β”€ tests/                          # Unit + integration tests
β”œβ”€β”€ paper/                          # LaTeX source (main.tex) + figures
β”œβ”€β”€ data/                           # Input datasets (SPARC, cosmic chronometers, growth rate)
β”œβ”€β”€ scripts/                         # Table generation and verification scripts
β”œβ”€β”€ run_correction_discovery.py     # Benchmark runner
└── README.md                       # This file

πŸ“– Citing This Work

If you use ADCD in your research, please cite:

@software{erdita2026adcd,
  author    = {Erdita, Muhammad Afif},
  title     = {{Anomaly-Driven Correction Discovery (ADCD): Physics-Constrained
                Symbolic Regression for Evolutionary Scientific Discovery}},
  year      = {2026},
  publisher = {Zenodo},
  version   = {3.0.0},
  doi       = {10.5281/zenodo.20534940},
  url       = {https://doi.org/10.5281/zenodo.20534940}
}

πŸ”¬ Reproducibility

Every quantitative claim in this project is reproducible from committed scripts. No number is hand-typed.

# Regenerate the 9-scenario benchmark (seed=42)
python run_correction_discovery.py

# Multi-seed study (16 seeds Γ— 9 scenarios Γ— 4 noise levels)
python run_reproducibility.py

# Build the per-seed Γ— per-noise anti-cherry-pick artifact
python scripts/generate_seed_distribution.py    # β†’ results/seed_distribution.json

# Guard: fails loudly if any headline number drifts
python scripts/verify_paper_claims.py

# SPARC MOND robustness study
python -m adcd.experiments.sparc_robustness

The full test suite must pass before any release:

pytest tests/ -q

πŸ“„ License

This project is licensed under the MIT License.