fix: retry engine_data CAS on InnoDB deadlocks instead of dropping writes by chubes4 · Pull Request #2786 · Extra-Chill/data-machine

chubes4 · 2026-06-25T01:24:46Z

Summary

Fixes #2785. InnoDB deadlocks (and lock-wait timeouts) during the engine_data compare-and-swap were returned as a generic db_error and treated as non-retryable, so the optimistic-concurrency loop bailed and silently dropped the write. This is exactly the case MySQL recommends retrying ("try restarting transaction").

Observed in production on events.extrachill.com — parallel scraper/Ticketmaster ingestion flows CAS the same engine_data rows at the same scheduled tick (~15:02–15:03 UTC) and step on each other's locks, dropping tool-run-state / step-progress writes.

Changes

Jobs::compare_and_swap_engine_data() — on $wpdb->update failure, classify $wpdb->last_error via a new is_retryable_db_error() helper (matches deadlock 1213 / lock-wait timeout 1205). Return a retryable flag and error: 'deadlock', and log at warning (not error) when transient.
EngineData::mutate() — treat retryable like a logical conflict: re-read the latest snapshot and retry within the existing max_attempts budget, with a small randomized backoff (5–25ms) to let the winning transaction commit. Genuinely fatal DB errors still fail fast.
RuntimeToolRunStateStore::mutate_engine_data() — mirror the same classification in the fallback CAS loop.

Behavior

A deadlock on the CAS write is now retried instead of dropping the write.
Non-retryable DB errors still fail fast (no infinite loop — bounded by max_attempts).
Logging distinguishes transient contention (warning, with reason: deadlock|conflict) from fatal failure (error).

Verification

php -l clean on all three files.
phpcs clean (no warnings/errors) on all three files.

…ites InnoDB deadlocks and lock-wait timeouts were returned as a generic 'db_error' and treated as non-retryable, so the optimistic-concurrency loop bailed and silently dropped the engine_data write. Concurrent events ingestion flows (parallel scraper/Ticketmaster jobs CAS-ing the same rows) hit this daily. Classify deadlock (1213) / lock-wait timeout (1205) as a transient, retryable condition and re-read the latest snapshot with a small randomized backoff, mirroring the existing logical-conflict retry path. Genuinely fatal DB errors still fail fast. Closes #2785

homeboy-ci · 2026-06-25T01:26:28Z

Homeboy Results — `data-machine`

Lint

✅ lint — passed

ℹ️ Full options: homeboy docs commands/lint
Deep dive: homeboy lint data-machine --changed-since 8a413e6

Artifacts and drill-down

CI results artifact: homeboy-ci-results-data-machine-lint-quality-Linux-node24 contains immediate command JSON for this action invocation.
Observation artifact: homeboy-observations-data-machine-lint-quality-Linux-node24 contains exported Homeboy run history for deeper queries.
Drill-down: download the observation artifact, then run homeboy runs import <dir>, homeboy runs list, and homeboy runs findings <run-id>.
Artifacts are attached to the workflow run: https://github.com/Extra-Chill/data-machine/actions/runs/28140624728

Test

❌ test — failed

ℹ️ No tests ran — the runner failed before producing results. See raw_output.stderr_tail / raw_output.stdout_tail for the underlying error (bootstrap failure, missing deps, DB connection, etc.).
ℹ️ To run specific tests: homeboy test data-machine -- --filter=TestName
ℹ️ Auto-fix lint issues: homeboy refactor data-machine --from lint --write
ℹ️ Collect coverage: homeboy test data-machine --coverage
ℹ️ Analyze failures: homeboy test data-machine --analyze
ℹ️ Pass args to test runner: homeboy test -- [args]
ℹ️ Full options: homeboy docs commands/test
Deep dive: homeboy test data-machine --changed-since 8a413e6

Artifacts and drill-down

CI results artifact: homeboy-ci-results-data-machine-test-quality-Linux-node24 contains immediate command JSON for this action invocation.
Observation artifact: homeboy-observations-data-machine-test-quality-Linux-node24 contains exported Homeboy run history for deeper queries.
Drill-down: download the observation artifact, then run homeboy runs import <dir>, homeboy runs list, and homeboy runs findings <run-id>.
Artifacts are attached to the workflow run: https://github.com/Extra-Chill/data-machine/actions/runs/28140624728

Audit

✅ audit — passed

audit — 131 finding(s)
Total: 131 finding(s)

Deep dive: homeboy audit data-machine --changed-since 8a413e6

Artifacts and drill-down

CI results artifact: homeboy-ci-results-data-machine-audit-quality-Linux-node24 contains immediate command JSON for this action invocation.
Observation artifact: homeboy-observations-data-machine-audit-quality-Linux-node24 contains exported Homeboy run history for deeper queries.
Drill-down: download the observation artifact, then run homeboy runs import <dir>, homeboy runs list, and homeboy runs findings <run-id>.
Artifacts are attached to the workflow run: https://github.com/Extra-Chill/data-machine/actions/runs/28140624728

Tooling versions

Homeboy CLI: homeboy 0.260.0+ba82bac50654+18263261
Extension: wordpress from https://github.com/Extra-Chill/homeboy-extensions
Extension revision: 40d1495f
Action: unknown@unknown

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: retry engine_data CAS on InnoDB deadlocks instead of dropping writes#2786

fix: retry engine_data CAS on InnoDB deadlocks instead of dropping writes#2786
chubes4 wants to merge 1 commit into
mainfrom
cas-deadlock-retry

chubes4 commented Jun 25, 2026

Uh oh!

homeboy-ci Bot commented Jun 25, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

chubes4 commented Jun 25, 2026

Summary

Changes

Behavior

Verification

Uh oh!

homeboy-ci Bot commented Jun 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Homeboy Results — data-machine

Lint

Test

Audit

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

homeboy-ci Bot commented Jun 25, 2026 •

edited

Loading

Homeboy Results — `data-machine`