Skip to content

fix(payments): atomic processing with idempotency key and recovery#29

Merged
oomokaro1 merged 1 commit into
OrbitStream:mainfrom
JuliobaCR:feat/issue-9-atomic-payment-processing
Jun 20, 2026
Merged

fix(payments): atomic processing with idempotency key and recovery#29
oomokaro1 merged 1 commit into
OrbitStream:mainfrom
JuliobaCR:feat/issue-9-atomic-payment-processing

Conversation

@JuliobaCR

Copy link
Copy Markdown
Contributor

Closes #9

Summary

#14/PR #26 (already merged into main) wrapped processPayment's session
update + payment insert in a db.transaction with SELECT ... FOR UPDATE
row locking, plus a tx_hash duplicate check and a 5-minute recovery cron.
That covers requirements #1 and #2, and most of #6.

This PR closes the gaps that are still open per the issue: the composite
idempotency key (#3), the processing status with its two-phase commit
(#4), the recovery path for sessions specifically stuck in processing
(#5), and an explicit graceful log for the already-paid duplicate case
(#6). Tests for all of the above are included (#7).

Requirements

1. Transactional Processing — already in place (PR #26)

processPayment already wraps the claim/finalize work in
db.transaction(async (tx) => { ... }). This PR splits it into two
transactions on purpose — see #4 below for why.

2. Optimistic Locking — already in place (PR #26)

SELECT ... FOR UPDATE locks the session row by memo before any write.
This PR adds a second, explicit affected-row check on top: the
pending -> processing update uses
.where(and(eq(id, session.id), eq(status, 'pending'))).returning(), and
if zero rows come back the payment is skipped with a
Lost claim race for session <id> warning — defense in depth on top of
the row lock, in case the lock window doesn't cover both steps.

3. Idempotency Key

  • src/db/schema.ts: payments.txHash's column-level .unique() is
    promoted to a composite unique('payment_idempotency_key').on(txHash, sessionId), since a tx hash can only ever confirm one session (Stellar
    memos are transaction-level) — the composite is the precise guarantee
    the issue asks for, and doubles as the ON CONFLICT target for the
    idempotent insert below.
  • claimSession() explicitly checks
    tx.query.payments.findFirst({ where: and(eq(txHash, ...), eq(sessionId, ...)) }) inside the transaction before claiming the session, and the
    finalize insert uses .onConflictDoNothing({ target: [txHash, sessionId] }) as a DB-level backstop.
  • Drizzle migration: src/db/migrations/0001_processing-status-and-payment-idempotency-key.sql
    (0000_baseline.sql is a one-time scaffolding migration capturing the
    schema as it already exists today — this repo had no migrations
    checked in yet, so a baseline was needed before a real delta could be
    generated).

4. Processing Status

  • sessionStatusEnum gains 'processing' between pending and paid.

  • processPayment now runs as two separate transactions:

    1. claimSession() — locks the row, validates amount/asset, checks the
      idempotency key, and atomically flips pending -> processing.
    2. the finalize block — inserts the payment and flips
      processing -> paid.

    Splitting these into two commits (instead of one transaction doing
    both) is what makes a crash between them observable: the session is
    left sitting in processing instead of silently rolling back to
    pending or jumping straight to paid, which is exactly the state
    the recovery job in [SEC] JWT Secret Hardening + CORS + Rate Limiting #5 is built to find and reconcile.

5. Recovery Job

Added recoverStuckProcessing() to the existing
PaymentRecoveryService (still on the same @Cron(EVERY_5_MINUTES) from
PR #26, now running it as the first of its four recovery passes):

  • Finds sessions in processing for >5 minutes.

  • Re-scans Horizon for the receiving account
    (StellarService.getPaymentsForAccount(account, undefined, 'desc')
    added the order param for this) for a payment matching the session's
    memo/amount/asset, then calls verifyTransaction() to confirm it
    actually succeeded.

  • Confirmed: inserts the missing payment record (idempotent via the same
    ON CONFLICT target) and marks the session paid, dispatching the
    webhook.

  • Not confirmed / no match: reverts the session to pending so the live
    poller picks it up again.

  • Every branch logs (warn for "found N stuck", warn for the
    revert/no-match case, log for a successful recovery), and one
    session's reconciliation failing doesn't stop the rest of the batch.

    Known limitation, noted for reviewers: the Horizon re-scan only looks
    at the 50 most recent payments for the account (getPaymentsPage's
    existing page size). If a session has been stuck for long enough that
    the confirming payment fell off that window, it reverts to pending
    rather than recovering — re-detection still happens once the live
    poller's cursor catches up, it just isn't covered by this specific
    Horizon re-scan. Solving that fully would need a persisted per-session
    search cursor, which felt out of scope for this issue.

6. Duplicate Payment Handling

claimSession() checks session.status explicitly before anything else:

  • 'paid' → logs Duplicate payment for already-paid session <id> — tx <hash> and returns null — no update, no insert, no throw.
  • anything else non-pending (i.e. processing) → logs
    Session <id> is not pending (status=...) and returns null the same
    way.

Both are graceful early returns; nothing throws and nothing double-writes.

7. Testing

  • payment-detector-transactions.spec.ts (rewritten): two-phase
    claim/finalize, processing-status transitions, idempotency-key
    short-circuit, already-paid / already-processing duplicate handling,
    amount/asset validation, transaction rollback on a forced finalize
    failure
    (asserts no metrics/webhook side effects), and a concurrent
    submissions
    test — two processPayment calls racing for the same
    session via Promise.all, asserting only one wins the claim and only
    one payment gets inserted.
  • payment-recovery.spec.ts (extended): recoverStuckProcessing for the
    confirmed/not-confirmed/no-memo/Horizon-error branches, plus the
    existing pending/paid-without-payment scenarios from PR feat: exactly-once payment processing with transaction wrapping and recovery #26.
  • payment-recovery-cron.spec.ts (new): boots a real
    ScheduleModule.forRoot() + PaymentRecoveryService and uses
    jest.useFakeTimers() to assert the @Cron(EVERY_5_MINUTES) job
    genuinely doesn't fire before 5 minutes and does fire (repeatedly) once
    the schedule elapses — this exercises the actual cron wiring, not just
    the handler called directly like the other specs do.

Verification

  • npm run build — clean
  • npm run lint — clean on all changed files (no new errors; remaining
    warnings are pre-existing no-explicit-any style already used
    throughout this codebase)
  • npm test — 26 suites / 226 tests passing
  • drizzle-kit check — migrations consistent with schema

Risks and Rollback

Same risk profile called out in PR #26 (the row lock spans the
transaction), plus: the new ALTER TYPE ... ADD VALUE migration is
additive and backward compatible (existing rows/statuses are untouched).
Rollback is a straight revert of this PR's commit; the 0001 migration
only adds an enum value and swaps one unique constraint for an equivalent
composite one, so reverting the migration (drop the composite, restore
the column-level unique) is safe if ever needed.

…essing recovery

Closes OrbitStream#9. Builds on the exactly-once work from OrbitStream#14/PR26 to close the
remaining gaps called out in the issue:

- Promote payments' tx_hash unique constraint to a composite
  UNIQUE(tx_hash, session_id) "payment_idempotency_key", checked inside
  the claim transaction before any insert (Drizzle migration included).
- Add a 'processing' status to sessionStatusEnum. processPayment now
  claims the session (pending -> processing) and finalizes it
  (insert + paid) as two separate transactions, so a crash between them
  leaves an observable 'processing' session instead of silently
  reverting or double-confirming.
- Add a recovery pass (every 5 minutes, via the existing
  PaymentRecoveryService cron) that finds sessions stuck in 'processing'
  for >5 minutes, re-checks Horizon by memo, and either completes the
  payment or releases the session back to 'pending'.
- Log-and-return gracefully (no throw, no double update/insert) for
  payments on an already-paid or already-processing session.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

@oomokaro1 oomokaro1 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR implements all 7 requirements from issue #9 — two-phase claim/finalize with a processing status, composite idempotency key, Horizon-based recovery for stuck sessions, graceful duplicate handling, and thorough test coverage. The two-phase transaction split is well-motivated and correctly implemented.

Blocking Issues

None.

Code Issues

None.

Minor Nits

⚠️ The recoverStuckProcessing Horizon re-scan only looks at the 50 most recent payments (documented in the PR body). If a session has been stuck long enough that the confirming payment falls outside that window, it reverts to pending rather than recovering. This is a known limitation and acceptable for now — just noting it for future visibility.

What's Good

  • The two-phase claim/finalize pattern is clean and correctly makes crash-between-phases observable.
  • The composite unique('payment_idempotency_key').on(txHash, sessionId) constraint is the right level of granularity for Stellar's transaction-level memos.
  • claimSession() uses FOR UPDATE + conditional WHERE status = 'pending' with .returning() — solid defense in depth against concurrent claims.
  • recoverStuckProcessing() handles all branches (confirmed, not confirmed, no memo, Horizon error) with graceful reverts.
  • revertToPending() correctly guards with eq(status, 'processing') to avoid race conditions.
  • The order param added to getPaymentsForAccount is backward-compatible (defaults to 'asc').
  • Test coverage is excellent: two-phase flow, idempotency short-circuit, concurrent submissions race, transaction rollback, cron schedule wiring, and all recovery branches.
  • Baseline migration + proper delta migration is the right approach for a repo with no existing migration history.

@oomokaro1 oomokaro1 merged commit 5653a01 into OrbitStream:main Jun 20, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] Atomic Payment Processing with Conflict Resolution

2 participants