fix(payments): atomic processing with idempotency key and recovery#29
Merged
oomokaro1 merged 1 commit intoJun 20, 2026
Merged
Conversation
…essing recovery Closes OrbitStream#9. Builds on the exactly-once work from OrbitStream#14/PR26 to close the remaining gaps called out in the issue: - Promote payments' tx_hash unique constraint to a composite UNIQUE(tx_hash, session_id) "payment_idempotency_key", checked inside the claim transaction before any insert (Drizzle migration included). - Add a 'processing' status to sessionStatusEnum. processPayment now claims the session (pending -> processing) and finalizes it (insert + paid) as two separate transactions, so a crash between them leaves an observable 'processing' session instead of silently reverting or double-confirming. - Add a recovery pass (every 5 minutes, via the existing PaymentRecoveryService cron) that finds sessions stuck in 'processing' for >5 minutes, re-checks Horizon by memo, and either completes the payment or releases the session back to 'pending'. - Log-and-return gracefully (no throw, no double update/insert) for payments on an already-paid or already-processing session. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
oomokaro1
approved these changes
Jun 20, 2026
oomokaro1
left a comment
Contributor
There was a problem hiding this comment.
This PR implements all 7 requirements from issue #9 — two-phase claim/finalize with a processing status, composite idempotency key, Horizon-based recovery for stuck sessions, graceful duplicate handling, and thorough test coverage. The two-phase transaction split is well-motivated and correctly implemented.
Blocking Issues
None.
Code Issues
None.
Minor Nits
recoverStuckProcessing Horizon re-scan only looks at the 50 most recent payments (documented in the PR body). If a session has been stuck long enough that the confirming payment falls outside that window, it reverts to pending rather than recovering. This is a known limitation and acceptable for now — just noting it for future visibility.
What's Good
- The two-phase claim/finalize pattern is clean and correctly makes crash-between-phases observable.
- The composite
unique('payment_idempotency_key').on(txHash, sessionId)constraint is the right level of granularity for Stellar's transaction-level memos. claimSession()usesFOR UPDATE+ conditionalWHERE status = 'pending'with.returning()— solid defense in depth against concurrent claims.recoverStuckProcessing()handles all branches (confirmed, not confirmed, no memo, Horizon error) with graceful reverts.revertToPending()correctly guards witheq(status, 'processing')to avoid race conditions.- The
orderparam added togetPaymentsForAccountis backward-compatible (defaults to'asc'). - Test coverage is excellent: two-phase flow, idempotency short-circuit, concurrent submissions race, transaction rollback, cron schedule wiring, and all recovery branches.
- Baseline migration + proper delta migration is the right approach for a repo with no existing migration history.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #9
Summary
#14/PR #26 (already merged intomain) wrappedprocessPayment's sessionupdate + payment insert in a
db.transactionwithSELECT ... FOR UPDATErow locking, plus a tx_hash duplicate check and a 5-minute recovery cron.
That covers requirements #1 and #2, and most of #6.
This PR closes the gaps that are still open per the issue: the composite
idempotency key (#3), the
processingstatus with its two-phase commit(#4), the recovery path for sessions specifically stuck in
processing(#5), and an explicit graceful log for the already-paid duplicate case
(#6). Tests for all of the above are included (#7).
Requirements
1. Transactional Processing — already in place (PR #26)
processPaymentalready wraps the claim/finalize work indb.transaction(async (tx) => { ... }). This PR splits it into twotransactions on purpose — see #4 below for why.
2. Optimistic Locking — already in place (PR #26)
SELECT ... FOR UPDATElocks the session row bymemobefore any write.This PR adds a second, explicit affected-row check on top: the
pending -> processingupdate uses.where(and(eq(id, session.id), eq(status, 'pending'))).returning(), andif zero rows come back the payment is skipped with a
Lost claim race for session <id>warning — defense in depth on top ofthe row lock, in case the lock window doesn't cover both steps.
3. Idempotency Key
src/db/schema.ts:payments.txHash's column-level.unique()ispromoted to a composite
unique('payment_idempotency_key').on(txHash, sessionId), since a tx hash can only ever confirm one session (Stellarmemos are transaction-level) — the composite is the precise guarantee
the issue asks for, and doubles as the
ON CONFLICTtarget for theidempotent insert below.
claimSession()explicitly checkstx.query.payments.findFirst({ where: and(eq(txHash, ...), eq(sessionId, ...)) })inside the transaction before claiming the session, and thefinalize insert uses
.onConflictDoNothing({ target: [txHash, sessionId] })as a DB-level backstop.src/db/migrations/0001_processing-status-and-payment-idempotency-key.sql(
0000_baseline.sqlis a one-time scaffolding migration capturing theschema as it already exists today — this repo had no migrations
checked in yet, so a baseline was needed before a real delta could be
generated).
4. Processing Status
sessionStatusEnumgains'processing'betweenpendingandpaid.processPaymentnow runs as two separate transactions:claimSession()— locks the row, validates amount/asset, checks theidempotency key, and atomically flips
pending -> processing.processing -> paid.Splitting these into two commits (instead of one transaction doing
both) is what makes a crash between them observable: the session is
left sitting in
processinginstead of silently rolling back topendingor jumping straight topaid, which is exactly the statethe recovery job in [SEC] JWT Secret Hardening + CORS + Rate Limiting #5 is built to find and reconcile.
5. Recovery Job
Added
recoverStuckProcessing()to the existingPaymentRecoveryService(still on the same@Cron(EVERY_5_MINUTES)fromPR #26, now running it as the first of its four recovery passes):
Finds sessions in
processingfor >5 minutes.Re-scans Horizon for the receiving account
(
StellarService.getPaymentsForAccount(account, undefined, 'desc')—added the
orderparam for this) for a payment matching the session'smemo/amount/asset, then calls
verifyTransaction()to confirm itactually succeeded.
Confirmed: inserts the missing payment record (idempotent via the same
ON CONFLICTtarget) and marks the sessionpaid, dispatching thewebhook.
Not confirmed / no match: reverts the session to
pendingso the livepoller picks it up again.
Every branch logs (warn for "found N stuck", warn for the
revert/no-match case, log for a successful recovery), and one
session's reconciliation failing doesn't stop the rest of the batch.
Known limitation, noted for reviewers: the Horizon re-scan only looks
at the 50 most recent payments for the account (
getPaymentsPage'sexisting page size). If a session has been stuck for long enough that
the confirming payment fell off that window, it reverts to
pendingrather than recovering — re-detection still happens once the live
poller's cursor catches up, it just isn't covered by this specific
Horizon re-scan. Solving that fully would need a persisted per-session
search cursor, which felt out of scope for this issue.
6. Duplicate Payment Handling
claimSession()checkssession.statusexplicitly before anything else:'paid'→ logsDuplicate payment for already-paid session <id> — tx <hash>and returnsnull— no update, no insert, no throw.pending(i.e.processing) → logsSession <id> is not pending (status=...)and returnsnullthe sameway.
Both are graceful early returns; nothing throws and nothing double-writes.
7. Testing
payment-detector-transactions.spec.ts(rewritten): two-phaseclaim/finalize, processing-status transitions, idempotency-key
short-circuit, already-paid / already-processing duplicate handling,
amount/asset validation, transaction rollback on a forced finalize
failure (asserts no metrics/webhook side effects), and a concurrent
submissions test — two
processPaymentcalls racing for the samesession via
Promise.all, asserting only one wins the claim and onlyone payment gets inserted.
payment-recovery.spec.ts(extended):recoverStuckProcessingfor theconfirmed/not-confirmed/no-memo/Horizon-error branches, plus the
existing pending/paid-without-payment scenarios from PR feat: exactly-once payment processing with transaction wrapping and recovery #26.
payment-recovery-cron.spec.ts(new): boots a realScheduleModule.forRoot()+PaymentRecoveryServiceand usesjest.useFakeTimers()to assert the@Cron(EVERY_5_MINUTES)jobgenuinely doesn't fire before 5 minutes and does fire (repeatedly) once
the schedule elapses — this exercises the actual cron wiring, not just
the handler called directly like the other specs do.
Verification
npm run build— cleannpm run lint— clean on all changed files (no new errors; remainingwarnings are pre-existing
no-explicit-anystyle already usedthroughout this codebase)
npm test— 26 suites / 226 tests passingdrizzle-kit check— migrations consistent with schemaRisks and Rollback
Same risk profile called out in PR #26 (the row lock spans the
transaction), plus: the new
ALTER TYPE ... ADD VALUEmigration isadditive and backward compatible (existing rows/statuses are untouched).
Rollback is a straight revert of this PR's commit; the
0001migrationonly adds an enum value and swaps one unique constraint for an equivalent
composite one, so reverting the migration (drop the composite, restore
the column-level unique) is safe if ever needed.