Skip to content

fix(triggers): rewrite user_ids = ANY(...) to @> so GIN indexes can be used#880

Merged
raymondjacobson merged 1 commit into
mainfrom
api/triggers-array-contains
May 29, 2026
Merged

fix(triggers): rewrite user_ids = ANY(...) to @> so GIN indexes can be used#880
raymondjacobson merged 1 commit into
mainfrom
api/triggers-array-contains

Conversation

@raymondjacobson
Copy link
Copy Markdown
Member

Summary

The actual fix for the IndexChallengesJob wedge that #877 was supposed to enable. #877's partial GIN was correct — but the trigger SQL couldn't reach it.

Root cause

The on_user_challenge and on_challenge_disbursement triggers both do per-row lookups against the 8 GB / ~23.5 M-row notification table to dedupe reward / cooldown notifications, e.g.:

SELECT id FROM notification
 WHERE type = 'reward_in_cooldown'
   AND new.user_id = ANY(user_ids)
   AND timestamp >= (new.completed_at - interval '1 hour')

PostgreSQL's GIN operator class supports @>, &&, <@ — but NOT scalar = ANY(array). Even with the full ix_notification GIN, and the partial GIN added by 0210 (#877), every trigger call fell back to a parallel sequential scan of the entire 8 GB table.

Confirmed in prod on cd94ede:

->  Parallel Seq Scan on notification  (cost=0.00..963930.62)
       Filter: type='reward_in_cooldown' AND scalar = ANY(user_ids)
       Rows Removed by Filter: 7,846,045
       Execution: 13,640 ms

And pg_stat_user_indexes showed ix_notification_cooldown_user_ids.idx_scan = 0 since it was built — completely unused.

Fix

Rewrite the predicate to the canonical @> form. Semantics are identical (both test array membership), but only @> is GIN-eligible. Same EXPLAIN on the same row, with the same data:

Form Plan Execution
new.user_id = ANY(user_ids) (before) Parallel Seq Scan 13,640 ms
user_ids @> ARRAY[new.user_id] (after) Bitmap Index Scan on ix_notification_cooldown_user_ids 2 ms

Three call sites updated:

File What it does
ddl/functions/handle_user_challenges.sql reward_in_cooldown dedupe path of handle_on_user_challenge() — fires on every is_complete=true write to user_challenges
ddl/functions/handle_challenge_disbursements.sql (×2) challenge_reward dedupe in both handle_challenge_disbursement() (legacy table) and handle_sol_reward_disbursement() (new indexer's table)

Schema dump (sql/01_schema.sql) and migration tracker checksums updated to match.

Impact

  • Challenge job: per-upsert trigger cost drops from ~13 s → ~2 ms. The IndexChallengesJob first-tick wedge clears once a fresh backend picks up the new function (so a core-indexer pod restart after this deploys is the last manual step, unless the deploy itself replaces the pod).
  • Rewards / disbursements: per-disbursement trigger cost drops by the same factor; the rewards attester is no longer rate-limited by trigger latency.
  • All other cooldown_days > 0 challenges (p, u, the Phase 2 ones, etc.) get the same speedup whenever they fire the trigger.

Out of scope

The wider codebase has ~50 other = any(user_ids) occurrences across other trigger functions (notification triggers added in #851, etc.). Same anti-pattern — same fix. Worth a separate sweep PR; I left it out here to keep this one minimal and reviewable.

Test plan

  • go build ./..., go vet ./... clean (no Go changes; sanity check).
  • Confirmed in prod via EXPLAIN (ANALYZE, BUFFERS) that the @> form picks ix_notification_cooldown_user_ids and completes in 2 ms (vs 13.6 s for the = ANY form).
  • Function checksums in sql/03_migration_tracker.sql updated so pg_migrate.sh re-applies them on deploy.
  • sql/01_schema.sql updated in lockstep so a fresh test template reflects the new function bodies.

🤖 Generated with Claude Code

… can be used

The on_user_challenge and on_challenge_disbursement triggers both do per-row
lookups against the 8 GB notification table to dedupe reward / cooldown
notifications, e.g.:

    SELECT id FROM notification
     WHERE type = 'reward_in_cooldown'
       AND new.user_id = ANY(user_ids)
       AND timestamp >= (new.completed_at - interval '1 hour')

PostgreSQL's GIN operator class supports @>, &&, <@ — but NOT `scalar = ANY(array)`.
So even with the full ix_notification GIN, AND with the partial GIN added by
0210 (#877), every trigger call fell back to a parallel sequential scan of
the entire 8 GB table:

    Parallel Seq Scan on notification  cost=0..963930.62
    Rows Removed by Filter: 7,846,045   Execution: 13,640 ms

pg_stat_user_indexes confirmed idx_scan = 0 on the new partial GIN since
creation.

Rewriting the predicate to the canonical @> form lets the planner pick the
partial GIN — same EXPLAIN drops to a Bitmap Index Scan / 2 ms / 5 buffers.
Semantics are identical (both forms test array membership).

Three call sites updated:
  - handle_user_challenges.sql        (reward_in_cooldown dedupe)
  - handle_challenge_disbursements.sql (challenge_reward dedupe, both
    legacy and sol_reward_disbursement copies)

This is the actual fix the #877 partial GIN was supposed to enable — the
index was correct, but the trigger SQL couldn't reach it. After this deploys,
the IndexChallengesJob first-tick wedge clears (the per-upsert trigger cost
drops from ~13s to ~2ms) and challenge_disbursement throughput rises with it.

The wider codebase has ~50 other `= any(user_ids)` occurrences across other
trigger functions; cleaning those up is a separate sweep.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@raymondjacobson raymondjacobson merged commit c5668a8 into main May 29, 2026
5 checks passed
@raymondjacobson raymondjacobson deleted the api/triggers-array-contains branch May 29, 2026 21:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant