perf(near_v1_signer_users): convert daily full rebuild to incremental append#9764
perf(near_v1_signer_users): convert daily full rebuild to incremental append#9764a-monteiro wants to merge 1 commit into
Conversation
This stack of pull requests is managed by Graphite. Learn more about stacking. |
PR SummaryMedium Risk Overview The sign-request logic is unchanged but wrapped in a Full refreshes also get a Reviewed by Cursor Bugbot for commit 10b3d6d. Configure here. |
tomfutago
left a comment
There was a problem hiding this comment.
Regression Report: PR #9764
Overall status: pass. CI output for near_v1_signer_users matches production exactly.
near_v1_signer_users
- Profile: grain=
(account_id, derivation_path, key_version)dimension; time=none; bound=none - Relations: prod=
near_v1_signer.users; ci=dune.dune_spellbook_ci__tmp_pr9764_27405626462_1.near_v1_signer_users - Row count: prod
6,136, ci6,136-- pass - Summary metrics: both have
575distinct accounts,0nullaccount_id,0nullderivation_path,655nullkey_version - Time coverage/range: skipped because the model has no time or block columns
- Key-match:
ci_only=0,prod_only=0-- pass - Uniqueness: both prod and CI have
6,136distinct keys and0duplicate rows
Verification queries:

near_v1_signer.userswas a daily full rebuild that scanned all ofnear.actions(12B rows, 1.38 TB) plusnear.logs(11.8B rows, 386 GB) every run -- 1.77 TB of IO and ~3.9 CPU-hrs per day -- to produce a ~6,100-row dimension of distinct(account_id, derivation_path, key_version)triples. The output is a monotonic set: triples only ever get added, never restated, so rebuilding history daily is pure waste.This converts the model to
incrementalwithstrategy='append'plus aNOT EXISTSanti-join against{{ this }}. Merge with aunique_keywas deliberately avoided:key_versionis NULL for 655 of 6,136 rows (andderivation_pathis nullable), and a mergeONclause never matches NULL keys, which would silently duplicate those rows. The anti-join usesIS NOT DISTINCT FROMto handle NULLs correctly (same pattern as #9754).Incremental runs read only the last 3 days of
block_datepartitions on both sides of the join (incremental_predicateonaction.block_dateandlog.block_date; both tables areblock_date-partitioned). A constantblock_date >= DATE '2024-08-01'floor (the v1.signer deployment block 124788114's date) also bounds full refreshes; proven a semantic no-op (0 rows exist withblock_height >= 124788114and an earlierblock_date).Proofs (read-only, prod data, spellbook-cd-large, UTC)
count(*)of rows withblock_height >= 124788114 AND block_date < DATE '2024-08-01'= 0 on bothnear.actionsandnear.logs.checksum()on all three output columns.A/B (medians of warm runs, same cluster)
Projected per-day on spellbook-daily (1 build/day): 3.9 CPU-hrs -> ~0.01; 1.77 TB IO -> ~4.4 GB.
No backfill needed: the existing table is already the correct full-history set; the first incremental run simply appends from it. A
--full-refreshreproduces the table exactly (proven by checksum above).