Skip to content

explorer: heatmap SQL pre-aggregation + adaptive radius (#233)#241

Merged
rdhyee merged 1 commit into
isamplesorg:mainfrom
rdhyee:feat/heatmap-sql-aggregation
May 28, 2026
Merged

explorer: heatmap SQL pre-aggregation + adaptive radius (#233)#241
rdhyee merged 1 commit into
isamplesorg:mainfrom
rdhyee:feat/heatmap-sql-aggregation

Conversation

@rdhyee
Copy link
Copy Markdown
Contributor

@rdhyee rdhyee commented May 28, 2026

Summary

Follow-up to PR #240 (heatmap phase 1). Two related changes:

  1. SQL pre-aggregation — replaces the LIMIT 100000 raw-row scan + JS per-pixel binning with a DuckDB GROUP BY that does the binning server-side. Removes the cap honestly: every sample in the bbox is counted, regardless of total sample count.
  2. Adaptive radius + maxOpacity — fixes the "everything red" symptom RY surfaced at world view after explorer: heatmap overlay spike — phase 1 (#233) #240 shipped.

Why #1: LIMIT 100000 was geographically biased

LIMIT 100000 returned the first 100k rows in parquet storage order — not random, not geographic. At world view, the heatmap silently showed whichever source happened to be physically first in the file (likely SESAR — the largest source by row count). The "(capped)" status warning from #240 disclosed the problem but didn't fix it.

This PR pushes the binning into DuckDB. SQL computes (x_bin, y_bin) pixel coordinates server-side using FLOOR/LEAST/GREATEST, then GROUP BY (x_bin, y_bin) returning one row per non-empty pixel with COUNT(*) as the count. Result cardinality is bounded by canvas pixels (≤ 512² = 262k), independent of bbox sample count. No LIMIT needed — every sample counted into its true pixel bucket.

Antimeridian handled: when bbox wraps (west > east), SQL shifts longitude < west by +360 so pixel arithmetic works in a continuous coordinate space.

Verified counts vs the existing samples table summary line (= true sample count for the current view):

view heatmap table match
PKAP (100km alt) 77,840 77,840
Cyprus medium (500km) 100,970 100,970 ✅ (was capped at 100k)
Cyprus regional (1,500km) 682,029 682,029 ✅ (was capped at 100k)
Africa (1.9Mkm) 12,875 12,875
World view (15Mkm) 5,980,282 5,980,282 ✅ (was capped at 100k)

Render time at world view (~6M samples → 35k cells): ~7s on localhost — similar to or faster than the LIMIT 100k version. Status text always reports the true count; the (capped) branch is removed.

Why #2: adaptive radius

After (1), RY tested staging and reported world view "everything is red." Cause: with 35k+ pixel cells on a 512² canvas, heatmap.js's default 25-pixel blur radius made each cell's Gaussian cover ~1% of canvas. 35k × 1% = >>100% → linear-additive blending saturated everything to full red.

Two complementary fixes:

  • maxOpacity: 0.6 on the heatmap.js instance — caps the rendered alpha so dense areas don't fully wash out the satellite imagery
  • Per-point radius computed from sqrt(canvas_pixels / cell_count) × 2, clamped to [6, 30]. World view (35k cells) → radius ≈ 6 (tight pixel dots, no overlap saturation). Cyprus medium (~400 cells) → radius = 30 cap (smooth blobs as before).

World view now shows geographic structure instead of solid red. Tight zooms unchanged visually.

Test plan

  • tests/playwright/heatmap-overlay.spec.js 5/5 pass on localhost
  • Visual verified on rdhyee staging at the URLs RY surfaced (Africa-wide alt=7.4Mkm, Atlantic alt=15Mkm). World view shows structure; smaller zooms unchanged.
  • Numbers match table exactly at all zoom levels (table above)
  • Verify on production after merge

Out of scope

  • Cluster dots still don't align with heatmap hotspots at cluster-mode altitudes (cluster = H3 centroids; heatmap = real positions). Phase 3 (third-mode promotion that hides cluster dots when heatmap is on) — separate work.
  • The HEATMAP_LIMIT constant (= 100,000) is kept in the code but no longer referenced; left in place for phase 2 in case a safety cap on cell count is reintroduced.

Provenance

Authored by Claude in response to RY feedback ("wondering whether we can do better geographic random sampling"). Approach (SQL pre-aggregation by pixel cell) chosen over TABLESAMPLE because it removes the cap entirely rather than just making the sampling random. Adaptive radius added in response to RY's second feedback that world view was washing out to red.

Cross-refs

@rdhyee rdhyee force-pushed the feat/heatmap-sql-aggregation branch from b631b4d to fb85ff0 Compare May 28, 2026 01:30
)

Two related changes that follow up PR isamplesorg#240 (heatmap phase 1):

1. SQL pre-aggregation removes the LIMIT 100000 cap honestly.
2. Adaptive per-point radius + maxOpacity caps avoid blur-overlap
   saturation at high-cell-count views (world view "everything
   red" symptom RY surfaced after isamplesorg#240 shipped).

## (1) SQL pre-aggregation

Previously: `SELECT latitude, longitude FROM lite WHERE bbox AND
filters LIMIT 100000`, then bin per pixel in JS. Two problems:

  - LIMIT 100000 returned the first 100k rows in parquet storage
    order — NOT random, NOT geographic. At world view, the
    heatmap silently showed whichever source happened to be
    physically first in the file (likely SESAR, the largest by
    row count). The "(capped)" status warning disclosed the
    problem but didn't fix it.
  - For sample sets above the cap, the density was unfaithful.

Now: SQL computes pixel cell coords server-side using FLOOR /
LEAST / GREATEST, then GROUP BY (x, y) returning one row per
non-empty pixel with COUNT(*) as the count. Result cardinality
is bounded by canvas pixels (≤ 512² = 262k), independent of
how many samples the bbox contains. No LIMIT needed — every
sample counted into its true pixel bucket.

Antimeridian handled: when bbox wraps (west > east), SQL shifts
longitudes < west by +360 so pixel arithmetic works in a
continuous coordinate space.

Verified counts vs `samples table` summary line (= true sample
count for the current view):

  view              | heatmap  | table    | match
  ------------------|----------|----------|------
  PKAP (100km)      |  77,840  |  77,840  | ✅
  Cyprus medium     | 100,970  | 100,970  | ✅  (was capped at 100k)
  Cyprus regional   | 682,029  | 682,029  | ✅  (was capped at 100k)
  Africa (1.9Mkm)   |  12,875  |  12,875  | ✅
  World view        | 5.98M    | 5.98M    | ✅  (was capped at 100k)

Render time at world view (~6M samples → 35k cells): ~7s on
localhost, similar to or faster than the LIMIT 100k version.

`HEATMAP_LIMIT` constant left in place but no longer used (kept
for back-compat in case phase 2 reintroduces a safety cell-count
cap).

## (2) Adaptive radius + maxOpacity

After (1), RY tested staging and reported world view "everything
is red." Cause: with 35k+ pixel cells on a 512² canvas, heatmap.js's
default 25-pixel blur radius made each cell's Gaussian blur cover
~1% of canvas. 35k × 1% = >>100% → linear-additive blending
saturated everything.

Two complementary fixes:

  - `maxOpacity: 0.6` on the heatmap.js instance config. Caps the
    rendered alpha so dense areas don't fully wash out the
    satellite imagery underneath.
  - Per-point radius computed from `sqrt(canvas_pixels /
    cell_count) * 2`, clamped to [6, 30]. World view (35k cells)
    → radius ≈ 6px (tight pixel dots, no overlap). Cyprus medium
    (~400 cells) → radius = 30px (cap, smooth blobs as before).

Together: world view shows geographic structure instead of
solid red. Tight zooms unchanged visually.

## Test plan

- `tests/playwright/heatmap-overlay.spec.js` 5/5 still pass
  on localhost.
- Visual verified on rdhyee staging at the URLs RY surfaced
  (Africa-wide, Atlantic alt=15Mkm). World view now shows
  structure; tight zooms unchanged.

## Provenance

Authored by Claude, prompted by RY ("wondering whether we can
do better geographic random sampling"). Approach (Option C from
Claude's menu: SQL pre-aggregation by pixel cell) recommended
over TABLESAMPLE because it removes the cap entirely rather
than just making the sampling random.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@rdhyee rdhyee force-pushed the feat/heatmap-sql-aggregation branch from fb85ff0 to 4a74b8f Compare May 28, 2026 01:34
@rdhyee rdhyee merged commit f9535ee into isamplesorg:main May 28, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant