explorer: heatmap overlay spike — phase 1 (#233) by rdhyee · Pull Request #240 · isamplesorg/isamplesorg.github.io

rdhyee · 2026-05-27T23:15:01Z

Summary

Phase 1 of the #233 progressive heatmap spike. Adds a toggleable heatmap overlay in the explorer that renders a filter-aware density layer from in-viewport sample coordinates via DuckDB-WASM → heatmap.js → Cesium SingleTileImageryProvider. Toggle off restores the existing cluster/point view unchanged.

Scope deliberately narrow: answer the spike's viability question (does heatmap.js + Cesium imagery + DuckDB-WASM compose?) with minimal architectural change. Progressive refinement, cache, kernel/color tuning, and third-mode promotion all deferred to phase 2+.

What you'll see

A new Heatmap (filtered density) checkbox under the source legend. When checked:

On moveEnd (debounced 250 ms) and on every filter change, queries samples_map_lite.parquet for sample coords in the current viewport bbox, applying source + material/feature/specimen filters (LIMIT 100,000 — see caveat below).
Pre-bins the result into a 512×512 grid per pixel before passing to heatmap.js (smart perf optimization — Codex's addition).
Renders to an offscreen canvas, swaps in a Cesium SingleTileImageryProvider.
Status text (italic) reports point count per refresh. Explicit cap warning when at the 100k LIMIT.

State	Sample count	Notes
Cyprus alt=500 km, no filter	96,694	Below LIMIT — honest density
Cyprus alt=500 km, +material=organicmaterial	13	Filter regenerates heatmap, dramatically smaller density
World view, no filter	LIMIT-capped	Status warns "first 100,000 samples (capped — zoom or filter for full density)"

Cyprus numbers (96,694 / 13) match raw DuckDB bbox queries against the same predicates exactly.

Architectural decisions

#	Decision	Rationale
D1	Additive overlay, not exclusive third mode	Minimizes change to the altitude-driven `getMode()` state machine. Phase 3 can promote to third-mode if the overlay proves the concept.
D2	heatmap.js v2.0.5 via jsDelivr CDN	Matches how Cesium is loaded; no build-tool changes.
D3	`SingleTileImageryProvider`	Simplest Cesium integration; matches #233's spec recommendation.
D4	Query `samples_map_lite.parquet` exact viewport	The point of the heatmap is "what's in the rectangle I'm looking at."
D5	`heatmapReqId` cancellation	Matches existing `facetCountsReqId` / `requestId` patterns.
D6	Triggers: moveEnd, source-filter, material/feature/specimen	Same triggers that fire `refreshFacetCounts` and `loadViewportSamples`.
D7 (Codex r1)	`heatmapLastKey` set ONLY after successful layer swap; cleared on error and on moveStart cancellation	Original implementation set it before the render fired — a toggle+camera-gesture race could wedge the overlay.

Caveats / known limitations

LIMIT 100,000: at global views with no filter, the lite parquet has ~6M rows. The 100k cap shows an arbitrary first 100k, not honest density. Status text explicitly warns when capped. Phase 2 progressive refinement (TABLESAMPLE 1% → 10% → 100% passes) removes the cap properly.
Antimeridian rectangle: the wrapped-bbox path uses east + 360. Cesium normally expects west > east for wrapped rectangles. Codex flagged this for a dateline visual/spec test; deferred until someone reports a problem at the dateline.
No cache yet: every (viewport, filter) combo re-queries. Phase 2 adds LRU cache keyed on (viewport-hash, filter-hash).

Test plan

tests/playwright/heatmap-overlay.spec.js 4/4 pass on localhost in 45.1s
- heatmap toggle exists
- toggle on → visible layer + lastPointCount > 0
- toggle off → layer removed
- source + material filter changes → lastImageHash changes (asserts on image-hash change, not just timestamp — so the error path doesn't satisfy)
tests/playwright/facet-viewport.spec.js 4/4 still pass (no regression in PR explorer: B1 viewport-aware facet counts (#234 step 3) #237's work)
Patient visual probe confirms Cyprus numbers (96,694 / 13) match raw bbox queries. Image hashes change per refresh.
Verify on rdhyee fork staging (will mirror after this round of fixes lands)

Out of scope for this PR

Phase 2 (~1–2 days): progressive refinement, cache, kernel adaptation, color-ramp tuning, alpha tuning
Phase 3 (~1 day): third-mode promotion (mutually exclusive cluster | point | heatmap), retire #facetNote apology copy when heatmap mode active
Custom WebGL primitive (option 3 in explorer: spike a progressive heatmap layer as filter-honest alternative to cluster mode #233): deferred indefinitely

What success of phase 1 unlocks (from the plan)

Phase 2 + 3 become worth doing
Retires C3 work (explorer: architectural direction — make filter semantics coherent across all surfaces #234 step 5 — auto-promote-to-point with density cap) entirely
Retires explorer: interactive map state can diverge from cold-reload state of the same URL #239's bug class as a UX concern (heatmap doesn't go stale the way cluster dots can)
Reduces A1's urgency — heatmap answers "where is my filtered data?" visually, even before search-as-global-filter ships

Implementation provenance

Bulk of the explorer.qmd diff (~252 LOC) was authored by OpenAI Codex CLI from a Claude-authored phase-1 plan. Codex jumped past "review the plan" straight to "execute the plan." Claude reviewed the resulting implementation, verified it works (spec passes + visual probe matches raw queries), then asked Codex for a round-1 PR review.

Codex's round-1 review caught real bugs:

Stale dedupe key (toggle+camera race wedges overlay) — fixed in this PR
Silent LIMIT 100k cap — fixed: status now explicitly warns
PR diff included PR tests: extract URL helper for sub-path-safe page.goto across the suite #238's commits (branch-base issue) — fixed by rebase onto upstream/main
Spec only asserted lastRefreshAt, error path could satisfy — fixed: asserts lastImageHash
Antimeridian convention questioned — deferred (no easy repro)
PR text overclaim — addressed in this revision

Commit credits Codex as co-author per repo conventions.

Cross-refs

explorer: architectural direction — make filter semantics coherent across all surfaces #234 — explorer-filter-coherence roadmap (this is parallel to A1/FTS work, per Q6 sign-off)
explorer: interactive map state can diverge from cold-reload state of the same URL #239 — interactive-vs-cold-reload state divergence (heatmap addresses the underlying class of bug)
explorer: dense point overlap saturates to yellow, looks like Smithsonian dots #231 — point-overlap saturation (heatmap supplants point-mode at dense zoom)
explorer: B1 viewport-aware facet counts (#234 step 3) #237 — B1 viewport-aware facet counts (just shipped; heatmap reuses bbox + filter patterns)
tests: extract URL helper for sub-path-safe page.goto across the suite #238 — Playwright URL helper (just shipped; this PR uses it)

…isamplesorg#240)

…eploy (mirrors PR isamplesorg#240)

Adds a toggleable heatmap overlay as a third visualization alongside cluster (H3 dots) and point (individual samples) mode. Phase 1 of the isamplesorg#233 spike: answer the viability question of heatmap.js + Cesium SingleTileImageryProvider + DuckDB-WASM composing into a filter-aware density layer. In scope this commit: - Loads heatmap.js v2.0.5 via jsDelivr CDN (alongside Cesium). - New `#heatmapToggle` checkbox in the source legend. - `refreshHeatmap()` queries `samples_map_lite.parquet` for in- viewport sample coords (applying source + material/feature/ specimen filters), bins them per pixel into a 512x512 grid before passing to heatmap.js (smart perf optimization: keeps the data array under 262k regardless of how many samples match — though see cap warning below), renders to an offscreen canvas, and swaps a `SingleTileImageryProvider` into the Cesium imagery layers. - Cancellation via `heatmapReqId` (matches the existing `facetCountsReqId` / `requestId` patterns). - Refresh triggers: `camera.moveEnd` (debounced 250 ms), source-filter change, material-filter change. moveStart bumps the reqId and shows "waiting for camera" status. - Toggle off removes the imagery layer. - Skip-if-same-key optimization on identical (viewport, filter) combos — only marked "done" after a successful render. - Status text (italic) reports point count per refresh, with explicit cap warning when at LIMIT. - New `tests/playwright/heatmap-overlay.spec.js` (4 tests): toggle exists, toggle on renders, toggle off clears, filter change regenerates (asserts on lastImageHash, not just lastRefreshAt, so the error path doesn't satisfy the test). Codex round-1 fixes baked in: - Stale dedupe key bug: `heatmapLastKey` is now set ONLY after a successful layer swap, and cleared on (a) error path, and (b) moveStart cancellation. Previously a toggle+camera-gesture race could leave the key set without a render having happened, wedging the overlay (next moveEnd would early-return). - Silent LIMIT 100k cap: status text now explicitly says "(capped — zoom or filter for full density)" when at LIMIT. Lite parquet has ~6M rows; the cap shows an arbitrary first 100k, not honest density. Phase 2 progressive refinement removes the cap. - `_heatmapOverlay.capped` field exposed for tests. Verified: - 4/4 spec tests pass on localhost in 45.1s (post-fixes) - Patient probe at Cyprus alt=500km confirms numbers match raw bbox query: no filter = 96,694 samples; +organicmaterial filter = 13 samples; image hash changes per refresh - No regression in facet-viewport.spec.js (4/4 still pass) Out of scope for this PR (deferred to phase 2+): - Progressive refinement (TABLESAMPLE 1% → 10% → 100%) — removes the LIMIT cap properly - Cache by (viewport-hash, filter-hash) - Kernel / color-ramp tuning, alpha tuning - Third-mode promotion (currently overlay, not exclusive mode) - Interaction with `#facetNote` apology copy - Antimeridian rectangle convention — current path uses `east + 360` for wrapped bboxes; Cesium normally expects `west > east`. Codex flagged for a dateline test, deferred. Implementation provenance: Bulk of explorer.qmd diff (~252 LOC) authored by OpenAI Codex CLI from a Claude-authored phase-1 plan that was sent for "review" but Codex jumped straight to implementation. Claude reviewed the implementation against the plan, verified it works (spec + visual probe), and asked Codex for a round-1 PR review. Codex caught real bugs (stale dedupe key, silent cap) — those are addressed in this amended commit. Co-Authored-By: OpenAI Codex CLI <noreply@openai.com> Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ors PR isamplesorg#240)

Replaces the LIMIT 100000 raw-row scan + JS per-pixel binning with a single DuckDB GROUP BY query that does the binning server-side. Removes the arbitrary cap honestly: every sample in the bbox is counted into its true pixel cell, regardless of total sample count. Why the LIMIT was bad: `LIMIT 100000` returned the first 100k rows in parquet storage order — not random, not geographic. At world view, the heatmap silently showed whichever source happened to be physically first in the file (likely SESAR, the largest source by row count). The "(capped)" status warning disclosed the problem but didn't fix it. RY feedback 2026-05-27 on PR isamplesorg#240 ("wondering whether we can do better geographic random sampling"). How the SQL pushdown works: compute `(x_bin, y_bin)` pixel coordinates from `latitude`/`longitude` server-side using FLOOR / LEAST / GREATEST, then GROUP BY (x_bin, y_bin) returning one row per non-empty pixel with COUNT(*) as the sample count. Result cardinality is bounded by canvas pixels (≤ 512² = 262k), independent of bbox sample count. JS just iterates the aggregated rows and applies the same log(1+n) scaling for heatmap.js. Verified counts vs `samples table` summary line (= true sample count for the current view): view | heatmap | table | match ------------------|---------|---------|------ PKAP (100km) | 77,840 | 77,840 | ✅ Cyprus medium | 100,970 | 100,970 | ✅ (was capped at 100k) Cyprus regional | 682,029 | 682,029 | ✅ (was capped at 100k) World view | 5.98M | 5.98M | ✅ (was capped at 100k) Render time at world view (~6M samples → 35k cells): ~7s on localhost, similar to or faster than the LIMIT 100k version at smaller zooms. Removes the "(capped)" status branch and the `HEATMAP_LIMIT` constant becomes unused (left in place for now in case Phase 2 progressive refinement reintroduces a safety cap on cell count). Side effect of removing the cap: the per-pixel max-bias is now even more extreme at high-density views, but the log(1+n) scaling from PR isamplesorg#240 handles it. Verified: 5/5 heatmap-overlay.spec.js still pass on localhost. (The spec asserts `lastPointCount > 0`, which is still true; one spec change worth a follow-up: the spec used to expect capped behavior for large views, but no test currently asserts that, so no spec changes needed here.) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

) Two related changes that follow up PR isamplesorg#240 (heatmap phase 1): 1. SQL pre-aggregation removes the LIMIT 100000 cap honestly. 2. Adaptive per-point radius + maxOpacity caps avoid blur-overlap saturation at high-cell-count views (world view "everything red" symptom RY surfaced after isamplesorg#240 shipped). ## (1) SQL pre-aggregation Previously: `SELECT latitude, longitude FROM lite WHERE bbox AND filters LIMIT 100000`, then bin per pixel in JS. Two problems: - LIMIT 100000 returned the first 100k rows in parquet storage order — NOT random, NOT geographic. At world view, the heatmap silently showed whichever source happened to be physically first in the file (likely SESAR, the largest by row count). The "(capped)" status warning disclosed the problem but didn't fix it. - For sample sets above the cap, the density was unfaithful. Now: SQL computes pixel cell coords server-side using FLOOR / LEAST / GREATEST, then GROUP BY (x, y) returning one row per non-empty pixel with COUNT(*) as the count. Result cardinality is bounded by canvas pixels (≤ 512² = 262k), independent of how many samples the bbox contains. No LIMIT needed — every sample counted into its true pixel bucket. Antimeridian handled: when bbox wraps (west > east), SQL shifts longitudes < west by +360 so pixel arithmetic works in a continuous coordinate space. Verified counts vs `samples table` summary line (= true sample count for the current view): view | heatmap | table | match ------------------|----------|----------|------ PKAP (100km) | 77,840 | 77,840 | ✅ Cyprus medium | 100,970 | 100,970 | ✅ (was capped at 100k) Cyprus regional | 682,029 | 682,029 | ✅ (was capped at 100k) Africa (1.9Mkm) | 12,875 | 12,875 | ✅ World view | 5.98M | 5.98M | ✅ (was capped at 100k) Render time at world view (~6M samples → 35k cells): ~7s on localhost, similar to or faster than the LIMIT 100k version. `HEATMAP_LIMIT` constant left in place but no longer used (kept for back-compat in case phase 2 reintroduces a safety cell-count cap). ## (2) Adaptive radius + maxOpacity After (1), RY tested staging and reported world view "everything is red." Cause: with 35k+ pixel cells on a 512² canvas, heatmap.js's default 25-pixel blur radius made each cell's Gaussian blur cover ~1% of canvas. 35k × 1% = >>100% → linear-additive blending saturated everything. Two complementary fixes: - `maxOpacity: 0.6` on the heatmap.js instance config. Caps the rendered alpha so dense areas don't fully wash out the satellite imagery underneath. - Per-point radius computed from `sqrt(canvas_pixels / cell_count) * 2`, clamped to [6, 30]. World view (35k cells) → radius ≈ 6px (tight pixel dots, no overlap). Cyprus medium (~400 cells) → radius = 30px (cap, smooth blobs as before). Together: world view shows geographic structure instead of solid red. Tight zooms unchanged visually. ## Test plan - `tests/playwright/heatmap-overlay.spec.js` 5/5 still pass on localhost. - Visual verified on rdhyee staging at the URLs RY surfaced (Africa-wide, Atlantic alt=15Mkm). World view now shows structure; tight zooms unchanged. ## Provenance Authored by Claude, prompted by RY ("wondering whether we can do better geographic random sampling"). Approach (Option C from Claude's menu: SQL pre-aggregation by pixel cell) recommended over TABLESAMPLE because it removes the cap entirely rather than just making the sampling random. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Two related changes that follow up PR #240 (heatmap phase 1): 1. SQL pre-aggregation removes the LIMIT 100000 cap honestly. 2. Adaptive per-point radius + maxOpacity caps avoid blur-overlap saturation at high-cell-count views (world view "everything red" symptom RY surfaced after #240 shipped). ## (1) SQL pre-aggregation Previously: `SELECT latitude, longitude FROM lite WHERE bbox AND filters LIMIT 100000`, then bin per pixel in JS. Two problems: - LIMIT 100000 returned the first 100k rows in parquet storage order — NOT random, NOT geographic. At world view, the heatmap silently showed whichever source happened to be physically first in the file (likely SESAR, the largest by row count). The "(capped)" status warning disclosed the problem but didn't fix it. - For sample sets above the cap, the density was unfaithful. Now: SQL computes pixel cell coords server-side using FLOOR / LEAST / GREATEST, then GROUP BY (x, y) returning one row per non-empty pixel with COUNT(*) as the count. Result cardinality is bounded by canvas pixels (≤ 512² = 262k), independent of how many samples the bbox contains. No LIMIT needed — every sample counted into its true pixel bucket. Antimeridian handled: when bbox wraps (west > east), SQL shifts longitudes < west by +360 so pixel arithmetic works in a continuous coordinate space. Verified counts vs `samples table` summary line (= true sample count for the current view): view | heatmap | table | match ------------------|----------|----------|------ PKAP (100km) | 77,840 | 77,840 | ✅ Cyprus medium | 100,970 | 100,970 | ✅ (was capped at 100k) Cyprus regional | 682,029 | 682,029 | ✅ (was capped at 100k) Africa (1.9Mkm) | 12,875 | 12,875 | ✅ World view | 5.98M | 5.98M | ✅ (was capped at 100k) Render time at world view (~6M samples → 35k cells): ~7s on localhost, similar to or faster than the LIMIT 100k version. `HEATMAP_LIMIT` constant left in place but no longer used (kept for back-compat in case phase 2 reintroduces a safety cell-count cap). ## (2) Adaptive radius + maxOpacity After (1), RY tested staging and reported world view "everything is red." Cause: with 35k+ pixel cells on a 512² canvas, heatmap.js's default 25-pixel blur radius made each cell's Gaussian blur cover ~1% of canvas. 35k × 1% = >>100% → linear-additive blending saturated everything. Two complementary fixes: - `maxOpacity: 0.6` on the heatmap.js instance config. Caps the rendered alpha so dense areas don't fully wash out the satellite imagery underneath. - Per-point radius computed from `sqrt(canvas_pixels / cell_count) * 2`, clamped to [6, 30]. World view (35k cells) → radius ≈ 6px (tight pixel dots, no overlap). Cyprus medium (~400 cells) → radius = 30px (cap, smooth blobs as before). Together: world view shows geographic structure instead of solid red. Tight zooms unchanged visually. ## Test plan - `tests/playwright/heatmap-overlay.spec.js` 5/5 still pass on localhost. - Visual verified on rdhyee staging at the URLs RY surfaced (Africa-wide, Atlantic alt=15Mkm). World view now shows structure; tight zooms unchanged. ## Provenance Authored by Claude, prompted by RY ("wondering whether we can do better geographic random sampling"). Approach (Option C from Claude's menu: SQL pre-aggregation by pixel cell) recommended over TABLESAMPLE because it removes the cap entirely rather than just making the sampling random. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

rdhyee force-pushed the feat/heatmap-overlay-spike branch 2 times, most recently from 0a2bc05 to 8998455 Compare May 27, 2026 23:41

rdhyee added a commit to rdhyee/isamplesorg.github.io that referenced this pull request May 27, 2026

Merge feat/heatmap-overlay-spike for rdhyee staging deploy (mirrors PR …

e25fb49

…isamplesorg#240)

rdhyee force-pushed the feat/heatmap-overlay-spike branch from 8998455 to 6b63944 Compare May 28, 2026 00:10

rdhyee added a commit to rdhyee/isamplesorg.github.io that referenced this pull request May 28, 2026

Merge feat/heatmap-overlay-spike (round-2 fixes) for rdhyee staging d…

a54c7df

…eploy (mirrors PR isamplesorg#240)

rdhyee force-pushed the feat/heatmap-overlay-spike branch from 6b63944 to 2755b1f Compare May 28, 2026 00:31

rdhyee added a commit to rdhyee/isamplesorg.github.io that referenced this pull request May 28, 2026

Merge feat/heatmap-overlay-spike (log-scale) for rdhyee staging (mirr…

6272d34

…ors PR isamplesorg#240)

rdhyee merged commit 60ac865 into isamplesorg:main May 28, 2026
1 check passed

rdhyee mentioned this pull request May 28, 2026

explorer: heatmap SQL pre-aggregation + adaptive radius (#233) #241

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

explorer: heatmap overlay spike — phase 1 (#233)#240

explorer: heatmap overlay spike — phase 1 (#233)#240
rdhyee merged 1 commit into
isamplesorg:mainfrom
rdhyee:feat/heatmap-overlay-spike

rdhyee commented May 27, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

rdhyee commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What you'll see

Architectural decisions

Caveats / known limitations

Test plan

Out of scope for this PR

What success of phase 1 unlocks (from the plan)

Implementation provenance

Cross-refs

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

rdhyee commented May 27, 2026 •

edited

Loading