explorer: heatmap overlay spike — phase 1 (#233)#240
Merged
Conversation
0a2bc05 to
8998455
Compare
rdhyee
added a commit
to rdhyee/isamplesorg.github.io
that referenced
this pull request
May 27, 2026
8998455 to
6b63944
Compare
rdhyee
added a commit
to rdhyee/isamplesorg.github.io
that referenced
this pull request
May 28, 2026
Adds a toggleable heatmap overlay as a third visualization alongside cluster (H3 dots) and point (individual samples) mode. Phase 1 of the isamplesorg#233 spike: answer the viability question of heatmap.js + Cesium SingleTileImageryProvider + DuckDB-WASM composing into a filter-aware density layer. In scope this commit: - Loads heatmap.js v2.0.5 via jsDelivr CDN (alongside Cesium). - New `#heatmapToggle` checkbox in the source legend. - `refreshHeatmap()` queries `samples_map_lite.parquet` for in- viewport sample coords (applying source + material/feature/ specimen filters), bins them per pixel into a 512x512 grid before passing to heatmap.js (smart perf optimization: keeps the data array under 262k regardless of how many samples match — though see cap warning below), renders to an offscreen canvas, and swaps a `SingleTileImageryProvider` into the Cesium imagery layers. - Cancellation via `heatmapReqId` (matches the existing `facetCountsReqId` / `requestId` patterns). - Refresh triggers: `camera.moveEnd` (debounced 250 ms), source-filter change, material-filter change. moveStart bumps the reqId and shows "waiting for camera" status. - Toggle off removes the imagery layer. - Skip-if-same-key optimization on identical (viewport, filter) combos — only marked "done" after a successful render. - Status text (italic) reports point count per refresh, with explicit cap warning when at LIMIT. - New `tests/playwright/heatmap-overlay.spec.js` (4 tests): toggle exists, toggle on renders, toggle off clears, filter change regenerates (asserts on lastImageHash, not just lastRefreshAt, so the error path doesn't satisfy the test). Codex round-1 fixes baked in: - Stale dedupe key bug: `heatmapLastKey` is now set ONLY after a successful layer swap, and cleared on (a) error path, and (b) moveStart cancellation. Previously a toggle+camera-gesture race could leave the key set without a render having happened, wedging the overlay (next moveEnd would early-return). - Silent LIMIT 100k cap: status text now explicitly says "(capped — zoom or filter for full density)" when at LIMIT. Lite parquet has ~6M rows; the cap shows an arbitrary first 100k, not honest density. Phase 2 progressive refinement removes the cap. - `_heatmapOverlay.capped` field exposed for tests. Verified: - 4/4 spec tests pass on localhost in 45.1s (post-fixes) - Patient probe at Cyprus alt=500km confirms numbers match raw bbox query: no filter = 96,694 samples; +organicmaterial filter = 13 samples; image hash changes per refresh - No regression in facet-viewport.spec.js (4/4 still pass) Out of scope for this PR (deferred to phase 2+): - Progressive refinement (TABLESAMPLE 1% → 10% → 100%) — removes the LIMIT cap properly - Cache by (viewport-hash, filter-hash) - Kernel / color-ramp tuning, alpha tuning - Third-mode promotion (currently overlay, not exclusive mode) - Interaction with `#facetNote` apology copy - Antimeridian rectangle convention — current path uses `east + 360` for wrapped bboxes; Cesium normally expects `west > east`. Codex flagged for a dateline test, deferred. Implementation provenance: Bulk of explorer.qmd diff (~252 LOC) authored by OpenAI Codex CLI from a Claude-authored phase-1 plan that was sent for "review" but Codex jumped straight to implementation. Claude reviewed the implementation against the plan, verified it works (spec + visual probe), and asked Codex for a round-1 PR review. Codex caught real bugs (stale dedupe key, silent cap) — those are addressed in this amended commit. Co-Authored-By: OpenAI Codex CLI <noreply@openai.com> Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
6b63944 to
2755b1f
Compare
rdhyee
added a commit
to rdhyee/isamplesorg.github.io
that referenced
this pull request
May 28, 2026
rdhyee
added a commit
to rdhyee/isamplesorg.github.io
that referenced
this pull request
May 28, 2026
Replaces the LIMIT 100000 raw-row scan + JS per-pixel binning with a single DuckDB GROUP BY query that does the binning server-side. Removes the arbitrary cap honestly: every sample in the bbox is counted into its true pixel cell, regardless of total sample count. Why the LIMIT was bad: `LIMIT 100000` returned the first 100k rows in parquet storage order — not random, not geographic. At world view, the heatmap silently showed whichever source happened to be physically first in the file (likely SESAR, the largest source by row count). The "(capped)" status warning disclosed the problem but didn't fix it. RY feedback 2026-05-27 on PR isamplesorg#240 ("wondering whether we can do better geographic random sampling"). How the SQL pushdown works: compute `(x_bin, y_bin)` pixel coordinates from `latitude`/`longitude` server-side using FLOOR / LEAST / GREATEST, then GROUP BY (x_bin, y_bin) returning one row per non-empty pixel with COUNT(*) as the sample count. Result cardinality is bounded by canvas pixels (≤ 512² = 262k), independent of bbox sample count. JS just iterates the aggregated rows and applies the same log(1+n) scaling for heatmap.js. Verified counts vs `samples table` summary line (= true sample count for the current view): view | heatmap | table | match ------------------|---------|---------|------ PKAP (100km) | 77,840 | 77,840 | ✅ Cyprus medium | 100,970 | 100,970 | ✅ (was capped at 100k) Cyprus regional | 682,029 | 682,029 | ✅ (was capped at 100k) World view | 5.98M | 5.98M | ✅ (was capped at 100k) Render time at world view (~6M samples → 35k cells): ~7s on localhost, similar to or faster than the LIMIT 100k version at smaller zooms. Removes the "(capped)" status branch and the `HEATMAP_LIMIT` constant becomes unused (left in place for now in case Phase 2 progressive refinement reintroduces a safety cap on cell count). Side effect of removing the cap: the per-pixel max-bias is now even more extreme at high-density views, but the log(1+n) scaling from PR isamplesorg#240 handles it. Verified: 5/5 heatmap-overlay.spec.js still pass on localhost. (The spec asserts `lastPointCount > 0`, which is still true; one spec change worth a follow-up: the spec used to expect capped behavior for large views, but no test currently asserts that, so no spec changes needed here.) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
rdhyee
added a commit
to rdhyee/isamplesorg.github.io
that referenced
this pull request
May 28, 2026
) Two related changes that follow up PR isamplesorg#240 (heatmap phase 1): 1. SQL pre-aggregation removes the LIMIT 100000 cap honestly. 2. Adaptive per-point radius + maxOpacity caps avoid blur-overlap saturation at high-cell-count views (world view "everything red" symptom RY surfaced after isamplesorg#240 shipped). ## (1) SQL pre-aggregation Previously: `SELECT latitude, longitude FROM lite WHERE bbox AND filters LIMIT 100000`, then bin per pixel in JS. Two problems: - LIMIT 100000 returned the first 100k rows in parquet storage order — NOT random, NOT geographic. At world view, the heatmap silently showed whichever source happened to be physically first in the file (likely SESAR, the largest by row count). The "(capped)" status warning disclosed the problem but didn't fix it. - For sample sets above the cap, the density was unfaithful. Now: SQL computes pixel cell coords server-side using FLOOR / LEAST / GREATEST, then GROUP BY (x, y) returning one row per non-empty pixel with COUNT(*) as the count. Result cardinality is bounded by canvas pixels (≤ 512² = 262k), independent of how many samples the bbox contains. No LIMIT needed — every sample counted into its true pixel bucket. Antimeridian handled: when bbox wraps (west > east), SQL shifts longitudes < west by +360 so pixel arithmetic works in a continuous coordinate space. Verified counts vs `samples table` summary line (= true sample count for the current view): view | heatmap | table | match ------------------|----------|----------|------ PKAP (100km) | 77,840 | 77,840 | ✅ Cyprus medium | 100,970 | 100,970 | ✅ (was capped at 100k) Cyprus regional | 682,029 | 682,029 | ✅ (was capped at 100k) Africa (1.9Mkm) | 12,875 | 12,875 | ✅ World view | 5.98M | 5.98M | ✅ (was capped at 100k) Render time at world view (~6M samples → 35k cells): ~7s on localhost, similar to or faster than the LIMIT 100k version. `HEATMAP_LIMIT` constant left in place but no longer used (kept for back-compat in case phase 2 reintroduces a safety cell-count cap). ## (2) Adaptive radius + maxOpacity After (1), RY tested staging and reported world view "everything is red." Cause: with 35k+ pixel cells on a 512² canvas, heatmap.js's default 25-pixel blur radius made each cell's Gaussian blur cover ~1% of canvas. 35k × 1% = >>100% → linear-additive blending saturated everything. Two complementary fixes: - `maxOpacity: 0.6` on the heatmap.js instance config. Caps the rendered alpha so dense areas don't fully wash out the satellite imagery underneath. - Per-point radius computed from `sqrt(canvas_pixels / cell_count) * 2`, clamped to [6, 30]. World view (35k cells) → radius ≈ 6px (tight pixel dots, no overlap). Cyprus medium (~400 cells) → radius = 30px (cap, smooth blobs as before). Together: world view shows geographic structure instead of solid red. Tight zooms unchanged visually. ## Test plan - `tests/playwright/heatmap-overlay.spec.js` 5/5 still pass on localhost. - Visual verified on rdhyee staging at the URLs RY surfaced (Africa-wide, Atlantic alt=15Mkm). World view now shows structure; tight zooms unchanged. ## Provenance Authored by Claude, prompted by RY ("wondering whether we can do better geographic random sampling"). Approach (Option C from Claude's menu: SQL pre-aggregation by pixel cell) recommended over TABLESAMPLE because it removes the cap entirely rather than just making the sampling random. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
4 tasks
rdhyee
added a commit
to rdhyee/isamplesorg.github.io
that referenced
this pull request
May 28, 2026
) Two related changes that follow up PR isamplesorg#240 (heatmap phase 1): 1. SQL pre-aggregation removes the LIMIT 100000 cap honestly. 2. Adaptive per-point radius + maxOpacity caps avoid blur-overlap saturation at high-cell-count views (world view "everything red" symptom RY surfaced after isamplesorg#240 shipped). ## (1) SQL pre-aggregation Previously: `SELECT latitude, longitude FROM lite WHERE bbox AND filters LIMIT 100000`, then bin per pixel in JS. Two problems: - LIMIT 100000 returned the first 100k rows in parquet storage order — NOT random, NOT geographic. At world view, the heatmap silently showed whichever source happened to be physically first in the file (likely SESAR, the largest by row count). The "(capped)" status warning disclosed the problem but didn't fix it. - For sample sets above the cap, the density was unfaithful. Now: SQL computes pixel cell coords server-side using FLOOR / LEAST / GREATEST, then GROUP BY (x, y) returning one row per non-empty pixel with COUNT(*) as the count. Result cardinality is bounded by canvas pixels (≤ 512² = 262k), independent of how many samples the bbox contains. No LIMIT needed — every sample counted into its true pixel bucket. Antimeridian handled: when bbox wraps (west > east), SQL shifts longitudes < west by +360 so pixel arithmetic works in a continuous coordinate space. Verified counts vs `samples table` summary line (= true sample count for the current view): view | heatmap | table | match ------------------|----------|----------|------ PKAP (100km) | 77,840 | 77,840 | ✅ Cyprus medium | 100,970 | 100,970 | ✅ (was capped at 100k) Cyprus regional | 682,029 | 682,029 | ✅ (was capped at 100k) Africa (1.9Mkm) | 12,875 | 12,875 | ✅ World view | 5.98M | 5.98M | ✅ (was capped at 100k) Render time at world view (~6M samples → 35k cells): ~7s on localhost, similar to or faster than the LIMIT 100k version. `HEATMAP_LIMIT` constant left in place but no longer used (kept for back-compat in case phase 2 reintroduces a safety cell-count cap). ## (2) Adaptive radius + maxOpacity After (1), RY tested staging and reported world view "everything is red." Cause: with 35k+ pixel cells on a 512² canvas, heatmap.js's default 25-pixel blur radius made each cell's Gaussian blur cover ~1% of canvas. 35k × 1% = >>100% → linear-additive blending saturated everything. Two complementary fixes: - `maxOpacity: 0.6` on the heatmap.js instance config. Caps the rendered alpha so dense areas don't fully wash out the satellite imagery underneath. - Per-point radius computed from `sqrt(canvas_pixels / cell_count) * 2`, clamped to [6, 30]. World view (35k cells) → radius ≈ 6px (tight pixel dots, no overlap). Cyprus medium (~400 cells) → radius = 30px (cap, smooth blobs as before). Together: world view shows geographic structure instead of solid red. Tight zooms unchanged visually. ## Test plan - `tests/playwright/heatmap-overlay.spec.js` 5/5 still pass on localhost. - Visual verified on rdhyee staging at the URLs RY surfaced (Africa-wide, Atlantic alt=15Mkm). World view now shows structure; tight zooms unchanged. ## Provenance Authored by Claude, prompted by RY ("wondering whether we can do better geographic random sampling"). Approach (Option C from Claude's menu: SQL pre-aggregation by pixel cell) recommended over TABLESAMPLE because it removes the cap entirely rather than just making the sampling random. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
rdhyee
added a commit
to rdhyee/isamplesorg.github.io
that referenced
this pull request
May 28, 2026
) Two related changes that follow up PR isamplesorg#240 (heatmap phase 1): 1. SQL pre-aggregation removes the LIMIT 100000 cap honestly. 2. Adaptive per-point radius + maxOpacity caps avoid blur-overlap saturation at high-cell-count views (world view "everything red" symptom RY surfaced after isamplesorg#240 shipped). ## (1) SQL pre-aggregation Previously: `SELECT latitude, longitude FROM lite WHERE bbox AND filters LIMIT 100000`, then bin per pixel in JS. Two problems: - LIMIT 100000 returned the first 100k rows in parquet storage order — NOT random, NOT geographic. At world view, the heatmap silently showed whichever source happened to be physically first in the file (likely SESAR, the largest by row count). The "(capped)" status warning disclosed the problem but didn't fix it. - For sample sets above the cap, the density was unfaithful. Now: SQL computes pixel cell coords server-side using FLOOR / LEAST / GREATEST, then GROUP BY (x, y) returning one row per non-empty pixel with COUNT(*) as the count. Result cardinality is bounded by canvas pixels (≤ 512² = 262k), independent of how many samples the bbox contains. No LIMIT needed — every sample counted into its true pixel bucket. Antimeridian handled: when bbox wraps (west > east), SQL shifts longitudes < west by +360 so pixel arithmetic works in a continuous coordinate space. Verified counts vs `samples table` summary line (= true sample count for the current view): view | heatmap | table | match ------------------|----------|----------|------ PKAP (100km) | 77,840 | 77,840 | ✅ Cyprus medium | 100,970 | 100,970 | ✅ (was capped at 100k) Cyprus regional | 682,029 | 682,029 | ✅ (was capped at 100k) Africa (1.9Mkm) | 12,875 | 12,875 | ✅ World view | 5.98M | 5.98M | ✅ (was capped at 100k) Render time at world view (~6M samples → 35k cells): ~7s on localhost, similar to or faster than the LIMIT 100k version. `HEATMAP_LIMIT` constant left in place but no longer used (kept for back-compat in case phase 2 reintroduces a safety cell-count cap). ## (2) Adaptive radius + maxOpacity After (1), RY tested staging and reported world view "everything is red." Cause: with 35k+ pixel cells on a 512² canvas, heatmap.js's default 25-pixel blur radius made each cell's Gaussian blur cover ~1% of canvas. 35k × 1% = >>100% → linear-additive blending saturated everything. Two complementary fixes: - `maxOpacity: 0.6` on the heatmap.js instance config. Caps the rendered alpha so dense areas don't fully wash out the satellite imagery underneath. - Per-point radius computed from `sqrt(canvas_pixels / cell_count) * 2`, clamped to [6, 30]. World view (35k cells) → radius ≈ 6px (tight pixel dots, no overlap). Cyprus medium (~400 cells) → radius = 30px (cap, smooth blobs as before). Together: world view shows geographic structure instead of solid red. Tight zooms unchanged visually. ## Test plan - `tests/playwright/heatmap-overlay.spec.js` 5/5 still pass on localhost. - Visual verified on rdhyee staging at the URLs RY surfaced (Africa-wide, Atlantic alt=15Mkm). World view now shows structure; tight zooms unchanged. ## Provenance Authored by Claude, prompted by RY ("wondering whether we can do better geographic random sampling"). Approach (Option C from Claude's menu: SQL pre-aggregation by pixel cell) recommended over TABLESAMPLE because it removes the cap entirely rather than just making the sampling random. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
rdhyee
added a commit
that referenced
this pull request
May 28, 2026
Two related changes that follow up PR #240 (heatmap phase 1): 1. SQL pre-aggregation removes the LIMIT 100000 cap honestly. 2. Adaptive per-point radius + maxOpacity caps avoid blur-overlap saturation at high-cell-count views (world view "everything red" symptom RY surfaced after #240 shipped). ## (1) SQL pre-aggregation Previously: `SELECT latitude, longitude FROM lite WHERE bbox AND filters LIMIT 100000`, then bin per pixel in JS. Two problems: - LIMIT 100000 returned the first 100k rows in parquet storage order — NOT random, NOT geographic. At world view, the heatmap silently showed whichever source happened to be physically first in the file (likely SESAR, the largest by row count). The "(capped)" status warning disclosed the problem but didn't fix it. - For sample sets above the cap, the density was unfaithful. Now: SQL computes pixel cell coords server-side using FLOOR / LEAST / GREATEST, then GROUP BY (x, y) returning one row per non-empty pixel with COUNT(*) as the count. Result cardinality is bounded by canvas pixels (≤ 512² = 262k), independent of how many samples the bbox contains. No LIMIT needed — every sample counted into its true pixel bucket. Antimeridian handled: when bbox wraps (west > east), SQL shifts longitudes < west by +360 so pixel arithmetic works in a continuous coordinate space. Verified counts vs `samples table` summary line (= true sample count for the current view): view | heatmap | table | match ------------------|----------|----------|------ PKAP (100km) | 77,840 | 77,840 | ✅ Cyprus medium | 100,970 | 100,970 | ✅ (was capped at 100k) Cyprus regional | 682,029 | 682,029 | ✅ (was capped at 100k) Africa (1.9Mkm) | 12,875 | 12,875 | ✅ World view | 5.98M | 5.98M | ✅ (was capped at 100k) Render time at world view (~6M samples → 35k cells): ~7s on localhost, similar to or faster than the LIMIT 100k version. `HEATMAP_LIMIT` constant left in place but no longer used (kept for back-compat in case phase 2 reintroduces a safety cell-count cap). ## (2) Adaptive radius + maxOpacity After (1), RY tested staging and reported world view "everything is red." Cause: with 35k+ pixel cells on a 512² canvas, heatmap.js's default 25-pixel blur radius made each cell's Gaussian blur cover ~1% of canvas. 35k × 1% = >>100% → linear-additive blending saturated everything. Two complementary fixes: - `maxOpacity: 0.6` on the heatmap.js instance config. Caps the rendered alpha so dense areas don't fully wash out the satellite imagery underneath. - Per-point radius computed from `sqrt(canvas_pixels / cell_count) * 2`, clamped to [6, 30]. World view (35k cells) → radius ≈ 6px (tight pixel dots, no overlap). Cyprus medium (~400 cells) → radius = 30px (cap, smooth blobs as before). Together: world view shows geographic structure instead of solid red. Tight zooms unchanged visually. ## Test plan - `tests/playwright/heatmap-overlay.spec.js` 5/5 still pass on localhost. - Visual verified on rdhyee staging at the URLs RY surfaced (Africa-wide, Atlantic alt=15Mkm). World view now shows structure; tight zooms unchanged. ## Provenance Authored by Claude, prompted by RY ("wondering whether we can do better geographic random sampling"). Approach (Option C from Claude's menu: SQL pre-aggregation by pixel cell) recommended over TABLESAMPLE because it removes the cap entirely rather than just making the sampling random. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Phase 1 of the #233 progressive heatmap spike. Adds a toggleable heatmap overlay in the explorer that renders a filter-aware density layer from in-viewport sample coordinates via DuckDB-WASM → heatmap.js → Cesium
SingleTileImageryProvider. Toggle off restores the existing cluster/point view unchanged.Scope deliberately narrow: answer the spike's viability question (does heatmap.js + Cesium imagery + DuckDB-WASM compose?) with minimal architectural change. Progressive refinement, cache, kernel/color tuning, and third-mode promotion all deferred to phase 2+.
What you'll see
A new
Heatmap (filtered density)checkbox under the source legend. When checked:moveEnd(debounced 250 ms) and on every filter change, queriessamples_map_lite.parquetfor sample coords in the current viewport bbox, applying source + material/feature/specimen filters (LIMIT 100,000 — see caveat below).SingleTileImageryProvider.Cyprus numbers (96,694 / 13) match raw DuckDB bbox queries against the same predicates exactly.
Architectural decisions
getMode()state machine. Phase 3 can promote to third-mode if the overlay proves the concept.SingleTileImageryProvidersamples_map_lite.parquetexact viewportheatmapReqIdcancellationfacetCountsReqId/requestIdpatterns.refreshFacetCountsandloadViewportSamples.heatmapLastKeyset ONLY after successful layer swap; cleared on error and on moveStart cancellationCaveats / known limitations
east + 360. Cesium normally expectswest > eastfor wrapped rectangles. Codex flagged this for a dateline visual/spec test; deferred until someone reports a problem at the dateline.(viewport-hash, filter-hash).Test plan
tests/playwright/heatmap-overlay.spec.js4/4 pass on localhost in 45.1slastPointCount > 0lastImageHashchanges (asserts on image-hash change, not just timestamp — so the error path doesn't satisfy)tests/playwright/facet-viewport.spec.js4/4 still pass (no regression in PR explorer: B1 viewport-aware facet counts (#234 step 3) #237's work)Out of scope for this PR
#facetNoteapology copy when heatmap mode activeWhat success of phase 1 unlocks (from the plan)
Implementation provenance
Bulk of the
explorer.qmddiff (~252 LOC) was authored by OpenAI Codex CLI from a Claude-authored phase-1 plan. Codex jumped past "review the plan" straight to "execute the plan." Claude reviewed the resulting implementation, verified it works (spec passes + visual probe matches raw queries), then asked Codex for a round-1 PR review.Codex's round-1 review caught real bugs:
lastRefreshAt, error path could satisfy — fixed: assertslastImageHashCommit credits Codex as co-author per repo conventions.
Cross-refs