Skip to content

fix(indexer): normalize SDK playlist_contents to canonical {track,time,metadata_time}#876

Closed
dylanjeffers wants to merge 2 commits into
mainfrom
fix/playlist-contents-hashid-decode
Closed

fix(indexer): normalize SDK playlist_contents to canonical {track,time,metadata_time}#876
dylanjeffers wants to merge 2 commits into
mainfrom
fix/playlist-contents-hashid-decode

Conversation

@dylanjeffers
Copy link
Copy Markdown
Contributor

@dylanjeffers dylanjeffers commented May 29, 2026

Symptom (prod)

User adds a track to a new playlist: it shows in the UI optimistically, but after a hard refresh the playlist is empty. The on-chain UpdatePlaylist tx landed and was indexed successfully — no error in the indexer logs.

Root cause

A field-name mismatch between the TS SDK and the API readers, introduced when apps#840 cut over playlist indexing from the legacy Python discovery-provider to the vendored `go-openaudio/pkg/etl`.

The TS SDK schema for an UpdatePlaylist entry (apps/packages/sdk/.../playlists/types.ts):

```ts
playlistContents: z.array(z.object({
timestamp: z.number(),
metadataTimestamp: z.optional(z.number()),
trackId: HashId // hashid-encoded string
}))
```

PlaylistsApi runs the metadata through `snakecaseKeys` before signing, so the on-chain metadata carries each entry as:

```json
{"track_id": "", "timestamp": 1700000000, "metadata_timestamp": 1700000000}
```

The legacy Python indexer's `process_playlist_contents` (apps/packages/discovery-provider/.../entities/playlist.py) normalized every entry before persisting:

```python
updated_tracks.append({
"track": track_id, # decoded integer
"time": index_time, # block_integer_time (or prior entry's time)
"metadata_time": metadata_time,
})
return {"track_ids": updated_tracks}
```

ETL's `normalizePlaylistContentsJSON` (added in go-openaudio#265) does NOT replicate this — it only normalizes the outer wrapper to `{"track_ids": [...]}` and `json.Marshal`s the inner entries verbatim. So the JSONB column ends up with `{"track_id": "", "timestamp": ..., "metadata_timestamp": ...}`-shaped entries.

Both API readers in this repo expect the canonical Python shape:

  • v1_playlist_tracks.go:46 — `(e.value->>'track')::int` returns NULL when the entry has `track_id` instead of `track`. The JOIN `tracks t ON t.track_id = NULL` matches nothing, and the endpoint returns zero rows.
  • dbv1/jsonb_types.go:16 — `PlaylistContents.TrackIDs.Track` is `int64` with JSON tag `track`. Missing field unmarshals as 0, the `TrackMap[0]` lookup misses, and the playlist's `tracks` slice in dbv1/playlists.go:95 comes back empty.

Either path: playlist looks empty even though the row + the `playlist_tracks` junction are correct. (The junction is correct because `pickPlaylistTrackID` does check both `track` and `track_id` and decodes hashids — only the JSONB-write path is missing the normalization.)

Fix

Register an etl `PostHook` for `(Playlist, Create)` and `(Playlist, Update)` that runs in the same DB transaction as the handler. It re-reads the JSONB and rewrites each entry to the canonical shape:

  • Resolve `track` (integer): prefer existing `track`, fall back to `track_id`. String values decode via `trashid.DecodeHashId` (same Hashids salt + min length as the upstream junction-table decoder).
  • Copy `timestamp` → `time` when `time` is missing.
  • Copy `metadata_timestamp` (fallback `timestamp`) → `metadata_time` when missing.
  • Drop the SDK-style `track_id`/`timestamp`/`metadata_timestamp` keys after copying their values.

Canonical entries (everything already in the `{track, time, metadata_time}` shape) pass through byte-identical — no spurious UPDATE.

Gated on `params.Metadata["playlist_contents"]` being present — name-only updates don't touch the column. Errors are logged and swallowed per the upstream PostHook contract (no rollback for a failed normalization), mirroring `indexer/user_pubkey_hook.go`.

Why a hook, not an upstream patch + bump

`pkg/etl` is vendored at a `main` pseudo-version per apps#840's pinning policy and lives in the read-only Go module cache. A hook in this repo:

  • Lands the fix today on this repo's CI/CD instead of waiting for an upstream PR + version bump.
  • Stays surgical: only fixes the column the API readers care about, doesn't touch the junction-table path.
  • Is byte-identical no-op for the common case (every entry already canonical).

The same fix should land in `normalizePlaylistContentsJSON` upstream so other consumers of `pkg/etl` get the same correctness. Tracking that as a follow-up.

Test plan

`go test ./indexer/...` — all green (existing tests + 9 new cases):

  • `TestPlaylistContentsHook_RenamesSDKShape` — SDK entry `{track_id: , timestamp}` becomes canonical `{track: , time, metadata_time}` with SDK keys removed.
  • `TestPlaylistContentsHook_RenamesSDKShape_WithMetadataTimestamp` — Create path with both `timestamp` and `metadata_timestamp` — `metadata_time` comes from `metadata_timestamp`, not `timestamp` (matches Python precedence).
  • `TestPlaylistContentsHook_NoChangeForCanonicalEntries` — JSONB byte-identical after hook for the pre-cutover canonical shape.
  • `TestPlaylistContentsHook_SkipsWhenMetadataLacksPlaylistContents` — name-only update leaves the JSONB untouched even when it contains fixable entries.
  • `TestPlaylistContentsHook_MixedEntries` — only the SDK-shape entry is rewritten; the canonical one passes through.
  • `TestPlaylistContentsHook_NumericStringTrackID` — `"12345"` falls back to `strconv.Atoi` (same as `trashid.DecodeHashId`'s Hashids-or-Atoi fallback).
  • `TestPlaylistContentsHook_EmptyContents` — `{"track_ids":[]}` (last-track-removed case) untouched.
  • `TestPlaylistContentsHook_UndecodableTrackIDLeftAlone` — entry with a malformed track ID is logged and preserved, not silently dropped.
  • `TestPlaylistContentsHook_HashidOnTrackField` — hashid string in the canonical `track` slot is decoded too.

`go build ./...` clean, `go vet ./indexer/...` clean.

Out of scope

  • The `OPENAUDIO_ETL_ENTITY_MANAGER_DATA_TYPES` env-var gate (if prod set it to a list missing `Playlist`, all playlist handlers would be silently skipped) is not addressed here — that's a config check, not a code fix. Mentioning so reviewers can rule it out by inspecting the deploy.
  • The other silent-failure path (`normalizePlaylistContentsJSON` writing `{"track_ids":[]}` for completely unrecognized SDK shapes) isn't addressed here either — by the time a post-hook runs, the original payload is gone. Upstream fix would be needed for that path; doesn't apply to the current prod symptom.

🤖 Generated with Claude Code

dylanjeffers and others added 2 commits May 29, 2026 12:09
ETL's playlist Create/Update handlers decode hashid-encoded track IDs
only for the playlist_tracks junction (pickPlaylistTrackID). The
playlists.playlist_contents JSONB column is written verbatim via
normalizePlaylistContentsJSON, so an SDK-supplied "1aV5byE"-style
string leaks into the column unchanged. Both downstream readers fail
on it: v1_playlist_tracks.go casts (e.value->>'track')::int and
dbv1.PlaylistContents scans into int64.

Add an etl PostHook for (Playlist, Create) and (Playlist, Update) that
re-reads the JSONB in the same tx and rewrites any string track/track_id
to its decoded integer via trashid.DecodeHashId (same Hashids salt and
min length as the junction path). Integer-only rows are untouched.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The actual root cause of the empty-playlist regression is field-name
mismatch, not (just) hashid encoding. The TS SDK uploads each entry as
{trackId, timestamp, metadataTimestamp?} and snakecase-keys converts
that to {track_id, timestamp, metadata_timestamp} before signing. ETL's
normalizePlaylistContentsJSON writes those entries to the JSONB column
verbatim, but the downstream API readers expect the canonical Python-
indexer shape {track, time, metadata_time}:

  - v1_playlist_tracks.go does (e.value->>'track')::int — returns NULL
    on a row whose entries only have 'track_id'.
  - dbv1.PlaylistContents.TrackIDs.Track is int64, so json.Unmarshal
    sees no 'track' field and leaves Track=0. The TrackMap lookup
    then misses, and the playlist's tracks slice comes back empty.

Either way: playlist appears empty after a hard refresh, even though
the row exists and the playlist_tracks junction is populated correctly
(the junction-table updater picks track_id via the same key precedence
the hook now does).

Expand the post-hook to fix the column in the same tx:

  - resolve 'track' from existing 'track' or 'track_id' (decoding
    hashid strings via trashid.DecodeHashId).
  - copy 'timestamp' to 'time' if 'time' is missing.
  - copy 'metadata_timestamp' (fallback 'timestamp') to 'metadata_time'
    if missing.
  - drop the SDK-style keys after copying their values, matching the
    Python apps process_playlist_contents output shape.

Canonical entries (already in the {track, time, metadata_time} shape)
pass through byte-identical — no spurious UPDATE.

Should also land in normalizePlaylistContentsJSON upstream; tracking
that separately. The hook here unblocks the urgent prod issue without
waiting for an ETL bump.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@dylanjeffers dylanjeffers changed the title fix(indexer): decode hashid track IDs in playlist_contents JSONB fix(indexer): normalize SDK playlist_contents to canonical {track,time,metadata_time} May 29, 2026
@raymondjacobson
Copy link
Copy Markdown
Member

Opting for this instead:
OpenAudio/go-openaudio#321

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants