fix(indexer): normalize SDK playlist_contents to canonical {track,time,metadata_time}#876
Closed
dylanjeffers wants to merge 2 commits into
Closed
fix(indexer): normalize SDK playlist_contents to canonical {track,time,metadata_time}#876dylanjeffers wants to merge 2 commits into
dylanjeffers wants to merge 2 commits into
Conversation
ETL's playlist Create/Update handlers decode hashid-encoded track IDs only for the playlist_tracks junction (pickPlaylistTrackID). The playlists.playlist_contents JSONB column is written verbatim via normalizePlaylistContentsJSON, so an SDK-supplied "1aV5byE"-style string leaks into the column unchanged. Both downstream readers fail on it: v1_playlist_tracks.go casts (e.value->>'track')::int and dbv1.PlaylistContents scans into int64. Add an etl PostHook for (Playlist, Create) and (Playlist, Update) that re-reads the JSONB in the same tx and rewrites any string track/track_id to its decoded integer via trashid.DecodeHashId (same Hashids salt and min length as the junction path). Integer-only rows are untouched. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The actual root cause of the empty-playlist regression is field-name
mismatch, not (just) hashid encoding. The TS SDK uploads each entry as
{trackId, timestamp, metadataTimestamp?} and snakecase-keys converts
that to {track_id, timestamp, metadata_timestamp} before signing. ETL's
normalizePlaylistContentsJSON writes those entries to the JSONB column
verbatim, but the downstream API readers expect the canonical Python-
indexer shape {track, time, metadata_time}:
- v1_playlist_tracks.go does (e.value->>'track')::int — returns NULL
on a row whose entries only have 'track_id'.
- dbv1.PlaylistContents.TrackIDs.Track is int64, so json.Unmarshal
sees no 'track' field and leaves Track=0. The TrackMap lookup
then misses, and the playlist's tracks slice comes back empty.
Either way: playlist appears empty after a hard refresh, even though
the row exists and the playlist_tracks junction is populated correctly
(the junction-table updater picks track_id via the same key precedence
the hook now does).
Expand the post-hook to fix the column in the same tx:
- resolve 'track' from existing 'track' or 'track_id' (decoding
hashid strings via trashid.DecodeHashId).
- copy 'timestamp' to 'time' if 'time' is missing.
- copy 'metadata_timestamp' (fallback 'timestamp') to 'metadata_time'
if missing.
- drop the SDK-style keys after copying their values, matching the
Python apps process_playlist_contents output shape.
Canonical entries (already in the {track, time, metadata_time} shape)
pass through byte-identical — no spurious UPDATE.
Should also land in normalizePlaylistContentsJSON upstream; tracking
that separately. The hook here unblocks the urgent prod issue without
waiting for an ETL bump.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Member
|
Opting for this instead: |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Symptom (prod)
User adds a track to a new playlist: it shows in the UI optimistically, but after a hard refresh the playlist is empty. The on-chain UpdatePlaylist tx landed and was indexed successfully — no error in the indexer logs.
Root cause
A field-name mismatch between the TS SDK and the API readers, introduced when apps#840 cut over playlist indexing from the legacy Python discovery-provider to the vendored `go-openaudio/pkg/etl`.
The TS SDK schema for an UpdatePlaylist entry (apps/packages/sdk/.../playlists/types.ts):
```ts
playlistContents: z.array(z.object({
timestamp: z.number(),
metadataTimestamp: z.optional(z.number()),
trackId: HashId // hashid-encoded string
}))
```
PlaylistsApi runs the metadata through `snakecaseKeys` before signing, so the on-chain metadata carries each entry as:
```json
{"track_id": "", "timestamp": 1700000000, "metadata_timestamp": 1700000000}
```
The legacy Python indexer's `process_playlist_contents` (apps/packages/discovery-provider/.../entities/playlist.py) normalized every entry before persisting:
```python
updated_tracks.append({
"track": track_id, # decoded integer
"time": index_time, # block_integer_time (or prior entry's time)
"metadata_time": metadata_time,
})
return {"track_ids": updated_tracks}
```
ETL's `normalizePlaylistContentsJSON` (added in go-openaudio#265) does NOT replicate this — it only normalizes the outer wrapper to `{"track_ids": [...]}` and `json.Marshal`s the inner entries verbatim. So the JSONB column ends up with `{"track_id": "", "timestamp": ..., "metadata_timestamp": ...}`-shaped entries.
Both API readers in this repo expect the canonical Python shape:
Either path: playlist looks empty even though the row + the `playlist_tracks` junction are correct. (The junction is correct because `pickPlaylistTrackID` does check both `track` and `track_id` and decodes hashids — only the JSONB-write path is missing the normalization.)
Fix
Register an etl `PostHook` for `(Playlist, Create)` and `(Playlist, Update)` that runs in the same DB transaction as the handler. It re-reads the JSONB and rewrites each entry to the canonical shape:
Canonical entries (everything already in the `{track, time, metadata_time}` shape) pass through byte-identical — no spurious UPDATE.
Gated on `params.Metadata["playlist_contents"]` being present — name-only updates don't touch the column. Errors are logged and swallowed per the upstream PostHook contract (no rollback for a failed normalization), mirroring `indexer/user_pubkey_hook.go`.
Why a hook, not an upstream patch + bump
`pkg/etl` is vendored at a `main` pseudo-version per apps#840's pinning policy and lives in the read-only Go module cache. A hook in this repo:
The same fix should land in `normalizePlaylistContentsJSON` upstream so other consumers of `pkg/etl` get the same correctness. Tracking that as a follow-up.
Test plan
`go test ./indexer/...` — all green (existing tests + 9 new cases):
`go build ./...` clean, `go vet ./indexer/...` clean.
Out of scope
🤖 Generated with Claude Code