Releases · Dim145/SubtitleExtractor

29 Jun 21:02

Dim145

v0.3.0

6cc2476

v0.3.0 — subtitle authoring, job re-run, video retention Latest

Latest

Subtitle authoring, job re-run, and video retention

Editor — add subtitles

Draw on the waveform to create a cue with exact in/out times (wavesurfer drag-selection).
Smart "Add cue" button: inserts at the playhead, clamped so it doesn't overrun the next cue, with a dropdown to insert above/below the selected cue.
Keyboard: N adds at the playhead, I / O mark in/out, alongside the existing [ / ] (set in/out) and ↑ / ↓ (select). New cues auto-select and focus their text field.
In-list affordances: a hover "insert" control between rows, a persistent "Add subtitle" row, and an empty-state call to action.

Jobs — re-run & source video

Re-run any finished job (succeeded, failed, or canceled) for a fresh extraction, as long as the source video is still stored. Non-destructive: existing subtitles are kept and the new run appends its own.
Delete only the source video to free storage while keeping the job and its subtitle files.

Admin — video retention

Scheduled cleanup deletes source videos older than a configurable window — default 7 days, daily at 03:00 (0 3 * * *) — and never touches jobs or subtitles. Enable/disable, retention days, and the cron schedule are editable in Settings.
Run cleanup now button plus a history of the last 7 runs (status, trigger, checked/deleted counts, storage freed). Runs that removed files open a modal listing the deleted videos.

Fixes

Worker result uploads now use a unique storage key, so re-running a job never overwrites a previously produced (possibly hand-edited) subtitle.

Notes

Adds database migrations 0006 and 0007, applied automatically on startup.
Video cleanup is enabled by default — after upgrading, the first scheduled run removes source videos of jobs finished more than the retention window ago. Adjust or disable it in Admin → Settings.

Assets 3

29 Jun 00:10

Dim145

v0.2.2

f12141b

v0.2.2 — fix S3 editor/video/download under strict CSP

Patch: completes the S3-safe download fix from v0.2.1.

Fixes

Editor subtitle loading, video preview, and result downloads now work with S3-compatible storage. v0.2.1 added the server proxy but the browser still tried the presigned S3 URL first, which the app's Content-Security-Policy (connect-src/media-src 'self') blocks because it points at a different host — so the editor showed "Failed to load subtitles", the video stayed black, and the console filled with CSP errors.
Now the client picks the right URL up front: a same-origin presigned URL (local backend → /api/files/…) is used directly, while a cross-origin presigned URL (S3 host) is streamed through the same-origin API proxy instead. No failed cross-origin request, and the CSP stays strict.
Applies to result downloads, the editor's subtitle load, and the editor's video preview. Local-storage deployments are unchanged.

🤖 Generated with Claude Code

Assets 3

28 Jun 23:32

Dim145

v0.2.1

3f43164

v0.2.1 — S3-safe downloads (server-proxy fallback)

Patch: reliable result & video downloads with S3-compatible storage.

Fixes

Downloads no longer break on non-public S3/Garage buckets. Presigned URLs that a browser can't reach (private bucket, or an S3 SigV4 "Date is too old" clock-skew/signature error) now transparently fall back to a same-origin streaming proxy through the API — which fetches the object with its own credentials, so no public bucket and no clock sync are required.
- New: GET /api/jobs/{id}/results/{resultId}/download (result download) and GET /api/jobs/{id}/video/raw (video stream, with HTTP Range for editor seeking).
- The results download button tries the presigned URL first, then the proxy; the editor video falls back to the proxy once if the presigned URL won't play.
Local-storage deployments are unaffected (already same-origin); the presigned path is still used whenever it works.

🤖 Generated with Claude Code

Assets 3

28 Jun 23:19

Dim145

v0.2.0

9e8be6c

v0.2.0 — rebuilt frontend, in-browser OCR, security & quality pass

SubtitleExtractor v0.2.0 — a from-scratch frontend, a real in-browser OCR path, and a broad security + quality pass.

Docker images for this release are built and pushed to GHCR by CI on publish. Set IMAGE_TAG=0.2.0 (or latest) in your .env to run prebuilt images via docker compose.

✨ Highlights

Frontend rebuilt from scratch ("Fusion")

New stack: Vite + React + TypeScript, Tailwind v4, TanStack Router + Query, Zustand, cmdk (⌘K palette), react-hook-form + zod, wavesurfer.js v7, self-hosted Geist fonts. Dark-first identity (cyan + amber).
Rebuilt every screen: login (local + OIDC), dashboard (drag-drop upload + live job list), job detail (live SSE log + results), admin (workers / substitutions / users / settings), and the subtitle editor.

Unified media player

One player across the app (preview and editor): play/pause, clickable/seekable progress bar, elapsed/total time, and keyboard transport (Space, ←/→ ±5s, Shift ±1s, Home). Drives both a native <video> and a WebCodecs/canvas backend for MKV/HEVC.

Subtitle editor & results

Save dialog: choose a filename and overwrite the current file or save as a new one (no more silent duplicate on every save).
Per-cue delete (X) in the cue table; per-result delete on the job page (deleting the last result removes the whole job + its files).
OCR language hint selector restored; zone layout + language remembered across extractions (localStorage).
Profile menu on the avatar: edit your display name / email / password (local accounts; OIDC profiles are read-only), Admin shortcut, sign out.

In-browser OCR (offline, privacy-preserving)

Fully working end-to-end and on par with server coverage for short clips.
Self-hosted OCR models and the onnxruntime WASM runtime (no third-party CDN at runtime → works offline).
Cross-origin isolation (COOP/COEP) enables multi-threaded WASM; WebGPU used when available.
Sequential frame decoding (true per-time frames instead of keyframe snaps) so OCR sees every subtitle; consensus voting + noise filtering for cleaner cues.

🔒 Security

Refuse to boot on placeholder JWT_SIGNING_KEY / INTERNAL_API_TOKEN; removed weak docker-compose credential fallbacks.
Rate limiting on /auth/login + /auth/register; session cookie Secure auto-enabled on HTTPS.
OIDC admin-claim now requires both claim and value (no accidental admin escalation).
Internal worker endpoints bound to the claiming worker; nginx security headers + CSP.

🐞 Fixes

Backend: atomic result/job deletes; handleSaveResult rejects oversize uploads and keeps storage keys consistent; worker progress/heartbeat on a vanished job returns 409 (worker stops cleanly instead of wedging).
Worker: accurate frame timestamps (real PTS) + safe ffmpeg→OpenCV fallback; prompt cancellation + per-job heartbeat on long jobs; SRT/VTT sanitization; PP-OCR no longer drops recognized lines; substitution-regex ReDoS hardening.
Frontend: editor follows the /video JSON {url} contract; WebCodecs decoder leak, SSE log duplication, and waveform drag jitter fixed; web-demuxer WASM served same-origin.

♿ Accessibility & polish

Accessible modals (role/aria + focus trap + Esc), larger touch targets, aria-labels on icon buttons, locale-aware number inputs.
Native checkboxes/radios replaced with custom switches/controls; clickable job rows, staggered entrance animations, skeleton loaders, route transitions (all respecting prefers-reduced-motion).

📦 Dependencies

Go: pgx 5.10, golang-jwt 5.3, go-oidc 3.19, env 11.4.1, x/crypto/x/net bumps; toolchain Go 1.25; base images golang:1.25-alpine / alpine:3.22; pinned MinIO images.
Web: React 19, Vite 8, TypeScript 6, zod 4, web-demuxer 4, jassub 2, lucide 1, TanStack/wavesurfer/tailwind minors; build image node:24-alpine.
Worker: onnxruntime ≥1.27, pillow ≥11, opencv 4.x floor.

🤖 Generated with Claude Code

Assets 3

28 Jun 16:47

Dim145

v0.1.5

309cccb

v0.1.5 — container healthchecks + friendly subtitle filenames

Changes since v0.1.4

Container healthchecks on api, web and the NVIDIA worker (postgres/minio already had them):
- api: HEALTHCHECK on /healthz.
- web: nginx now serves /healthz (200) and the image healthchecks it.
- nvidia-worker: a daemon thread touches a liveness file every ~5s (fresh even mid-job); unhealthy if missing or >60s old.
- compose: web and nvidia-worker now wait for the API condition: service_healthy.
Friendly subtitle download names — downloads are now named after the source video (e.g. Movie.mkv → Movie.srt) instead of an opaque storage token. Applies to the job results download, the in-browser editor export, and browser-side extraction. The API sets Content-Disposition from a sanitized ?name=.

Container images (rebuilt by CI)

ghcr.io/dim145/subtitleextractor-api:0.1.5
ghcr.io/dim145/subtitleextractor-web:0.1.5
ghcr.io/dim145/subtitleextractor-worker-nvidia:0.1.5

macOS worker

subtitleextractor-worker-macos-v0.1.5.zip is attached — unzip → cp .env.example .env → set API_BASE_URL + INTERNAL_API_TOKEN → ./run.sh.

Assets 3

28 Jun 16:10

Dim145

v0.1.4

f5d6750

v0.1.4 — OCR substitutions, idle RAM reclaim, macOS worker zip

Changes since v0.1.3

Inter-worker OCR substitution rules — a dedicated Admin → Substitutions page (table editor: find→replace, regex toggle, per-language, inline regex validation). Rules are global and applied by every worker to recognized text after merging — for fixing recurring OCR mistakes or stripping watermarks. (DB migration 0004.)
Idle RAM reclamation — the OCR model now runs in a disposable child process that is killed after the idle grace period, returning all RAM and VRAM to the OS (the previous in-process unload freed VRAM but not host RAM). The worker parent process stays small.
Ready-to-run macOS worker zip attached to this release (see Assets below): subtitleextractor-worker-macos-v0.1.4.zip. Unzip → cp .env.example .env → set API_BASE_URL + INTERNAL_API_TOKEN → ./run.sh. First run sets up a venv and installs deps (MLX GPU backend on Apple Silicon).

Container images (rebuilt by CI)

ghcr.io/dim145/subtitleextractor-api:0.1.4
ghcr.io/dim145/subtitleextractor-web:0.1.4
ghcr.io/dim145/subtitleextractor-worker-nvidia:0.1.4

Assets 3

28 Jun 15:08

Dim145

v0.1.3

c6864d1

v0.1.3 — confidence-weighted character-level consensus

Reliability improvement to subtitle text accuracy.

Changes since v0.1.2

Character-level confidence-weighted consensus replaces whole-string majority
voting when merging the frames of a cue. The text is now voted character by
character across all frames of a subtitle, weighted by each frame's OCR
confidence — repairing single-character misreads that no single frame got fully
right, and letting a high-confidence reading override a wrong majority. This is
the highest-impact reliability technique from the OCR research (documented
20-50% reduction of single-pass errors). The per-frame OCR confidence is now
threaded through the worker pipeline (it was previously discarded).
Conservative post-OCR normalization (whitespace cleanup only — no character edits).
New char_voting toggle in the worker config (default on).

Container images (rebuilt by CI)

ghcr.io/dim145/subtitleextractor-api:0.1.3
ghcr.io/dim145/subtitleextractor-web:0.1.3
ghcr.io/dim145/subtitleextractor-worker-nvidia:0.1.3

Assets 2

28 Jun 14:30

Dim145

v0.1.2

f1ce917

v0.1.2 — slim NVIDIA worker image

Image-size optimization of the NVIDIA worker.

Changes since v0.1.1

NVIDIA worker image slimmed ~13.7 GB → ~8 GB (~40%): rebuilt on
python:3.13-slim-trixie (Debian 13, Python 3.13) instead of the all-in-one
PaddlePaddle devel image. paddlepaddle-gpu's pip wheel already ships its CUDA
userspace libs (nvidia-*-cu12), so the CUDA/Paddle base only duplicated them
and dragged in TensorRT + the CUDA toolkit + build tools we never use. The
remaining ~8 GB is the floor for GPU PaddleOCR (paddle ~3.4 GB + CUDA ~2.8 GB).
.dockerignore so the local .venv (~1.1 GB) is never copied into the image.
Dockerfile layering ordered for cache reuse (paddle / paddleocr / worker layers).

Python 3.13 is the newest version with a paddlepaddle-gpu 3.3.1 wheel (cp38..cp313).

Container images

Rebuilt and pushed to GHCR by CI on this release:

ghcr.io/dim145/subtitleextractor-api:0.1.2
ghcr.io/dim145/subtitleextractor-web:0.1.2
ghcr.io/dim145/subtitleextractor-worker-nvidia:0.1.2

Assets 2

28 Jun 12:05

Dim145

v0.1.1

82d43f8

v0.1.1 — latest OCR stack + working NVIDIA image

Maintenance release: upgrades the OCR stack to the latest libraries/models and fixes the NVIDIA worker image.

Changes since v0.1.0

OCR stack upgraded:
- RapidOCR → unified rapidocr 3.x (better accent accuracy).
- PP-OCR → PaddleOCR 3.x (device= / .predict() API).
- PaddleOCR-VL (Apple GPU/MLX) → default model 1.5-8bit, with filtering of the model's "no text / too blurry" meta responses.
NVIDIA worker image fixed & modernized: now built on the official paddlepaddle/paddle:3.3.1-gpu-cuda12.6-cudnn9.5 base (bundles CUDA 12.6 + paddlepaddle-gpu). This replaces the v0.1.0 NVIDIA image, which was broken (deps silently failed to install). CUDA 12.6 chosen for broad host-driver compatibility (~R535+).
docker-compose: supports prebuilt GHCR images via image: + IMAGE_TAG (docker compose pull && up -d).

Container images

Rebuilt and pushed to GHCR by CI on this release:

ghcr.io/dim145/subtitleextractor-api:0.1.1
ghcr.io/dim145/subtitleextractor-web:0.1.1
ghcr.io/dim145/subtitleextractor-worker-nvidia:0.1.1

(The macOS OCR worker runs natively on the host — see the README.)

Assets 2

28 Jun 10:28

Dim145

v0.1.0

2208982

v0.1.0 — first release

First public release of SubtitleExtractor — extract hardcoded (burned-in) subtitles from video via OCR and edit them in the browser.

Highlights

Extraction pipeline: upload → automatic worker routing → OCR → downloadable SRT / ASS / WebVTT.
OCR backends (configurable per worker): RapidOCR (CPU), PP-OCR (CUDA), PaddleOCR-VL (Apple GPU / MLX).
Quality filters: text-mask change detection, presence gate, crop upscaling, persistence/duration/junk filters, majority-vote merge, and VLM anti-hallucination.
In-browser editor: live caption overlay, editable cue table, waveform timeline (wavesurfer.js).
100% browser extraction option (WebCodecs + onnxruntime-web / WebGPU).
Up to 2 subtitle zones with WebCodecs preview.
Accounts (local + OIDC), admin (users, settings, DB-backed dynamic worker config), live progress/logs via SSE.
Storage: local filesystem or S3/MinIO. Hardware decode: NVDEC / VideoToolbox.

Container images

Published to GHCR by CI on this release:

ghcr.io/dim145/subtitleextractor-api
ghcr.io/dim145/subtitleextractor-web
ghcr.io/dim145/subtitleextractor-worker-nvidia

(The macOS OCR worker runs natively on the host — see the README.)

Getting started

```bash
cp .env.example .env # set JWT_SIGNING_KEY, INTERNAL_API_TOKEN, ...
docker compose up --build
```

Licensed under AGPL-3.0.

Assets 2

Releases: Dim145/SubtitleExtractor

v0.3.0 — subtitle authoring, job re-run, video retention

Subtitle authoring, job re-run, and video retention

Editor — add subtitles

Jobs — re-run & source video

Admin — video retention

Fixes

Notes

Uh oh!

v0.2.2 — fix S3 editor/video/download under strict CSP

Fixes

Uh oh!

v0.2.1 — S3-safe downloads (server-proxy fallback)

Fixes

Uh oh!

v0.2.0 — rebuilt frontend, in-browser OCR, security & quality pass

✨ Highlights

Frontend rebuilt from scratch ("Fusion")

Unified media player

Subtitle editor & results

In-browser OCR (offline, privacy-preserving)

🔒 Security

🐞 Fixes

♿ Accessibility & polish

📦 Dependencies

Uh oh!

v0.1.5 — container healthchecks + friendly subtitle filenames

Changes since v0.1.4

Container images (rebuilt by CI)

macOS worker

Uh oh!

v0.1.4 — OCR substitutions, idle RAM reclaim, macOS worker zip

Changes since v0.1.3

Container images (rebuilt by CI)

Uh oh!

v0.1.3 — confidence-weighted character-level consensus

Changes since v0.1.2

Container images (rebuilt by CI)

Uh oh!

v0.1.2 — slim NVIDIA worker image

Changes since v0.1.1

Container images

Uh oh!

v0.1.1 — latest OCR stack + working NVIDIA image

Changes since v0.1.0

Container images

Uh oh!

v0.1.0 — first release

Highlights

Container images

Getting started

Uh oh!