Releases: Dim145/SubtitleExtractor
v0.3.0 — subtitle authoring, job re-run, video retention
Subtitle authoring, job re-run, and video retention
Editor — add subtitles
- Draw on the waveform to create a cue with exact in/out times (wavesurfer drag-selection).
- Smart "Add cue" button: inserts at the playhead, clamped so it doesn't overrun the next cue, with a dropdown to insert above/below the selected cue.
- Keyboard:
Nadds at the playhead,I/Omark in/out, alongside the existing[/](set in/out) and↑/↓(select). New cues auto-select and focus their text field. - In-list affordances: a hover "insert" control between rows, a persistent "Add subtitle" row, and an empty-state call to action.
Jobs — re-run & source video
- Re-run any finished job (succeeded, failed, or canceled) for a fresh extraction, as long as the source video is still stored. Non-destructive: existing subtitles are kept and the new run appends its own.
- Delete only the source video to free storage while keeping the job and its subtitle files.
Admin — video retention
- Scheduled cleanup deletes source videos older than a configurable window — default 7 days, daily at 03:00 (
0 3 * * *) — and never touches jobs or subtitles. Enable/disable, retention days, and the cron schedule are editable in Settings. - Run cleanup now button plus a history of the last 7 runs (status, trigger, checked/deleted counts, storage freed). Runs that removed files open a modal listing the deleted videos.
Fixes
- Worker result uploads now use a unique storage key, so re-running a job never overwrites a previously produced (possibly hand-edited) subtitle.
Notes
- Adds database migrations
0006and0007, applied automatically on startup. - Video cleanup is enabled by default — after upgrading, the first scheduled run removes source videos of jobs finished more than the retention window ago. Adjust or disable it in Admin → Settings.
v0.2.2 — fix S3 editor/video/download under strict CSP
Patch: completes the S3-safe download fix from v0.2.1.
Fixes
- Editor subtitle loading, video preview, and result downloads now work with S3-compatible storage. v0.2.1 added the server proxy but the browser still tried the presigned S3 URL first, which the app's Content-Security-Policy (
connect-src/media-src 'self') blocks because it points at a different host — so the editor showed "Failed to load subtitles", the video stayed black, and the console filled with CSP errors. - Now the client picks the right URL up front: a same-origin presigned URL (local backend →
/api/files/…) is used directly, while a cross-origin presigned URL (S3 host) is streamed through the same-origin API proxy instead. No failed cross-origin request, and the CSP stays strict. - Applies to result downloads, the editor's subtitle load, and the editor's video preview. Local-storage deployments are unchanged.
🤖 Generated with Claude Code
v0.2.1 — S3-safe downloads (server-proxy fallback)
Patch: reliable result & video downloads with S3-compatible storage.
Fixes
- Downloads no longer break on non-public S3/Garage buckets. Presigned URLs that a browser can't reach (private bucket, or an S3 SigV4 "Date is too old" clock-skew/signature error) now transparently fall back to a same-origin streaming proxy through the API — which fetches the object with its own credentials, so no public bucket and no clock sync are required.
- New:
GET /api/jobs/{id}/results/{resultId}/download(result download) andGET /api/jobs/{id}/video/raw(video stream, with HTTP Range for editor seeking). - The results download button tries the presigned URL first, then the proxy; the editor video falls back to the proxy once if the presigned URL won't play.
- New:
- Local-storage deployments are unaffected (already same-origin); the presigned path is still used whenever it works.
🤖 Generated with Claude Code
v0.2.0 — rebuilt frontend, in-browser OCR, security & quality pass
SubtitleExtractor v0.2.0 — a from-scratch frontend, a real in-browser OCR path, and a broad security + quality pass.
Docker images for this release are built and pushed to GHCR by CI on publish. Set
IMAGE_TAG=0.2.0(orlatest) in your.envto run prebuilt images viadocker compose.
✨ Highlights
Frontend rebuilt from scratch ("Fusion")
- New stack: Vite + React + TypeScript, Tailwind v4, TanStack Router + Query, Zustand, cmdk (⌘K palette), react-hook-form + zod, wavesurfer.js v7, self-hosted Geist fonts. Dark-first identity (cyan + amber).
- Rebuilt every screen: login (local + OIDC), dashboard (drag-drop upload + live job list), job detail (live SSE log + results), admin (workers / substitutions / users / settings), and the subtitle editor.
Unified media player
- One player across the app (preview and editor): play/pause, clickable/seekable progress bar, elapsed/total time, and keyboard transport (Space, ←/→ ±5s, Shift ±1s, Home). Drives both a native
<video>and a WebCodecs/canvas backend for MKV/HEVC.
Subtitle editor & results
- Save dialog: choose a filename and overwrite the current file or save as a new one (no more silent duplicate on every save).
- Per-cue delete (X) in the cue table; per-result delete on the job page (deleting the last result removes the whole job + its files).
- OCR language hint selector restored; zone layout + language remembered across extractions (localStorage).
- Profile menu on the avatar: edit your display name / email / password (local accounts; OIDC profiles are read-only), Admin shortcut, sign out.
In-browser OCR (offline, privacy-preserving)
- Fully working end-to-end and on par with server coverage for short clips.
- Self-hosted OCR models and the onnxruntime WASM runtime (no third-party CDN at runtime → works offline).
- Cross-origin isolation (COOP/COEP) enables multi-threaded WASM; WebGPU used when available.
- Sequential frame decoding (true per-time frames instead of keyframe snaps) so OCR sees every subtitle; consensus voting + noise filtering for cleaner cues.
🔒 Security
- Refuse to boot on placeholder
JWT_SIGNING_KEY/INTERNAL_API_TOKEN; removed weak docker-compose credential fallbacks. - Rate limiting on
/auth/login+/auth/register; session cookieSecureauto-enabled on HTTPS. - OIDC admin-claim now requires both claim and value (no accidental admin escalation).
- Internal worker endpoints bound to the claiming worker; nginx security headers + CSP.
🐞 Fixes
- Backend: atomic result/job deletes;
handleSaveResultrejects oversize uploads and keeps storage keys consistent; worker progress/heartbeat on a vanished job returns 409 (worker stops cleanly instead of wedging). - Worker: accurate frame timestamps (real PTS) + safe ffmpeg→OpenCV fallback; prompt cancellation + per-job heartbeat on long jobs; SRT/VTT sanitization; PP-OCR no longer drops recognized lines; substitution-regex ReDoS hardening.
- Frontend: editor follows the
/videoJSON{url}contract; WebCodecs decoder leak, SSE log duplication, and waveform drag jitter fixed; web-demuxer WASM served same-origin.
♿ Accessibility & polish
- Accessible modals (role/aria + focus trap + Esc), larger touch targets, aria-labels on icon buttons, locale-aware number inputs.
- Native checkboxes/radios replaced with custom switches/controls; clickable job rows, staggered entrance animations, skeleton loaders, route transitions (all respecting
prefers-reduced-motion).
📦 Dependencies
- Go: pgx 5.10, golang-jwt 5.3, go-oidc 3.19, env 11.4.1, x/crypto/x/net bumps; toolchain Go 1.25; base images
golang:1.25-alpine/alpine:3.22; pinned MinIO images. - Web: React 19, Vite 8, TypeScript 6, zod 4, web-demuxer 4, jassub 2, lucide 1, TanStack/wavesurfer/tailwind minors; build image
node:24-alpine. - Worker: onnxruntime ≥1.27, pillow ≥11, opencv 4.x floor.
🤖 Generated with Claude Code
v0.1.5 — container healthchecks + friendly subtitle filenames
Changes since v0.1.4
- Container healthchecks on api, web and the NVIDIA worker (postgres/minio already had them):
- api:
HEALTHCHECKon/healthz. - web: nginx now serves
/healthz(200) and the image healthchecks it. - nvidia-worker: a daemon thread touches a liveness file every ~5s (fresh even mid-job); unhealthy if missing or >60s old.
- compose:
webandnvidia-workernow wait for the APIcondition: service_healthy.
- api:
- Friendly subtitle download names — downloads are now named after the source video (e.g.
Movie.mkv→Movie.srt) instead of an opaque storage token. Applies to the job results download, the in-browser editor export, and browser-side extraction. The API setsContent-Dispositionfrom a sanitized?name=.
Container images (rebuilt by CI)
ghcr.io/dim145/subtitleextractor-api:0.1.5ghcr.io/dim145/subtitleextractor-web:0.1.5ghcr.io/dim145/subtitleextractor-worker-nvidia:0.1.5
macOS worker
subtitleextractor-worker-macos-v0.1.5.zip is attached — unzip → cp .env.example .env → set API_BASE_URL + INTERNAL_API_TOKEN → ./run.sh.
v0.1.4 — OCR substitutions, idle RAM reclaim, macOS worker zip
Changes since v0.1.3
- Inter-worker OCR substitution rules — a dedicated Admin → Substitutions page (table editor: find→replace, regex toggle, per-language, inline regex validation). Rules are global and applied by every worker to recognized text after merging — for fixing recurring OCR mistakes or stripping watermarks. (DB migration 0004.)
- Idle RAM reclamation — the OCR model now runs in a disposable child process that is killed after the idle grace period, returning all RAM and VRAM to the OS (the previous in-process unload freed VRAM but not host RAM). The worker parent process stays small.
- Ready-to-run macOS worker zip attached to this release (see Assets below):
subtitleextractor-worker-macos-v0.1.4.zip. Unzip →cp .env.example .env→ setAPI_BASE_URL+INTERNAL_API_TOKEN→./run.sh. First run sets up a venv and installs deps (MLX GPU backend on Apple Silicon).
Container images (rebuilt by CI)
ghcr.io/dim145/subtitleextractor-api:0.1.4ghcr.io/dim145/subtitleextractor-web:0.1.4ghcr.io/dim145/subtitleextractor-worker-nvidia:0.1.4
v0.1.3 — confidence-weighted character-level consensus
Reliability improvement to subtitle text accuracy.
Changes since v0.1.2
- Character-level confidence-weighted consensus replaces whole-string majority
voting when merging the frames of a cue. The text is now voted character by
character across all frames of a subtitle, weighted by each frame's OCR
confidence — repairing single-character misreads that no single frame got fully
right, and letting a high-confidence reading override a wrong majority. This is
the highest-impact reliability technique from the OCR research (documented
20-50% reduction of single-pass errors). The per-frame OCR confidence is now
threaded through the worker pipeline (it was previously discarded). - Conservative post-OCR normalization (whitespace cleanup only — no character edits).
- New
char_votingtoggle in the worker config (default on).
Container images (rebuilt by CI)
ghcr.io/dim145/subtitleextractor-api:0.1.3ghcr.io/dim145/subtitleextractor-web:0.1.3ghcr.io/dim145/subtitleextractor-worker-nvidia:0.1.3
v0.1.2 — slim NVIDIA worker image
Image-size optimization of the NVIDIA worker.
Changes since v0.1.1
- NVIDIA worker image slimmed ~13.7 GB → ~8 GB (~40%): rebuilt on
python:3.13-slim-trixie(Debian 13, Python 3.13) instead of the all-in-one
PaddlePaddle devel image. paddlepaddle-gpu's pip wheel already ships its CUDA
userspace libs (nvidia-*-cu12), so the CUDA/Paddle base only duplicated them
and dragged in TensorRT + the CUDA toolkit + build tools we never use. The
remaining ~8 GB is the floor for GPU PaddleOCR (paddle ~3.4 GB + CUDA ~2.8 GB). .dockerignoreso the local.venv(~1.1 GB) is never copied into the image.- Dockerfile layering ordered for cache reuse (paddle / paddleocr / worker layers).
Python 3.13 is the newest version with a paddlepaddle-gpu 3.3.1 wheel (cp38..cp313).
Container images
Rebuilt and pushed to GHCR by CI on this release:
ghcr.io/dim145/subtitleextractor-api:0.1.2ghcr.io/dim145/subtitleextractor-web:0.1.2ghcr.io/dim145/subtitleextractor-worker-nvidia:0.1.2
v0.1.1 — latest OCR stack + working NVIDIA image
Maintenance release: upgrades the OCR stack to the latest libraries/models and fixes the NVIDIA worker image.
Changes since v0.1.0
- OCR stack upgraded:
- RapidOCR → unified
rapidocr3.x (better accent accuracy). - PP-OCR → PaddleOCR 3.x (
device=/.predict()API). - PaddleOCR-VL (Apple GPU/MLX) → default model 1.5-8bit, with filtering of the model's "no text / too blurry" meta responses.
- RapidOCR → unified
- NVIDIA worker image fixed & modernized: now built on the official
paddlepaddle/paddle:3.3.1-gpu-cuda12.6-cudnn9.5base (bundles CUDA 12.6 + paddlepaddle-gpu). This replaces the v0.1.0 NVIDIA image, which was broken (deps silently failed to install). CUDA 12.6 chosen for broad host-driver compatibility (~R535+). - docker-compose: supports prebuilt GHCR images via
image:+IMAGE_TAG(docker compose pull && up -d).
Container images
Rebuilt and pushed to GHCR by CI on this release:
ghcr.io/dim145/subtitleextractor-api:0.1.1ghcr.io/dim145/subtitleextractor-web:0.1.1ghcr.io/dim145/subtitleextractor-worker-nvidia:0.1.1
(The macOS OCR worker runs natively on the host — see the README.)
v0.1.0 — first release
First public release of SubtitleExtractor — extract hardcoded (burned-in) subtitles from video via OCR and edit them in the browser.
Highlights
- Extraction pipeline: upload → automatic worker routing → OCR → downloadable SRT / ASS / WebVTT.
- OCR backends (configurable per worker): RapidOCR (CPU), PP-OCR (CUDA), PaddleOCR-VL (Apple GPU / MLX).
- Quality filters: text-mask change detection, presence gate, crop upscaling, persistence/duration/junk filters, majority-vote merge, and VLM anti-hallucination.
- In-browser editor: live caption overlay, editable cue table, waveform timeline (wavesurfer.js).
- 100% browser extraction option (WebCodecs + onnxruntime-web / WebGPU).
- Up to 2 subtitle zones with WebCodecs preview.
- Accounts (local + OIDC), admin (users, settings, DB-backed dynamic worker config), live progress/logs via SSE.
- Storage: local filesystem or S3/MinIO. Hardware decode: NVDEC / VideoToolbox.
Container images
Published to GHCR by CI on this release:
ghcr.io/dim145/subtitleextractor-apighcr.io/dim145/subtitleextractor-webghcr.io/dim145/subtitleextractor-worker-nvidia
(The macOS OCR worker runs natively on the host — see the README.)
Getting started
```bash
cp .env.example .env # set JWT_SIGNING_KEY, INTERNAL_API_TOKEN, ...
docker compose up --build
```
Licensed under AGPL-3.0.