Skip to content

Releases: Dim145/SubtitleExtractor

v0.3.0 — subtitle authoring, job re-run, video retention

29 Jun 21:02

Choose a tag to compare

Subtitle authoring, job re-run, and video retention

Editor — add subtitles

  • Draw on the waveform to create a cue with exact in/out times (wavesurfer drag-selection).
  • Smart "Add cue" button: inserts at the playhead, clamped so it doesn't overrun the next cue, with a dropdown to insert above/below the selected cue.
  • Keyboard: N adds at the playhead, I / O mark in/out, alongside the existing [ / ] (set in/out) and / (select). New cues auto-select and focus their text field.
  • In-list affordances: a hover "insert" control between rows, a persistent "Add subtitle" row, and an empty-state call to action.

Jobs — re-run & source video

  • Re-run any finished job (succeeded, failed, or canceled) for a fresh extraction, as long as the source video is still stored. Non-destructive: existing subtitles are kept and the new run appends its own.
  • Delete only the source video to free storage while keeping the job and its subtitle files.

Admin — video retention

  • Scheduled cleanup deletes source videos older than a configurable window — default 7 days, daily at 03:00 (0 3 * * *) — and never touches jobs or subtitles. Enable/disable, retention days, and the cron schedule are editable in Settings.
  • Run cleanup now button plus a history of the last 7 runs (status, trigger, checked/deleted counts, storage freed). Runs that removed files open a modal listing the deleted videos.

Fixes

  • Worker result uploads now use a unique storage key, so re-running a job never overwrites a previously produced (possibly hand-edited) subtitle.

Notes

  • Adds database migrations 0006 and 0007, applied automatically on startup.
  • Video cleanup is enabled by default — after upgrading, the first scheduled run removes source videos of jobs finished more than the retention window ago. Adjust or disable it in Admin → Settings.

v0.2.2 — fix S3 editor/video/download under strict CSP

29 Jun 00:10

Choose a tag to compare

Patch: completes the S3-safe download fix from v0.2.1.

Fixes

  • Editor subtitle loading, video preview, and result downloads now work with S3-compatible storage. v0.2.1 added the server proxy but the browser still tried the presigned S3 URL first, which the app's Content-Security-Policy (connect-src/media-src 'self') blocks because it points at a different host — so the editor showed "Failed to load subtitles", the video stayed black, and the console filled with CSP errors.
  • Now the client picks the right URL up front: a same-origin presigned URL (local backend → /api/files/…) is used directly, while a cross-origin presigned URL (S3 host) is streamed through the same-origin API proxy instead. No failed cross-origin request, and the CSP stays strict.
  • Applies to result downloads, the editor's subtitle load, and the editor's video preview. Local-storage deployments are unchanged.

🤖 Generated with Claude Code

v0.2.1 — S3-safe downloads (server-proxy fallback)

28 Jun 23:32

Choose a tag to compare

Patch: reliable result & video downloads with S3-compatible storage.

Fixes

  • Downloads no longer break on non-public S3/Garage buckets. Presigned URLs that a browser can't reach (private bucket, or an S3 SigV4 "Date is too old" clock-skew/signature error) now transparently fall back to a same-origin streaming proxy through the API — which fetches the object with its own credentials, so no public bucket and no clock sync are required.
    • New: GET /api/jobs/{id}/results/{resultId}/download (result download) and GET /api/jobs/{id}/video/raw (video stream, with HTTP Range for editor seeking).
    • The results download button tries the presigned URL first, then the proxy; the editor video falls back to the proxy once if the presigned URL won't play.
  • Local-storage deployments are unaffected (already same-origin); the presigned path is still used whenever it works.

🤖 Generated with Claude Code

v0.2.0 — rebuilt frontend, in-browser OCR, security & quality pass

28 Jun 23:19

Choose a tag to compare

SubtitleExtractor v0.2.0 — a from-scratch frontend, a real in-browser OCR path, and a broad security + quality pass.

Docker images for this release are built and pushed to GHCR by CI on publish. Set IMAGE_TAG=0.2.0 (or latest) in your .env to run prebuilt images via docker compose.

✨ Highlights

Frontend rebuilt from scratch ("Fusion")

  • New stack: Vite + React + TypeScript, Tailwind v4, TanStack Router + Query, Zustand, cmdk (⌘K palette), react-hook-form + zod, wavesurfer.js v7, self-hosted Geist fonts. Dark-first identity (cyan + amber).
  • Rebuilt every screen: login (local + OIDC), dashboard (drag-drop upload + live job list), job detail (live SSE log + results), admin (workers / substitutions / users / settings), and the subtitle editor.

Unified media player

  • One player across the app (preview and editor): play/pause, clickable/seekable progress bar, elapsed/total time, and keyboard transport (Space, ←/→ ±5s, Shift ±1s, Home). Drives both a native <video> and a WebCodecs/canvas backend for MKV/HEVC.

Subtitle editor & results

  • Save dialog: choose a filename and overwrite the current file or save as a new one (no more silent duplicate on every save).
  • Per-cue delete (X) in the cue table; per-result delete on the job page (deleting the last result removes the whole job + its files).
  • OCR language hint selector restored; zone layout + language remembered across extractions (localStorage).
  • Profile menu on the avatar: edit your display name / email / password (local accounts; OIDC profiles are read-only), Admin shortcut, sign out.

In-browser OCR (offline, privacy-preserving)

  • Fully working end-to-end and on par with server coverage for short clips.
  • Self-hosted OCR models and the onnxruntime WASM runtime (no third-party CDN at runtime → works offline).
  • Cross-origin isolation (COOP/COEP) enables multi-threaded WASM; WebGPU used when available.
  • Sequential frame decoding (true per-time frames instead of keyframe snaps) so OCR sees every subtitle; consensus voting + noise filtering for cleaner cues.

🔒 Security

  • Refuse to boot on placeholder JWT_SIGNING_KEY / INTERNAL_API_TOKEN; removed weak docker-compose credential fallbacks.
  • Rate limiting on /auth/login + /auth/register; session cookie Secure auto-enabled on HTTPS.
  • OIDC admin-claim now requires both claim and value (no accidental admin escalation).
  • Internal worker endpoints bound to the claiming worker; nginx security headers + CSP.

🐞 Fixes

  • Backend: atomic result/job deletes; handleSaveResult rejects oversize uploads and keeps storage keys consistent; worker progress/heartbeat on a vanished job returns 409 (worker stops cleanly instead of wedging).
  • Worker: accurate frame timestamps (real PTS) + safe ffmpeg→OpenCV fallback; prompt cancellation + per-job heartbeat on long jobs; SRT/VTT sanitization; PP-OCR no longer drops recognized lines; substitution-regex ReDoS hardening.
  • Frontend: editor follows the /video JSON {url} contract; WebCodecs decoder leak, SSE log duplication, and waveform drag jitter fixed; web-demuxer WASM served same-origin.

♿ Accessibility & polish

  • Accessible modals (role/aria + focus trap + Esc), larger touch targets, aria-labels on icon buttons, locale-aware number inputs.
  • Native checkboxes/radios replaced with custom switches/controls; clickable job rows, staggered entrance animations, skeleton loaders, route transitions (all respecting prefers-reduced-motion).

📦 Dependencies

  • Go: pgx 5.10, golang-jwt 5.3, go-oidc 3.19, env 11.4.1, x/crypto/x/net bumps; toolchain Go 1.25; base images golang:1.25-alpine / alpine:3.22; pinned MinIO images.
  • Web: React 19, Vite 8, TypeScript 6, zod 4, web-demuxer 4, jassub 2, lucide 1, TanStack/wavesurfer/tailwind minors; build image node:24-alpine.
  • Worker: onnxruntime ≥1.27, pillow ≥11, opencv 4.x floor.

🤖 Generated with Claude Code

v0.1.5 — container healthchecks + friendly subtitle filenames

28 Jun 16:47

Choose a tag to compare

Changes since v0.1.4

  • Container healthchecks on api, web and the NVIDIA worker (postgres/minio already had them):
    • api: HEALTHCHECK on /healthz.
    • web: nginx now serves /healthz (200) and the image healthchecks it.
    • nvidia-worker: a daemon thread touches a liveness file every ~5s (fresh even mid-job); unhealthy if missing or >60s old.
    • compose: web and nvidia-worker now wait for the API condition: service_healthy.
  • Friendly subtitle download names — downloads are now named after the source video (e.g. Movie.mkvMovie.srt) instead of an opaque storage token. Applies to the job results download, the in-browser editor export, and browser-side extraction. The API sets Content-Disposition from a sanitized ?name=.

Container images (rebuilt by CI)

  • ghcr.io/dim145/subtitleextractor-api:0.1.5
  • ghcr.io/dim145/subtitleextractor-web:0.1.5
  • ghcr.io/dim145/subtitleextractor-worker-nvidia:0.1.5

macOS worker

subtitleextractor-worker-macos-v0.1.5.zip is attached — unzip → cp .env.example .env → set API_BASE_URL + INTERNAL_API_TOKEN./run.sh.

v0.1.4 — OCR substitutions, idle RAM reclaim, macOS worker zip

28 Jun 16:10

Choose a tag to compare

Changes since v0.1.3

  • Inter-worker OCR substitution rules — a dedicated Admin → Substitutions page (table editor: find→replace, regex toggle, per-language, inline regex validation). Rules are global and applied by every worker to recognized text after merging — for fixing recurring OCR mistakes or stripping watermarks. (DB migration 0004.)
  • Idle RAM reclamation — the OCR model now runs in a disposable child process that is killed after the idle grace period, returning all RAM and VRAM to the OS (the previous in-process unload freed VRAM but not host RAM). The worker parent process stays small.
  • Ready-to-run macOS worker zip attached to this release (see Assets below): subtitleextractor-worker-macos-v0.1.4.zip. Unzip → cp .env.example .env → set API_BASE_URL + INTERNAL_API_TOKEN./run.sh. First run sets up a venv and installs deps (MLX GPU backend on Apple Silicon).

Container images (rebuilt by CI)

  • ghcr.io/dim145/subtitleextractor-api:0.1.4
  • ghcr.io/dim145/subtitleextractor-web:0.1.4
  • ghcr.io/dim145/subtitleextractor-worker-nvidia:0.1.4

v0.1.3 — confidence-weighted character-level consensus

28 Jun 15:08

Choose a tag to compare

Reliability improvement to subtitle text accuracy.

Changes since v0.1.2

  • Character-level confidence-weighted consensus replaces whole-string majority
    voting when merging the frames of a cue. The text is now voted character by
    character across all frames of a subtitle, weighted by each frame's OCR
    confidence — repairing single-character misreads that no single frame got fully
    right, and letting a high-confidence reading override a wrong majority. This is
    the highest-impact reliability technique from the OCR research (documented
    20-50% reduction of single-pass errors). The per-frame OCR confidence is now
    threaded through the worker pipeline (it was previously discarded).
  • Conservative post-OCR normalization (whitespace cleanup only — no character edits).
  • New char_voting toggle in the worker config (default on).

Container images (rebuilt by CI)

  • ghcr.io/dim145/subtitleextractor-api:0.1.3
  • ghcr.io/dim145/subtitleextractor-web:0.1.3
  • ghcr.io/dim145/subtitleextractor-worker-nvidia:0.1.3

v0.1.2 — slim NVIDIA worker image

28 Jun 14:30

Choose a tag to compare

Image-size optimization of the NVIDIA worker.

Changes since v0.1.1

  • NVIDIA worker image slimmed ~13.7 GB → ~8 GB (~40%): rebuilt on
    python:3.13-slim-trixie (Debian 13, Python 3.13) instead of the all-in-one
    PaddlePaddle devel image. paddlepaddle-gpu's pip wheel already ships its CUDA
    userspace libs (nvidia-*-cu12), so the CUDA/Paddle base only duplicated them
    and dragged in TensorRT + the CUDA toolkit + build tools we never use. The
    remaining ~8 GB is the floor for GPU PaddleOCR (paddle ~3.4 GB + CUDA ~2.8 GB).
  • .dockerignore so the local .venv (~1.1 GB) is never copied into the image.
  • Dockerfile layering ordered for cache reuse (paddle / paddleocr / worker layers).

Python 3.13 is the newest version with a paddlepaddle-gpu 3.3.1 wheel (cp38..cp313).

Container images

Rebuilt and pushed to GHCR by CI on this release:

  • ghcr.io/dim145/subtitleextractor-api:0.1.2
  • ghcr.io/dim145/subtitleextractor-web:0.1.2
  • ghcr.io/dim145/subtitleextractor-worker-nvidia:0.1.2

v0.1.1 — latest OCR stack + working NVIDIA image

28 Jun 12:05

Choose a tag to compare

Maintenance release: upgrades the OCR stack to the latest libraries/models and fixes the NVIDIA worker image.

Changes since v0.1.0

  • OCR stack upgraded:
    • RapidOCR → unified rapidocr 3.x (better accent accuracy).
    • PP-OCR → PaddleOCR 3.x (device= / .predict() API).
    • PaddleOCR-VL (Apple GPU/MLX) → default model 1.5-8bit, with filtering of the model's "no text / too blurry" meta responses.
  • NVIDIA worker image fixed & modernized: now built on the official paddlepaddle/paddle:3.3.1-gpu-cuda12.6-cudnn9.5 base (bundles CUDA 12.6 + paddlepaddle-gpu). This replaces the v0.1.0 NVIDIA image, which was broken (deps silently failed to install). CUDA 12.6 chosen for broad host-driver compatibility (~R535+).
  • docker-compose: supports prebuilt GHCR images via image: + IMAGE_TAG (docker compose pull && up -d).

Container images

Rebuilt and pushed to GHCR by CI on this release:

  • ghcr.io/dim145/subtitleextractor-api:0.1.1
  • ghcr.io/dim145/subtitleextractor-web:0.1.1
  • ghcr.io/dim145/subtitleextractor-worker-nvidia:0.1.1

(The macOS OCR worker runs natively on the host — see the README.)

v0.1.0 — first release

28 Jun 10:28

Choose a tag to compare

First public release of SubtitleExtractor — extract hardcoded (burned-in) subtitles from video via OCR and edit them in the browser.

Highlights

  • Extraction pipeline: upload → automatic worker routing → OCR → downloadable SRT / ASS / WebVTT.
  • OCR backends (configurable per worker): RapidOCR (CPU), PP-OCR (CUDA), PaddleOCR-VL (Apple GPU / MLX).
  • Quality filters: text-mask change detection, presence gate, crop upscaling, persistence/duration/junk filters, majority-vote merge, and VLM anti-hallucination.
  • In-browser editor: live caption overlay, editable cue table, waveform timeline (wavesurfer.js).
  • 100% browser extraction option (WebCodecs + onnxruntime-web / WebGPU).
  • Up to 2 subtitle zones with WebCodecs preview.
  • Accounts (local + OIDC), admin (users, settings, DB-backed dynamic worker config), live progress/logs via SSE.
  • Storage: local filesystem or S3/MinIO. Hardware decode: NVDEC / VideoToolbox.

Container images

Published to GHCR by CI on this release:

  • ghcr.io/dim145/subtitleextractor-api
  • ghcr.io/dim145/subtitleextractor-web
  • ghcr.io/dim145/subtitleextractor-worker-nvidia

(The macOS OCR worker runs natively on the host — see the README.)

Getting started

```bash
cp .env.example .env # set JWT_SIGNING_KEY, INTERNAL_API_TOKEN, ...
docker compose up --build
```

Licensed under AGPL-3.0.