SubtitleExtractor

Extract hardcoded (burned-in) subtitles from videos (mp4/mkv) using OCR, and edit the resulting subtitles directly in the browser. Containerized web app.

See PLAN.md for the full architecture, decisions, and roadmap.

Status

End-to-end extraction plumbing is in place: upload a video → a worker claims it → OCR pipeline → downloadable SRT/ASS. The control plane (M1–M3) is verified end-to-end.

Milestone	Scope	State
M1	API socle, auth, DB, storage	✅ done
M2	Upload + jobs (queue via Postgres `SKIP LOCKED`)	✅ done
M3	macOS worker + claim protocol	✅ done¹
M4	SSE real-time progress/logs	✅ done (EventSource + polling fallback)
M6	NVIDIA worker	✅ built (Docker overlay + ppocr GPU) — unverified (no NVIDIA hw)
M5	Frontend: auth + dashboard + job detail + editor	✅ done²
M7	Client-side (browser) extraction	✅ built (WebCodecs + onnxruntime-web/WebGPU, code-split)
—	Admin (users, settings, workers) + DB-backed dynamic worker config	✅ done
—	2-zone subtitle-area selector (WebCodecs, browser-only)	✅ done³

¹ The Go control plane + claim protocol are tested end-to-end. The worker's OCR pipeline compiles and is wired; running it on real video needs ffmpeg + the Python deps on the host (brew install ffmpeg, then ./worker/run-macos.sh).

² React + Vite frontend in web/ — "Cutting Room" dark pro-tool theme (amber/cyan, Archivo/Geist/JetBrains Mono). Login (local + OIDC), dashboard (upload + live job list), job detail (progress/logs/downloads), and the subtitle editor (video preview with live ASS overlay via JASSUB/libass-wasm, editable cue table synced to playback, \an alignment, SRT/ASS export) — all verified against the running API. The waveform timeline (wavesurfer) + save-to-server are part of the editor. Dev: cd web && npm install && npm run dev (proxies /api to localhost:8080).

³ Workers register themselves in the DB (heartbeat), and their OCR config (backend, fps, confidence, zones…) is admin-editable and pushed via the heartbeat's config_version — no restart. Admin pages (/admin, admin-only) manage users, site settings (registration toggle, defaults), and workers (status, enable/disable, per-worker config, delete). Job routing is automatic. The subtitle-area selector lets users draw up to two zones over the video; MKV/HEVC are decoded in-browser via WebCodecs + a WASM demuxer (falls back to <video> for MP4/H.264). Up to two zones are merged into one ASS with \an alignment from each zone's position.

Layout

api/      Go API (chi) — auth, jobs, storage, SSE, /internal worker protocol
worker/   Python OCR worker (shared pipeline; macOS + NVIDIA backends)
web/      React + Vite frontend (subtitle editor)

Quick start

cp .env.example .env          # then edit secrets (JWT_SIGNING_KEY, INTERNAL_API_TOKEN, ...)

Run prebuilt images (from GitHub Container Registry — no local build):

docker compose pull           # fetch ghcr.io/dim145/subtitleextractor-* images
docker compose up -d          # starts postgres + minio + api + web

Pin a version instead of latest with IMAGE_TAG (in .env or inline): IMAGE_TAG=0.1.0 docker compose up -d.

Or build locally (for development):

docker compose up --build

# App (frontend):  http://localhost:3000   (nginx serves the SPA + proxies /api)
# API:             http://localhost:8080
# health check:    http://localhost:8080/healthz
# MinIO console:   http://localhost:9001

The macOS OCR worker runs natively on the host (Docker on macOS can't reach the GPU): brew install ffmpeg && cd worker && ./run-macos.sh.

The API runs database migrations automatically on startup.

Tech stack

API: Go, chi, pgx, River (job queue), coreos/go-oidc, argon2id, minio-go
DB: PostgreSQL 16
Worker: Python — ffmpeg + RapidOCR / PP-OCRv5 / PaddleOCR-VL (configurable)
Frontend: React + Vite + TypeScript; ass-compiler, JASSUB, wavesurfer.js

License

Licensed under the GNU Affero General Public License v3.0 or later (AGPL-3.0) — see LICENSE. If you run a modified version to provide a network service, the AGPL requires you to offer your modified source to its users.

Name		Name	Last commit message	Last commit date
Latest commit History 64 Commits
.github/workflows		.github/workflows
api		api
web		web
worker		worker
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
PLAN.md		PLAN.md
README.md		README.md
docker-compose.nvidia.yml		docker-compose.nvidia.yml
docker-compose.yml		docker-compose.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SubtitleExtractor

Status

Layout

Quick start

Tech stack

License

About

Uh oh!

Releases 10

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SubtitleExtractor

Status

Layout

Quick start

Tech stack

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 10

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages