Professional AI-powered tool for removing hard-coded subtitles from videos and images
Features | Installation | Usage | Configuration | CLI | Troubleshooting
Video Subtitle Remover Pro uses real AI neural networks to remove hard-coded subtitles and text watermarks from videos and images. Unlike simple blur or crop methods, it intelligently fills in removed areas with content that matches the surrounding video.
Based on YaoFANGUK/video-subtitle-remover, enhanced with a professional interface, real LaMa inpainting, multi-engine detection, and 12-language support.
- Real Video Inpainting -- Temporal Background Exposure (TBE) reconstructs the true background from neighbouring frames where the subtitle is absent. No external model weight downloads required.
- Real AI Inpainting -- LaMa neural network for still-frame and residual refinement (via
simple-lama-inpainting) - AUTO Inpaint Routing -- Per-batch routing between TBE and LaMa based on exposure score
- Multi-Engine Detection -- RapidOCR (ONNX PP-OCR, 4-5x faster, leak-free) > PaddleOCR > Surya (GPL opt-in) > EasyOCR > OpenCV fallback chain (automatic)
- Lossless Pipeline -- FFV1 lossless intermediate (only the final encode is lossy) for noticeably cleaner outputs than the legacy mp4v intermediate
- HEVC + AV1 Output -- Pick H.264 / H.265 / AV1 from a dropdown; NVENC/QSV/AMF for HW encoding, libx265 / libsvtav1 software fallback
- Multi-region Masks -- Draw multiple subtitle rects on a scrubbable video frame; backend honours every rect
- Inpaint Preview -- "Preview cleanup" button runs detect + inpaint on the selected frame so you can A/B settings before committing
- Seamless Boundaries -- Gaussian alpha feathering at every inpaint boundary, no visible cut lines
- ~50 Language Support -- English / Chinese / Japanese / Korean / European, plus Thai, Vietnamese, Polish, Greek, Ukrainian, Filipino, Hebrew, Czech, and more
- GPU Acceleration -- NVIDIA CUDA, AMD/Intel DirectML, hardware-decode hints (D3D11 / VAAPI / MFX), CPU fallback
- Subtitle Region Selector -- Scrub to any frame and draw one or more rectangles
- Batch Processing -- Queue files or drag entire folders; per-item cancellation
- Multi-track Audio + Loudness Normalisation -- Pass through every audio track on Bluray rips; optional per-stream EBU R128 normalisation to LUFS targets (YouTube -14, Apple -16, broadcast -23)
- Quality Self-Test -- PSNR / SSIM report with an ROI-cropped metric (measures the inpaint region, not the unchanged background) and an optional side-by-side comparison PNG
- CLI + Presets --
python -m backend.processor --pattern ... --preset "YouTube (default)"; six built-in presets + user presets persisted to%APPDATA% - Chyron vs Subtitle Filter -- Keep persistent text (logos, lower-thirds) and remove dialogue, or vice versa
- Karaoke Grouping -- Per-syllable boxes fuse into a single line mask so highlighted lyrics do not leak through the gaps
- Live Preview During Processing -- 15 FPS throttled preview piped from the backend worker
- Pre-batch ETA Estimate -- 30-frame detect probe seeds the ETA so users see "about X left" from the very first frame
- Crash-Resume Checkpointing -- SHA-256 input fingerprint per file; re-running a glob skips finished work
- Premium Dark UI -- Cohesive design system with custom sliders, toggles, status chips, taskbar progress, onboarding modal
- Settings Persistence -- All knobs saved/restored between sessions; versioned schema with backfill migration
- CI/CD Releases -- Automated Windows builds via GitHub Actions, pip-audit dependency scan included
| Component | Minimum | Recommended |
|---|---|---|
| OS | Windows 10 | Windows 11 |
| CPU | Intel i5 / AMD Ryzen 5 | Intel i7 / AMD Ryzen 7 |
| RAM | 8 GB | 16+ GB |
| GPU | Any (CPU mode) | NVIDIA RTX 2060+ |
| VRAM | - | 6+ GB |
| Python | 3.10 | 3.12 |
- Download or clone this repository
- Double-click
Run_VSR_Pro.bat— first run automatically:- Creates a virtual environment
- Detects your GPU and installs appropriate packages
- Installs PaddleOCR, EasyOCR, and LaMa inpainting
- Launches the application
- Use
Run_VSR_Pro_Debug.batif you want the same bootstrap flow with a visible console for troubleshooting
cd VideoSubtitleRemover
# Create virtual environment
python -m venv venv
.\venv\Scripts\activate
# Install PyTorch (choose one):
# NVIDIA:
pip install torch==2.7.0 torchvision==0.22.0 --index-url https://download.pytorch.org/whl/cu118
# CPU:
pip install torch==2.7.0 torchvision==0.22.0 --index-url https://download.pytorch.org/whl/cpu
# Install dependencies
pip install -r requirements.txt
# Run
python VideoSubtitleRemover.pywinget install ffmpegpython -m unittest discover -s tests -v- Launch via
Run_VSR_Pro.bat - Add files — Click to browse, press
Ctrl+O, right-click for folders, or drag & drop - Select algorithm — LAMA (recommended), STTN, or ProPainter
- Set language if subtitles are non-English
- Optionally set region — Click "Set Region" to draw a rectangle on the subtitle area
- Start Processing and monitor progress
- Select a queue item to preview it, use Review mask to confirm detection, and double-click the preview for a larger source frame
| Algorithm | Inpainting Engine | Speed | Quality | Best For |
|---|---|---|---|---|
| STTN | Temporal Background Exposure | Fastest | Great | Live-action video with changing subtitles (default) |
| LAMA | Neural (LaMa) | Medium | Best still-frame | Images, animations, static backgrounds |
| ProPainter | TBE + LaMa refinement | Slowest | Best motion | Motion-heavy footage, thick/decorative text |
All three modes now do real inpainting. STTN recovers the literal background from adjacent frames where the subtitle is absent -- this works because hard-coded subtitles are sparse in time, and the pixels behind them are revealed whenever the text changes or disappears. LAMA is a single-frame neural fill. ProPainter is a hybrid: TBE reconstructs the background, then LaMa refines any residual.
The app automatically selects the best available engine:
| Priority | Engine | Install | Languages | Notes |
|---|---|---|---|---|
| 1 | RapidOCR (ONNX PP-OCR) | pip install rapidocr |
100+ | 4-5x faster than PaddleOCR, leak-free (default) |
| 2 | PaddleOCR (PP-OCRv5) | pip install paddleocr>=3.0.0 |
106 | High accuracy reference implementation |
| 3 | Surya | pip install surya-ocr |
90+ | Layout-aware (GPL) |
| 4 | EasyOCR | pip install easyocr |
80+ | Legacy fallback |
| 5 | OpenCV fallback | Built-in | Any | Threshold-based |
Process files from the command line:
python -m backend.processor -i input.mp4 -o output.mp4 -m lama --lang en --crf 20| Flag | Description | Default |
|---|---|---|
-i, --input |
Input file path | Required |
-o, --output |
Output file path | Required |
--pattern |
Glob pattern for batch (e.g. inputs/*.mp4) |
- |
--out-dir |
Output directory for batch mode | - |
--config |
JSON config overlay | - |
--preset NAME |
Apply a built-in or user preset by name | - |
--list-presets |
List every preset and exit | - |
-m, --mode |
Algorithm (sttn/lama/propainter/auto) | sttn |
--codec |
Output codec (h264/h265/av1) | h264 |
-g, --gpu |
GPU device ID (-1 for CPU) | 0 |
-l, --lang |
Detection language | en |
--crf |
Output quality (15-35, lower=better) | 23 |
--skip-detection |
Use manual region only | Off |
--fast |
LAMA fast mode | Off |
--no-audio |
Strip audio | Off |
--single-audio |
Mux only first audio stream | Off |
--loudnorm <LUFS> |
EBU R128 loudness target (0 disables) | 0 |
--frame-skip N |
Reuse mask for N frames (0=every frame) | 0 |
--mask-dilate N |
Expand masks by N pixels | 8 |
--no-hw-encode |
Force software encoding | Off |
--decode-accel |
HW decode hint (off/auto/d3d11/vaapi/mfx) | off |
--keep-chyrons |
Leave persistent text (logos / lower-thirds) | Off |
--keep-subtitles |
Leave dialogue subtitles | Off |
--karaoke-grouping |
Fuse per-syllable boxes on the same line | Off |
--quality-report |
Compute PSNR/SSIM after each run | Off |
--quality-sheet |
Side-by-side comparison PNG | Off |
--validate-config |
Print resolved config and exit | Off |
--skip-existing |
Skip files whose output already exists | Off |
--no-prefetch |
Disable worker-thread frame prefetcher | Off |
--json-log PATH |
Append a structured JSON-line log | - |
Settings are stored in %APPDATA%\VideoSubtitleRemoverPro\settings.json and persist across sessions.
| Setting | Description | Default | Range |
|---|---|---|---|
| Neighbor Stride | STTN temporal window | 10 | 5-30 |
| Reference Length | STTN reference frames | 10 | 5-30 |
| Max Load Frames | Batch size | 30 | 10-100 |
| CRF Quality | Output quality (lower=better) | 23 | 15-35 |
| Output Codec | H.264 / H.265 / AV1 | h264 | h264/h265/av1 |
| Frame Skip | Reuse detection mask for N frames | 0 | 0-10 |
| Mask Dilate | Expand detected regions (px) | 8 | 0-20 |
| Mask Feather | Soft alpha-blend at boundary (px) | 4 | 0-15 |
| TBE Coverage | Min frames a pixel must be unmasked to trust its exposure | 3 | 1-10 |
| HW Encoding | Use NVENC/QSV/AMF if available | On | On/Off |
| HW Decode Hint | cv2 HW-accel hint with software fallback | off | off/auto/d3d11/vaapi/mfx |
| Loudness Target | EBU R128 LUFS target (0 = off) | 0 | 0 or -70..-5 |
| Multi-track Audio | Pass through every audio stream | On | On/Off |
| Quality Sheet | Side-by-side PNG next to output | Off | On/Off |
CUDA out of memory
- Reduce Max Load Frames in Advanced Settings
- Switch to LAMA mode (lower VRAM)
- Use CPU mode as fallback
No audio in output
- Install FFmpeg:
winget install ffmpeg - Ensure "Preserve original audio" is checked
Poor detection accuracy
- Try changing the detection language to match your subtitles
- Use "Set Region" to manually define the subtitle area
- Install PaddleOCR for best detection accuracy
Application won't start
- Ensure Python 3.10+ is installed
- Delete
venvfolder and re-run setup - Try
Run_VSR_Pro_Debug.batto keep the console open during startup - Check the log file:
%APPDATA%\VideoSubtitleRemoverPro\vsr_pro.log
- GUI log panel (collapsible, click "Open Log File" for full log)
- File log:
%APPDATA%\VideoSubtitleRemoverPro\vsr_pro.log(5MB rotating)
VideoSubtitleRemover/
|-- VideoSubtitleRemover.py # Main GUI application
|-- backend/
| |-- __init__.py # Module exports
| |-- processor.py # Core processing (detection + inpainting + mux)
| |-- presets.py # Shared preset library (GUI + CLI)
| `-- model_hashes.py # Vendored SHA-256 weight hashes
|-- docs/
| `-- architecture.md # Pipeline map for new contributors
|-- ROADMAP.md # Shipped log + ordered backlog + research bench
|-- TODO.md # Active checklist (single source of truth)
|-- RESEARCH_FEATURE_PLAN.md # Audit companion (historical analysis)
|-- setup.py # First-time environment setup
|-- Run_VSR_Pro.bat # Windows launcher
|-- Run_VSR_Pro_Debug.bat # Windows launcher with a visible console
|-- build_exe.bat # PyInstaller build script
|-- requirements.txt # Python dependencies
|-- tests/ # Focused regression coverage for hardened paths
|-- .github/workflows/
| `-- build.yml # CI/CD release workflow + pip-audit
|-- assets/ # Application assets
|-- models/ # AI model weights (auto-downloaded)
`-- output/ # Default output location
See docs/architecture.md for a walkthrough of the detect -> tracker -> mask -> TBE -> refine -> mux pipeline and the "add a new feature" checklist.
- Original project: YaoFANGUK/video-subtitle-remover
- LaMa inpainting: simple-lama-inpainting
- EasyOCR: JaidedAI/EasyOCR
- STTN: Learning Joint Spatial-Temporal Transformations
- ProPainter: sczhou/ProPainter
This project is licensed under the MIT License.
Video Subtitle Remover Pro -- Built by SysAdminDoc