gapsong

ML Engineer · LLM Efficiency & Quantization

I work at the intersection of extreme model compression and production-grade fine-tuning.

What I actually do: I make large language models run on hardware that wasn't supposed to support them — without destroying what makes them useful.

Shipped:

🔧 Official QA-LoRA implementation in Hugging Face PEFT → PR #2571 · PR #2664
📄 Master Thesis @ TU Berlin (supervised by Prof. Samek & Prof. Müller, Fraunhofer HHI): "Accelerating Quantization-Aware Training of 2-bit Compact LLMs" → Proposed SA-SVD: -63% training VRAM vs. standard LoRA, +150 perplexity points recovery on broken 2-bit models → Proposed DRA: error-based adapter initialization for parameter-efficient fine-tuning on resilient architectures
🧪 SA-SVD reference implementation (open source, reproducible) → gapsong/sa-svd-qa-lora → Measured at 2-bit across three LLMs: better WikiText perplexity on every model tested (-15% to -48%) at identical training budget — and it makes Qwen2-1.5B trainable where standard random-init QA-LoRA diverges with inf gradients

Stack: PyTorch · Hugging Face (PEFT, Transformers, TRL) · GPTQ · bitsandbytes · Slurm · CUDA · AWS

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gapsong

Achievements

Achievements

Block or report gapsong

ML Engineer · LLM Efficiency & Quantization

Pinned Loading

Uh oh!