Sean Collins SMC17

Sean Collins

Systems engineer building ML infrastructure from first principles. Inference kernels, vector search, tokenization, agentic verification, and quant tooling. Zig/Elixir/OCaml.

✉️ sean@sunlitmoon.online · 🔗 sunlitmoon.online

ML inference & search substrate

Repo	What it is	Evidence
inference	Pure-Zig LLM serving — paged KV-cache, BF16 kernels, persistent thread pool, safetensors integration. TinyLlama-1.1B end-to-end on CPU.	77 tests · 6.17× decode speedup at M=8 vs baseline · STATUS.md canonical
tokenizers-zig	Pure-Zig HF tokenizers — BPE, WordPiece, Unigram, full pipeline, offsets, `tokenizer.json` compat.	189 tests + 600-iter fuzz · ~5.3× faster on WordPiece · parity verified vs 🤗 tokenizers
faiss-zig	Pure-Zig ANN — Flat, HNSW, IVFFlat, IVFPQ; SIMD `@Vector` distance kernels, multi-threaded `searchBatch`.	68 tests · 16.94× memory compression on IVFPQ at reference recall · cross-validated vs FAISS C++
safetensors-zig	Pure-Zig safetensors reader — `@Vector(32,u8)` structural scan, BF16/F32/I8, libc-only.	21 tests · 241µs parse on 201-tensor TinyLlama fixture · upstream fixture contribution

Agentic systems & safety

Repo	What it is	Evidence
evals-agentic-control	Pre-registered adversarial prompt battery for agentic computer-use safety — 40 prompts × 5 categories × 2 models. Two-stage verdict pass: rule-based + LLM-as-judge on unclear cells. 94.4% safe (CONFIRMED ≥ 80%). 4 confirmed full-compliance cells in C5-cron-social-pressure. Judge Type-II rate on full-compliance label documented.	161 cells · verdicts.jsonl + combined-summary.json · 7 Type-I/II catches documented
stax-mast	Single-binary editor kernel — buffer-as-protocol, Janet (Lisp) extensible, in-process capability sandbox, adversarial verification loop.	64 tests · 65 Janet bindings removed/gated · mutation-tested security check · AGPL

Quant & statistical verification

Repo	What it is	Evidence
quant-validation-zig	Bailey–López de Prado bias-defense: Probabilistic & Deflated Sharpe Ratio, Purged + Combinatorial-Purged K-fold, CPCV backtest path distribution (positive_fraction anti-luck check).	46 tests · `erfc` to ~3×10⁻⁸ vs scipy · reproducible Nix builds
zsym	Rigorous statistical cryptanalysis substrate — Miller-Madow entropy, n-gram LMs, monoalphabetic hill-climber, polyalphabetic (Vigenère) IC period detection + per-position solver, stationary bootstrap CIs. Applied to Voynich EVA.	37 tests · bit-exact parallel/serial parity · deterministic seeding · Python parity verified
zig-h3	Uber H3 v4 geospatial index — idiomatic wrapper + pure-Zig port of all 70 public functions.	211 tests · 27k+ cross-validation cases · 94.4% coverage · property/fuzz/mutation
rippled-zig	XRPL protocol toolkit — canonical tx encoding, secp256k1/Ed25519 sign+verify, live testnet RPC conformance.	5-gate gated process (build → serialize → crypto parity → live RPC → security)

Open-source contributions

NixOS/nixpkgs — CrowdSec hermetic-deploy option; ydotool uinput autoload (open)
HuggingFace/tokenizers — BPE legacy-merge disambiguation fix
FAISS (facebookresearch) — IVFPQ recall documentation fix
safetensors — Llama-3.2 shape-fixture generator; header whitespace-tolerance spec
PyO3 · clap-rs/clap — docs and builder-method contributions

How I work

Evidence vocabulary is fixed. sketch → compiled → unit-tested → integration-tested → audited → benchmarked. Words like production or verified require the evidence that makes them true.
Pre-register before claiming. Falsifier written before the test runs. Type I (overclaim) and Type II (missed risk) tracked as separate error classes. Inconclusive results reported honestly.
Tests are the spec. Priors update when proved wrong.

sunlitmoon.online

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sean Collins SMC17

Achievements

Achievements

Highlights

Block or report SMC17

Sean Collins

ML inference & search substrate

Agentic systems & safety

Quant & statistical verification

Open-source contributions

How I work

Pinned Loading

Uh oh!