Skip to content
View SMC17's full-sized avatar
🃏
Ádh
🃏
Ádh

Highlights

  • Pro

Block or report SMC17

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
SMC17/README.md

Sean Collins

Systems engineer building ML infrastructure from first principles. Inference kernels, vector search, tokenization, agentic verification, and quant tooling. Zig/Elixir/OCaml.

✉️ sean@sunlitmoon.online · 🔗 sunlitmoon.online


ML inference & search substrate

Repo What it is Evidence
inference Pure-Zig LLM serving — paged KV-cache, BF16 kernels, persistent thread pool, safetensors integration. TinyLlama-1.1B end-to-end on CPU. 77 tests · 6.17× decode speedup at M=8 vs baseline · STATUS.md canonical
tokenizers-zig Pure-Zig HF tokenizers — BPE, WordPiece, Unigram, full pipeline, offsets, tokenizer.json compat. 189 tests + 600-iter fuzz · ~5.3× faster on WordPiece · parity verified vs 🤗 tokenizers
faiss-zig Pure-Zig ANN — Flat, HNSW, IVFFlat, IVFPQ; SIMD @Vector distance kernels, multi-threaded searchBatch. 68 tests · 16.94× memory compression on IVFPQ at reference recall · cross-validated vs FAISS C++
safetensors-zig Pure-Zig safetensors reader — @Vector(32,u8) structural scan, BF16/F32/I8, libc-only. 21 tests · 241µs parse on 201-tensor TinyLlama fixture · upstream fixture contribution

Agentic systems & safety

Repo What it is Evidence
evals-agentic-control Pre-registered adversarial prompt battery for agentic computer-use safety — 40 prompts × 5 categories × 2 models. Two-stage verdict pass: rule-based + LLM-as-judge on unclear cells. 94.4% safe (CONFIRMED ≥ 80%). 4 confirmed full-compliance cells in C5-cron-social-pressure. Judge Type-II rate on full-compliance label documented. 161 cells · verdicts.jsonl + combined-summary.json · 7 Type-I/II catches documented
stax-mast Single-binary editor kernel — buffer-as-protocol, Janet (Lisp) extensible, in-process capability sandbox, adversarial verification loop. 64 tests · 65 Janet bindings removed/gated · mutation-tested security check · AGPL

Quant & statistical verification

Repo What it is Evidence
quant-validation-zig Bailey–López de Prado bias-defense: Probabilistic & Deflated Sharpe Ratio, Purged + Combinatorial-Purged K-fold, CPCV backtest path distribution (positive_fraction anti-luck check). 46 tests · erfc to ~3×10⁻⁸ vs scipy · reproducible Nix builds
zsym Rigorous statistical cryptanalysis substrate — Miller-Madow entropy, n-gram LMs, monoalphabetic hill-climber, polyalphabetic (Vigenère) IC period detection + per-position solver, stationary bootstrap CIs. Applied to Voynich EVA. 37 tests · bit-exact parallel/serial parity · deterministic seeding · Python parity verified
zig-h3 Uber H3 v4 geospatial index — idiomatic wrapper + pure-Zig port of all 70 public functions. 211 tests · 27k+ cross-validation cases · 94.4% coverage · property/fuzz/mutation
rippled-zig XRPL protocol toolkit — canonical tx encoding, secp256k1/Ed25519 sign+verify, live testnet RPC conformance. 5-gate gated process (build → serialize → crypto parity → live RPC → security)

Open-source contributions

  • NixOS/nixpkgs — CrowdSec hermetic-deploy option; ydotool uinput autoload (open)
  • HuggingFace/tokenizers — BPE legacy-merge disambiguation fix
  • FAISS (facebookresearch) — IVFPQ recall documentation fix
  • safetensors — Llama-3.2 shape-fixture generator; header whitespace-tolerance spec
  • PyO3 · clap-rs/clap — docs and builder-method contributions

How I work

  • Evidence vocabulary is fixed. sketch → compiled → unit-tested → integration-tested → audited → benchmarked. Words like production or verified require the evidence that makes them true.
  • Pre-register before claiming. Falsifier written before the test runs. Type I (overclaim) and Type II (missed risk) tracked as separate error classes. Inconclusive results reported honestly.
  • Tests are the spec. Priors update when proved wrong.

sunlitmoon.online

Pinned Loading

  1. facebookresearch/faiss facebookresearch/faiss Public

    A library for efficient similarity search and clustering of dense vectors.

    C++ 40.4k 4.4k

  2. NixOS/nixpkgs NixOS/nixpkgs Public

    Nix Packages collection & NixOS

    Nix 25.2k 19.3k

  3. huggingface/tokenizers huggingface/tokenizers Public

    💥 Fast State-of-the-Art Tokenizers optimized for Research and Production

    Rust 10.8k 1.1k