I build end-to-end data systems — from real-time ingestion pipelines to production-ready ML infrastructure.
Currently working as a Data Lead, I own the full data lifecycle at my company: SQL Server administration, ETL pipelines, REST APIs, and BI delivery. Outside of work, I build MLOps and motorsport analytics systems that push into streaming, model tracking, and race simulation.
I don't just build models — I build the systems around them.
Kafka · Spark Structured Streaming · MLflow · Docker Compose · PySpark
- End-to-end streaming architecture: producer → Spark processing → ML inference → MLflow tracking
- Fraud detection models tuned for 96% recall — deliberate business trade-off over precision
- ROC-AUC improved from 0.53 → 0.77 through feature engineering within the streaming pipeline
- Fully reproducible environment via Docker Compose with domain-based service architecture
FastAPI · OpenAI Whisper · MongoDB · Docker Compose
- Microservices architecture separating API layer from async batch processing workers
- Transcribes thousands of F1 team radio files: ~2.2s/audio · <50ms API latency
- Enables full-text search across race communications via structured MongoDB indexing
- Production-level repo: MIT license, contributing guidelines, modular codebase
🔗 palomacdev/openf1-transcribe
XGBoost · Scikit-learn · SHAP · FastF1 · Python
- Qualifying grid prediction model achieving ~3 position MAE
- XGBoost selected over Random Forest based on lower prediction error and stronger generalization
- Race simulation engine modeling strategy scenarios and tire degradation
- SHAP explainability for model interpretation and validation
- Built custom feature engineering using telemetry, track characteristics, and driver performance history
- Machine Learning for real-time decision systems
- MLOps and model lifecycle in production
- Feature engineering on streaming data
- Motorsport analytics and simulation systems
- 📧 palomacordeiro2009@hotmail.com
- 🔬 Architecture & experimental projects: palomahub-arch

