Skip to content
View palomacdev's full-sized avatar

Highlights

  • Pro

Block or report palomacdev

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
palomacdev/README.md

Hi, I'm Paloma Cordeiro 👋

Data Engineer · MLOps Engineer · Motorsport Analytics

I build end-to-end data systems — from real-time ingestion pipelines to production-ready ML infrastructure.

Currently working as a Data Lead, I own the full data lifecycle at my company: SQL Server administration, ETL pipelines, REST APIs, and BI delivery. Outside of work, I build MLOps and motorsport analytics systems that push into streaming, model tracking, and race simulation.

I don't just build models — I build the systems around them.


🚀 Featured Projects

⚡ Real-Time Fraud Detection Pipeline

Kafka · Spark Structured Streaming · MLflow · Docker Compose · PySpark

  • End-to-end streaming architecture: producer → Spark processing → ML inference → MLflow tracking
  • Fraud detection models tuned for 96% recall — deliberate business trade-off over precision
  • ROC-AUC improved from 0.53 → 0.77 through feature engineering within the streaming pipeline
  • Fully reproducible environment via Docker Compose with domain-based service architecture

🔗 palomacdev/ml-lab


🎤 OpenF1 Transcribe — Real-Time Audio Processing

FastAPI · OpenAI Whisper · MongoDB · Docker Compose

  • Microservices architecture separating API layer from async batch processing workers
  • Transcribes thousands of F1 team radio files: ~2.2s/audio · <50ms API latency
  • Enables full-text search across race communications via structured MongoDB indexing
  • Production-level repo: MIT license, contributing guidelines, modular codebase

🔗 palomacdev/openf1-transcribe


🏎️ DRS Data — Motorsport ML & Simulation Platform

XGBoost · Scikit-learn · SHAP · FastF1 · Python

  • Qualifying grid prediction model achieving ~3 position MAE
  • XGBoost selected over Random Forest based on lower prediction error and stronger generalization
  • Race simulation engine modeling strategy scenarios and tire degradation
  • SHAP explainability for model interpretation and validation
  • Built custom feature engineering using telemetry, track characteristics, and driver performance history

🔗 palomacdev/drs_data


🛠️ Tech Stack

⚙️ Data Engineering

Apache Kafka Apache Spark Apache Airflow

🧠 Machine Learning & MLOps

MLflow FastAPI Docker Scikit-learn XGBoost

🗄️ Databases

SQL Server MongoDB PostgreSQL

☁️ Cloud & Languages

AWS GCP Python Pandas


🎯 Current Focus

  • Machine Learning for real-time decision systems
  • MLOps and model lifecycle in production
  • Feature engineering on streaming data
  • Motorsport analytics and simulation systems

📊 GitHub Stats

GitHub Streak

Visitors

Visitors


📫 Let's Connect

Pinned Loading

  1. drs_data drs_data Public

    A high-performance Machine Learning system for Formula 1® analysis, prediction, and race simulation.

    Python 2

  2. ml-lab ml-lab Public

    Real-Time Fraud Detection Pipeline with Kafka, Spark & MLflow

    Python

  3. openf1-transcribe openf1-transcribe Public

    AI-powered transcription service for F1 team radios — FastAPI, Whisper, MongoDB, Docker

    Python

  4. f1-race-analysis f1-race-analysis Public

    Comprehensive performance analysis of the Grand Prix using official F1 telemetry data

    Jupyter Notebook