Skip to content
View ivanrivasgr's full-sized avatar
😉
😉
  • Corpus Christi, Texas
  • LinkedIn in/ifrg

Block or report ivanrivasgr

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
ivanrivasgr/README.md

👋 Hi, I'm Ivan F Gruber

Data Engineer · Analytics Pipelines · Sports Data · BI & Visualization Remote · Corpus Christi, TX

LinkedIn GitHub Email


🚀 About Me

Data Engineer with 5+ years designing and operating analytics pipelines on GCP and AWS, with a focus on sports data infrastructure, real-time event processing, and cloud-based ETL systems.

Built production pipelines handling live MLB game feeds at Sportradar/Synergy Sports. At Vikua, delivered GCP analytical models that cut time-to-insight by 45%, maintained 99.7% pipeline uptime, and reduced cloud compute costs by 18% across 6 client environments.

Currently completing an MIT MicroMaster in Statistical Modeling & Computation. Fluent in English and Spanish.

💡 My focus: turning technical execution into measurable business impact.


⚙️ Tech Stack

Languages: Python, SQL, Ruby Cloud: GCP (BigQuery, Cloud Composer, Cloud Storage), AWS (S3, Redshift), Azure SQL ETL & Orchestration: Airflow, dbt, Tray.io, Zapier, REST APIs, Pandas, Terraform Streaming: Kafka / Redpanda, Apache Flink (event time, windowing) Lakehouse: Apache Iceberg, Parquet, DuckDB BI & Viz: Power BI, Metabase, Plotly, Streamlit, ParaView, QGIS Sports Data: Statcast, pybaseball, MLB StatsAPI, pitch-by-pitch tracking Quality / Ops: Great Expectations, pytest, GitHub Actions, OpenLineage


📊 Featured Projects

⚾ BaseballIQ — Production MLB Analytics Platform

End-to-end MLB platform: Statcast ingestion → Bronze/Silver/Gold (DuckDB) → XGBoost CSW model (+SHAP) → AI scouting reports backed by Claude → Streamlit dashboard. Live demo · Repo

🩺 Sports Injury Risk — Data Architecture & ML Pipeline

12-page interactive app on real injury data (Real Madrid 2021–2025): medallion stack, feature store, point-in-time-correct features, drift detection, CI/CD — with explicit epistemic limits. Live demo · Repo

⚽ Soccer Data Platform

End-to-end soccer tracking pipeline: ingestion, validation, Parquet transforms, analytics layer, CI/CD, an Airflow DAG and Terraform provisioning a 3-layer AWS S3 data lake. Live demo · Repo

☁️ GCP Data Architecture with PII Anonymization

PII-safe pipeline unifying heterogeneous sources into a Master User Model in BigQuery (Bronze/Silver/Gold), with SHA256 hashing and boolean masking, orchestrated via Cloud Composer. Repo

🏗️ Financial Analytics for Construction Projects

SQL models + interactive Metabase dashboards tracking income, expenses, and profitability across multiple construction projects. Repo

🔁 Bullpen Signal — Streaming vs Batch Decision Engine

Dual-path (Flink streaming + dbt/Iceberg batch) engine for pitcher fatigue, bullpen readiness, and matchup leverage, with a reconciliation layer measuring where each architecture wins. Architecture complete; implementation in progress. Repo


🧠 Interests

  • Sports data infrastructure & real-time event pipelines
  • Cloud data architecture & orchestration (Airflow, Terraform, dbt)
  • Streaming vs batch trade-offs (Flink, Kafka, Iceberg)
  • BI automation, geospatial & scientific visualization

📬 Contact

📧 ivanfgruber@gmail.com 🌐 linkedin.com/in/ifrg


"Architecture is not about storing data — it's about how data flows to create value."

Pinned Loading

  1. ruby-dropbox-file-automation ruby-dropbox-file-automation Public

    Automated workflow that reads files from Dropbox, transforms CSVs (cleaning and formatting data), and sends them to a data pipeline — fully serverless and powered by Ruby + Cron scheduling.

    Ruby

  2. financial_analytics_construction_projects financial_analytics_construction_projects Public

    End-to-end financial analytics for construction companies: SQL models + interactive Metabase dashboards for income, expenses, and profitability.

    1