Het Prajapati Het415

Het Prajapati

Data Scientist · ML Engineer · NLP & Agentic AI

MS Data Science @ Northeastern University. I build end-to-end ML systems; from ETL pipelines and predictive models to production-deployed LLM applications. Focused on retail analytics, agentic AI, and turning messy data into decisions.

Projects

ListingLens — Amazon Seller Intelligence Platform

Multi-stage NLP pipeline processing 250 reviews/product · BERT sentiment scoring · XGBoost return risk classifier (96.5% acc, 0.997 ROC-AUC) · RAG pipeline with FAISS + LLaMA 3 70B · Deployed on Railway + Vercel

Distributed Backtesting Engine — Algorithmic Trading

PySpark parallel framework · 123 strategies × 100 S&P 500 stocks · 12,300 backtests on 303K real market records · 5-step data governance pipeline · 9-panel Plotly BI dashboards

Spotify Breakout Predictor — Viral Music Classification

99.2% accuracy · 0.998 ROC-AUC · temporal + 5-fold cross-validation · TikTok views as dominant predictor (41% importance) · Interactive Streamlit dashboard

Stack

Languages · Python · R · SQL · Java · JavaScript

ML / AI · Scikit-learn · XGBoost · PyTorch · TensorFlow · HuggingFace · LangChain · FAISS · RAG Pipelines

Data Engineering · PySpark · ETL · Data Warehousing · Snowflake · MySQL · PostgreSQL

Deployment · FastAPI · Next.js · Railway · Vercel · AWS (CLF-C02)

Experience

Data Science Associate · Compatible Solutions (Jul 2024 – Jun 2025)

Built ETL pipelines in Python and SQL processing 100K+ records from CRM and transactional systems, enabling faster analytics for business teams
Engineered 15+ behavioral features (purchase frequency, recency scores, seasonality indices), improving forecast accuracy by 20% over legacy baseline
Designed 5+ KPI dashboards using Matplotlib and Seaborn, reducing manual reporting and supporting data-driven stakeholder decisions
Implemented data validation pipelines achieving 95%+ data quality scores across BI reporting systems

Data Science Intern · Yhills / IIT Hyderabad (Mar 2023 – May 2023)

Built ML models for H1N1 vaccine prediction (84% accuracy) and NYC taxi fare prediction (RMSE: $3.20) using Random Forest and XGBoost on 50K+ records
Engineered 25+ features from temporal, geographic, and demographic data, improving model performance by 30%
Communicated insights via visual dashboards to non-technical stakeholders using Matplotlib and Seaborn

Currently Exploring

LLM applications & agentic workflows
Scalable ML systems & deep learning
Distributed data processing (Spark + cloud)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Het Prajapati Het415

Achievements

Achievements

Highlights

Block or report Het415

Het Prajapati

Projects

Stack

Experience

Currently Exploring

Pinned Loading

Uh oh!