End-to-end Data Science project — Sales Analytics + LightGBM forecasting model + Interactive Plotly Dash dashboard for Rossmann Store Sales.
-
Updated
May 21, 2026 - Jupyter Notebook
End-to-end Data Science project — Sales Analytics + LightGBM forecasting model + Interactive Plotly Dash dashboard for Rossmann Store Sales.
Comprehensive Machine Learning Portfolio: Real-world data science, classification, regression, and business analytics in Python
Automated classification of 7 different types of dry beans using machine learning techniques. This project leverages computer vision-extracted geometric and shape features (such as Area, Perimeter, and Shape Factors) to accurately identify bean varieties including Barbunya, Bombay, Cali, Dermason, Horoz, Seker, and Sira.
ScoutIQ is a football intelligence and match prediction platform that uses FIFA-style data to deliver scouting insights, team comparisons, EDA, feature engineering, ML model benchmarking, explainability, and Flask-based win probability predictions.
Open-source proof-of-concept repository for Sensera, developed as Founding ML Engineer, implementing an end-to-end pipeline from data ingestion and epoch-based feature extraction to anomaly detection, risk scoring, and explainable analysis. Built using scikit-learn and tensorflow
Credit Risk Prediction System is an end-to-end machine learning project that predicts loan default risk using customer financial data. It applies EDA, feature encoding, and advanced models like Random Forest and XGBoost, and is deployed via Streamlit for real-time credit risk assessment.
Retail margin restoration project: Identified and eliminated 1.42% margin erosion by fixing flawed discount and retention Opex allocation.
Predicting customer booking completion for British Airways using Random Forest & XGBoost. Built as part of the Forage Data Science Virtual Job Simulation. Covers EDA, feature engineering, class imbalance handling, threshold tuning, and ROC-AUC model comparison.
Two-part project combining a PySpark MLlib pipeline (83.12% accuracy) with a GCP cloud architecture proposal for real-time patient monitoring. Covers feature engineering, Random Forest classification, and HIPAA-compliant healthcare infrastructure using BigQuery, Vertex AI, and Cloud Healthcare API.
Comprehensive AutoML framework that automates data preprocessing, feature engineering, model selection, hyperparameter tuning, and deployment. Features neural architecture search and automated data cleaning pipelines.
An end-to-end Machine Learning project predicting laptop prices using hardware specs. Includes advanced data cleaning, Feature Engineering (Regex for Resolution, Touchscreen extraction), and benchmarking between Linear Regression and Random Forest Regressors. Achieved a 14% improvement in MAE via ensemble modeling. Built with Python & Scikit-Learn.
An end-to-end machine learning project for predicting house prices using regression models, feature engineering, and hyperparameter tuning.
Energy prediction model that compares linear and nonlinear regression for building efficiency.
This repository contains my machine learning homework tasks and their implementations. It includes data preprocessing, feature engineering, model training, evaluation, and prediction pipelines using Python and popular ML libraries.
Exploratory data analysis projects using Python, Pandas, NumPy, Matplotlib, and Seaborn. Covers data cleaning, visualization, statistical analysis, and insight extraction from real-world datasets.
ML-based IPO listing gain predictor using subscription demand and market sentiment, with planned extensions including grey market premium and company fundamentals for improved investment decision insights.
Customer churn prediction using machine learning classification models and feature engineering.
Add a description, image, and links to the feature-engineering-ml topic page so that developers can more easily learn about it.
To associate your repository with the feature-engineering-ml topic, visit your repo's landing page and select "manage topics."