🛥️ European Boat Market: Full-Stack Data Science Analysis

Capstone Project: Predictive Modeling, Clustering & Time-Series Forecasting

🎯 Project Overview

This project is a comprehensive data science exploration of the European boat market, designed to provide the marketing team of a yacht sales platform with actionable insights to power their weekly newsletter. I conducted an end-to-end data science workflow, moving from business-centric Exploratory Data Analysis (EDA) to advanced predictive modeling and market segmentation.

📊 Data Sourcing & Enrichment

Primary Dataset: Sourced from Kaggle (2021). It contains nearly 10,000 listings of yachts and boats across Europe, including technical specifications (material, age, size) and engagement metrics (number of views in the last 7 days).
Macro-Economic Data: To enrich the analysis, complementary time-series data was pulled from FRED (Federal Reserve Economic Data) via the Quandl API to analyze the Producer Price Index (PPI) trends within the industry.

🤖 Machine Learning & Advanced Statistics

1. Unsupervised Learning: K-means Clustering & PCA

To identify distinct market segments and buyer profiles, I implemented a robust clustering process:

Dimensionality Reduction: Used PCA (Principal Component Analysis) to reduce features while preserving 80% of the variance, ensuring more stable and interpretable clusters.
Optimization: Applied the Elbow Technique to determine the optimal number of clusters ($k=6$).
Impact: Identified 6 unique boat segments based on price-to-size ratios, allowing the marketing team to tailor newsletter content for specific owner profiles.

2. Predictive Modeling: Multiple Linear Regression

I developed models to test if physical boat attributes could forecast user engagement:

Approach: Progressed from Simple to Multiple Linear Regression using scikit-learn, incorporating Price, Age, and Boat Area ($m^2$).
Key Finding: With an $R^2$ of 0.025, the model statistically demonstrated that "Visits" are driven by complex, non-linear factors (such as location or manufacturer prestige) rather than just physical dimensions or price.

3. Time-Series Analysis: PPI Trends

Analyzed long-term economic trends for the boat industry to help sellers time their listings:

Testing: Performed the Augmented Dickey-Fuller Test to check for stationarity.
Transformation: Applied Differencing to stationarize the data, successfully reducing the $p$-value from 0.96 to near zero ($2.59e^{-16}$), making the series ready for forecasting.

🛠️ Tech Stack & Skills

Analysis: Python (Pandas, NumPy, Scipy).
Machine Learning: Scikit-Learn (LinearRegression, KMeans, PCA).
Time-Series: Statsmodels (ADF Test, Decomposition, ACF).
Visualization: Seaborn, Matplotlib, Scikit-Plot, and Tableau Public.
Data Wrangling: Standardization, handling missing values, and outlier detection.

📂 Project Structure

01 Management: Project Brief and strategic marketing goals.
02 Data: Raw datasets, cleaned versions (Excel/CSV), and standardized data for PCA.
03 Scripts: Jupyter Notebooks covering the entire pipeline: EDA, Regression (Simple & Multiple), Clustering, and Time-Series Analysis.
04 Analysis: Visualizations, Tableau Dashboards, and the final executive report.

🚀 Presentation Tableau Public

*Note: This project was developed as part of a professional Data Analytics certification by CareerFoundry.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
Analysis/Visualizations		Analysis/Visualizations
Data		Data
Project_Management		Project_Management
Scripts		Scripts
info		info
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🛥️ European Boat Market: Full-Stack Data Science Analysis

Capstone Project: Predictive Modeling, Clustering & Time-Series Forecasting

🎯 Project Overview

📊 Data Sourcing & Enrichment

🤖 Machine Learning & Advanced Statistics

1. Unsupervised Learning: K-means Clustering & PCA

2. Predictive Modeling: Multiple Linear Regression

3. Time-Series Analysis: PPI Trends

🛠️ Tech Stack & Skills

📂 Project Structure

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🛥️ European Boat Market: Full-Stack Data Science Analysis

Capstone Project: Predictive Modeling, Clustering & Time-Series Forecasting

🎯 Project Overview

📊 Data Sourcing & Enrichment

🤖 Machine Learning & Advanced Statistics

1. Unsupervised Learning: K-means Clustering & PCA

2. Predictive Modeling: Multiple Linear Regression

3. Time-Series Analysis: PPI Trends

🛠️ Tech Stack & Skills

📂 Project Structure

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages