Skip to content
This repository was archived by the owner on Feb 22, 2026. It is now read-only.

Mmm11222/Boat-Sales-strategic-insights

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🛥️ European Boat Market: Full-Stack Data Science Analysis

Capstone Project: Predictive Modeling, Clustering & Time-Series Forecasting


🎯 Project Overview

This project is a comprehensive data science exploration of the European boat market, designed to provide the marketing team of a yacht sales platform with actionable insights to power their weekly newsletter. I conducted an end-to-end data science workflow, moving from business-centric Exploratory Data Analysis (EDA) to advanced predictive modeling and market segmentation.

📊 Data Sourcing & Enrichment

  • Primary Dataset: Sourced from Kaggle (2021). It contains nearly 10,000 listings of yachts and boats across Europe, including technical specifications (material, age, size) and engagement metrics (number of views in the last 7 days).
  • Macro-Economic Data: To enrich the analysis, complementary time-series data was pulled from FRED (Federal Reserve Economic Data) via the Quandl API to analyze the Producer Price Index (PPI) trends within the industry.

🤖 Machine Learning & Advanced Statistics

1. Unsupervised Learning: K-means Clustering & PCA

To identify distinct market segments and buyer profiles, I implemented a robust clustering process:

  • Dimensionality Reduction: Used PCA (Principal Component Analysis) to reduce features while preserving 80% of the variance, ensuring more stable and interpretable clusters.
  • Optimization: Applied the Elbow Technique to determine the optimal number of clusters ($k=6$).
  • Impact: Identified 6 unique boat segments based on price-to-size ratios, allowing the marketing team to tailor newsletter content for specific owner profiles.

2. Predictive Modeling: Multiple Linear Regression

I developed models to test if physical boat attributes could forecast user engagement:

  • Approach: Progressed from Simple to Multiple Linear Regression using scikit-learn, incorporating Price, Age, and Boat Area ($m^2$).
  • Key Finding: With an $R^2$ of 0.025, the model statistically demonstrated that "Visits" are driven by complex, non-linear factors (such as location or manufacturer prestige) rather than just physical dimensions or price.

3. Time-Series Analysis: PPI Trends

Analyzed long-term economic trends for the boat industry to help sellers time their listings:

  • Testing: Performed the Augmented Dickey-Fuller Test to check for stationarity.
  • Transformation: Applied Differencing to stationarize the data, successfully reducing the $p$-value from 0.96 to near zero ($2.59e^{-16}$), making the series ready for forecasting.

🛠️ Tech Stack & Skills

  • Analysis: Python (Pandas, NumPy, Scipy).
  • Machine Learning: Scikit-Learn (LinearRegression, KMeans, PCA).
  • Time-Series: Statsmodels (ADF Test, Decomposition, ACF).
  • Visualization: Seaborn, Matplotlib, Scikit-Plot, and Tableau Public.
  • Data Wrangling: Standardization, handling missing values, and outlier detection.

📂 Project Structure

  • 01 Management: Project Brief and strategic marketing goals.
  • 02 Data: Raw datasets, cleaned versions (Excel/CSV), and standardized data for PCA.
  • 03 Scripts: Jupyter Notebooks covering the entire pipeline: EDA, Regression (Simple & Multiple), Clustering, and Time-Series Analysis.
  • 04 Analysis: Visualizations, Tableau Dashboards, and the final executive report.

🚀 Presentation Tableau Public


*Note: This project was developed as part of a professional Data Analytics certification by CareerFoundry.

About

End-to-end analysis of the European Boat Market using Multiple Regression, K-means Clustering (PCA), and Time-Series forecasting (ADF test) to drive marketing insights.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors