Machine Learning

This repository is a hands-on machine learning lab built around Jupyter notebooks. It covers the path from first-principles algorithm implementation to applied projects using scikit-learn, NLTK, Keras, and TensorFlow.

The notebooks are organized by topic: regression, classification, decision trees, random forests, KNN, Naive Bayes, SVM, NLP, neural networks, TensorFlow, and end-to-end prediction projects. Several notebooks implement core algorithms manually and then compare the result with library implementations, making the repository useful for both learning the mathematics and practicing real modeling workflows.

Repository Snapshot

49 Jupyter notebooks across classical ML, NLP, and neural network topics
20+ bundled CSV datasets and prediction outputs
Manual implementations for linear regression, gradient descent, decision tree splitting, Naive Bayes, and neural network forward/backward passes
Applied projects for Titanic survival prediction, Twitter airline sentiment analysis, Boston/CCPP-style regression, MNIST digit classification, and breast cancer classification
Uses numpy, pandas, matplotlib, scikit-learn, nltk, keras, and TensorFlow v1-style APIs

Learning Flow

flowchart LR
    A["Data Exploration"] --> B["Preprocessing"]
    B --> C["Model Training"]
    C --> D["Evaluation"]
    D --> E["Prediction Output"]

    B --> B1["Scaling"]
    B --> B2["Text Cleaning"]
    B --> B3["Missing Value Handling"]

    C --> C1["From-Scratch Algorithms"]
    C --> C2["scikit-learn Models"]
    C --> C3["Neural Networks"]

    D --> D1["Accuracy"]
    D --> D2["Confusion Matrix"]
    D --> D3["Classification Report"]
    D --> D4["Regression Score"]

Project Structure

.
|-- Classification Measures/
|   |-- Confusion Matrix.ipynb
|   `-- iris.csv
|-- Decision tree/
|   |-- Code Using Sklearn Decision Tree.ipynb
|   |-- Decision Tree Implementation.ipynb
|   |-- DecisionTreeImplementation_Base File.ipynb
|   |-- decision_tree_ta.ipynb
|   `-- iris.pdf
|-- Feature Scaling/
|   `-- Feature Scaling in Sklearn.ipynb
|-- KNN/
|   |-- KNN.ipynb
|   |-- Cross_Validation.ipynb
|   `-- KNN_from_scratch.ipynb
|-- Keras/
|   `-- Keras_Intro.ipynb
|-- Linear Regression/
|   |-- Analysis of LR using dummy Data.ipynb
|   |-- diabetes.ipynb
|   |-- linear_regression_by_diffrentiation.ipynb
|   `-- diabetes_train.csv / diabetes_test.csv
|-- Logistic Regression/
|   `-- Logistic regression examples
|-- MultiVariable Regression and Gradient Descent/
|   |-- Gradient Descent.ipynb
|   `-- Complex Boundaries.ipynb
|-- NLP/
|   |-- NLTK.ipynb
|   `-- Movie_review.ipynb
|-- NLP-2/
|   `-- Movie review classification notebooks
|-- Naive Bayes/
|   `-- Naive Bayes from scratch and sklearn comparison
|-- Neural Network-2/
|   `-- Neural network forward/backward propagation notebooks
|-- Neural Networks - 1/
|   `-- MLP Classifier in Sklearn.ipynb
|-- Project - Logistic Regression/
|   `-- Logistic Regression - Titanic Dataset.ipynb
|-- Project Twitter Sentiment Analysis/
|   `-- Twitter US Airline Sentiment Analysis.ipynb
|-- Projects - Gradient Descent/
|   `-- Boston and Combined Cycle Power Plant regression notebooks
|-- Random Forests/
|   `-- Random forest and decision tree comparison notebooks
|-- SVM/
|   `-- SVM decision-boundary notebooks
`-- Tensor Flow/
    |-- MNIST Tensorflow.ipynb
    |-- Digit prediction notebooks
    |-- input_data.py
    `-- MNIST_data/

Topic Guide

Area	What It Covers	Representative Files
Linear Regression	Closed-form slope/intercept, cost function, R2-style score, sklearn comparison	`Linear Regression/linear_regression_by_diffrentiation.ipynb`
Gradient Descent	Manual gradient descent loops, cost tracking, multivariable regression	`MultiVariable Regression and Gradient Descent/Gradient Descent.ipynb`, `Projects - Gradient Descent/Gradient Descent - Boston Dataset.ipynb`
Logistic Regression	Classification with sklearn logistic regression and prediction export	`Project - Logistic Regression/Logistic Regression - Titanic Dataset.ipynb`
Decision Trees	Entropy, information gain, categorical binning, sklearn tree usage	`Decision tree/Decision Tree Implementation.ipynb`, `Decision tree/Code Using Sklearn Decision Tree.ipynb`
Random Forests	Titanic data preprocessing, decision tree vs random forest comparison	`Random Forests/Random Forest vs Decision Trees.ipynb`
KNN	Breast cancer classification, cross-validation over neighbor counts	`KNN/KNN.ipynb`, `KNN/Cross_Validation.ipynb`
Naive Bayes	From-scratch probability tables, Laplace smoothing, sklearn comparison	`Naive Bayes/Implementation of Naive Bayes .ipynb`
SVM	SVM classification on Iris/dummy data, visual decision boundaries	`SVM/SVM-Iris.ipynb`, `SVM/SVM_Dummy_data.ipynb`
NLP	Tokenization, stopword removal, POS tagging, lemmatization, text classification	`NLP/NLTK.ipynb`, `NLP-2/movie_review_by_sklearn.ipynb`
Neural Networks	Forward propagation, hidden-layer experiments, MLPClassifier	`Neural Network-2/forward_propagation.ipynb`, `Neural Networks - 1/MLP Classifier in Sklearn.ipynb`
Keras	Dense neural network for breast cancer classification	`Keras/Keras_Intro.ipynb`
TensorFlow	TensorFlow v1-style variables, placeholders, MNIST digit prediction	`Tensor Flow/MNIST Tensorflow.ipynb`, `Tensor Flow/Digit_prediction_using_neural_network.ipynb`

Highlight Projects

Titanic Survival Prediction

Location: Project - Logistic Regression/

This notebook builds a logistic regression classifier for Titanic survival prediction. It performs categorical conversion for gender and embarked port, fills missing age values, removes high-cardinality/non-numeric columns, trains a LogisticRegression model, and writes predictions to output.csv.

Key ideas:

Binary classification
Missing value handling
Basic categorical encoding
Prediction export

Twitter US Airline Sentiment Analysis

Location: Project Twitter Sentiment Analysis/

This project classifies airline-related tweets by sentiment. It cleans raw tweet text, removes stopwords and punctuation, applies POS-aware lemmatization, vectorizes text with TF-IDF n-grams, and trains SVM / Multinomial Naive Bayes classifiers.

Key ideas:

NLP preprocessing with NLTK
Lemmatization and POS tagging
TfidfVectorizer with n-grams
Text classification with SVM and Naive Bayes

Gradient Descent Regression Projects

Location: Projects - Gradient Descent/

These notebooks apply gradient descent to regression datasets such as Boston-style housing data and Combined Cycle Power Plant data. They demonstrate how model parameters are iteratively updated, how cost changes during training, and how predictions are saved.

Key ideas:

Batch gradient descent
Multivariable linear regression
Cost minimization
Regression prediction files

MNIST Digit Prediction

Location: Tensor Flow/

The TensorFlow notebooks work with the bundled MNIST gzip files and input_data.py. They use TensorFlow v1-style placeholders, variables, sessions, and softmax classification to predict handwritten digits.

Key ideas:

TensorFlow graph execution
Placeholders and variables
Softmax classification
MNIST image data loading

Breast Cancer Classification

Locations: KNN/, Keras/, Logistic Regression/, Neural Networks - 1/

Multiple notebooks use the sklearn breast cancer dataset to compare classical and neural-network approaches, including KNN, logistic regression, Keras dense networks, and sklearn MLP.

Key ideas:

Train/test split
Standard scaling
KNN neighbor search
Dense neural networks
Model evaluation

From-Scratch Implementations

This repository is especially useful because several notebooks build ML logic manually before leaning on libraries:

Linear regression slope, intercept, cost, and coefficient-of-determination style score
Batch gradient descent for simple and multivariable regression
Decision tree splitting with entropy and information gain
Naive Bayes probability estimation with Laplace smoothing
Neural network forward propagation with NumPy
Basic hidden-layer neural network training logic

That mix helps connect the math behind each model with the API-level workflow used in real projects.

Datasets Included

Dataset / File	Used For
`Classification Measures/iris.csv`	Confusion matrix and classification metric practice
`Linear Regression/data.csv`	Simple linear regression from scratch
`Linear Regression/diabetes_train.csv`, `diabetes_test.csv`	Diabetes regression experiments
`MultiVariable Regression and Gradient Descent/data.csv`	Basic gradient descent experiments
`Projects - Gradient Descent/boston_test.csv`	Boston-style regression prediction
`Project - Logistic Regression/titanic_train.csv`, `titanic_test.csv`	Titanic survival classification
`Random Forests/titanic.csv` and split CSVs	Decision tree / random forest comparison
`Project Twitter Sentiment Analysis/train.csv`, `test.csv`	Airline sentiment classification
`Tensor Flow/MNIST_data/*.gz`	MNIST digit classification

Several notebooks also use built-in scikit-learn datasets, including Iris, Breast Cancer Wisconsin, Boston housing style data, and Diabetes.

Prerequisites

Recommended:

Python 3.7 or compatible Python 3.x environment
Jupyter Notebook or JupyterLab
Core packages:
- numpy
- pandas
- matplotlib
- scikit-learn
- nltk
- pydotplus
- keras
- tensorflow

For the TensorFlow notebooks, the code uses TensorFlow v1-style APIs such as:

import tensorflow.compat.v1 as tf
tf.disable_v2_behavior()

Use a TensorFlow version that supports tensorflow.compat.v1.

Setup

Clone the repository:

git clone https://github.com/devthedevil/Machine-Learning.git
cd Machine-Learning

Create and activate a virtual environment:

python3 -m venv .venv
source .venv/bin/activate

Install common dependencies:

pip install jupyter numpy pandas matplotlib scikit-learn nltk pydotplus keras tensorflow

Download common NLTK resources used by the NLP notebooks:

import nltk

nltk.download("punkt")
nltk.download("stopwords")
nltk.download("wordnet")
nltk.download("averaged_perceptron_tagger")
nltk.download("movie_reviews")

Start Jupyter:

jupyter notebook

Then open any notebook from the topic folders.

Running Notes

Run notebooks from inside their own folder when they reference local files such as train.csv, data.csv, boston_test.csv, or MNIST_data/.
Some notebooks generate output files such as output.csv, twitter.csv, and prediction CSVs.
The TensorFlow notebooks depend on the local Tensor Flow/input_data.py helper and bundled Tensor Flow/MNIST_data/ files.
The repository does not currently include a requirements.txt; the package list above was inferred from notebook imports.
The CNN/Untitled.ipynb notebook is currently empty.

Evaluation Techniques Used

Across the notebooks, the project uses:

Train/test splitting
Cross-validation
Confusion matrices
Classification reports
Accuracy scores
Regression score comparisons
Cost-function tracking during gradient descent
Visual decision boundaries for SVM experiments

Current Limitations

No centralized dependency file is included.
Most work is notebook-based rather than packaged into reusable Python modules.
Some notebooks use older APIs, including TensorFlow v1-style code and older scikit-learn defaults.
Several notebooks depend on being executed from a specific folder because CSV paths are relative.
The CNN notebook is empty and can be removed or replaced with a complete convolutional neural network example.

Possible Improvements

Add requirements.txt or environment.yml
Move reusable logic into Python modules under a src/ folder
Add notebook execution checks with nbconvert
Convert major projects into clean scripts or pipelines
Add exploratory data analysis sections to project notebooks
Add saved visualizations for model comparison
Modernize TensorFlow notebooks to TensorFlow 2 / Keras
Add a completed CNN notebook for image classification

What This Repository Demonstrates

Practical understanding of core supervised learning algorithms
Ability to implement ML fundamentals from scratch
Experience with data preprocessing, feature engineering, and model evaluation
Familiarity with NLP workflows using NLTK and scikit-learn
Exposure to neural-network workflows in Keras and TensorFlow
Comfort working with Jupyter notebooks, CSV datasets, and iterative experiments

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Machine Learning

Repository Snapshot

Learning Flow

Project Structure

Topic Guide

Highlight Projects

Titanic Survival Prediction

Twitter US Airline Sentiment Analysis

Gradient Descent Regression Projects

MNIST Digit Prediction

Breast Cancer Classification

From-Scratch Implementations

Datasets Included

Prerequisites

Setup

Running Notes

Suggested Reading Path

Evaluation Techniques Used

Current Limitations

Possible Improvements

What This Repository Demonstrates

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
CNN		CNN
Classification Measures		Classification Measures
Decision tree		Decision tree
Feature Scaling		Feature Scaling
KNN		KNN
Keras		Keras
Linear Regression		Linear Regression
Logistic Regression		Logistic Regression
MultiVariable Regression and Gradient Descent		MultiVariable Regression and Gradient Descent
NLP-2		NLP-2
NLP		NLP
Naive Bayes		Naive Bayes
Neural Network-2		Neural Network-2
Neural Networks - 1		Neural Networks - 1
Project - Logistic Regression		Project - Logistic Regression
Project Twitter Sentiment Analysis		Project Twitter Sentiment Analysis
Projects - Gradient Descent		Projects - Gradient Descent
Random Forests		Random Forests
SVM		SVM
Tensor Flow		Tensor Flow
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

Machine Learning

Repository Snapshot

Learning Flow

Project Structure

Topic Guide

Highlight Projects

Titanic Survival Prediction

Twitter US Airline Sentiment Analysis

Gradient Descent Regression Projects

MNIST Digit Prediction

Breast Cancer Classification

From-Scratch Implementations

Datasets Included

Prerequisites

Setup

Running Notes

Suggested Reading Path

Evaluation Techniques Used

Current Limitations

Possible Improvements

What This Repository Demonstrates

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages