This project is focused on analyzing the Titanic dataset using Python, with the goal of predicting passenger survival. https://www.kaggle.com/datasets/yasserh/titanic-dataset
The analysis includes:
- Data preprocessing and cleaning
- Exploratory Data Analysis (EDA)
- Feature engineering
- Machine learning model preparation
- Python 3.x
- pandas
- numpy
- matplotlib
- seaborn
- scikit-learn
The project uses the famous Titanic dataset, which includes the following features:
| Feature | Description |
|---|---|
| Survival | Survival (0 = No, 1 = Yes) |
| Pclass | Ticket class (1 = 1st, 2 = 2nd, 3 = 3rd) |
| Sex | Gender |
| Age | Age in years |
| SibSp | Number of siblings/spouses aboard |
| Parch | Number of parents/children aboard |
| Fare | Passenger fare |
| Embarked | Port of embarkation (C, Q, S) |
- Initial data loading and inspection
- Basic data cleaning (removing unnecessary columns)
- Feature encoding for categorical variables
- Correlation analysis using heatmap
- Data splitting into training and test sets
- Feature engineering
- Model selection and training
- Model evaluation
- Predictions on test data
├── data/
│ ├── train.csv
│ └── test.csv
└── sample.ipynb
This project is currently under development. Future updates will include:
- Complete feature engineering
- Implementation of various machine learning models
- Model performance comparison
- Final predictions and analysis
- Clone the repository
- Ensure you have all required dependencies installed
- Run the Jupyter notebook to see the analysis