A complete end-to-end movie recommendation system built using traditional collaborative filtering techniques and modern Graph Neural Networks (GNNs) on the MovieLens 20M dataset.
This project compares classical recommendation approaches such as ItemKNN and Matrix Factorization (MF-BPR) with graph-based deep learning models including LightGCN and GraphSAGE. The system also integrates domain-specific knowledge such as movie genres and tags using heterogeneous graph representations.
Recommendation systems are a core component of modern platforms like Netflix, Amazon, Spotify, and YouTube. Their purpose is to reduce information overload by suggesting content that users are most likely to prefer.
In this project, I developed a large-scale movie recommendation system using the MovieLens 20M dataset containing:
- 20 Million+ ratings
- 138K+ users
- 27K+ movies
- 465K+ tag interactions
The project explores how graph-based learning can improve recommendation systems by modeling users, movies, genres, and tags as interconnected graph structures.
The major goals of this project were:
- Build and compare multiple recommendation models
- Explore Graph Neural Networks for recommendation systems
- Incorporate domain-specific knowledge using genres and tags
- Evaluate ranking performance using recommendation metrics
- Deploy an interactive recommendation dashboard using Streamlit
- Store and share trained models using Hugging Face
- Traditional collaborative filtering approach
- Uses cosine similarity between movies
- Recommends movies similar to previously liked movies
- Strong baseline model for recommendation systems
- Simple and interpretable
- Effective on dense interaction datasets
- Uses nearest-neighbor similarity
- Learns latent embeddings for users and movies
- Optimized using pairwise ranking loss
- Captures hidden user preference patterns
- Embedding Dimension: 64
- Learning Rate: 0.001
- Epochs: 5
- Simplified Graph Convolutional Network for recommendation
- Uses user-item interaction graph
- Focuses on neighborhood aggregation
- Embedding Dimension: 128
- Propagation Layers: 3
- Learning Rate: 0.001
- Epochs: 15
- Learns node embeddings through neighborhood aggregation
- Supports heterogeneous graphs
- Integrates:
- Users
- Movies
- Genres
- Tags
- Hidden Dimension: 128
- Layers: 3
- Dropout: 0.05
- Epochs: 15
The project uses the MovieLens 20M benchmark dataset released by GroupLens Research.
| Attribute | Value |
|---|---|
| Total Ratings | 20,000,263 |
| Total Users | 138,493 |
| Total Movies | 27,278 |
| Total Tag Applications | 465,564 |
ratings.csvmovies.csvtags.csv
The preprocessing pipeline included:
- Data cleaning
- Missing value handling
- Duplicate verification
- User and movie ID encoding
- Genre extraction
- Tag processing
- Graph construction
- Users
- Movies
- Genres
- Tags
- User โ Movie
- Movie โ Genre
- Movie โ Tag
This graph representation enabled the models to capture both collaborative and semantic relationships.
The recommendation models were evaluated using ranking-based metrics:
- Precision@10 / Precision@20
- Recall@10 / Recall@20
- NDCG@10 / NDCG@20
- HitRate@10 / HitRate@20
A leave-last-interaction-out strategy was used for testing.
Negative sampling was also applied using:
- 1 positive item
- 499 negative samples
| Model | Recall@10 | NDCG@10 | HitRate@10 |
|---|---|---|---|
| ItemKNN | 0.6817 | 0.4166 | 0.6817 |
| MF-BPR | 0.5751 | 0.3397 | 0.5751 |
| LightGCN | 0.4888 | 0.2810 | 0.4888 |
| GraphSAGE | 0.4888 | 0.2810 | 0.4888 |
- Built a complete recommendation system pipeline from preprocessing to deployment
- Successfully implemented and compared 4 recommendation models
- Constructed heterogeneous graphs using domain-specific knowledge
- Evaluated recommendation quality using ranking metrics
- Developed a real-time Streamlit recommendation dashboard
- Hosted trained models on Hugging Face for reproducibility
Through this project, I gained hands-on experience in:
- Collaborative filtering techniques
- Ranking-based recommendation evaluation
- Latent factor models
- LightGCN
- GraphSAGE
- Graph construction for recommender systems
- Node embeddings and neighborhood aggregation
- Large-scale data preprocessing
- Model evaluation pipelines
- Negative sampling strategies
- Hyperparameter tuning
- Streamlit dashboard development
- Hugging Face model hosting
- Real-time recommendation generation
An interactive Streamlit application was developed to:
- Select recommendation models
- Enter User IDs
- Generate Top-K recommendations
- Compare outputs from different algorithms
- Real-time recommendations
- Model comparison
- Top-10 and Top-20 movie recommendations
- Genre and tag visualization
All trained models were uploaded to Hugging Face for reproducibility and deployment.
Hosted models include:
- ItemKNN
- MF-BPR
- LightGCN
- GraphSAGE
| Model | Hugging Face Link |
|---|---|
| ItemKNN | View Model |
| MF-BPR | View Model |
| LightGCN | View Model |
| GraphSAGE | View Model |
git clone https://github.com/Anjali2220/graph-recommender-system-movielens.git
cd graph-recommender-system-movielens
pip install -r requirements.txt
streamlit run src/eda/streamlit_app.py- Python
- Pandas
- NumPy
- PyTorch
- PyTorch Geometric
- Scikit-learn
- Streamlit
- Matplotlib
- Hugging Face
- Add transformer-based recommendation models
- Integrate temporal user interaction modeling
- Deploy the Streamlit dashboard publicly
- Optimize graph training for scalability
- Explore knowledge graph-based recommendation systems
โโโ data/
โโโ notebooks/
โโโ models/
โโโ streamlit_app/
โโโ preprocessing/
โโโ evaluation/
โโโ requirements.txt
โโโ README.md

