Aug 10, 2025●13 reads

Youtube Recommendation System

Umang Dadhich

🎬 YouTube Recommendation System | Python + Machine Learning

A Machine Learning-based video recommendation engine inspired by YouTube’s algorithm.
This project uses Natural Language Processing (NLP) and Content-Based Filtering to recommend videos based on metadata like title, tags, views, likes, and categories from the Kaggle YouTube Trending Dataset.

📁 Notebook: youtubeIndia.ipynb
📊 Dataset: INvideos.csv (YouTube Trending Videos in India)

📜 Overview

This system mimics a simplified version of YouTube’s recommendation algorithm by:

Analyzing video metadata (title, tags, description, category)
Processing text using TF-IDF Vectorization
Computing Cosine Similarity to find related content
Suggesting the Top N most similar videos

🔍 Features

✅ Content-Based Filtering – Recommends videos based on textual similarity of metadata
✅ NLP Integration – Uses TF-IDF to convert text into machine-readable vectors
✅ Customizable Recommendations – Choose how many recommendations to display
✅ Clean Data Pipeline – Removes duplicates, handles missing values, and formats text
✅ Interactive Testing – Get recommendations for any input video title

🧠 Machine Learning Workflow

1. Data Preprocessing

Load INvideos.csv into Pandas DataFrame
Remove duplicates & null values
Merge title, tags, and description into a single text feature
Normalize text (lowercase, remove punctuation, clean stopwords)

2. Feature Engineering with NLP

Apply TF-IDF Vectorization to convert text into numerical form
Weight important words higher for better accuracy
Handle multi-word tags and special characters

3. Recommendation Logic

Compute Cosine Similarity matrix on TF-IDF vectors
For a given video title:
- Find its index in the dataset
- Sort similarity scores in descending order
- Return the top N most relevant videos

4. Evaluation & Testing

Manually test by querying video titles
Compare similarity quality
Visualize popular categories & tags using Matplotlib/Seaborn

🛠️ Technologies Used

Python 3.x
Pandas – Data manipulation
NumPy – Numerical computations
Scikit-learn – TF-IDF Vectorizer & Cosine Similarity
Matplotlib / Seaborn – Data visualization
Jupyter Notebook – Development environment

🚀 Installation & Usage

1. Clone the Repository

git clone https://github.com/your-username/youtube-recommendation-system.git
cd youtube-recommendation-system

---


# Visualizing the Metrics

```python
import matplotlib.pyplot as plt

def plot_metrics(metrics, k_values):
    """
    Plot Precision, Recall, and F1 scores against different values of K.

    Parameters:
    - metrics: Dictionary containing 'precision', 'recall', and 'f1' lists.
    - k_values: List of K values to plot.
    """
    plt.figure(figsize=(10, 6))
    plt.plot(k_values, metrics['precision'], label='Precision@K', marker='o')
    plt.plot(k_values, metrics['recall'], label='Recall@K', marker='s')
    plt.plot(k_values, metrics['f1'], label='F1@K', marker='^')
    plt.xlabel('K (Number of Recommendations)')
    plt.ylabel('Score')
    plt.title('Evaluation Metrics vs. Number of Recommendations')
    plt.legend()
    plt.grid(True)
    plt.show()

# Example usage
metrics = {
    'precision': [0.40, 0.50, 0.60, 0.70],
    'recall': [0.67, 0.75, 0.80, 0.85],
    'f1': [0.50, 0.60, 0.69, 0.77]
}
k_values = [5, 10, 15, 20]

plot_metrics(metrics, k_values)