A Machine Learning-based video recommendation engine inspired by YouTube’s algorithm.
This project uses Natural Language Processing (NLP) and Content-Based Filtering to recommend videos based on metadata like title, tags, views, likes, and categories from the Kaggle YouTube Trending Dataset.
📁 Notebook: youtubeIndia.ipynb
📊 Dataset: INvideos.csv
(YouTube Trending Videos in India)
This system mimics a simplified version of YouTube’s recommendation algorithm by:
✅ Content-Based Filtering – Recommends videos based on textual similarity of metadata
✅ NLP Integration – Uses TF-IDF to convert text into machine-readable vectors
✅ Customizable Recommendations – Choose how many recommendations to display
✅ Clean Data Pipeline – Removes duplicates, handles missing values, and formats text
✅ Interactive Testing – Get recommendations for any input video title
INvideos.csv
into Pandas DataFrametitle
, tags
, and description
into a single text featuregit clone https://github.com/your-username/youtube-recommendation-system.git cd youtube-recommendation-system --- # Visualizing the Metrics ```python import matplotlib.pyplot as plt def plot_metrics(metrics, k_values): """ Plot Precision, Recall, and F1 scores against different values of K. Parameters: - metrics: Dictionary containing 'precision', 'recall', and 'f1' lists. - k_values: List of K values to plot. """ plt.figure(figsize=(10, 6)) plt.plot(k_values, metrics['precision'], label='Precision@K', marker='o') plt.plot(k_values, metrics['recall'], label='Recall@K', marker='s') plt.plot(k_values, metrics['f1'], label='F1@K', marker='^') plt.xlabel('K (Number of Recommendations)') plt.ylabel('Score') plt.title('Evaluation Metrics vs. Number of Recommendations') plt.legend() plt.grid(True) plt.show() # Example usage metrics = { 'precision': [0.40, 0.50, 0.60, 0.70], 'recall': [0.67, 0.75, 0.80, 0.85], 'f1': [0.50, 0.60, 0.69, 0.77] } k_values = [5, 10, 15, 20] plot_metrics(metrics, k_values)