I. Data Preparation
The dataset employed in this study was obtained from the Electronics_5.json corpus [1]. To ensure computational efficiency, only the first 5,000 entries were extracted for subsequent analysis. A pandas DataFrame was constructed, retaining the attributes reviewerID, asin (product identifier), and overall (rating). To increase reliability in user behavior modeling, only users who had provided ratings for at least five products were retained. Thereafter, a user–item rating matrix was constructed, where rows corresponded to users, columns represented products, and the entries denoted rating values.
II. Exploratory Data Analysis
Exploratory Data Analysis (EDA) was performed to examine the structural properties of the dataset. Descriptive statistics of the ratings were computed to assess distributional characteristics. The number of unique users and products was determined, and the sparsity of the user–item matrix was calculated to quantify rating density. Furthermore, the top 10 users were identified based on rating activity, thereby highlighting the most active participants within the dataset.
III. Popularity-Based Recommendation
A baseline recommendation model was implemented using a popularity-based approach. Products were ranked according to the number of reviews received in the training dataset. Based on this metric, a function was designed to recommend the top five most popular products to any user. This approach serves as a non-personalized benchmark for evaluating more sophisticated recommendation techniques.
IV. Model-Based Recommendation using SVD
To achieve personalized recommendations, Singular Value Decomposition (SVD) was applied to the user–item ratings matrix. SVD enabled dimensionality reduction by decomposing the matrix into latent factors that capture user preferences and product characteristics. Predicted ratings were then generated by reconstructing the matrix from these latent representations. Finally, a function was implemented to recommend items by ranking products according to their predicted ratings for a given user.
📌 References
[1] Dataset Source: Electronics_5.json, Amazon Product Review Dataset, Stanford SNAP.