🛍️ Customer Satisfaction Prediction in E-Commerce

This project uses machine learning models to predict whether an e-commerce customer is satisfied (1) or dissatisfied (0) based on delivery behavior, payment value, freight cost, and product attributes. We explore techniques like class imbalance handling, feature importance analysis, and model evaluation using various ML models.

📌 Project Overview

Problem Type: Binary classification
Target: Customer Satisfaction (0 = Dissatisfied, 1 = Satisfied)
Dataset: Brazilian E-commerce Dataset (Kaggle)
Key Features: Delivery delay, freight ratio, payment value, product category
Models Used: Logistic Regression, Random Forest, XGBoost, MLP

🔍 Class Distribution

🔄 Before Binary Classification

✅ After Binary Classification

⚖️ Handling Imbalanced Data

Converted review scores into binary classes
Used class_weight='balanced' in training
Applied stratified train-test split
Evaluated using weighted precision, recall, F1, and ROC-AUC

🧠 Feature Importance

Random Forest ranked these as most important:

payment_value
order_freight_ratio
freight_value
delivery_days

🚚 Delivery Performance

📊 Satisfaction vs Delivery Status

Late deliveries were strongly associated with dissatisfaction.

📦 Top Product Categories by Delays

Some categories had more frequent shipping delays.

💸 Freight Value Insights

Categories with high freight value totals:

🤖 Model Evaluation

Model	Accuracy	Weighted F1	ROC AUC
Logistic Regression	72.4%	72.8%	65.9%
Random Forest	80.6%	77.4%	69.9%
XGBoost	72.8%	73.9%	70.1%
MLP Neural Network	73.1%	73.9%	68.9%

📈 ROC AUC Comparison

🧠 Key Insights

Delivery delays, high freight costs, and lower order value drive dissatisfaction.
Random Forest offered the best balance between accuracy and interpretability.
Structured metadata alone can predict satisfaction fairly well — even without NLP or text embeddings.

📁 Repository Structure

customer_satisfaction_prediction/
├── data/                           # Raw & processed CSVs (not included in repo)
├── notebook/                       # Jupyter notebook used for training and evaluation
├── plots/                          # Visualizations (moved to root for ReadyTensor)
├── models/                         # (Optional) Trained model files
├── README.md                       # This file

🔮 Future Enhancements

Add sentiment analysis from review text
Use SHAP values for explainability
Build an interactive Streamlit dashboard
Explore ensemble models and meta-learning

📎 Dataset Reference

🗃️ Kaggle: Brazilian E-Commerce Dataset

👨‍💻 Author

Created by Abhishek Malaviya and team
Powered by Python, Scikit-learn, XGBoost, and JupyterLab

Customer Satisfaction Prediction