This project uses machine learning models to predict whether an e-commerce customer is satisfied (1) or dissatisfied (0) based on delivery behavior, payment value, freight cost, and product attributes. We explore techniques like class imbalance handling, feature importance analysis, and model evaluation using various ML models.


class_weight='balanced' in trainingRandom Forest ranked these as most important:
payment_valueorder_freight_ratiofreight_valuedelivery_days
Late deliveries were strongly associated with dissatisfaction.

Some categories had more frequent shipping delays.

Categories with high freight value totals:

| Model | Accuracy | Weighted F1 | ROC AUC |
|---|---|---|---|
| Logistic Regression | 72.4% | 72.8% | 65.9% |
| Random Forest | 80.6% | 77.4% | 69.9% |
| XGBoost | 72.8% | 73.9% | 70.1% |
| MLP Neural Network | 73.1% | 73.9% | 68.9% |

customer_satisfaction_prediction/
├── data/ # Raw & processed CSVs (not included in repo)
├── notebook/ # Jupyter notebook used for training and evaluation
├── plots/ # Visualizations (moved to root for ReadyTensor)
├── models/ # (Optional) Trained model files
├── README.md # This file
🗃️ Kaggle: Brazilian E-Commerce Dataset
Created by Abhishek Malaviya and team
Powered by Python, Scikit-learn, XGBoost, and JupyterLab