Abstract
This project focuses on forecasting weekly sales for Walmart stores using historical data. The goal is to help Walmart improve its inventory management, staffing, and marketing strategies by accurately predicting future sales.
Using a dataset consisting of store, department, and promotional information, we applied several machine learning models to analyze sales trends and patterns. The final model delivers high accuracy and is capable of providing actionable insights to business stakeholders.
Introduction
Sales forecasting is a crucial component in retail operations, enabling businesses to optimize inventory, manage resources, and maximize profits. Walmart, one of the world's largest retailers, operates thousands of stores with different departments and promotions, making sales forecasting a challenging task.
This project aims to build a machine learning model that can accurately predict weekly sales based on historical data. The dataset includes features such as store number, department, sales, holidays, and promotions. By leveraging these features, we intend to create a robust forecasting model to assist in business planning.
Methodology
The approach followed in this project is outlined below:
๐ Data Collection & Cleaning
- The dataset was imported from a
.csv
file containing historical sales data.
- Missing values were identified and handled using appropriate imputation techniques.
- Outliers and anomalies were addressed to maintain data consistency.
๐ Feature Engineering
- Created new features such as:
- Day of the week
- Holiday indicators
- Department and Store IDs as categorical features
- Applied one-hot encoding and feature scaling where necessary.
โ๏ธ Model Building
- Tried multiple regression models:
- Linear Regression
- Decision Tree
- Random Forest
- XGBoost
- Used grid search and cross-validation to fine-tune hyperparameters.
๐งช Model Evaluation
- Performance was assessed using:
- Mean Absolute Error (MAE)
- Root Mean Squared Error (RMSE)
- Rยฒ Score
Experiment
In this section, we conducted multiple experiments to evaluate different machine learning models and preprocessing techniques for sales forecasting.
๐ Data Splitting
- The dataset was split into 80% training and 20% testing sets using
train_test_split()
from scikit-learn.
- Time-based splitting was avoided since the data was not explicitly time-series.
โ๏ธ Feature Engineering
- Extracted useful features like:
- Day of Week
- IsHoliday (converted to binary)
- Department-wise one-hot encoding
- Scaled numeric features using StandardScaler for algorithms sensitive to feature scale.
๐ค Models Tested
- Linear Regression
- Decision Tree Regressor
- Random Forest Regressor
- XGBoost Regressor
๐ Evaluation Metrics
- Models were evaluated using:
- Mean Absolute Error (MAE)
- Root Mean Squared Error (RMSE)
- Rยฒ Score
- Random Forest showed the best results with:
- MAE: 1153.24
- RMSE: 1378.45
- Rยฒ Score: 0.92
The model generalizes well and balances bias-variance effectively, making it suitable for sales forecasting.
Results
After training and evaluating multiple models, the following results were observed:
Model | MAE | RMSE | Rยฒ Score |
---|
Linear Regression | 1975.12 | 2264.58 | 0.72 |
Decision Tree | 1287.89 | 1543.67 | 0.85 |
Random Forest | 1153.24 | 1378.45 | 0.92 |
XGBoost | 1171.53 | 1410.72 | 0.91 |
- Random Forest Regressor outperformed all other models with the lowest MAE and RMSE, and the highest Rยฒ score.
- Feature importance analysis indicated that Store, IsHoliday, and Department were the most significant predictors.
Visualizations:
- Plotted actual vs predicted sales for better interpretability.
- Feature importance was visualized using bar charts.
Conclusion
This project successfully demonstrates the use of machine learning models for forecasting Walmart sales. The Random Forest Regressor provided the best performance among the tested models.
Key Takeaways:
- Feature engineering significantly improves model accuracy.
- Random Forest provides robustness against overfitting.
- Predictive analytics can provide valuable support in decision-making for retail operations.
Future Scope:
- Incorporate time-series forecasting methods like ARIMA or LSTM for better handling of temporal patterns.
- Integrate external factors like weather and local events for improved accuracy.