Time Series Forecasting of Alcohol Sales
This document outlines the process of forecasting alcohol sales using time series analysis techniques. The dataset used is Alcohol_Sales.csv, which contains monthly sales data. The goal is to analyze the data, test for stationarity, and build a forecasting model using ARIMA and SARIMAX.
Data Preparation
Importing Required Libraries
To start, we need to import the necessary libraries for data manipulation and visualization.
python
Run
Copy code
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from statsmodels.tsa.stattools import adfuller
Loading the Dataset
We load the dataset into a Pandas DataFrame.
python
Run
Copy code
df = pd.read_csv('Alcohol_Sales.csv')
Renaming Columns
For clarity, we rename the columns of the DataFrame.
python
Run
Copy code
df.columns = ["Month", "Sales"]
Converting Month to Datetime
We convert the 'Month' column to a datetime format and set it as the index.
python
Run
Copy code
df['Month'] = pd.to_datetime(df['Month'])
df.set_index('Month', inplace=True)
Testing for Stationarity
Augmented Dickey-Fuller Test
We perform the Augmented Dickey-Fuller test to check for stationarity.
python
Run
Copy code
def adfuller_test(sales):
result = adfuller(sales)
labels = ['ADF Test Statistic', 'p-value', '#Lags Used', 'Number of Observations Used']
for value, label in zip(result, labels):
print(label + ' : ' + str(value))
if result[1] <= 0.05:
print("Data is stationary.")
else:
print("Data is non-stationary.")
adfuller_test(df['Sales'])
Building the Forecasting Model
ARIMA Model
We build an ARIMA model based on the identified parameters.
python
Run
Copy code
from statsmodels.tsa.arima.model import ARIMA
model = ARIMA(df['Sales'], order=(1, 1, 1))
model_fit = model.fit()
model_fit.summary()
SARIMAX Model
We also build a SARIMAX model to account for seasonality.
python
Run
Copy code
import statsmodels.api as sm
model = sm.tsa.statespace.SARIMAX(df['Sales'], order=(1, 1, 1), seasonal_order=(1, 1, 1, 12))
results = model.fit()
Forecasting Future Sales
We can visualize the forecasted sales alongside the actual sales.
python
Run
Copy code
df['forecast'] = results.predict(start=90, end=103, dynamic=True)
df[['Sales', 'forecast']].plot(figsize=(12, 8))
plt.title('Sales vs Forecast')
plt.xlabel('Month')
plt.ylabel('Sales')
plt.show()
Conclusion
This project successfully demonstrates the process of time series forecasting using ARIMA and SARIMAX models. The analysis provides insights into the trends and seasonal patterns in alcohol sales, allowing for informed decision-making in inventory and marketing strategies.
References
Data Preparation
Importing Required Libraries
To start, we need to import the necessary libraries for data manipulation and visualization.
python
Run
Copy code
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from statsmodels.tsa.stattools import adfuller
Loading the Dataset
We load the dataset into a Pandas DataFrame.
python
Run
Copy code
df = pd.read_csv('Alcohol_Sales.csv')
Renaming Columns
For clarity, we rename the columns of the DataFrame.
python
Run
Copy code
df.columns = ["Month", "Sales"]
Converting Month to Datetime
We convert the 'Month' column to a datetime format and set it as the index.
python
Run
Copy code
df['Month'] = pd.to_datetime(df['Month'])
df.set_index('Month', inplace=True)
Testing for Stationarity
Augmented Dickey-Fuller Test
We perform the Augmented Dickey-Fuller test to check for stationarity.
python
Run
Copy code
def adfuller_test(sales):
result = adfuller(sales)
labels = ['ADF Test Statistic', 'p-value', '#Lags Used', 'Number of Observations Used']
for value, label in zip(result, labels):
print(label + ' : ' + str(value))
if result[1] <= 0.05:
print("Data is stationary.")
else:
print("Data is non-stationary.")
adfuller_test(df['Sales'])
Building the Forecasting Model
ARIMA Model
We build an ARIMA model based on the identified parameters.
python
Run
Copy code
from statsmodels.tsa.arima.model import ARIMA
model = ARIMA(df['Sales'], order=(1, 1, 1))
model_fit = model.fit()
model_fit.summary()
SARIMAX Model
We also build a SARIMAX model to account for seasonality.
python
Run
Copy code
import statsmodels.api as sm
model = sm.tsa.statespace.SARIMAX(df['Sales'], order=(1, 1, 1), seasonal_order=(1, 1, 1, 12))
results = model.fit()
Forecasting Future Sales
We can visualize the forecasted sales alongside the actual sales.
python
Run
Copy code
df['forecast'] = results.predict(start=90, end=103, dynamic=True)
df[['Sales', 'forecast']].plot(figsize=(12, 8))
plt.title('Sales vs Forecast')
plt.xlabel('Month')
plt.ylabel('Sales')
plt.show()
Conclusion
This project successfully demonstrates the process of time series forecasting using ARIMA and SARIMAX models. The analysis provides insights into the trends and seasonal patterns in alcohol sales, allowing for informed decision-making in inventory and marketing strategies.
References
https://www.statsmodels.org/stable/index.html
https://pandas.pydata.org/pandas-docs/stable/