This project uses advanced time series analysis techniques to forecast HDFC stock prices for the next 10 years. By leveraging historical data, the project employs ARIMA and Exponential Smoothing methods, including Holt's model, to predict stock trends and assess the accuracy of these models using standard metrics such as RMSE and MAE.
The primary goal is to provide accurate forecasts of HDFC stock prices and compare the performance of different predictive models to identify the most suitable method for long-term forecasting.
1.1.1 Data Collection:
• The input data, HDFC.csv, contains historical stock prices for HDFC.
• The dataset is processed and analysed to ensure it is suitable for time series modelling.
1.1.2 Forecasting Models:
• ARIMA (Autoregressive Integrated Moving Average):
o Captures the linear dependencies in the time series.
o Applied after examining and differencing the data for stationarity.
• Exponential Smoothing:
o Average past data with exponentially decreasing weights.
o Provides smoother forecasts.
• Holt's Exponential Smoothing:
o Accounts for trends in the data by extending basic exponential smoothing.
1.1.3 Tools and Libraries:
The following R libraries are utilized:
• Prophet: Advanced forecasting framework.
• quantmod: For financial data modelling.
• forecast: Contains ARIMA and Exponential Smoothing functions.
• ggplot2: For visualization.
• dplyr: For data manipulation.
1.1.4 Evaluation Metrics:
• Root Mean Square Error (RMSE): Measures the average error magnitude.
• Mean Absolute Error (MAE): Represents the average error magnitude without considering its direction.
1.2 ARIMA Model (AutoRegressive Integrated Moving Average):
ARIMA is a statistical model used for time series forecasting. It combines three components: AutoRegressive (AR), Integrated (I), and Moving Average (MA) to predict future values based on past values in a time series dataset. It's widely used for modelling and forecasting time series data that shows trends or seasonal patterns.
1.2.1 Components of ARIMA:
1.2.2 Steps to Build an ARIMA Model:
1.3 Exponential Smoothing Model:
1.3.1 Theory: Exponential Smoothing (ES) is a time series forecasting technique that assigns exponentially decreasing weights to past observations. It is based on the idea that more recent observations are more relevant for predicting future values than older ones. The model is particularly useful for forecasting data with trends and seasonality.
1.3.2 Types of Exponential Smoothing:
1.3.3 Steps Involved:
Code Files:
# install required libraries install.packages("readxl") install.packages("ggplot2") install.packages("dplyr") install.packages("forecast") install.packages("readr") install.packages("pastecs") install.packages("prophet") install.packages("lubridate") install.packages("tidyverse") install.packages("xts") # set working directory setwd("C:/Users/manik/OneDrive/KARTHIK MANI/Karthik Projects/Stock Market Time Series") # load installed library library(ggplot2) library(readxl) library(dplyr) library(forecast) library(readr) library(pastecs) library(prophet) library(lubridate) library(tidyverse) library(xts) #importing data set HDFC <- read_csv("C:/Users/manik/Downloads/HDFC.csv") # Rows: 4977 Columns: 15 # Column specification: # Delimiter: "," # chr (2): Symbol, Series # dbl (12): Prev Close, Open, High, Low, Last, Close, VWAP, Volume, Turnover, Trades, Deliverable Volume, %Deliverble # date (1): Date #Make a copy of the data set HDFC_COPY= data.frame(HDFC) HDFC_COPY1 = data.frame(HDFC_COPY) # exploring data set # view the first 5 rows of the data set using head () head(HDFC) head(HDFC,10) # view the last 5 rows of the data set using tail () tail(HDFC) tail(HDFC,15) # view dimension of the data set using dim () dim(HDFC) # view the statistical vectors using summary () summary(HDFC) # to know more about column and its name using variable_name () variable.names(HDFC) # to know data types of the column using str () str(HDFC) # to view number of missing data in data set using is.na () is.na(HDFC) colSums(is.na(HDFC)) # to visualize the data of a particular column by '$' symbol and column name head(HDFC$Turnover, 20) # to know more briefly about descriptive statistic and its calculated values using stat.desc () stat.desc(HDFC) # Remove missing values HDFC <- na.omit(HDFC) # line plot of the trending close price ggplot(HDFC, aes(x = Date, y = Close)) + geom_line(color = "blue") + labs(title = "HDFC Stock Price Over Time", x = "Date", y = "Close Price") # ARIMA MODEL building... class(HDFC) # Assuming HDFC has a Date column and Close column HDFCtime <- ts(HDFC$Close, start = c(as.numeric(format(min(HDFC$Date), "%Y")), as.numeric(format(min(HDFC$Date), "%j"))), end = c(as.numeric(format(max(HDFC$Date), "%Y")), as.numeric(format(max(HDFC$Date), "%j"))), frequency = 365.25) class(HDFCtime) library(tseries) plot(HDFCtime) # Auto Correlation BEFORE ARIMA model acf(HDFCtime) pacf(HDFCtime) adf.test(HDFCtime) #Augmented Dickey-Fuller Test #data: HDFCtime #Dickey-Fuller = -2.1372, Lag order = 14, p-value = 0.5202 #alternative hypothesis: stationary # Auto Arima Function HDFCmodel = auto.arima(HDFCtime, ic="aic", trace = TRUE) #ARIMA(0,1,0) : 31417.37 #Best model: ARIMA(0,1,0) HDFCmodel # Auto Correlation AFTER ARIMA model acf(ts(HDFCmodel$residuals)) pacf(ts(HDFCmodel$residuals)) # forecast Myforecast_HDFC = forecast(HDFCmodel, level = c(95), h = 10*365.25) Myforecast_HDFC plot(Myforecast_HDFC) # validate the forecast Box.test(Myforecast_HDFC$residuals, lag = 15, type = "Box-Pierce") Box.test(Myforecast_HDFC$residuals, lag = 10, type = "Box-Pierce") Box.test(Myforecast_HDFC$residuals, lag = 5, type = "Box-Pierce") Box.test(Myforecast_HDFC$residuals, lag = 20, type = "Box-Pierce") # EXPONENTIAL SMOOTHING MODEL BUILDING... HDFC_ES_model = ses(HDFCtime, alpha = 0.2, h = 100) HDFC_ES_model plot(HDFC_ES_model, main = "Forecast of the Exponential Smoothing") #USING HOLT EXPONENTIAL SMOOTHING HDFC_HOLT = holt(HDFCtime, h = 100) HDFC_HOLT plot(HDFC_HOLT, main = "Forecast of Exponential Smoothing using HOLT method") # Facebook's Prophet model Building... #HDFC_DF = data.frame(ds = index(HDFC_COPY$Date), y = as.numeric(HDFC_COPY$Close)) #HDFC_prophet = prophet(HDFC_DF) #future = make_future_dataframe(HDFC_prophet, period = 30) #forecast_prophet = predict(HDFC_prophet, future) #plot(HDFC_prophet, forecast_prophet) # CALCULATING THE ACCURACY FOR THE ABOVE RESULTED MODELS.... accuracy(Myforecast_HDFC) # ME RMSE MAE MPE MAPE MASE ACF1 #Training set 0.1903026 36.2822 14.36358 -0.03355592 1.324054 0.03537981 0.01778181 #Summary of Ideal Accuracy Ranges for ARIMA: # RMSE: Lower is better, ideally < 5% of actual values. # MAE: Lower is better, ideally < 5% of actual values. # MAPE: Lower is better, ideally < 5% (or less than 10% for good models). # AIC/BIC: Lower is better; use to compare different models. # ACF of Residuals: Should be close to 0 (no significant autocorrelation). accuracy(HDFC_ES_model) # ME RMSE MAE MPE MAPE MASE ACF1 #Training set 0.9015601 59.58318 23.98849 -0.1686734 2.30609 0.05908751 0.7930406 accuracy(HDFC_HOLT) # ME RMSE MAE MPE MAPE MASE ACF1 #Training set 0.7432281 36.30499 14.39091 0.03704185 1.327261 0.03544713 0.01803578
Model Accuracy:
Model RMSE MAE
ARIMA 36.2822 14.36358
Exponential Smoothing 59.58318 23.98849
Holt's Method 36.30499 14.39091