Predicting stock prices is an exciting challenge that involves analyzing historical data and utilizing different models to forecast future trends. In this project, I attempt to predict Alibaba Group's stock prices using a variety of time series models, including ARIMA, SARIMA, LSTM, and RNN. The goal is to explore which models can best predict future stock prices using historical data collected from Yahoo Finance.
In this blog, I will discuss:
I used a publicly available dataset that contains Alibaba's stock prices spanning from 2014 to 2025. This dataset is sourced from Kaggle and provides the necessary historical data for training and testing the models.
Time series analysis involves statistical techniques to model and predict data points indexed in time order. Stock prices are classic examples of time series data, where each price is associated with a specific point in time.
ARIMA is a classical statistical model used for time series forecasting. It combines three components:
ARIMA is great for data that exhibits linear patterns over time.
SARIMA is an extension of ARIMA that also considers seasonality in time series data, making it more suitable for datasets that exhibit seasonal variations (e.g., holidays, quarterly earnings).
LSTM is a type of Recurrent Neural Network (RNN), designed to capture long-range dependencies in sequential data, like stock prices. Unlike traditional models, LSTM can learn from past patterns and remember important information over long periods of time.
RNN is a deep learning model that is particularly good for sequence prediction. It works by maintaining a "memory" of previous inputs and using that information to make predictions about future events. It's a useful model for tasks like speech recognition, language translation, and, of course, time series forecasting like stock price prediction.
MAE is a common evaluation metric used in regression problems. It calculates the average of the absolute differences between predicted values and actual values. Lower MAE indicates better model performance.
RMSE is another evaluation metric that penalizes larger errors more than MAE. It gives a better sense of how much the predicted values deviate from actual values, especially when dealing with large errors.
I collected the dataset of Alibaba’s stock prices from Yahoo Finance using Pandas. This provided me with daily stock prices for the years 2014 to 2025, including features like the opening, closing, highest, and lowest prices.
I visualized the trends and behavior of Alibaba's stock prices using Matplotlib and Seaborn. This allowed me to gain insights into the data, such as detecting any long-term upward or downward trends, identifying seasonality, and spotting anomalies.
I implemented four different models for stock price prediction:
After training my models, I evaluated them using MAE and RMSE. Here are the results:
Model | MAE | RMSE |
---|---|---|
ARIMA | 29.35 | 31.25 |
SARIMA | 21.88 | 23.83 |
LSTM | 2.33 | 3.30 |
RNN | 2.05 | 2.61 |
The LSTM and RNN models performed significantly better than the traditional statistical models (ARIMA and SARIMA), with lower MAE and RMSE. This demonstrates the power of deep learning in capturing complex patterns and trends in time series data.
pip install pandas numpy matplotlib seaborn scikit-learn statsmodels tensorflow
Download the Jupyter Notebook and run it to:
You can adjust the LSTM and RNN models’ hyperparameters to improve their accuracy and performance.
This project is open-source and available under the MIT License.
This project highlights the importance of combining traditional and modern machine learning techniques in stock price prediction. While classical models like ARIMA and SARIMA are effective for simpler datasets, deep learning models like LSTM and RNN have shown superior performance in capturing complex patterns in Alibaba's stock price data.
There are no models linked