This research explores global land temperature trends from 1900 to 2010 and forecasts future temperatures from 2011 to 2015 using time series analysis. Leveraging tools such as PySpark, Pandas, and statistical modeling techniques, the study focuses on five countries—Brazil, India, Kenya, the United Kingdom, and the United States—each representing different climatic zones. The Autoregressive Integrated Moving Average (ARIMA) model was applied to forecast future temperatures, with preprocessing steps including data cleaning, stationarity checks, and seasonal decomposition. The resulting models achieved high performance metrics, particularly for India and the United States, demonstrating the utility of time series forecasting for climate-related decision-making in sectors such as agriculture.
Climate change has become a pressing global issue, with increased focus on understanding long-term temperature trends and predicting future patterns. This research investigates historical land temperature data between 1900 and 2010 and develops forecasting models for the period 2011 to 2015. By analyzing the temperature trends of five countries with diverse climate characteristics—Brazil, India, Kenya, the United Kingdom, and the United States—this study aims to provide a representative analysis of global climate behavior and create accurate forecasts using advanced data science tools.
The dataset used in this study was sourced from Berkeley Earth, featuring over 1.6 billion temperature readings from 16 archival sources. A subset of the Global Land Temperature Dataset was extracted, focusing on the five aforementioned countries to represent a diverse climatic spectrum ranging from tropical to temperate zones.
Data Exploration
Initial data exploration was conducted using SQL queries (e.g., GROUP BY, ORDER BY, COUNT, AGG) and Pandas to derive key insights. Visualizations created with Seaborn and Matplotlib highlighted long-term warming trends, monthly temperature variations, and seasonal decomposition. For instance, line plots showed steady increases in average global land temperatures and seasonal behavior in countries like the UK and Brazil.
Data Preprocessing:
Removed null values.
Converted the dt column to datetime format.
Filtered the data to retain records from 1900 to 2010.
Selected and isolated the five countries of interest, each yielding 1,332 observations.
Conducted stationarity testing using the Augmented Dickey-Fuller (ADF) test. Brazil and Kenya required first-order differencing to achieve stationarity.
Time Series Modeling
Time series models were developed using the ARIMA approach. To determine optimal values of parameters (p, d, q), ACF and PACF plots were used. Separate ARIMA models were tuned and fitted for each country after verifying stationarity:
p (autoregressive terms)
d (differencing order)
q (moving average terms)
Models were trained on data from 1900–2010 and tested with forecasts from 2011–2015.
The ARIMA models provided forecasts for each country with varying performance:
India & United States: High R² scores, low MAE and RMSE, indicating excellent model performance and strong predictive accuracy.
Brazil & Kenya: Moderate errors but good overall trend capture, particularly after first-order differencing was applied to achieve stationarity.
United Kingdom: Slightly higher MAE and RMSE, but still within acceptable limits for practical forecasting applications.
Visual inspection of the forecasts showed alignment with historical trends, and seasonal decomposition confirmed additive seasonality, especially in countries like Brazil and the UK.
This study successfully modeled and forecasted land temperatures for five climatically diverse countries using ARIMA models. While performance varied across regions, India and the US showed the highest model accuracy, and Brazil and Kenya benefited from preprocessing adjustments such as differencing. To further improve performance, future work will include data transformations to stabilize variance and normalize distributions.
Accurate temperature forecasting has vital applications in agriculture, enabling farmers to optimize planting and harvesting schedules, mitigate climate-related risks, and improve productivity. This research contributes toward better understanding and preparedness in the face of a changing global climate.