This project aims to forecast the Air Quality Index (AQI) for Kathmandu Valley using historical data from January 2023 to March 2025. The AQI is measured across multiple pollutants including CO, NO, NO₂, O₃, SO₂, PM2.5, PM10, and NH₃. By leveraging machine learning techniques, specifically the XGBoost library with MultiOutputRegressor, this project predicts future AQI values for all pollutants simultaneously.
The dataset used in this project contains AQI measurements for Kathmandu Valley from January 2023 to March 5, 2025, obtained through the OpenWeather API. The data is fetched within the notebook and stored in the datasets
folder. The data is structured as a time series, with each row representing multiple AQI values for a specific date.
This project is implemented as a Jupyter notebook with the following structure:
datasets
folderAdditionally, the project includes:
aqi_api.py
: External Python file containing the OpenWeather API key configurationdatasets/
: Folder containing the stored AQI data filesBefore training the model, the data went through several preprocessing steps:
The project utilizes the seaborn
and matplotlib
libraries to visualize the data. Various plots and charts are generated to understand the data better and identify any potential correlations or seasonality across the different pollutants.
The core of the project revolves around the XGBoost library, specifically the MultiOutputRegressor
with XGBRegressor
as the base estimator. This approach allows for simultaneous prediction of multiple AQI pollutant values (CO, NO, NO₂, O₃, SO₂, PM2.5, PM10, and NH₃). The model is trained on the preprocessed historical AQI data, with the goal of learning the underlying patterns and relationships between the features and the multiple target variables.
The project implements GridSearchCV
for hyperparameter tuning, systematically exploring different combinations of XGBoost parameters to find the optimal configuration that minimizes prediction error across all pollutants.
To evaluate the performance of the trained model, the project employs multiple metrics from the sklearn
library:
These metrics provide comprehensive insights into the model's accuracy and help assess its suitability for forecasting future AQI values.
Once the model is trained and evaluated, it is used to forecast the AQI values for future periods across all pollutants.
pandas
: For data manipulation and preprocessingnumpy
: For numerical computationsseaborn
and matplotlib
: For data visualizationxgboost
: For implementing the XGBoost algorithmsklearn
: For evaluation metrics and other machine learning utilitiesrequests
: For API calls to OpenWeatherjupyter
: For running the notebook environmentscipy
: For statistical functions including Z-score calculationTo run the project, follow these steps:
pip install pandas numpy seaborn matplotlib xgboost scikit-learn requests jupyter scipy
.aqi_api.py
file.jupyter notebook
in your terminal.The project uses OpenWeather API to fetch AQI data. You will need to:
aqi_api.py
fileThere are no models linked
There are no datasets linked
There are no datasets linked