#Abstract
This project explores COVID-19 trends using a dataset with global geographic distribution. It focuses on cleaning the dataset, visualizing trends, and performing statistical analyses to gain insights. Additionally, it employs machine learning to predict weekly case counts, contributing to understanding the pandemic's progression
COVID-19 has significantly impacted the global population, necessitating data-driven approaches for analysis and decision-making. This project aims to clean and analyze COVID-19 data, visualize key trends, and develop a predictive model to aid in understanding the disease's spread and evolution.
The dataset was first cleaned to remove missing values and ensure consistency. Data visualization techniques were applied to identify trends and patterns in weekly and cumulative cases. Statistical analyses were conducted to summarize data distributions, and a linear regression model was trained to predict weekly case counts based on cumulative counts and 14-day rates.
The project included several experiments:
1.Visualizing weekly COVID-19 cases by continent to track temporal trends.
2.Analyzing cumulative cases to identify the most affected regions.
3.Training and evaluating a linear regression model for predictive analysis.
The data visualizations revealed significant variations in COVID-19 case trends across continents. Statistical analyses showed the highest average weekly cases in certain regions, highlighting disparities in disease spread. The machine learning model achieved an R-squared value indicating moderate predictive performance, demonstrating the utility of cumulative counts and 14-day rates in forecasting case trends.
This project provides a comprehensive analysis of COVID-19 trends, combining visualization, statistical insights, and machine learning predictions. The findings emphasize the importance of clean data and robust methodologies in pandemic analysis. The predictive model offers a foundation for further research and more advanced modeling techniques.