This project analyzes Twitter data to uncover distinctions between how younger and older generations engage with the platform. The task requires participants to conduct preliminary data analysis, apply feature engineering, and utilize natural language processing (NLP) methods to examine differences in follower counts, discussion topics, sentiment, and language use across various age groups.
The dataset consists of two main components:
Tweets: This file includes the following fields:
Users: This file contains the following fields:
Data Exploration
Pandas: For data manipulation and analysis.
NumPy: For numerical data processing.
Matplotlib: For creating static, animated, and interactive visualizations in Python.
Seaborn: For making statistical graphics.
NLTK: For natural language processing tasks.
Scikit-learn: For machine learning and predictive data analysis.
TextBlob: For text processing tasks and sentiment analysis.
A pie chart showing the percentage distribution of tweets based on age groups. The chart is divided into two sections:
A bar chart representing the count of tweets occurring on different days of the month. It shows high activity on days like 1st, 5th, 7th, 15th, and 31st, with relatively lower activity on other days such as the 10th, 11th, and 25th.
This pie chart displays the distribution of sentiments in the tweets:
A stacked bar chart showing the sentiment distribution (Positive, Neutral, Negative) across two age groups (young and old).
The young group shows more tweets with positive sentiment compared to the old group.
Both groups have a comparable distribution of neutral and negative sentiments, though the young group has higher counts overall.
This chart lists the top 40 most frequent 4-grams (phrases of 4 consecutive words) used in the tweets. Examples of frequent 4-grams include:
"watch mtv music award"
"tonight show conan brien"
"michael john idol new"
The frequencies are displayed as bars, with some phrases being more frequent than others.
Another bar chart representing the top 40 4-grams, but with different content compared to the previous one. Examples include:
"love spending time family"
"baby cousin graduation party"
"happy family never love"
The emphasis here is more on family and relationships.
These visualizations provide insights into how different age groups interact on Twitter, their sentiment distribution, and common topics of discussion based on word combinations.
This project provides a comprehensive analysis of Twitter data, revealing how user behavior, sentiments, and interests vary across different age groups. The insights from this project can help understand social media dynamics and can be useful for targeted marketing, sentiment analysis, and user engagement studies.