Text Sentiment Analysis using VADER and Readability Metrics

Text Sentiment Analysis using VADER and Readability Metrics
Abstract
This project presents a hybrid rule-based approach for text sentiment analysis combined with readability assessment. By leveraging the VADER (Valence Aware Dictionary and sEntiment Reasoner) algorithm and classic readability metrics like the Flesch Reading Ease Score, we provide a lightweight yet interpretable pipeline for evaluating textual data. The system is suitable for academic and practical use cases where model interpretability, simplicity, and low resource consumption are critical.

Introduction
Natural Language Processing (NLP) has witnessed a rapid transformation with the emergence of large language models. However, rule-based models like VADER remain relevant for lightweight, domain-agnostic applications. Additionally, readability metrics offer an orthogonal perspective on how comprehensible or complex a piece of text is.

This project explores the integration of sentiment polarity scoring with readability analysis to better understand the emotional tone and cognitive load of input text.

Methodology
2.1 Sentiment Analysis with VADER
VADER is a lexicon and rule-based sentiment analysis tool specifically attuned to sentiments expressed in social media. It provides polarity scores for:
Positive
Negative
Neutral
Compound (a normalized sum)
The tool is particularly useful in text-heavy applications where computational efficiency is crucial.

2.2 Custom Lexicon Scoring
In addition to VADER, we employed manually curated lists of positive and negative words (positive-words.txt, negative-words.txt) to perform lightweight keyword-based scoring. These lists were obtained through internal academic sources and are used to cross-reference sentiment classification.

2.3 Readability Metrics
Readability is measured using the Flesch Reading Ease Score, calculated via the textstat library. It evaluates text complexity using syllable count, sentence length, and word structure.

The combined sentiment and readability profile helps assess not just what a text is saying, but how easily it can be interpreted by a reader.

Dataset
The dataset used in this project consists of text files located in the dict/model/ directory, including:
Sample Texts: 10744.4.txt, 11206.2.txt, 12129.8.txt, 123.0.txt
Lexicons: positive-words.txt, negative-words.txt
Note: These datasets were privately shared by a senior member at Coding Ninjas SRM and are intended solely for internal educational and demonstration purposes.
Implementation
The code is implemented in a single Jupyter notebook main2.ipynb. The pipeline includes:
Reading input text files.
Calculating VADER sentiment scores.
Performing word-level matching with custom lexicons.
Computing the Flesch Reading Ease Score.
Outputting overall sentiment and readability metrics.
Results
Input Text: "This device is absolutely fantastic and exceeded expectations."
VADER Sentiment: Positive
Compound Score: 0.812
Custom Lexicon Score: +2
Readability Score: 78.4 (Fairly easy to read)
Applications
Review sentiment & complexity analysis (e.g., Amazon, Yelp)
Educational content grading
Text classification pre-filter
Social media opinion mining
Limitations
Not context-aware (e.g., sarcasm, irony)
Fixed lexicons may not generalize across domains
Rule-based approach may miss subtle or idiomatic cues
Conclusion
This project demonstrates the power of combining lightweight, rule-based sentiment classification with standard readability metrics. It is ideal for educational use, early-stage text analysis, or situations where transparency and performance efficiency are prioritized over deep learning sophistication.
References
Hutto, C.J., & Gilbert, E.E. (2014). VADER: A Parsimonious Rule-based Model for Sentiment Analysis of Social Media Text. Proceedings of ICWSM.

TextStat Library: https://pypi.org/project/textstat/
GitHub Repository: https://github.com/Pranavtiwari30/Text-Sentiment-Analysis
Appendix: Project Metadata
Author: Pranav Tiwari
Institution: B.Tech – AI & ML, SRM University
Organization: Coding Ninjas SRM (AI/ML Domain)
License: Apache 2.0

Text Sentiment Analysis using VADER and Readability Metrics

Text Sentiment Analysis using VADER and Readability Metrics

Files

Code

Code

Datasets

Datasets