This project involves a comprehensive text analysis of the Holy Quran using Natural Language Processing (NLP) techniques. Through meticulous Analysis, I've uncovered fascinating insights from the Holy Quran including frequently occurring words in the Quran, verse distribution across 114 Chapters, and important commands of Allah SWT.
The dataset included surah numbers, verse texts in both Arabic and English, the place of revelation, and other features.
By diving into Exploratory Data Analysis, I explored key patterns like the division between Makki and Madni surahs, verse counts in each surah, and the Quran's segmentation into seven manzils.
I also focused on analyzing the frequency of commands and warnings from Allah using NLP techniques. To visualize the data, I created word clouds in both Arabic and English which highlight the frequently occurring words after removing stop words.
I started by exploring and visualizing the dataset, which provided information on surahs, verses, and metadata like the place of revelation (Makki or Madni), surah length, and manzil divisions.
Here are some visualizations from analysis:
Next, I cleaned the text by removing stop words in both Arabic and English. I created custom list, tokenized the text, and applied techniques like arabic_reshaper and bidi.algorithm to properly format Arabic for analysis.
def English_punctuation_stopwords(text): # Tokenize the text into words words = word_tokenize(text) # Remove punctuation punctuation = ['—', '˹','.', '˺', '“', '”', '’'] for mark in punctuation: words = [word.replace(mark, '') for word in words] no_punct = [word for word in words if word not in string.punctuation] en_stop_words = "stopwords-en.txt" with open(en_stop_words, 'r', encoding='utf-8') as file: en_stop_words= [line.strip() for line in file] no_stopword = [word for word in no_punct if word.lower() not in en_stop_words] # Reconstruct the text return ' '.join(no_stopword)
With the preprocessed data, I generated word clouds to visualize frequently mentioned words in the Quran. I used an arial-arabic Arabic font to ensure accurate rendering of the script, and the word clouds highlight the most frequently occurring words.
I created the Word Cloud of the Entire Holy Quran, along with some of my favorite surahs.
The story of Prophet Yusuf is fully narrated in the Quran. He went through many harships which all passed with patiece and piety. The story of prophet Yusuf started by being thrown into a well and finally becoming in a great position in Egypt.
I analyzed the most frequent commands and warnings in the Quran. These visualizations brought attention to recurring themes like instructions and warnings of Allah.
Investing my data skills in this project has been truly rewarding. It’s deepened my understanding of the Quran’s structure and themes, and I hope this work contributes to further exploration of the Holy Quran.