This project focuses on sentiment analysis of reviews for an appliance insurance company. The goal is to identify which markets in England have the highest reviews and orders, specifically in the home services sector. The data was collected by scraping reviews from the Zestplan website using Selenium, and the sentiment was classified using BERT.
The project is organized into several key folders, reflecting the logical steps in the workflow:
We used Selenium to scrape data from the Zestplan website, which focuses on home services. The scraper extracts review text and stores it in a structured format.
from selenium import webdriver # Set up the WebDriver driver = webdriver.Chrome(executable_path='path_to_chromedriver') driver.get('https://example.com') # Extract review text reviews = driver.find_elements_by_class_name('review-text') data = [review.text for review in reviews] # Save the data import pandas as pd df = pd.DataFrame(data, columns=['review']) df.to_csv('reviews.csv', index=False) driver.quit()
The next step involved labeling the dataset for sentiment classification. The labeled data is saved in Labeled_Data.xlsx
, and the process is detailed in the notebook.
We explored the data, performed feature engineering, and trained our sentiment analysis model in the Analysis.ipynb
notebook. Additionally, the ZEST_ANALYSIS.ipynb
notebook focuses on specific sentiment extraction using BERT for the home services sector in England.
This notebook details the training and fine-tuning of the BERT model on the labeled dataset.
The following analysis provides insights into user reviews and ratings in the home services sector, based on the collected data.
Most Common Services Provided by Suppliers
The analysis identified the most frequent service categories. Plumber, Cleaning Service, and Home Services are the top categories, with Plumber being the most in-demand service. This reflects the popularity of certain home services in the market.
Top 5 Companies with the Most Reviews in Selected Categories
The top 5 companies with the highest number of reviews in each service category were analyzed. Companies such as Plumbworld and Baxi UK dominate the reviews in the plumbing and heating service categories, suggesting strong market positions.
Number of Reviews Over Time
The reviews over time show a significant spike in early 2023, with a peak of 220 reviews in a single month. This indicates increased customer engagement during this period, which could be tied to promotions or seasonal demand.
Number of Reviews by Day of the Week
The distribution of reviews by day shows that Tuesday received the highest number of reviews, with 734 reviews, while Sunday saw the least engagement with only 7 reviews. This trend highlights a preference for leaving reviews during the workweek.
Top Keywords by TF-IDF Value
The TF-IDF analysis highlights the most important words in the context of customer reviews. "Helpful", "service", and "zest" rank the highest, demonstrating the aspects of service that customers most frequently associate with high satisfaction.
from transformers import BertTokenizer, TFBertForSequenceClassification import tensorflow as tf # Load BERT tokenizer and model tokenizer = BertTokenizer.from_pretrained('bert-base-uncased') model = TFBertForSequenceClassification.from_pretrained('bert-base-uncased', num_labels=2) # Tokenizing the reviews encoded_data = tokenizer(df['clean_review'].tolist(), padding=True, truncation=True, return_tensors='tf') # Model training model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=5e-5), loss='binary_crossentropy', metrics=['accuracy']) history = model.fit(encoded_data['input_ids'], df['label'], epochs=3, batch_size=16)
The results of the BERT model training, evaluation metrics, and analysis can be found in the analysis notebooks. Below are some key metrics:
This project successfully demonstrates how sentiment analysis using BERT can help an appliance insurance company understand the market trends in England for home services. By scraping data from the Zestplan website, we were able to gain valuable insights into customer sentiment, which can be used to inform business decisions.
You can access the entire project and explore each part in detail in this GitHub repository.