Analyzing Car Reviews with LLMs
Table of contents
Chapter 1: Introduction
Car-ing is sharing, an auto dealership company for car sales and rental, is taking its services to the next level thanks to Large Language Models (LLMs).
The scope of the project is to prototype a chatbot app with multiple functionalities that not only assist customers but also provide support to human agents in the company.
The solution should receive textual prompts and use a variety of pre-trained Hugging Face LLMs to respond to a series of tasks, e.g. classifying the sentiment in a car’s text review, answering a customer question, summarizing or translating text, etc.
The Project tasks include:
Use a pre-trained LLM to classify the sentiment of the five car reviews in the car_reviews.csv dataset, and evaluate the classification accuracy and F1 score of predictions.
Store the model outputs in predicted_labels, then extract the labels and map them onto a list of {0,1} integer binary labels called predictions.
Store the calculated metrics in accuracy_result and f1_result.
The company has recently attracted customers from Spain.
Extract and pass the first two sentences of the first review in the dataset to an English-to-Spanish translation LLM.
Calculate the BLEU score to assess translation quality, using the content in reference_translations.txt as references.
Store the translated text generated by the LLM in translated_review.
Store the BLEU score metric result in bleu_score.
The 2nd review in the dataset emphasizes brand aspects.
Load an extractive QA LLM such as "deepset/minilm-uncased-squad2" to formulate the question
"What did he like about the brand?" and obtain an answer.
Use question and context for the two variables containing the LLM inputs: question and context.
Store the actual text answer in answer.
Summarize the last review in the dataset, into approximately 50-55 tokens long. Store it in the variable summarized_text.
Chapter 2: Technical Implementation
Sentiment Classification
Hugging Face libraries such as transformers and evaluate have bee installed for the project completion.
!pip install transformers !pip install evaluate==0.4.0 !pip install datasets==2.10.0 !pip install sentencepiece==0.1.97 from transformers import logging logging.set_verbosity(logging.WARNING)
import pandas as pd import torch # Load the car reviews dataset file_path = "data/car_reviews.csv" df = pd.read_csv(file_path, delimiter=";") # Put the car reviews and their associated sentiment labels in two lists reviews = df['Review'].tolist() real_labels = df['Class'].tolist()
Load a sentiment analysis LLM into a pipeline
from transformers import pipeline classifier = pipeline('sentiment-analysis', model='distilbert-base-uncased-finetuned-sst-2-english')
Perform inference on the car reviews and display prediction results
predicted_labels = classifier(reviews) for review, prediction, label in zip(reviews, predicted_labels, real_labels): print(f"Review: {review}\nActual Sentiment: {label}\nPredicted Sentiment: {prediction['label']} (Confidence: {prediction['score']:.4f})\n")
Load accuracy and F1 score metrics
import evaluate accuracy = evaluate.load("accuracy") f1 = evaluate.load("f1")
Map categorical sentiment labels into integer labels
references = [1 if label == "POSITIVE" else 0 for label in real_labels] predictions = [1 if label['label'] == "POSITIVE" else 0 for label in predicted_labels]
Calculate accuracy and F1 score
accuracy_result_dict = accuracy.compute(references=references, predictions=predictions) accuracy_result = accuracy_result_dict['accuracy'] f1_result_dict = f1.compute(references=references, predictions=predictions) f1_result = f1_result_dict['f1'] print(f"Accuracy: {accuracy_result}") print(f"F1 result: {f1_result}")
Results
Review: I am very satisfied with my 2014 Nissan NV SL. I use this van for my business deliveries and personal use. Camping, road trips, etc. We dont have any children so I store most of the seats in my warehouse. I wanted the passenger van for the rear air conditioning. We drove our van from Florida to California for a Cross Country trip in 2014. We averaged about 18 mpg. We drove thru a lot of rain and It was a very comfortable and stable vehicle. The V8 Nissan Titan engine is a 500k mile engine. It has been tested many times by delivery and trucking companies. This is why Nissan gives you a 5 year or 100k mile bumper to bumper warranty. Many people are scared about driving this van because of its size. But with front and rear sonar sensors, large mirrors and the back up camera. It is easy to drive. The front and rear sensors also monitor the front and rear sides of the bumpers making it easier to park close to objects. Our Nissan NV is a Tow Monster. It pulls our 5000 pound travel trailer like its not even there. I have plenty of power to pass a vehicle if needed. The 5.6 liter engine produces 317 hp. I have owned Chevy and Ford vans and there were not very comfortable and had little cockpit room. The Nissan NV is the only vehicle made that has the engine forward like a pick up truck giving the driver plenty of room and comfort in the cockpit area. I dont have any negatives to say about my NV. This is a wide vehicle. The only modification I would like to see from Nissan is for them to add amber side mirror marker lights.BTW. I now own a 2016 Nissan NVP SL. Love it.
Actual Sentiment: POSITIVE
Predicted Sentiment: POSITIVE (Confidence: 0.9294)
Review: The car is fine. It's a bit loud and not very powerful. On one hand, compared to its peers, the interior is well-built. The transmission failed a few years ago, and the dealer replaced it under warranty with no issues. Now, about 60k miles later, the transmission is failing again. It sounds like a truck, and the issues are well-documented. The dealer tells me it is normal, refusing to do anything to resolve the issue. After owning the car for 4 years, there are many other vehicles I would purchase over this one. Initially, I really liked what the brand is about: ride quality, reliability, etc. But I will not purchase another one. Despite these concerns, I must say, the level of comfort in the car has always been satisfactory, but not worth the rest of issues found.
Actual Sentiment: NEGATIVE
Predicted Sentiment: POSITIVE (Confidence: 0.8654)
Review: My first foreign car. Love it, I would buy another.
Actual Sentiment: POSITIVE
Predicted Sentiment: POSITIVE (Confidence: 0.9995)
Review: I've come across numerous reviews praising the Rogue, and I genuinely feel like I might be missing something. It's only been a week since I got the car, and I am genuinely disappointed. I truly wish I could return it. My main concern revolves around what I see as a significant design flaw (which I believe also exists in the Murano, though that wasn't much better and considerably pricier). The rear windshield is just too small. The headrests in the back seat obstruct the sides of the rearview window. This "Crossover" feels more like a cheaply made compact car. My other vehicle is a Sonata, and it provides a significantly quieter and smoother ride. I did not anticipate this car to ride so roughly; my 2006 Pathfinder had a smoother ride! I would rate this car a 5 all around.
Actual Sentiment: NEGATIVE
Predicted Sentiment: NEGATIVE (Confidence: 0.9935)
Review: I've been dreaming of owning an SUV for quite a while, but I've been driving cars that were already paid for during an extended period. I ultimately made the decision to transition to a brand-new car, which, of course, involved taking on new payments. However, given that I don't drive extensively, I was inclined to avoid a substantial financial commitment. The Nissan Rogue provides me with the desired SUV experience without burdening me with an exorbitant payment; the financial arrangement is quite reasonable. Handling and styling are great; I have hauled 12 bags of mulch in the back with the seats down and could have held more. I am VERY satisfied overall. I find myself needing to exercise extra caution when making lane changes, particularly owing to the blind spots resulting from the small side windows situated towards the rear of the vehicle. To address this concern, I am actively engaged in making adjustments to my mirrors and consciously reducing the frequency of lane changes. The engine delivers strong performance, and the ride is really smooth.
Actual Sentiment: POSITIVE
Predicted Sentiment: POSITIVE (Confidence: 0.9987)
Accuracy: 0.8
F1 result: 0.8571428571428571
Your input_length: 365 is bigger than 0.9 * max_length: 27. You might consider increasing your max_length manually, e.g. translator('...', max_length=400)
Translation
Load translation LLM into a pipeline and translate car review
first_review = reviews[0] translator = pipeline("translation", model="Helsinki-NLP/opus-mt-en-es") translated_review = translator(first_review, max_length=27)[0]['translation_text'] print(f"Model translation:\n{translated_review}")
Load reference translations from file
with open("data/reference_translations.txt", 'r') as file: lines = file.readlines() references = [line.strip() for line in lines] print(f"Spanish translation references:\n{references}")
Load and calculate BLEU score metric
bleu = evaluate.load("bleu") bleu_score = bleu.compute(predictions=[translated_review], references=[references]) print(bleu_score['bleu'])
Model translation
Estoy muy satisfecho con mi 2014 Nissan NV SL. Uso esta furgoneta para mis entregas de negocios y uso personal.
Spanish translation references:
['Estoy muy satisfecho con mi Nissan NV SL 2014. Utilizo esta camioneta para mis entregas comerciales y uso personal.', 'Estoy muy satisfecho con mi Nissan NV SL 2014. Uso esta furgoneta para mis entregas comerciales y uso personal.']
0.6022774485691839
Extractive QA
Import auto classes (optional: can be solved via pipelines too)
from transformers import AutoTokenizer from transformers import AutoModelForQuestionAnswering
Instantiate model and tokenizer
model_ckp = "deepset/minilm-uncased-squad2" tokenizer = AutoTokenizer.from_pretrained(model_ckp) model = AutoModelForQuestionAnswering.from_pretrained(model_ckp)
Define context and question, and tokenize them
context = reviews[1] print(f"Context:\n{context}") question = "What did he like about the brand?" inputs = tokenizer(question, context, return_tensors="pt")
Perform inference and extract answer from raw outputs
with torch.no_grad(): outputs = model(**inputs) start_idx = torch.argmax(outputs.start_logits) end_idx = torch.argmax(outputs.end_logits) + 1 answer_span = inputs["input_ids"][0][start_idx:end_idx]
Decode and show answer
answer = tokenizer.decode(answer_span) print("Answer: ", answer)
Context
The car is fine. It's a bit loud and not very powerful. On one hand, compared to its peers, the interior is well-built. The transmission failed a few years ago, and the dealer replaced it under warranty with no issues. Now, about 60k miles later, the transmission is failing again. It sounds like a truck, and the issues are well-documented. The dealer tells me it is normal, refusing to do anything to resolve the issue. After owning the car for 4 years, there are many other vehicles I would purchase over this one. Initially, I really liked what the brand is about: ride quality, reliability, etc. But I will not purchase another one. Despite these concerns, I must say, the level of comfort in the car has always been satisfactory, but not worth the rest of issues found.
Answer: ride quality, reliability
Original text:
I've been dreaming of owning an SUV for quite a while, but I've been driving cars that were already paid for during an extended period. I ultimately made the decision to transition to a brand-new car, which, of course, involved taking on new payments. However, given that I don't drive extensively, I was inclined to avoid a substantial financial commitment. The Nissan Rogue provides me with the desired SUV experience without burdening me with an exorbitant payment; the financial arrangement is quite reasonable. Handling and styling are great; I have hauled 12 bags of mulch in the back with the seats down and could have held more. I am VERY satisfied overall. I find myself needing to exercise extra caution when making lane changes, particularly owing to the blind spots resulting from the small side windows situated towards the rear of the vehicle. To address this concern, I am actively engaged in making adjustments to my mirrors and consciously reducing the frequency of lane changes. The engine delivers strong performance, and the ride is really smooth.
Instruction 4
Get original text to summarize upon car review
text_to_summarize = reviews[-1] print(f"Original text:\n{text_to_summarize}")
Load summarization pipeline and perform inference
model_name = "cnicu/t5-small-booksum" summarizer = pipeline("summarization", model=model_name) outputs = summarizer(text_to_summarize, max_length=53) summarized_text = outputs[0]['summary_text'] print(f"Summarized text:\n{summarized_text}")
Summarized text:
The Nissan Rogue provides me with the desired SUV experience without burdening me with an exorbitant payment; the financial arrangement is quite reasonable. I have hauled 12 bags of mulch in the back with the seats down and could have held more.