# Predicting Wine Reviewer Name using Wine Reviews...? Yes!!--DIVIDER--## Introduction to my Thought Process The question in its essence is, are Wine Reviews written over time sufficient enough to 'unconsciously' know who was it that wrote the Review? As a start, I wanted to restrict my temptation to consider other factors such as Country, Region, points, etc; as they would give us more insight on how each Reviewer belonging to a specific region affects their availability to a Wine in order to write a Review or how he/she evaluates a Wine.

##### This word, 'evaluates' now begs the question, are Wine Reviews enriched enough to help us know which Reviewer they have been written by? I bilieve so, since it's the journey (Explaination) that tells a lot more than the destination (Quantified Evaluation). A little bit of background about myself, I am definietly not a Wine-taster, all I have wondered about Alcohol is, does it taste good and gets me the buzz? What I am trying to say is, I do not really Know what is it that a Wine-Reviwer looks for in a Wine.

As a being, outside the domain of a Subject Matter Expert in Wine, I would begin by looking out for the words Reviewers use to describe a Wine.
- Which I did so, using the Word Clouds
As I went through the WordClouds, which generate high frequency words appeared in the document (Reviews), I observed that certain users do like to use lot of Stop_Words, 'with' being an Interesting one.
- This showed me a glimpse of what a Reviewer looks for,
  - Culmination of tastes and smells of Wine
Next, I removed the Stop_Words to understand the nature with which a Reviewer approaches the tasting of Wine.
- For example, some looked for the Aromas more than the Palate taste and vice-versa

These observations as I will talk about as we go through the code, led me to believe that Reviews contained sufficient information to determine who the Reviewer was.

### What next?

Now that the exploration of Reviews led me to believe (for lack of a better word) that we do can use the Reviews to figure of who the Reviewer was, I chose to stick to Machine Learning algorithms.
- Only stuck to ML algorithms as the data-set was just in 100,000s and ML- algorithms are sufficiently powerful enough to predict the outcome for not so large dataset, as we will see.
My first Intuition was to use Logistic Regression, with One-vs-Rest Classification since, after we do the Tf-IDF of the reviewes, the words can be considered as dimensions and the Reviewer as the data point in that Manifold created by them dimensions.
Then after testing with other ML-algorithms such as Random Forests and K-NN, It hit me that SVMs would do a great job of getting to the global-optimum because of their nature of Dual-Optimization of a Paraboloid, which it turns out absolutely does.
Apart from all the other analysis and prediction, one of the important factors was the imbalance of Classes(Reviewers), hence, used SMOTE analysis.

Which led to words such as 'and','with', as well as, 'this' to have high frequency.

The words such as 'Palate' and 'Aroma' having high frequency implies that a Reviewer is looking for taste and smell respectively.

Wow, the Reviews of Fiona Adams and Christina Picarkd almost surely dissapear. This could be a problem to do justice to them.
Cristina only has written 6 reviews, either that's the truth or the NaNs might have her information (might not be case as when I did the prediction with Nans as Unknown_Reviewers they did seem to be classified with quite good recall.), either way, for now I chose not to focus on her.
Fiona on the other hand can be saved to an extent using SMOTE analysis.

With the SMOTE analysis Fiona Adams did get some justice, Precision as expected was high, Recall was low mostly because of the less number of her Rewviews written
The Precision and Recall did increase although Precision by not so much, still the Recall's case is quite significant I would say

Not much of a difference on average, but definietly less without Stop_Words.
The Precisions of Reviewers definietly did drop especially the ones with less number of Reviews.
- I suspect that stop_words usuage might be related to the slang of a Reviewer