Taking a Bayesian perspective on fake product reviews
Amazon’s review and rating system was originally implemented to help customers make more informed choices from trusted customers that have previously purchased the item. But in 2015, Amazon did a complete 180, stopping the practice of sponsored reviews and instead, placing strict rules against it. Ironically, the system had been infiltrated by misleading or false reviews, often through merchants reimbursing customers for buying their products in exchange for a 5-star review.
The company claims that they use artificial intelligence to sift through reviews, resulting in a 99% truthful outcome. After deleting suspicious reviews, a product such as “a pair of wireless headphones from Atgoin dropped its rating from 4.4 stars to 2.6.” Suspicious reviews often share the characteristics of repetitive phrases, a huge influx of reviews in a short amount of time and a guaranteed 5-star rating.
Data privacy consultant John D. Cook writes that sample size and Bayes’ Theorem have a huge part in deciphering which products have legitimate reviews. He compares a product with 90 good reviews out of 100, and a product with 2 good reviews out of 2. When you simply compare the averages, the second product has a leg up; however, we take θA as the probability the customer is satisfied with the first product and θB as the probability that the customer is satisfied with the second product. We estimate θA to have a beta(91,11) distribution and θB to have a beta(3,1) distribution. Thus, the probability that the sample of θA is larger than θB is .713, meaning there were more satisfied customers for product one and a higher chance that the customer will be as well. While it may seem intuitive that more reviews support the legitimacy of a product, this Bayesian perspective is important because fake reviews are artificially increasing the sample sizes and the feedback of others have a substantial impact on an individual’s decision-making.
We can apply conditional probability to this situation. Ideally, a good review represents a high signal with 100% guarantee that the product is satisfactory. The user’s own signal is dependent on how they personally perceive the product’s potential. Theoretically, seeing multiple high signals will sway a user’s decision, possibly encouraging them to buy the product as well. Thus, if there was one bad review, or a low signal, as long as the conditional probability that the result will be positive is equal to or greater than .50, the individual may be inclined to purchase it as well. Although this is not a perfect example of what was discussed in class, we learned that once the number of “acceptances” differs “rejections” by at least two, an information cascade will form where the “acceptances” will take precedence. There exists an extremely narrow interval in which the user is not influenced by the previous reviews before a cascade forms, and businesses could potentially use this to their advantage by forging positive reviews when their product is just released to sway users’ decisions.
https://www.johndcook.com/blog/2011/09/27/bayesian-amazon/