Criticism on PageRank – Axis of Evil in the Spam-Crowded Web?
As discussed in the class, Google’s algorithm of determining which web page is relevant as a search result, namely PageRank, is closely related with the extent of connectivity of the web page. The more it is connected with other web pages with relevant topics, the more “important” and “relevant” is the page determined according to PageRank algorithm. Also, the fact that the limiting value of each page’s PageRank within a network converges to a certain numerical value suggests that the structure of the network, rather than the semantic contents or “relevancy” of the web page, is the key determinant of a web page’s PageRank. Such structural characteristic of PageRank naturally leaves Google as a search engine prone to spams and other ways of manipulating a web page’s PageRank. This weakness is the basis of the linked article’s criticism on PageRank.
What the article points out is that since Google relies on the connectivity of a web page to determine its PageRank, markets wherein web-page managers sell and buy links have emerged. Thus, people could actually “buy” higher PageRank scores by purchasing links in the link-selling market, such as SearchKing. It is not difficult to imagine the detrimental effects that link-selling economy has on Google as a search engine. Thus, Google did take actions against SearchKing and those who “bough” PageRank scores from it by “penalizing the site and some of those in the network with PageRank score reductions or actual removal from Google.” Such measure can be understood based on the concept of “scaling” discussed in the classes as well as the textbook. Although the article doesn’t specify whether the “scaled” PageRank scores were redistributed among web pages, as the textbook explanation does, deducting a certain amount of PageRank from those engaged in fraudulent activities to gain higher PageRank score seems to go along with the idea of “scaling.”
Although this article’s criticism has a point in criticizing the nature of PageRank algorithm that is prone to fraudulent activities to increase the scores, the article does entail some logical jumps. One is associating the “visibility” of PageRank with link-selling networks and spam activities. The article goes one step further by claiming that, as Google will make each page’s PageRank score inaccessible as it did before, the link-selling networks and spam activities will dwindle. Such claim lacks logical support since, even though the PageRank scores get hidden, people still do know that such algorithm exists and that link sellers like SearchKing still possess the power to manipulate PageRank scores.
Of course, Google is not a perfect search engine. However, based on my personal experience, it is one of the best, or least imperfect, search engines among the many imperfect ones we have currently. Although I am not an expert in computational linguistics or natural language processing, there are many ongoing researches in these fields that potentially will allow including qualitative and semantic aspects of a web page into ranking.
Until then, however, I’ll Google anyway.
RIP Google PageRank score: A retrospective on how it ruined the web