Skip to main content



The Holes in Google’s PageRank Search Algorithm

Google’s PageRank algorithm is mostly a popularity contest. It accounts for many practical factors that would affect a web pages’s place in the list of search results returned by searching one or more certain keywords. For example, to calculate a page’s importance, Google counts in-links, recommendations (which are out-going links), the importance of the recommenders of a page, and the weight of the recommendation from one recommender. There are numerous search engines, but according to a study, more than 76% of searchers use Google’s search worldwide and 84% of Google searchers never go beyond the second page of search results. In addition, the popularity of using Google has resulted in the coinage of the term “googling”. We can safely draw the conclusion that Google is doing quite the good job with its PageRank system.

However, due to this circularly pointing system, there are bound to be holes in the algorithm. Two problems in real webs are “dangling nodes” and subwebs. When you draw out even a small portion of the internet with nodes being webpages and edges being the links between them, it is fundamentally apparent that nodes which have in-links but no outgoing links. The creators of PageRank proposed that if a searcher reaches a page without out-links, the searcher chooses to visit another page randomly. Though this keeps the searcher from getting stuck in the dangling node, this workaround certainly requires more effort and work in finding a related article of interest. In the case of subwebs, a small group of tightly-knit web pages all link to each other. Though this cycling can be overcome if the searcher goes back to the list of search results and chooses a page outside of the subweb, this still poses problems with ranking since the subweb is such a mutually reliant group that will likely move together up or down the list of search results, perhaps getting in the way of other pages.

http://www.acmsonline.org/conferences/2011/proceedings-2011.pdf#page=134
http://ijettcs.org/Volume2Issue3/IJETTCS-2013-05-28-057.pdf

Comments

Leave a Reply

Blogging Calendar

October 2014
M T W T F S S
 12345
6789101112
13141516171819
20212223242526
2728293031  

Archives