Skip to main content



Gaming PageRank

In class, we discussed the PageRank algorithm used on networks of web pages to analyze the relative importance of each website.  For each site, the PageRank is determined by looking at the PageRank of the incoming links, or the sites that link to the page in question.  After several iterations of this formula, the numbers begin to converge on the true values of each site’s PageRank.  In the end, the PageRank can be seen as the probability that a random surfer will end on a given page, and Google ranks sites with higher PageRank higher in its search results.

However, PageRank in its basic form can easily be gamed.  In 2003, a Google “I’m feeling lucky” search for “french military victories” resulted in a fake website that looked just like Google’s page, but said that there were no search results and suggested searching for “french military defeats” instead.

One might wonder, how could this joke page become the number one hit for “french military victories”?  As described on this page: http://www.google-watch.org/gaming.html , it turns out it was mostly due to an influential blogger, named Jason Shellen.  Jason Shellen was at the time a Google employee, and his blog was highly visited and respected.  As a result, the pages on his website had a very high PageRank.  Shellen archived all of his blog posts for every month since 2000, each on a separate page, but on each page he had a “current links” section that was current.  Jason put the link to the french military victories page in his current links section, so all 33 of his archived blog pages linked to the page.  And because Jason’s blog had such a high PageRank, this resulted in the french military victories page getting an enormous boost in its PageRank.  Google actually had some algorithms set up to detect and remove identical pages from consideration when determining PageRank, but because most of the archived pages were different (only the current links section was the same) the algorithm could not detect this.

Google has since vastly improved its search engine algorithms to reduce the threat of PageRank gaming, also known as “Googlebombs.”  For example, Google uses many algorithms other than PageRank to contribute to its overall ranking of a website.  In addition, Google likely set up precautions to make sure that all of the links weren’t coming from the same domain, as in this example.  However, search engine optimization (a term given to the various strategies to increase a website’s visibility in search engines) remains a common practice.  As cool as PageRank is, it just goes to show that developing a real-world algorithm for ranking websites that is immune to gaming is extremely complex.

Comments

One Response to “ Gaming PageRank ”

Leave a Reply

Blogging Calendar

October 2011
M T W T F S S
 12
3456789
10111213141516
17181920212223
24252627282930
31  

Archives