Skip to main content



PageRank: The Graph Theory-based Backbone of Google

http://ilpubs.stanford.edu:8090/361/1/1998-8.pdf

In 1998, 2 Stanford computer science candidates forever changed the World Wide Web as we know it.  They created one of the greatest universal website used daily.  Currently, this site is the most visited site.  And no, it is not Facebook.  Want a hint? Theses 2 Stanford students were nicknamed as… the “Google boys”.  Google.com is one of the most successful companies in the world.  What was the basis for its success?  As everyone probably knows, it was the Google Search Engine that initiated Google’s meteoric rise into the record books.

The Google Search Engine is based one simple algorithm called PageRank.  Originally conceived by Larry Page and Sergey Brin in 2008, PageRank is an optimization algorithm based on a simple graph.  The attached publication is Page’s and Brin’s original paper which details the exact original PageRank algorithm.  The PageRank graph is generated by having all of the World Wide Web pages as nodes and any hyperlinks on the pages as edges.  The edges are further characterized as weak or strong edges by weighting the edges.  Pages that are linked by more credible sources such as CNN or USA.gov sites have higher weightings for the respective edges.  Thus, if we compare two sites with the same number of edges.  PageRank will give the site with more links to credible sources a better rank.  The total number of edges also plays a role as pages with more edges tend to have better ranks.  Finally, when someone searches for a query, the search engine parses the strings and attempts to find the sites that closely matches the strings (another mathematical algorithm for another time).  It then ranks those sites according to PageRank with the best ranks appearing first.  This simple procedure is how the Google Search Engine runs a search query.

Clearly the true accuracy of PageRank can be attributed to the weights and categorization of the edges.  Thus, in theory, if one knew the true weights of the PageRank algorithm, then one can manipulate aspects of a website to generate a better PageRank.  This is not a common occurrence as Google tends to tweak the weighting values quite frequently while keeping the weighting system as confidential as possible. There have been a few instances in which people researched and actually came close to figuring out the actual weighting system.  To combat this situation, Google warns that any websites caught with a manipulating will cause Google to manually devalue the websites in violation.  Based on this, it is clear that Google’s security policy of keeping the actual edge weighting algorithm as secret as possible consistently provided Google with an edge over other search engine companies.

Comments

Leave a Reply

Blogging Calendar

September 2011
M T W T F S S
 1234
567891011
12131415161718
19202122232425
2627282930  

Archives