Skip to main content



PageRank: The Web as a Graph

Without even thinking about how it works, you constantly Google things and trust that it will give you links to relevant websites. But how does it work? You might be tempted to think that Google keeps track of all the web pages and when you type something into the search box, it returns an index to the key words, similar to how you would find a key word in an article by pressing Ctrl-f. There are numerous problems to this approach. For example, suppose you want to apply to Cornell so we type in the world Cornell and expect that “www.cornell.edu” would be the first link on the webpage. However, there are many other pages on the internet containing the word Cornell and “www.cornell.edu” might not be the first link. We know this isn’t this case, but why isn’t it?

We can think of the internet as a directed graph and this is very intuitive since pages link to each other in a one-directional way. One of the main idea behind PageRank is that the importance of a web page can be calculated by the number of external web pages that link to it. So for example.

A web in which the rankings are obvious. Page 1 wins.

A trickier web to rank. Page 1 appears to win.

In the first example, webpage 1 is the most important since webpages 2-5 point to webpage 1 and webpage 3 is the least important since no webpages point to it. In the second example, the most important webpage is harder to pinpoint, but again it seems to be webpage 1.

This is a very simplistic model of how PageRank works and we have just barely scratched the surface. In our earlier example, we assumed that each edge is weighed the same. Clearly, this is not the way Google runs its search engine. Since each webpage links to another webpage, we could increase the weight of an edge that comes from “cornell.edu” and decrease the weight of an edge that comes from “yahooanswers.com”. We could keep adding layers of complexity to our graph to refine our search engine even further. I will stop here to save you the headache of knowing way more than you’ll ever need to know about the Google Search engine, but if you are interested, I included a link below.

So in conclusion, how does PageRank work? No one really knows how PageRank is calculated by Google. Every search engine (Yahoo!, Bing, Baidu) all have different algorithms but Google seems to the fan favorite in search bars. So whatever algorithm Google is using, it’s working.

http://www.smashingmagazine.com/2007/06/google-pagerank-what-do-we-really-know-about-it/

http://jeremykun.com/2011/06/12/googles-pagerank-introduction/

Comments

Leave a Reply

Blogging Calendar

September 2015
M T W T F S S
« Aug   Oct »
 123456
78910111213
14151617181920
21222324252627
282930  

Archives