PageRank: The Web as a Graph
Without even thinking about how it works, you constantly Google things and trust that it will give you links to relevant websites. But how does it work? You might be tempted to think that Google keeps track of all the web pages and when you type something into the search box, it returns an index to the key words, similar to how you would find a key word in an article by pressing Ctrl-f. There are numerous problems to this approach. For example, suppose you want to apply to Cornell so we type in the world Cornell and expect that “www.cornell.edu” would be the first link on the webpage. However, there are many other pages on the internet containing the word Cornell and “www.cornell.edu” might not be the first link. We know this isn’t this case, but why isn’t it?
We can think of the internet as a directed graph and this is very intuitive since pages link to each other in a one-directional way. One of the main idea behind PageRank is that the importance of a web page can be calculated by the number of external web pages that link to it. So for example.
In the first example, webpage 1 is the most important since webpages 2-5 point to webpage 1 and webpage 3 is the least important since no webpages point to it. In the second example, the most important webpage is harder to pinpoint, but again it seems to be webpage 1.
This is a very simplistic model of how PageRank works and we have just barely scratched the surface. In our earlier example, we assumed that each edge is weighed the same. Clearly, this is not the way Google runs its search engine. Since each webpage links to another webpage, we could increase the weight of an edge that comes from “cornell.edu” and decrease the weight of an edge that comes from “yahooanswers.com”. We could keep adding layers of complexity to our graph to refine our search engine even further. I will stop here to save you the headache of knowing way more than you’ll ever need to know about the Google Search engine, but if you are interested, I included a link below.
So in conclusion, how does PageRank work? No one really knows how PageRank is calculated by Google. Every search engine (Yahoo!, Bing, Baidu) all have different algorithms but Google seems to the fan favorite in search bars. So whatever algorithm Google is using, it’s working.
http://www.smashingmagazine.com/2007/06/google-pagerank-what-do-we-really-know-about-it/
http://jeremykun.com/2011/06/12/googles-pagerank-introduction/