Skip to main content

Google’s PageRank Algorithm

In class, we spent about a week discussing an algorithm for ranking the relevance of web pages to search terms. We found that to properly rank pages according to their relevance, it is necessary to examine the pages linking to search results, ranking these “hub” pages according and giving them a score related to how many relevant pages they link to. Using these values we compute a score for the relevancy of our “authority” pages.

However, this algorithm, while a good explanation, is not really how Google’s PageRank works. The algorithm is named after Larry Page, co-founder of Google (not simply the fact that it ranks pages). The true PageRank algorithm, invented by Larry Page and Sergey Brin while students at Stanford, now trademarked by Google, uses applied Linear Algebra — specifically eigenvalues and eigenvectors. In Linear Algebra, any linear transformation between vector spaces can be represented as a matrix. For many linear transformations, however, there are vectors which, under a given transformation, are only scaled rather than being transformed in a more dramatic way. These vectors are known as eigenvectors, (“eigen” is a German word meaning own, inherent, or particular). As a more mathematical definition, an eigenvector x of a square matrix A is a nonzero vector such that Ax=λx for some scalar λ. This λ is called an eigenvalue, and x is the eigenvector corresponding to λ.

So how does all of this relate to PageRank? Consider the results one gets from typing a search term into Google. The initial result from a naive search is a set of pages which may or may not actually relate to the search term, and these pages often have links to each other. Then, we can construct an nxn square matrix A, where each (i, j) entry of A is the proportion of links from page j to page i. For example, if page j has 5 links total, and 2 of them go to page i, the (i,j) entry would be 2/5. In a simple case, we also assume pages do not link to themselves, so the diagonal entries are 0.

Now, since A is a square matrix, we can compute its eigenvalues and eigenvectors. We want to find a way to order the pages by their relative importance. Using Linear Algebra (I won’t go into the mathematical details), it turns out that the best way to do this is to find an eigenvector corresponding to an eigenvalue of λ=1. Since we only care about the relative rankings, it would make our lives easier if the sum of all entries in this eigenvector were 1, and since the eigenvectors are scalar multiples of each other, we can simply scale whatever eigenvector we find so that this is the case.

As it turns out, this eigenvector represents exactly the relative relevancy of our pages to the search term. So, the first entry of the eigenvector corresponds to the rank of the first page in our set, the second entry to the rank of the second page, and so on. Then, we can simply rank the pages by the corresponding values in the eigenvector from highest to lowest.

There is a good example, and much more explanation, in the source linked below.



Leave a Reply

Blogging Calendar

November 2015