Skip to main content



An Improved Computation of the PageRank Algorithm

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.3.3964&rep=rep1&type=pdf

As learned in class, Google uses the ranking method PageRank in order to measure the relative importance of web pages. However, in practice the algorithm can sometimes fail to have the sum of all the PageRank values equal to 1 as the computation process continues. Hence, the PageRank values become smaller than they should be. The algorithm in the above paper computes the PageRank values of the Web pages such that all the PageRank values sum to 1.

Specifically, the paper finds that PageRank exhibits “norm-leak” in which the Rank (vector used to compute PageRank values) loses part of its norm continuously in situations where dangling pages are part of the graph and overall calculation. The implications of the drawback are that the PageRank values are smaller than true values and that the process may not converge to a fixed point as the norms of the Rank calculations are not held constant. The proposed algorithm in the paper that redefines how dangling columns of the graph matrix are represented. In the new calculation, a dangling page will evenly distribute all of its importance to all pages (as apposed to 1-dampening factor in the original PageRank algorithm). Both algorithms were run on a set of 10 million web pages, in which PageRank values summed to 0.52, while the norm-leak phenomenon did not take place in the new algorithm. Overall, the new algorithm solves the norm-leak problem with low amounts of spatial and computational overhead.

Comments

Leave a Reply

Blogging Calendar

October 2017
M T W T F S S
 1
2345678
9101112131415
16171819202122
23242526272829
3031  

Archives