Skip to main content



Comparative Study of HITS and PageRank Link based Ranking Algorithms

Web-page ranking is an optimization technique used for search engine, and basic page ranking algorithms can be briefly classified into two class: Content-based page ranking which is influenced by number of matched terms, frequency of terms, and location of terms; Connectivity-based page ranking which use two famous link analysis methods: RageRank Algorithm and HITS algorithms. The PageRank algorithm was developed at Stanford University by Larry Page and Sergey Brin in 1996. A simplified version of PageRank is defined in Equation:PR(A)=c∑ PR(v)/Qv where PR(v) are all page link to page A, Qv is the number of page to which v link to, and c is the normalization factor. In short, the score of a page A will be evenly distributed to the page that A link to, and sum of page score that link to A will be the updated PageRank of A. HITS is called Hypertext Induced Topic Search or hubs and authorities. This is a link analysis algorithm developed by Jon Kleinberg in 1998. Authority score is the page that provide important and trustworthy information on a given topic where Hub score contains link to authorities. The authority will be the sum of hubs that points to it, and hub will be the sum of authorities that it points to.

Link above information to what we learnt in lecture and textbook, both algorithms are introduced in textbook and lecture. PageRank can be consider as kind of “fluid” that circulated through the network, passing from node across the edges. The initial PageRank will be evenly assigned to each node. When we starts updating, each page divides its current PageRank evenly across its outgoing links, and new PageRank will be the sum of PageRank of incoming link received. And at one specific value, the network will reach equilibrium where updated PageRank is identical as previous. Hub and Authorities is a link analysis algorithm that rates web page, and this is also known as voting by in-links. The first step is initializing the score of each point to be auth(p) and hub(p) both be 1. The authority update rule is for each page p, the update auth(p) to be the sum of the hub scores of all pages that point to it, and the hubs update rule is for each page p, update hub(p) to be the sum of the authority score of all pages that it points to. This is the identical procedures introduced in lecture, but with more complicated network.

Work cited page

  1. Pooja Devi1, Ashlesha Gupta, Ashutosh Dixit “Comparative Study of HITS and PageRank Link based Ranking Algorithms” International Journal of Advanced Research in Computer and Communication Engineering 3, Issue 2, February 2014.
  2. Easley, David, and Jon Kleinberg. Networks, Crowds, and Markets Reasoning about a Highly Connected World. New York: Cambridge UP, 2010. Print.

Comments

Leave a Reply

Blogging Calendar

October 2015
M T W T F S S
 1234
567891011
12131415161718
19202122232425
262728293031  

Archives