Skip to main content

Promise and Pitfalls of Extending Google’s PageRank Algorithm to Citation Networks

The article I read is about the pitfalls of just using number of times cited as the only criteria for determining the significance of a research paper. This article attempts to correct for some of these pitfalls by extending Google’s PageRank algorithm to rank research papers in order of significance. Some of the pitfalls of just using number of times cited for determining paper importance are different fields cite more or less frequently, citations to obscure papers and groundbreaking work are weighted the same, and the importance of influential older papers is consistently underestimated. This article focused on publications in the American Physical Society (APS) family of journals. A graph was created where each node represents a publication and directed edges from an article that cites another article to the article being cited. In Google’s PageRank algorithm, “a random surfer is initially placed at each node of this network and its position is updated as follows: (1) with probability 1 − d, a surfer hops to a neighboring node by following a randomly selected outgoing link from the current node; (2) with probability d, a surfer “gets bored” and starts a new search from a randomly selected node in the entire network. This update is repeated until the number of surfers at each node reaches a steady value, the Google number. These nodal Google numbers are then sorted to determine the Google rank of each node.” A d value of .15 is normally chosen for PageRank because a typical web surfer typically clicks 6 hyperlinks before getting bored and beginning a new search. The PageRank algorithm was slightly modified by initially distributing random surfers exponentially with age, in favor of more recent publications. This new algorithm, CiteRank, is characterized by 2 parameters: d (the inverse of the average citation depth) and τ (the time constant of the bias toward more recent publications at the start of searches). The optimal values of these parameters were found to be 0.5 and 2.6 years respectively. This article was written as a first step to showing how PageRank can be used to rank publications.

I thought this article was a very interesting follow up to the lecture on PageRank and seeing how ranking pages is very applicable to many different things beyond just ranking pages. Additionally, it is interesting to see how the way we model things in class is very similar to how real researchers model very complex networks. Ranking the importance of research papers seems like a daunting task but with the material learned in lecture it becomes a much more manageable and interesting task. Also, it is cool to see how widely applicable the material we learn in this class is because ranking the importance of research papers is an extremely important issue for people trying to find relevant papers in topics they are interested in.



Leave a Reply

Blogging Calendar

October 2019
« Sep   Nov »