Extending PageRank to the Scientific Community
PageRank is the foundation of one of the most influential companies of this century. It controls how likely we are to see or find any given piece of information on the whole Internet. Therefore, it goes without saying that PageRank is an incredibly successful and sophisticated tool. Because of both its pervasive and powerful nature, many people have tried to extend its utility to other disciplines. However, at its core, it is just a system for ranking the importance of related items. Within any intellectual discipline there are dozens of relevant hierarchies that give important information about the field and the people within it. In this paper, PageRank is extended to examine the network of citations in the “premier American Physical Society (APS) family of physics journals”.
Being able to rank publications is incredibly important. Correct valuations allow for easily searchable content and consequently quick access to the most immediately relevant information. Publication (and citation) rank is also used in evaluating the success of different researchers, departments, and even fields. In fact, this paper was published as part of a series of papers concerning the misevaluation of journal impact and its adverse effect on hiring and promotion decisions. In the past, a researcher’s contribution to the community was evaluated based on the number of citations of his publications with the idea that other researchers would cite important, informative papers the most. (This method is very similar to the way we described PageRank in class.) Therefore, the rank given to a paper would be implicitly determined in the aggregate by the most qualified people in the field. However, this method also has many obvious shortcomings. It has the potential to overvalue people who published very frequently or whose publications were disproportionately popular to the value of the publications. Also, as the paper points out, only papers that have already been written can be cited, so this establishes a very distinct time-based ordering to the network’s topology. However, the method introduced by the paper attempts to combat this by increasing the inverse of the average citation depth (thus assuming that the typical reader of these papers only view two papers, far lower than the average for the web) and by introducing a constant T that negates the time bias previously mentioned. I think this is a rather clever method to account for an otherwise unavoidable source of imprecision. The increased success of the adjusted methodology has great implications for the assessment of individual publications and scientists without the otherwise necessary computational or time burden of assessing the content of the paper.
Additionally, the paper mentions some caveats that I would also like to talk about. While applying PageRank does lead to a more accurate ranking system than simply counting citations, there are still many shortcomings in the methodology. Citation frequency can be a result of a de-facto popularity contest within the scientific community, where some scientists and their publications are given acclaim for reasons other than the merit garnered by their content. Consequently, PageRank relies on an implicit connection between the popularity of a paper and the intrinsic scientific value of its content. This connection is not always as strong as we might hope, and consequently it is something that must be accounted for or at least acknowledged when implementing algorithms of this sort. This relationship can be corrupted by ulterior motives of scientists (and also in people making websites!). When citations are given for other reasons than their merit (e.g. friendship, professional alliance, or increased visibility, possibly because of affiliating institution) then the backbone of the algorithm is also corrupted and will lead to disingenuous results. Consequently, advances made in algorithms of this sort will come with great utility, but there will never be a perfect substitute for manual evaluation done by altruistically motivated individuals.
http://www.jneurosci.org/content/28/44/11103.full