Power Laws and the Rich get Richer Effect in Research Papers
When reading about a web page’s popularity through in-links being demonstrated by the equation 1/k^2, I wondered if the same applied to published research papers, which reference each other’s ideas and findings as an acknowledgment of inspiration/source. This notion of citations directly parallels our discussion of in-links; when a new research paper cites another one, it either cites it directly or cites one of its sources, similar to the way websites have in-links to each other. I, therefore, predicted that the world’s most popular research papers would adhere to the power law given that extreme imbalances would be bound to form.
This turned out to be accurate, with the top 100 research papers being extreme outliers given that only 14,000 papers have more than 1,000 citations. Surprisingly, it takes 12,119 citations to rank in the top 100 — and many of the world’s most famous papers do not make the cut, not even Einstein’s theory of relativity.
The graph below reflects the number of citations for the top 100 research papers.
The plot mimics a/k^c, so I used excel to find a log-log graph for the referenced online data set and found that it seamlessly fits the power-law model where the exponent can be read off the slope, a simple straight line of best fit.
The graph above demonstrates that only a handful of research papers possess over 60,000 citations whilst papers with citations with around 10,000-40,000 have a much higher frequency, making up the majority of the scatter plot. From the slope of this graph, I found the equation of the first graph to be 199859/k^.618, taking the form of a/k^.618 as predicted.
When looking at the dataset, there are actually only three studies that have over 100,000 citations, each of which belong to the “biology lab technique” discipline. This demonstrates a clear rich-get-richer effect because the popularity of a study being cited is directly proportional to its current popularity, so its number of in-citations grows exponentially. This phenomenon is also known as preferential attachment in the sense that links are formed “preferentially” to papers that already have high popularity. The dominance of the techniques within these three research papers is attributable to the high volume of citations in cell and molecular biology, where they remain indispensable tools. At least two of these three studies won noble peace prizes.
Source: https://www.nature.com/news/the-top-100-papers-1.16224