Ranking nodes in growing networks: When PageRank fails
In class, we learned PageRank to be the most popular and well-known means of ranking pages on the web. One of the main measures of the PageRank algorithm is its use of links to and from other sites to show the importance and relevance of a particular site. I read the paper, “Ranking nodes in growing networks: When PageRank fails”, discussing the effect of PageRank on an actively growing network and how it affected the nodes in the network. As quoted by this paper, “despite its outstanding popularity and broad use in different areas of science, the relation between the algorithm’s efficacy and properties of the network on which it acts has not yet been fully understood.”
The researchers in this paper took two data sets, one with pages from digg.com, a social news aggregator and from the papers from the American Physical Society. They ran PageRank and compared the PageRank to a “fitness” measure they developed for each node.
What was found in “Ranking nodes in growing networks: When PageRank fails” was that the PageRank had a temporal bias. That is, it favored the oldest and the most recent nodes, more heavily weighing the importance of them. Additionally, more recent nodes tended to have links to other more recent nodes, whereas old nodes (if they were updated) had links to both older and recent nodes. New nodes would reference newer and more relevant material. It was found that more recent nodes were an attractor of clicks, whereas older nodes were not, so it was very unlikely visiting a newer node would take you back to an older node, thus further favoring newer nodes. After pointing out this temporal bias, it suggests that the algorithm should be built upon the temporal linking features of a growing network
Source: https://www.nature.com/articles/srep16181#results
