Skip to main content



When does PageRank fail?

https://www.nature.com/articles/srep16181

Entitled “Ranking nodes in growing networks: When PageRank fails,” this article by Mariani et al. discusses the limitations of the PageRank algorithm on certain dynamic, time-dependent networks. 

PageRank, which is the cornerstone of the Google search algorithm, is based on the idea that “a node is important if it is pointed [to] by other important nodes” (Mariani et al.). It is known that PageRank does well on static networks, even with varying topology, but not much research has been done in examining PageRank’s performance on networks which have a temporal component. 

The first question is how to model a dynamic, time-varying network, in which nodes appear and disappear and links are formed. In the Relevance Model (RM), one of the models considered by the authors, no nodes (or links) disappear; rather their relevance and probability of linking to new nodes naturally decreases over time. In particular, to each node in the network at time t, we can ascribe a probability of it receiving a link from another node in the network. Qualitatively, this probability is proportional to the importance of the node, so that the most important nodes in the network are most likely to receive links. In particular, the probability that a node i receives a link at time t is a function of 1) the number of inlinks of i at time t, 2) the relevance of node i at time t, which can be expressed as the product of the “fitness” of node i, a constant parameter, and a monotonically decaying function of the age of the node (how long it has been since node i has entered the network). In defining relevance, it makes sense to include the latter function based on the premise that nodes become less relevant over time. To complete the picture, m (a controllable parameter) nodes are selected to initiate links at time t. Node i is chosen to fulfill such a role with probability proportional to its 1) Activity parameter (constant parameter) and 2) a monotonically decaying function of the age of the node. Again, the latter is included based on the assumption that nodes become less active over time. 

Mariani et al. test PageRank on this model, varying the input parameters. They evaluate PageRank’s performance by comparing its ranking of nodes (given by PageRank scores) with the ranking of the nodes by their fitness. They find that in certain regions of the parameter space, the PageRank values have a “temporal bias towards recent or old nodes,” the former occurring when the decay of the relevance function is faster and the latter occurring when the decay of the activity function is faster. This temporal bias results in a ranking of nodes that is worse than the ranking of nodes by their indegree. The authors test PageRank on another model and find similar results, concluding that “PageRank can only perform well if the two system’s timescales (of activity and relevance decay, respectively) are of similar magnitude” (Mariani et al.).

In class, we have discussed PageRank in detail as well as its “correctness” in some sense. However, our study of information networks and ranking algorithms has been rather simplistic. There is added complexity when we consider time-dependent networks. Creating models which capture the chaotic nature of real-world networks is a challenge in and of itself, let alone ranking nodes by their importance. This article sheds light on this fact and illustrates that we may need to develop more sophisticated and nuanced ranking algorithms.

Comments

Leave a Reply

Blogging Calendar

October 2019
M T W T F S S
 123456
78910111213
14151617181920
21222324252627
28293031  

Archives