Skip to main content



Extending the Mathematical Theory of Google’s PageRank Algorithm to Other Models

In class, we described Google’s PageRank algorithm in terms of an iterative process. First, assign arbitrary positive importance values to nodes in a network, and then make the values “flow” throughout the network structure. It is a mathematical fact that no matter what positive starting values we choose, iteration of this algorithm will result in a “steady state” set of importances of the nodes.

This is because we may represent the iteration of the algorithm as a square matrix (dependent on the network structure alone) acting upon a vector representing the scores of the nodes. The Theorem of Perron and Frobenius states that matrices of this type with solely nonnegative entries possesses a leading eigenvector and eigenvalue, and a large number of iterations of the matrix on a positive initial vector will get to the steady-state vector eventually. Essentially, this theorem states that any iterative process represented by a nonnegative matrix will achieve a fixed, steady-state directionality (and a fixed set of values, if we keep normalizing the result or construct the matrix to have leading eigenvalue 1).

This theorem has striking implications besides PageRank. In this post, I discuss some other instances of this theory best understood by random walks. Of course, random walks are deeply related to the network-flow process described above — according to Austin Benson et. al (source below), a simple random walk is a Markov process that can be thought of as a nonnegative matrix—hence, it possesses a leading eigenvector and hence a steady-state result under constant iteration. This is essentially saying that if one keeps carrying out a random walk with a fixed probability of moving to a state, the eventual probability distribution among states can be calculated by simply looking at the transition matrix and performing an eigenvector-finding algorithm known as diagonalization.

This idea of iterating on a graph is what is known as an “existence theorem”: given this structure, does a steady state set of information exist? Google’s PageRank uses this to efficiently gauge how important each node is in a directed network, but there are perhaps more “sinister” implications.

Herschlag et al. from Duke University described how a Markov process combined with “Monte Carlo sampling” (essentially random sampling) can lead to an optimized map that represents district boundaries, such that Republican “power” (i.e. probability to win all the districts, even with an opposition majority) is maximized. This is more commonly known as gerrymandering. While this is not perfectly analogous to Perron-Frobenius on positive matrices, it represents the same type of existence theorem that leads to not only the possibility of a gerrymandered map, but gives an algorithm for computing it. One can see this as analogous to a topological network flow, as population clusters within small distances of each other can be thought of as possessing an edge, and rearranging the nodal values of this network structure (i.e. the population contained in a “gerrymander”) according to certain flow-rules leads to a steady-state optimized solution that depends on the flow-rules. This is the central idea (albeit simplified) that exists in Google’s page-rank: an existence theorem combined with an iterative process leads to very simple computational optimization techniques that has a wide range of uses, from ranking pages and authorities to optimizing gerrymandered structures.

 

http://www.cs.cornell.edu/~arb/papers/spacey-random-walks-sirev-2017.pdf

https://arxiv.org/pdf/1709.01596.pdf

https://www.nytimes.com/2017/10/06/opinion/sunday/computers-gerrymandering-wisconsin.html

Comments

Leave a Reply

Blogging Calendar

October 2017
M T W T F S S
 1
2345678
9101112131415
16171819202122
23242526272829
3031  

Archives