Computing Random Walks
When approaching the PageRank calculation at a scale required to effectively model the current state of the world wide web, specific modifications must be made to the algorithm to allow for computing the scale required. It is possible that the desired number of elements in the space is prohibitively large to compute as a graph. In place of this, the concept of a random walk is introduced. This is the essential component that allows for computation. By utilizing a random walk, the computation of the graph can be stored as a Markov chain, as opposed to developing a data structure that can understand the graph in its entirety.
A Markov chain allows us to store the transition probability from one node to another. This takes the computational storage complexity of our original algorithm to be NxN, where N is the number of nodes in our graph. By understanding that the transition probability Pxy is equivalent to the probability of moving from node x to node y, we can use this to more effectively model the probability of ending on each specific node.
We introduce the concept here of the Markov Chain Monte Carlo method, which we can use to extrapolate our random walk simulation to k iterations, which then allows our transition probabilities to correspond very closely to the true stationary probability of ending on any specific node. In Markov Chain Monte Carlo simulations, we can begin at any one node and iterate a random walk, gradually updating the transition probabilities until it resembles very closely the stationary probability of the Markov chain. This can be thought of as when the probability of which node you started at has no impact on the given probability distribution, and vice versa. One this stationary probability is found, this is effectively the desired outcome for the PageRank algorithm.
The core reasoning behind why this works is due to the convergence of this simulating process. There are several important proofs to pay attention to when understanding why these Monte Carlo simulations will converge to stationarity. The first is found in chapter 14 of the textbook, and the second is articulated in https://www.cs.cmu.edu/~avrim/598/chap5only.pdf, a Carnegie Mellon Computer Science textbook. The textbook chapter delves into proving out the convergence properties of these Markov Chains, and is helpful for understanding the mathematics behind convergence.