Skip to main content



Markov Chains and Random Walks in PageRank and Personalized PageRank

https://www.r-bloggers.com/from-random-walks-to-personalized-pagerank/

http://ilpubs.stanford.edu:8090/422/1/1999-66.pdf

With the PageRank algorithm and its Basic Update Rule, we can figure out the relative power ratings of webpages. Instead of the Basic Update Rule, the articles introduce a different perspective to examining the graph of webpages. The first article introduces the random walk model to the web page graph. Suppose we have a random web surfer starting at some node with some given distance to travel. This surfer clicks on a random outlink from that node and continuous to do so until he has traveled that distance. Now imagine a large number of such surfers all starting at the same node chosen at random. Running this experiment multiple times, we should see a stable distribution of the walkers across the graph. This should work, but in the example below, we see that everyone starts going back and forth because the graph is bipartite.

Click on images to see the GIF moving

While we expected a stable distribution, we see a very oscillating set of distributions. While we can’t say the entire graph of web pages is bipartite, we cannot also say some subgraph will not be bipartite. Hence, for these kinds of situations, the article introduces the “lazy” random walker. Let there be a 0.5 probability that a walker will retain his position at his current node for that iteration, counting as one distance traveled. We then see a more stable distribution of walkers after multiple iterations.

Click on images to see the GIF moving

In fact, the nodes with more outlinks (higher degree) will have the most walkers, possibly indicating some relation with the relative power of each web page. Thus, this approach focuses on prioritizing nodes with the most outlinks. However, the article proposes a situation where the node where the surfers initially began should be prioritized.

This introduces the concept of Personalized PageRank. Where instead of having probability 0.5 of staying at a certain node, we have a probability α of the surfer jumping to any random node in the graph. The second article gives commentary on the necessity of such procedure. Consider a graph like below, where there is cycle of nodes with only incoming links and no outgoing links.

If the surfer enters the cycle B-C-D, there is no way of going out. At a certain point, the surfer will be bored looking at web pages B, C, and D, and would like to go out. By making a PageRank give the surfer the probability of a “jump” to a random node for every iteration of movement, we can ensure a stable distribution of walkers across the graph after multiple iterations. The personalization can happen by customizing the probability of the “jump” on each node. That is, customizing the probability to where the surfer will most likely make the “jump.” It can be random (uniform) but in the case like this were we prioritize the origin node of surfers, we can make the person do a “restart jump” at the origin node. Hence, we get something like the image below.

Click on images to see the GIF moving

Anyone with the knowledge of Markov chains should see something familiar. Suppose we have X as the initial distribution of walkers across the graph. Given a transition probability matrix A, the distribution of walkers after one iteration will be x’=Ax. Given n iterations, we have x(n)=(A^n)x. We see here a Markov Chain with transition probability matrix A and initial distribution X. With the lazy walker model, we have x(n)=0.5(A^n+I)x, where I is the identity matrix. For the Personalized PageRank model, we have an additional probability matrix E, that denotes the person’s preferences for jumping around, and of course, probability of jumping, α. Then, the distribution becomes x(n)=(1-α)(A^n)x+αE.

This example shows that we can personalize the PageRank algorithm to prioritize certain factors. While the Basic Update Rule from class looks at the outlinks as a major factor, the case above with the Personalized PageRank considers your starting point a strong factor. According to the first article, “if you set a homepage in your browser or visit the same set of webpages frequently, search engines use this fact and rank web-pages higher which are closer to the set of web-pages you visit often.” Therefore, personalizing your PageRank for your search engine. Whether Google does this or not is unknown, since they stopped supporting PageRank “officially” for a while. However, it interesting to see how PageRank algorithms can be personalized for certain factors and for each person’s search results and that as a side, PageRank algorithm is, in essence, a Markov chain.

Comments

Leave a Reply

Blogging Calendar

October 2018
M T W T F S S
1234567
891011121314
15161718192021
22232425262728
293031  

Archives