Skip to main content



Random Walk Algorithms for Movie Recommendation

http://ids.csom.umn.edu/faculty/gedas/cars2010/Bogers-CARS-2010.pdf

In class, we learned about the PageRank algorithm which revolutionized the way in which search engines rank their results.  Before PageRank, most search engines returned results solely based on how well they matched the query that the user posed and ignored the network structure of the web all together.  PageRank uses the structure of the web to return pages that match the query and are also “highly recommended” by other pages.  The reliance on the network structure has improved search engines dramatically, and now most highly regarded search engines incorporate the network structure into their ranking system.  However, PageRank is only a specific example of a more general random walk algorithm, which can be used rank the importance of certain nodes in a network based on the network structure.  This article provides a novel and interesting application of random walk algorithms to a movie recommendation service which demonstrates a not as well-known example of how network analysis can be used to create fascinating services.
This article provides a brief introduction into the random walk algorithm and describes how the authors used network structure to recommend movies to users.  A random walk over a network places a user at a random node in the network.  The user then follows one of the outgoing links from this node at random bringing the user to a new node in the network.  The user then repeats this process any number of times.  Following the nodes that the user passed through gives a random path through the network.  (As the course textbook explains, PageRank is a specific example of the more general random walk algorithm, with the rank of a given page equal to the probability that a user participating in a random walk is at that page at a given time).  The random walk algorithm allows us to rank the importance of nodes in a network based on the structure of the network.  As we see in PageRank, the most important nodes are those that have many in-links from other important nodes.  With more important in-links, a node becomes more likely to be reached by a random walker, and therefore, is ranked as more important.
In order to apply the random walk algorithm to rank movies in order of importance, the authors simply have to create a network that best represents the data and make slight modifications to the algorithm which encode the important aspects of the problem that they are trying to solve.  In this situation, the authors modeled the network as a contextual graph, where the nodes were movies, users, movie genres, tags, and actors.  Although these are the only things that the authors used, the model can easily be changed to support other types of nodes which may provide other helpful information.  Their random walk algorithm was also slightly modified.  It is intended to model a user browsing IMDB, which means that the user starts at a movie that he/she recently watched and follows links based on what he/she liked about the movie.  This would bring the user to a new movie that had similar qualities at the movie that the user started with.  Unlike the general random walk algorithm, the process is not repeated until the probabilities that a user is at a given node converge to the stationary probabilities, but the random walker only takes a finite number of steps to ensure that the movie chosen is in the same vacinity as the starting movie.  However, this is still a random walk because the links that the user takes from the initial movie is still random.  These modifications allow the authors to create a system that makes movie recommendations based on network structure as well as ratings and while the authors do not provide any statistics to show how well their system is performing, it is nevertheless, a very novel approach to building a recommendation system.
This article is interesting because it shows how useful network structure is in many problems in computer science.  By carefully choosing how to represent their data as a network and by making subtle changes to the random walk algorithm, the authors were able to adapt the random walk algorithm to fit an entirely new problem.  This is exciting because it opens up the possibility of using networks to solve an entirely new range of problems, even those where the underlying network structure is very subtle.

In class, we learned about the PageRank algorithm which revolutionized the way in which search engines rank their results.  Before PageRank, most search engines returned results solely based on how well they matched the query that the user posed and ignored the network structure of the web all together.  PageRank uses the structure of the web to return pages that match the query and are also “highly recommended” by other pages.  The reliance on the network structure has improved search engines dramatically, and now most highly regarded search engines incorporate the network structure into their ranking system.  However, PageRank is only a specific example of a more general random walk algorithm, which can be used rank the importance of certain nodes in a network based on the network structure.  This article provides a novel and interesting application of random walk algorithms to a movie recommendation service which demonstrates a not as well-known example of how network analysis can be used to create fascinating services.

This article provides a brief introduction into the random walk algorithm and describes how the authors used network structure to recommend movies to users.  A random walk over a network places a user at a random node in the network.  The user then follows one of the outgoing links from this node at random bringing the user to a new node in the network.  The user then repeats this process any number of times.  Following the nodes that the user passed through gives a random path through the network.  (As the course textbook explains, PageRank is a specific example of the more general random walk algorithm, with the rank of a given page equal to the probability that a user participating in a random walk is at that page at a given time).  The random walk algorithm allows us to rank the importance of nodes in a network based on the structure of the network.  As we see in PageRank, the most important nodes are those that have many in-links from other important nodes.  With more important in-links, a node becomes more likely to be reached by a random walker, and therefore, is ranked as more important.

In order to apply the random walk algorithm to rank movies in order of importance, the authors simply have to create a network that best represents the data and make slight modifications to the algorithm which encode the important aspects of the problem that they are trying to solve.  In this situation, the authors modeled the network as a contextual graph, where the nodes were movies, users, movie genres, tags, and actors.  Although these are the only things that the authors used, the model can easily be changed to support other types of nodes which may provide other helpful information.  Their random walk algorithm was also slightly modified.  It is intended to model a user browsing IMDB, which means that the user starts at a movie that he/she recently watched and follows links based on what he/she liked about the movie.  This would bring the user to a new movie that had similar qualities at the movie that the user started with.  Unlike the general random walk algorithm, the process is not repeated until the probabilities that a user is at a given node converge to the stationary probabilities, but the random walker only takes a finite number of steps to ensure that the movie chosen is in the same vacinity as the starting movie.  However, this is still a random walk because the links that the user takes from the initial movie is still random.  These modifications allow the authors to create a system that makes movie recommendations based on network structure as well as ratings and while the authors do not provide any statistics to show how well their system is performing, it is nevertheless, a very novel approach to building a recommendation system.

This article is interesting because it shows how useful network structure is in many problems in computer science.  By carefully choosing how to represent their data as a network and by making subtle changes to the random walk algorithm, the authors were able to adapt the random walk algorithm to fit an entirely new problem.  This is exciting because it opens up the possibility of using networks to solve an entirely new range of problems, even those where the underlying network structure is very subtle.

Comments

Leave a Reply

Blogging Calendar

October 2011
M T W T F S S
 12
3456789
10111213141516
17181920212223
24252627282930
31  

Archives