How Facebook Suggests Friends
Ever wondered how Facebook’s People You May Know algorithm works? It’s so accurate it’s almost creepy. At its core, the underlying concept is quite simple, and we can model Facebook’s friend suggestion algorithm as a PageRank problem.
Let’s say we want to find potential friends for a user. Let each of their friends act as a hub, and each of the friends’ friends act as an authority (getting rid of duplicates, as well as the user and their friends). Now, we have a list of people that the user is not friends with, but has mutual friends with. We can assign a score from each of the user’s friends to the potential friends in a variety of ways. We can set each potential friend’s score to the sum of the number of mutual friends that each of the user’s friends has with them. From here, we can back-propagate and repeat the process as we did in class, and assign the hub score for each of the user’s friends based on how “interleaved” they are in the web of the potential friends. After iterating a set number of times or reaching equilibrium, we can suggest the potential friends with the highest scores to the user. While this is a simplified model of how Facebook suggests friends, it provides a good framework for understanding how the system works.
Facebook’s scoring algorithm is quite complex, and certainly does not only rely on mutual friends. In fact, sometimes Facebook recommends friends to you that you have no mutual friends with. So how does this work? Facebook’s scoring algorithm, based on what they have publicly acknowledged, involves mutual friends, interests, work and education information, search history, and mobile contacts. Even then, the accuracy of their algorithm may be quite surprising given these factors, but this is only a testament to the power of the PageRank algorithm.
The idea of recommending items based on some sort of PageRank algorithm is not unique to Facebook. Almost any recommendation system will involve some elements of PageRank, including Facebook’s News Feed sorting algorithm, YouTube’s recommended videos section, Amazon’s recommended items section, and Twitter’s tweet sorting algorithm. Each of these systems are more complex than a simple PageRank algorithm, but they employ some of the core concepts of PageRank to lay their foundation. Ultimately, in a world where targeting information to users is highly lucrative, PageRank is crucial to serve as a basis to jump off of.
https://www.theguardian.com/technology/2016/jun/29/how-does-facebook-suggest-potential-friends-not-location-data-not-now
https://web.stanford.edu/class/msande233/handouts/lecture8.pdf