Facebook SansNet Viral Information Cascade Detection
https://research.fb.com/publications/detecting-large-reshare-cascades-in-social-networks/ (download the paper at that link)
While learning about information cascades in class, I was interested in learning about the applications and examples of information cascades in social media. As a moderate consumer of various social media platforms, I could imagine how information cascades have a large effect on the type of content that becomes most popular on different social media. The simplest example I could think of is the Reddit upvote system for posts. It’s been a long-known fact that the initial few voters (upvote or downvote) on posts on any given subreddit (a forum for a particular topic) have significant influence on whether those posts become popular. As a result, I was always interested in whether it would be possible to develop methods to analyze whether or not a post is destined to become popular, before it has even been posted. In searching for research done on this topic, I came across a paper by a Facebook research group associated with Virginia Tech that explains SansNet, a network agnostic approach to modeling large reshare cascades.
In this case, the idea of popularity is determined by the frequency/amount of resharing of Facebook posts. The paper first outlined current challenges with predicting viral cascades. The foremost challenge is the lack of knowledge regarding the complete network structure that the information travels through. The reason for this lack of knowledge has to do with the fact that most networks are just “inferred” networks, and so most networks are difficult to obtain or “noisy” in terms of the data acquired to model the network structure. However, SansNet is able to bypass these issues by modeling viral cascades without considering the network structure at all. Instead, SansNet focuses on modeling the behavior as a time series, focusing only on what stage in evolution a post is in. In other words, SansNet doesn’t attempt to predict how large a cascade will grow, but rather just whether or not it will become viral. I believe an adequate analog to our class content would be information cascades. Clearly, the virality of Facebook posts is essentially the result of an information cascade. SansNet determining only whether a cascade will go viral is similar to determining whether a certain information cascade will lead to an incorrect cascade or not. It analyzes a certain property of the cascade without fully delving into the details of the probabilities that certain individuals in the network succumb to the information cascade.
SansNet’s criteria for virality is crossing a certain relative size threshold. It determines this by using a survival model with a random variable that represents the time to event and a survival function that represents the probability that a cascade will not encounter an event until a certain time. The actual math behind this survival model is fairly complicated and at most times was too complex for me to comprehend, but the results and conclusions in the paper are still valuable and insightful. Specifically, the paper did conclude that SansNet performed better than all current alternative methods of virality detection in terms of F-measure, which is the harmonic mean of precision and recall of the models. Ultimately, the paper concludes that SansNet’s network agnostic approach to virality detection is superior to current methods of virality detection, especially in relatively younger cascades. I feel that this paper was a great extension of our class material regarding information cascades. While most of the cascades we studied/modeled were very small in scale (often small groups of students), this paper outlined the study of an information cascade in a network that consisted of hundreds of thousands of individuals (they used a data set of 250,000 photos/videos). I also believe that the paper’s study also related to our study in class of the Rich-get-richer phenomena. While the paper never directly mentioned it, I would assume that since they were studying Facebook posts and photos, it is likely that much of their data that went viral did so due to the rich-get-richer phenomena. It is interesting to see a model that is able to predict how this phenomena is able to actually affect a large-scale network. Our class material did consider the case of video virality as it relates to the rich-get-richer phenomena hypothetically (using power-law distribution), but it was interesting to see an actual example of it in real life.
In general, it was interesting to see how information cascades (and possibly the rich-get-richer phenomena) apply to the real world and how we can use our knowledge of information cascades to make predictions about how a real network will evolve over time. Knowledge like this can be very powerful when it comes to controlling the information that is passed throughout a network (in this case, those networks would be social media platforms). Therefore studying the related effects is crucial for interpreting the information we obtain through social media.