Predicting Information Cascades
Over the past few lectures, we have been creating different mathematical models based on our (social) intuitions of how people make decisions in light of information of the decisions of others. First, we looked at this problem in terms of the global behavior of the population and developed a model based on Bayesian inference. However, this problem naturally lends itself to a network structure, which leads to a more granular notion of the information available to a person when making a decision—that is, knowledge of their friends. This of course gives us the threshold cascade model. Although we know how to analyze simple cascades in network structures, there are more sophisticated models used in real-world network analysis in which predicting cascades becomes rather non-trivial in terms of both actual prediction and the computational complexity. For example, there are algorithms utilizing deep learning techniques on the graph structure of social networks to predict information cascades, whereas others represent networks as certain objects that have well-founded analyses in artificial intelligence. All in all, these algorithms attempt to approximate the “essence” or behavior of the network with existing mathematical structures to predict cascades. However, they are only as good as single approximations can be, which leads Li et al. to DeepCas, which they describe in their paper as an “end-to-end predictor of information cascades.”
Essentially, DeepCas improves on earlier work on adapting machine learning to improve the accuracy of predicting information cascades. Their algorithm takes a cascade graph i.e. a specially annotated social network graph and splits it up into sequences that roughly corresponds to local groups of nodes in which cascades may originate. Then, these sequences are fed through a recurrent neural network which are trained to distinguish individual sequences and their role in causing information cascades. These sequences are then analyzed further to recover a graph representation that can be used to output the (mathematical representation of) areas which are more likely to cause cascades. This algorithm was tested on multiple data sources, including networks from Twitter, and was compared to “baseline” methods of predicting cascades, showing that in some cases, DeepCas was a more efficient predictor and sometimes gave fewer false positives. Overall, it demonstrated the power and utility of using multiple machine learning methods over any one approximation, which looks to be the future of analyzing network models.