Skip to main content



Research on Information Cascades in Twitter via Linear Threshold Model

Microblogging in recent years has become one of the most pertinent means of online communication that is host to an increasing number of viral phenomena; breaking news, emergency broadcasts, marketing, public relations, promoting new music or movie releases, campaigning, activism and many more. One of the most relevant of these microblogging hosts is indisputably Twitter, in which users post tweets (short messages/statuses), and each user has the option to “follow” any other user on Twitter; this forms a definite social network within Twitter in which the directed graph consists of each user as a node and the “follow” mechanism as a directed edge connecting to other nodes. In a research paper originally published in 2010, four years after Twitter Inc. launched, “Outtweeting the Twitterers — Predicting Information Cascades in Microblogs” (URL source: http://static.usenix.org/events/wosn10/tech/full_papers/Galuba.pdf) by Wojciech Galuba, Karl Aberer, Dipanjan Chakraborty, Zoran Despotovic, and Wolfgang Kellerer, the research authors seek to better understand the information flow within Twitter’s network system particularly through the propagation of URLs through Twitter.  Over a 300 hour period, the researchers tracked 15 million URLs exchanged among 2.7 million users and through data analysis uncovered different statistical regularities in user activity, the social graph, the structure of URL cascades and communication dynamics. The primary goal of this research was to characterize and model the information cascades formed by individual URL mentions in the Twitter follower graph. While substantial research had previously been conducted to better understand how far information diffuses in the Twitter network given initial spread, the purpose of this research was in order to predict “which users will tweet which URLs given a training set of existing URL mentions.” We can see the crucial importance of this research in relation to topics from earlier this semester when we studied the significance of the strength of connections (strong or weak ties) between users in social networks; further research on the tweeting probabilities/tendencies for each user and URL can allow for personalized recommendation of URLs, the ranking and filtering of incoming tweets, and early identification of viral URLs. Additionally, this research can help viral marketing campaigns to select URL injection points to maximize the spread of the URL.

These researchers found that the Twitter follower graph is a giant connected component with a mean shortest path of 3.61; additionally, the tweeting frequencies across the different users and across the different URLs are power-law distributed. This consistently supports what we discussed in class in recent weeks about how power laws describe the behavior of popularity on the web. Unlike typical distributions modeled by the Normal distribution, the popularity of sites is described by the power law distribution f(k) = 1/k^c for some c. This power law distribution can be justified by the “Rich-Get-Richer” model, which ties in directly with the observable consequences of decision-making in the presence of cascades. The researchers also found in their study that information cascades on the social graph tend to be shallow and wide, with an exponentially distributed depth, and that the shallow cascades for each URL are composed of sub-cascades that have both number and size follow power-law distributions.

After exploring several different models to predict the propagation of URLs via Twitter, the researchers found that by using the linear threshold model, in which each node has an associated threshold value (the specific number of infected neighbors of a node for the node to itself become infected — for example, the number of people that must tweet something in order for you to feel compelled to retweet it), they could correctly predict almost half of the URL mentions (55% recall) with at most 15% false positives among the predictions (85% precision). The linear threshold model ties in closely with the topics discussed in our course in Chapter 19; in this chapter, we consider the threshold rule in relation to how a cascade can penetrate and diffuse into a network. In this study, the linear threshold model generalizes by introducing a per-node threshold which must be exceeded by the cumulative influence from all the followees for the user to tweet. This model provides a prediction with at most 15% false positives which could prove very useful; by employing this model, it could serve as means to create personalized URL recommendations, filtering of tweets and spam detection, and to help marketing/campaigning schemes.

 

Comments

Leave a Reply

Blogging Calendar

November 2014
M T W T F S S
 12
3456789
10111213141516
17181920212223
24252627282930

Archives