Skip to main content



Predicting Microblog Information Cascades

http://static.usenix.org/events/wosn10/tech/full_papers/Galuba.pdf

This paper attempts to establish a highly accurate method of predicting viral content on microblogs, such as twitter. To do so, a 300 hour chunk of data was studied. During this time, more than 15 million unique URLs were shared among all Twitter users in 27 million tweets. This study occurred in 2009, so the user graph consisted of only 2.7 million nodes (users.) Each user had an average of about 80 followers, giving a total of 218 million relationships.

The study then began modelling the usage. Of the links that were posted, the majority were posted by a very small subset of users. In a given post, it was very rare that more than two links were shared, even with URL shortening. The chance of a tweet containing a URL being retweeted depended highly on the number of followers the user in question had, with the probability reaching about 0.5 when the user had 500 followers. Models were devised to represent the number of retweets that a tweet would get, given certain environnmental aspects (equation 1, 2, and 3 in the article, which appear to be impossible to type in this blog.)

In order to test predictive power of this algorithm, the data was split into two halves, with the first 150 hours being used to train the algorithm, and the second 150 hours being used to test the results. In general, the linear threshold model described in the paper appeared to offer the best predictions, with a precision of 0.8, an F-score of 0.65, and a recall of 0.55.

In addition to the primary attempt of the research, predicting viral URL sharing, the study also discovered some interesting data on the commonality of link sharing virality. The amount of user activity generated around a url is power log distributed, and the cascades are very shallow. In addition each cascade is very frequently composed of multiple subcascades each of which are very shallow themselves.

Comments

Leave a Reply

Blogging Calendar

November 2014
M T W T F S S
 12
3456789
10111213141516
17181920212223
24252627282930

Archives