The Distribution and Structure of Retweet Cascades
Presentation: Contagion on Social Networks
Paper (1): Everyone’s an Influencer: Quantifying Influence on Twitter
Paper (2): The Structural Virality of Online Diffusion
Last year, I attended a guest lecture by a former Cornell PhD student, Duncan Watts. The talk gave an overview of his research on contagion on social networks. One section focused on the distribution and structure of retweet cascades on Twitter. The relevant papers are linked above in addition to a presentation similar to the one I attended. Today, I would like to share some of his results and their relevance to our recent units on information cascades and the power law.
First, let us define what we mean by a “cascade” on Twitter. We can imagine the original tweet as the root of a tree. If the tweet is retweeted, the retweet becomes a child of the tweet that exposed them to the content. Lastly, we can define the “size” of the cascade to be the number of people reached and the “depth” to be how “far” the cascade falls from the original tweet. Consider the following example (recreation of images from Watts presentation)
In the left cascade, a tweet is made (node 1) and is retweeted by one of their followers (node 2). Similarly, in the center cascade, two followers retweet it. Lastly, in the right cascade, node 3 retweets the content but was exposed to it via node 2’s retweet, not the original tweet from node 1. Furthermore, both the center and right cascades have size 3 but the center cascade has a depth of 1 while the right cascade has a depth of 2.
The first result that he shared dealt with the most common cascade structures. Watts considered every tweet including a piece of content (picture, video, etc…) over a year-long time span. The resulting dataset had over one billion observations. He found 93% of tweets had no retweet, 5%, 0.9%, and 0.3% had 1,2 and 3 friends retweet respectively. Lastly, 0.3% of the time, a friend would retweet and a friend of theirs (who did not follow the original poster) would also retweet. Watts notes this accounts for over 99% of all tweets but we often see tweets that have tens of thousands of retweets. In class, we have seen many areas with large skews in popularity (websites, apps, etc…). We found that the distribution of popularities often follows a power law distribution. In fact, Figure 4 from Paper (1) shows that both cascade size and density approximately follow a power law distribution. Given the nature of Twitter, this distribution is not unexpected.
The other result I would like to share is his analysis of cascade structure. Specifically, he wanted to look at tweet popularity versus how viral the tweet’s cascade structure was. Broadly speaking, the cascade structure could be “broadcasted” where everyone who retweets saw the content from the same place (left) or “viral” where there are fewer retweets but, after many steps, the tweet still reaches a large group of people (right). The following is a figure from Paper (2) The Structural Virality of Online Diffusion.
In order to quantify whether a cascade was viral or broadcast, they looked at the average pairwise shortest path length. In a pure broadcast, this measure is approximately 2 since the shortest path between any two nodes that retweet is 2. In a pure viral spreading with two retweets at each step, the measure is approximately log(n). In this way, they could quantify which cascades were viral and which were broadcast. I highly recommend checking out the sample of cascades provided in Paper (2) to see the wide variety in their structure. They found that popularity in tweets did not correlate with being viral. In fact, they found the opposite: the popularity of a tweet mostly depended on the largest broadcast.
It was exciting to make predictions about the distribution of twitter cascades using the intuition given from class and then find this intuition to be correct when I looked through the original paper. Additionally, I found it extremely clever how they used measures of the graphs that we learned about in our earlier introductory graph theory units to define exactly what they meant by “broadcast” and “viral.” I found it incredibly fascinating the broad insights they were able to make using fundamental ideas we have discussed in our course.