Skip to main content



Can cascades be predicted?

As an active facebook user, I was always curious about how, when, where the 50k likes of popular posts started, and speard through the social media and become so popular. There must have been a starting point where it had 0 like, 0 share, but how in the world  it suddenly became 50k likes and thounsands of shares? As I was trying to understand this cascade phononenum in social media, I found an interesting research actually from Facebook that tackled on the problem I was wondering about.

In this particular research done by Facebook Data Science, they tracked the growth of a cascade in order to figure out whether it is possible to predict the cascade on social media. As certain photos become viral, and others die off, the researchers tracked these photos in realtime, and tried to understand how well they can predict cascade’s future growth  as they continue to observe cascade. They pose two main questions regarding the issue: 1) if we have seen 5 reshares of a photo, what’s the probability that a further 5 reshares will happen?, 2) How does this compare with the case when we’ve seen 100 reshares, and want to know whether a further 100 reshares will happen? More generally: if we have seen k reshares, what’s the probability that we will see another reshares? We have seen similar problem during the lecture and the problem set such as solving for expected number of downloads of song using power laws and the concept of network effect. The mechanics and formulation are different but I think these share similar point. The larger the initial starting point, the greater and faster the spread. Similarly, a s the fraction of  neighbor is larger, the more likely the network effect will lead to spread. Therefore, the cascade becomes faster and larger as the number of cascade (reshare) gets larger in this case.

The result came out to be that as we observe more of a cascade (k), the more accurate in predicting whether a cascade will receive a furtherreshares. I think this could be also predicted result since the more sample, the more accurate of the statistical result. And by the law of large number, the sample mean will eventually converge to the theoretical mean, and so the larger the sample of reshares, the higher that mean accuracy. In addition,  the features that are most important in this prediction change as well. The actual photo and the user who posted it start to matter less, while information about how fast a photo is being reshared continues to remain relevant. To predict whether a cascade will grow large, there were several factors considered. These included content features (e.g. whether a photo was taken indoors, or contained food), user features (e.g. number of friends),  structural features (e.g. how deep a cascade tree grew), and temporal features (e.g. how fast a photo spread).

Further, how “deep” a cascade travels is also a good indicator of a photo’s longevity. For example, if at least one of a photo’s five initial reshares was by a friend of  a friend (i.e. if a photo you shared has spread beyond just your friends), that is a strong indicator that it will continue to spread further. In other words, if users    “far away” from you share a photo, that suggests that the content being shared is likely to be generally interesting, rather than specific to just you and your friends. I think this can be also linked to the strength of weak ties. As we have numerously seen in problem sets and lectures, information exchange that are less costly, less risky are faster and more effective through weak ties. And the result of photo being shared by “far away” relationship spread further matches with our understanding of strength of weak ties.

In addition to the speed and size of the cascade, they were also able to predict the shape of the cascade and result of each types of the shape. Focusing on the initial structure of a cascade reveals interesting insights. As illustrated in above, a “star” configuration results in the largest cascade sizes, while a “path”, or straight-line configuration, results in smaller cascade sizes. I think this makes sense because a “path” like configuration indicates that number of fraction of neighbor is constant with 1 while “star” configuration indicates that as the cascade spread, the fraction of neighbor increases since the width gets larger as depth increases.

In conclusion, while there is no absolute prediction and criteria for creating a photo that will achieve a large number of reshares, the study shows that it is possible to observe a photo being reshared, and figure out, with increasing confidence, how large it will grow. From this finding I think it can lead to a richer understanding of how information spreads online, and pave the way towards better management of socially-shared content  and applications that can identify trending content in its early stages. It was interesting to see how the cascade can be actually predicted and deeply understand how the concept I learned in the course are actually interrelated closely such as network effect, strength of weak ties, and the information cascade.

reference: https://www.facebook.com/notes/facebook-data-science/can-cascades-be-predicted/10152056491448859/

Comments

Leave a Reply

Blogging Calendar

November 2016
M T W T F S S
 123456
78910111213
14151617181920
21222324252627
282930  

Archives