Upvote All the Things: Power Law Analysis of reddit
While the concept of a social news website isn’t a new one, reddit slowly is becoming one of the most popular websites lately (Alexa). Founded in 2005, at the beginning of Digg’s popularity (another social news website), reddit has steadily been growing in user-base since. With the ability to downvote submissions (see below for description), the abundance of simple images and videos as well as articles, and more consolidated communities—unlike Digg—stories, links, and internet memes originating from reddit have not only become a major source of content for other websites, but it also has been steadily seeping into everyday culture.
Differences between reddit and Digg aside, it’s useful to briefly how reddit works. A registered user can submit either a link or a text (self-post), which is submitted to a subreddit, communities within reddit focusing on specific topics. The submission is shown under the “new” tab in both the particular subreddit as well as the main combined new page as well. Other users can either approve the submission with an “upvote” or disapprove it with a “downvote.” Depending on its age and its aggregate score (upvotes – downvotes), reddit’s algorithm determines where the story is shown on the website. Usually, relatively new submissions with high scores are shown on the front pages of subreddit and reddit itself.
With enough users and submissions, scoring on each link is essentially (and intended to be) a popularity measure. Within the 24 hours that most submissions are usually visible, good submissions are hopefully upvoted by the users and the top submissions are shown on the main page. With most users only viewing the front page and only upvoting the content there (which were already the most popular to begin with), the rich-get-richer phenomenon is quite obvious in reddit. Additionally, as such, the distribution of links in reddit is expected to follow a power law instead of the normal distribution. With the histogram of score per link provided by reddit (Ketralnis) and using GraphClick to extrapolate the values on the graphs, it was possible to perform a power law analysis on the distribution of scores within reddit.
Figure 1. By taking the logarithm of both the number of submissions and the corresponding score, it is possible to obtain a power law distribution of f(k) = 87,684 / k^0.801.
Not surprisingly, taking a linear fit after taking the logarithm of both axes, the power law obtained for reddit’s submissions is f(k) = 87,684 / k^0.801. Compared to Web sites as a whole with exponent, c, of 2 and book sale, 3, it is a relatively small exponent. This is most likely due to the constantly refreshing nature of reddit, where regardless of how popular the link is, it cannot maintain its position in the front page for more than a day. Additionally, subreddit structure of reddit allows a wider variety of topics to be explored than a traditional Web site structure.
Overall, it is interesting how a simple power law applies to such a wide variety of popularity-based distributions. The combination of the popularity vote system with the constant refreshing of its front page leads to a different dynamics from the rest of the Web; looking at the amazing pace at which reddit is becoming popular, doing a further analysis on different systems of popularity voting in social media in general could lead to determining the key to success in this area.
References
- Ketralnis, David. “Nerd talk: The tale of the life of a link on reddit, told in graph porn.” the reddit blog. reddit, 17 July 2011. Web. 13 Nov. 2011. <http://blog.reddit.com/2011/07/nerd-talk-tale-of-life-of-link-on.html>.
- Alexa. “Reddit.com Site Info.” Alexa. Amazon.com, 13 Nov 2011. Web. 13 Nov 2011. <http://www.alexa.com/siteinfo/reddit.com>.
I was going to do reddit dammit