Skip to main content



Reddit Subreddit-Network Visualization

The following post is built off the following link: http://redditstuff.github.io/sna/. Simply put, Reddit is a website that is a collection of different forum-style communities, in which users can create threads on either a link or a body of text, and post comments inside these threads. In the year 2013, they had 731 million unique visitors, with an average visit duration of 15 min and 55 sec [http://www.redditblog.com/2013/12/top-posts-of-2013-stats-and-snoo-years.html]. Due to the nature of how it is set up (a network of communities) and its popularity, it makes it an interesting thing of evaluation for a Networks class.

One user saw the opportunity, and attempted to “[map] the relationships between subreddits, with associations defined by a redditor posting a link to one subreddit in another subreddit.” This is hosted at http://redditstuff.github.io/sna/selfposts.html. As detailed in the link, the author enforced several criteria: “Removed edges between subreddits that have less than eight occurrances [sic]. Removed nodes with a degree greater than 75 (this was enough to get rid of every sub in the top 20 subreddits (by subscriber). Since these subs are likely to link to a wide variety of topics, an association with one of these subs is not particularly interesting to us. Remove any remaining nodes that are now orphaned (i.e. no edges link to them).” One can see that in the actual network visualization, there is a huge central “clique” of subreddits that are all heavily interlinked. Then, on the fringe are a quite a few isolated groups, such as one 2-subreddit grouping: “r/fishtank” and “r/aquarist”. It is interesting that these concepts align very closely with terms from the class. The huge central “clique” could be referred to as a “giant component”: “a deliberately informal term for a connected component that contains a significant fraction of all the nodes.” It is also interesting to note that just like the Networks class puts forth, “when a network contains a giant component, it almost always contains only one.” It is clear to see from the visualization that given the criteria — criteria that I would say does not invalidate conclusions one draws about Reddit in general — the author used, Reddit’s subreddits closely align with these concepts. All the outlying groups are just smaller connected components of the graph. It is interesting to see how due to the criteria of what is defined as an “edge” or “link” between nodes also shows how not only is Reddit a community of people discussing things from the outside, but also a huge component being a community that discusses itself, via people posting threads that are specifically discussing threads in other places. In this case, the real edges are the threads that reference other threads. This provides interesting “bridges” as they would be called in the Networks class. It is interesting to see that one of the biggest creators of edges is the subreddit “r/DepthHub” – a community where people post all sorts of links to in-depth discussions held in all varieties of communities of Reddit. This provides constant opportunities for bridges to form between different members of different communities, on a level higher than purely having high interest in a similar area, but on the simple basis of wanting to read in-depth knowledge about random subjects, especially ones that people don’t normally interact with.

However, the author also states that they were “unsatisfied with how little of Reddit was represented in this dataset.” So, the author went on to create a different visualization, with the links being defined as “the number of times subreddits contain links to the same websites (so not just other sections of reddit).” This is hosted at http://redditstuff.github.io/sna/vizit. (Click on a node to see its connections to other nodes). The vast number of edges between nodes, and the huge number of nodes represented themselves, shows how naturally overlaps occur between different communities. This visualization also plays very nicely with the idea of Six Degrees of Separation with large networks. In this case, it seems like one only has to click on one of the big nodes (such as “r/videos” or “r/funny”) and see that it is probably more like Three Degrees of Separation for the vast majority, if not all of the subreddits. If starting at an edge node, one simply has to take one hop to get to a central “hub” of the big nodes and then from there can travel across all sorts of communities. This level of interconnectedness also reinforces the point about there being a giant central component among the subreddits of Reddit. It is also interesting to see the use of “red = many, blue = few” in analyzing the number of connections between two nodes. I think a good visualization would be to be able to click on a node, and on the right side pop-up that shows a list of connected subreddits, have a number or color that indicates the number of connections with each individual subreddit. In this way, or by simply exercising different insights into the raw data, one could get a better interpretation of the strength of ties between subreddits. For example, see how the concept of triadic closure might play into the strength of a subreddit, especially seeing if smaller subreddits can get connected through mutual strong ties.

Comments

Leave a Reply

Blogging Calendar

September 2014
M T W T F S S
1234567
891011121314
15161718192021
22232425262728
2930  

Archives