Skip to main content



Reddit’s Flawed Ranking Algorithm

The social forum news giant, Reddit, has come under fire recently due to a flaw in its story ranking algorithm. Their open sourced Pyrex code consisted of a simple algorithm designed to rate stories according to their “hotness” and gradually bubble up the popular ones to the front page. In this flawed algorithm, it was found that submission time was a key part of the ranking, with the algorithm ranking newer articles higher than the older ones. This makes sense intuitively as people naturally want to read the most recent stories. However, because the algorithm is based on upvotes and downvotes by users as well, a corner case was found where older stories which had neutral or even negative scores (scores are upvotes minus downvotes) were found to rank higher than newer articles which only had small positive scores. This became problematic for users as many started noticing the staleness of “the front page”, with many old but popular articles remaining at the top for many days. Breaking news such as the recent Oregon shooting were filtered out as the front page was clogged with the older posts. This incident has caused Reddit’s admins to modify the algorithm, so older posts will naturally fall off the front page at a faster rate.

Even though Reddit’s algorithm has no relation to Google’s famous Page-Rank algorithm, Reddit could definitely use this controversy as an opportunity to incorporate this famous algorithm in its own story ranking’s algorithm. In addition to Reddit’s horrendous search algorithm, the PageRank algorithm could factor in additional weight to Reddit’s current story scoring algorithm. Posts that link to external sources would be the primary target for this, as it would be easy to create hubs and endorsements out of these web pages that are linked to, as submissions, by users. Other links which cite the previously mentioned story or article could be classified as authorities as well. Specific subreddits could have their own Page-ranking system, which isolates the submissions in their network to their own Page-rank algorithm, thus having no influence on other subreddit’s pages. Thus, in an E-sports article on the League of Legends subreddit might gain endorsement and influence from other websites and articles posted in the same subreddit, but it would not gain outside influence from articles in an unrelated discipline such as a subreddit for cars. Indeed, it would be easy to create a network of nodes out of the user submissions. Although the Page-Ranking algorithm might seem to keep older articles and stories at the top for longer, it can be a small factor in Reddit’s overall story ranking algorithm as the one Reddit currently has is quite simplistic and even flawed. Reddit faces a challenge in which it must balance the up and coming breakout stories with slightly older but also popular stories that people in different time zones might not have seen. Albeit challenging, integrating PageRank into Reddit’s story algorithm could be an innovative solution that tackles both problems.

 

Article Referenced:

http://technotes.iangreenleaf.com/posts/2013-12-09-reddits-empire-is-built-on-a-flawed-algorithm.html

http://www.outofscope.com/reddits-empire-no-longer-founded-on-a-flawed-algorithm/

For more information about Reddit’s ranking system and how it works:

https://coderwall.com/p/cacyhw/an-introduction-to-ranking-algorithms-seen-on-social-news-aggregators

http://amix.dk/blog/post/19588

Comments

Leave a Reply

Blogging Calendar

October 2015
M T W T F S S
« Sep   Nov »
 1234
567891011
12131415161718
19202122232425
262728293031  

Archives