Skip to main content



PageRank Rule Algorithm to AP Poll in College Athletics

Rankings are a familiar concept to the likes of college sports fans. They have a large impact on seeding, matchups and seasonal outcomes. The NCAA employs a rankings system for many of the major athletic seasons. While this system exists in many college sports, it is most prominent in basketball and football. The NCAA adapts the AP Top 25 for both its pre-season and in-season rankings. Constructed by the Associated Press, the rankings list the top 25 teams in Division I. The rankings are often highly criticized because of allegations of re-ranking all teams each week; instead of moving teams up and down from the prior week. Many concerns exist regarding the ranking committee and AP Top 25, therefore, ideas of adopting a PageRank algorithm suggest a more distinct and transparent ranking process.

The AP Top 25 is one of, if not the most prominent ranking list in the United States. It is derived from the AP Poll, containing numerous sports writers, broadcasters and experts in the field. They compose their own rankings of the top 25 teams and it is further compiled into one set of rankings that is released weekly during the season. It has a direct impact in football, as it decides the BCS College Championship final four teams. The rankings in basketball also play into decisions made in selecting teams for the NCAA March Madness tournament. With that being said, many seasonal outcomes are contingent on aspects of the AP Top 25 rankings. It is essential that this process is fluid and just. We will explore how the concept of PageRank can add validity and transparency to that process. 

PageRank is an algorithm used by search engines to rank web page results upon search entry. It analyzes incoming and outgoing webpage links, and classifies them as votes for the page. Votes are essentially endorsements for the page to move higher on the search entry response. PageRank update rule is a concept that is commonly discussed in this course. Each page is a node and has an associated PageRank value, and that value is further distributed over all the outgoing links. This process is repeated until the values stabilize. As one may expect, pages with a larger page rank value are going to be high quality pages.

As it pertains to college sports, suggestions that a PageRank algorithm to the rankings process could be a better method for determining the AP Top 25. Teams with a higher PageRank value are going to be higher rated and will be endorsed by teams they have beaten. It is an intricate process but it is worth exploring because of the abundance of disputes surrounding existing ranking methods.

Both sources listed at the top of the blog post are post/articles that pertain to PageRank experiments in athletics. The concept of PageRank is explored in rankings and analyzed to understand the efficacy of the method. One source, an article by Patrick Wilson dives into the concept of the NCAA Football Rankings and PageRank adaptations. The other source, by data-scientist Jen Liu, demonstrates the exact methodology of using PageRank to make College Football rankings and further comparing them to the existing rankings.

The methodology in both articles investigate PageRank adaptations for the AP Top 25. It analyzes the existing college football season and introduces a PageRank update rule based on the wins and losses in the season. In regards to the course materials, the PageRank update rule consists of nodes with page rank values divided amongst outgoing links. A key component is that the resulting list indicates high ranking web pages with high page rank value and vice versa. The more incoming links, the higher the page rank value and consequently, the higher the web page. This concept can be implemented in the college football season with a few important affiliate links.

Initially, the idea classifies teams as the nodes (webpages) and games played as the edges (links). It does not account for any other statistic besides wins and losses because the only factor on the edge is game result. There exists an edge between each team that has played each other. The edge goes from the losing team to the winning team. For example, if Florida plays Georgia and wins the game, there will be an edge going from Georgia to Florida. Thus, a team with a ton of wins, will have a ton of incoming links and thus will have a higher page rank and higher overall ranking. The concept coincides with webpages, as top search results are higher ranking because of the incoming links. 

As per Jen Liu’s experiment, she displays the NCAA AP Poll at the time of her experiment and the NCAA AP Poll sorted by her PageRank update rule algorithm.

This is the rankings from the AP Poll.

rankings-ap

This is the rankings using the PageRank rule.

Rankings - PageRank

What you notice is that a lot of the discrepancies exist because PageRank has a large emphasis on the win to loss counts. In contrast, the AP Top 25 considers other metrics and opinions. Something discussed in class that could be implemented is a scaling factor. The algorithm doesn’t account for the page rank values on the exterior. This only exists because of how scheduling and conference play work out in college football. A scaling factor could be added to scale page rank values down to a standardized value, in order to include scheduling disparities.

PageRank is so widely used in search engines and entry data that its applications have been explored in other realms of ranking. Adaptation of such concepts to the AP Top 25 rankings system in college athletics could provide a more transparent and direct form of sequencing teams. Examples from both sources showed major differences to rankings if PageRank rule was applied. Additionally, many of the concepts discussed in the PageRank and websearch portions of the class go hand in hand with potential college ranking modifications. I hope that as time goes on, the AP Poll can gravitate towards a more mathematically defined way of ranking teams instead of polling based on the human eye test.

Comments

Leave a Reply

Blogging Calendar

November 2020
M T W T F S S
 1
2345678
9101112131415
16171819202122
23242526272829
30  

Archives