Skip to main content



Directed Graphs, PageRank, and Soccer

https://arxiv.org/pdf/1206.6904v1.pdf

Recently in class, we have been discussing directed graphs and their applications in understanding how the World Wide Web works, or how various link analyses and web searches like Hubs and Authorities or PageRank operate. However, directed graphs and their properties may also be used for other analyses. This blog post discusses a use of directed graphs in analyzing the performance of soccer teams. Although the networks describing soccer teams are by no means graphs employed in analyzing links or searches, one can still use the same analyses to judge how “good” certain nodes are in the network, with “good” being naturally defined by the context of the network in question. The authors of the paper “A network theory analysis of football strategies” by Javier López Peña and Hugo Touchette discuss how networks consisting of members of national soccer teams during the 2010 FIFA World Cup provide interesting insights into the strategies of the teams during the tournament. In dissecting these strategies, the authors employ measures of “goodness”, one being PageRank.

The network that the paper describes is called a passing network. It is a network with players as nodes and with directed edges between the nodes describing successful passes between the two players. Weights are then assigned to the edges based on the amount of successful passes made. The more weight an edge has, the darker its hue and thickness. The figure below is an example of two such networks, with the one on the right for the Netherlands, and the one on the right for Spain. So for example, in the Netherlands passing network, nodes 3 and 12 didn’t have a lot of successful passes between them in either direction, while nodes 4 and 5 did, particularly from 4 to 5. The lengths of paths are defined a little differently than for unweighted graphs. In this case, the distance between two nodes is defined as the length of the shortest path between two nodes, with the length between two neighbors being the inverse of the weight of the edge. The degree of performance of an individual node in the network is obtained by measures that show the popularity of a player; popularity in the sense of how often he’s involved in the network in terms of passing. The first measure the authors employed was called closeness, and roughly speaking it indicates how easy it is to reach a player in the network by a path. The second measure, betweenness, shows roughly how important a node is to ball flow between two other players in the network. So if a player has high betweenness, then they are a crucial junction to paths between players. The last measure of popularity that the authors used was PageRank, the very same as discussed in lecture. Going through the data of the tournament, what the authors discovered was that the two finalists, Spain and the Netherlands, had on average higher scores of popularity in terms of all three measures than other teams.

It is generally believed that the more a team passes in soccer, then the more likely they are to be successful. The authors’ analysis showed that in fact this is true, at least for this specific tournament. Intuitively speaking, in the context of soccer, the measure of PageRank for a node shows how engaged a player is in the match. If he is very engaged, then naturally he will be receiving more “endorsements” in the form of passes to him. Also, PageRank is a better form of evaluating how good a node is than Hubs and Authorities, as all players give endorsements to each other in the form of passes, and there is no division into nodes that solely give endorsements and nodes that receive them. It is obvious that Spain’s passing network is very well connected and features many heavily weighted edges. Then it must be that its nodes have high PageRank scores, since the nodes are endorsing each other a lot. Curiously, the authors noted that despite the PageRank scores for Spain and the Netherlands being similar, the Netherlands had more uniformly distributed scores.  What this shows is that no single Dutch player has a dominant role in passing. This is unlike the Spanish team, where two players had dominant roles. So what this shows about PageRank scores is that the way that they are distributed in a network can also provide further insight into how a network functions.

 

passes

Comments

Leave a Reply

Blogging Calendar

October 2016
M T W T F S S
 12
3456789
10111213141516
17181920212223
24252627282930
31  

Archives