Skip to main content



Applying Networks and Graph Theory to Last Year’s World Cup 2010

During last year’s world cup I came across an interesting article that claimed to predict the winner of soccer matches based on analyzing statistical data they had organized into a network. The group, from Queen Mary, University of London, used passing data from various teams in the world cup in order to create a network where each node of the graph was a player on the team and each edge (and its strength) corresponded to the frequency which one player passed to another.

When I first started reading about this analysis I thought that the idea was fantastic and I eagerly read the group’s analysis and conclusions. However at the end I felt a bit disappointed, it seemed the group had merely cherry picked pieces of data in an attempt to support conclusions about how various teams functioned rather than using the data to create verifiable conclusions (basic format of the scientific method). For example the group analyses both the passing networks of the English soccer team and Spanish Soccer team. Both teams play roughly (this ‘roughness’ will be discussed later) a two striker system with England having Rooney and Defoe (10&19) while Spain played Villa and Torres (7&9) in the striking positions. In one of the analyses, the group states: “Villa’s performance has been impressive compared with Fernando Torres, who has not scored any goals this tournament. This was reflected in the successful Spanish tactics, with Torres only receiving an average of 13 passes per match, and 37 to Villa.” These successful Spanish tactics, where one striker receives more passes than the other, however take on a very different tone when the English team attempts to try something similar. The group states: “The good midfield work of Frank Lampard, Steven Gerrard and Gareth Barry doesn’t appear to transfer very well to the forwards, with Wayne Rooney receiving on average three times more passes than Jermain Defoe. This makes the English attack very predictable and easily stoppable by blocking Rooney, who is usually forced to give the ball back to Gerrard.” So when Spain attempt to focus their play through Villa, it’s an example of successful tactics, however when the English do it, it makes them predictable. Now this kind of an assessment makes sense if you are attempting to analyze a match after the fact. Its quite obvious that one team can apply tactics successfully in one case but in another case the tactics fall apart. However the group is attempting to predict the outcomes of matches which is a different beast entirely. Without saying anything about the other teams’ defensive tactics, its impossible to predict whether passing it more to one striker will be helpful or a detriment.

There are several other places where the group’s analysis seems confusing to me. For example the way they construct each of their graphs locates the players based on their spatial positioning on the field. However the listed formation for Spain is very different to their actual on-paper formation and even more different compared to the way these players actually play. David Villa is actually not a striker, he’s a winger and Spain do not play a 2-3-3-2 (that’s not even a real formation, but its close to 4-1-3-2 which is also what Spain are not playing) they play a lopsided 4-3-3. However the group’s conclusion that Villa scored more than Torres because he saw more of the ball becomes much more confusing and less clear cut when one realizes that he’s not playing the same position as Torres (forward), but rather Iniesta (winger) who sees the ball even more than Villa. The question then becomes why didn’t Iniesta score more than Villa? Another questionable conclusion the group comes to has to do with the Dutch team. The group says that De Zeeuw (#14) is involved in only 11 passes on average (completed and received), however the Fifa website (which the group credits as their source of information) has De Zeeuw completing 29 passes over the two matches that he plays in, a clear discrepancy. On top of that De Zeeuw is only on the field for 48 minutes, meaning he played only a fraction of the two games he participated in, so why he is on the graph at all is a mystery to me.

In the class we have used Networks to analyze and simplify complicated data sets, which is exactly what the group in the article is attempting to achieve. However, at the level of analysis that the group performed, I do not think it is possible to predict soccer matches using this technique. The groups’ predictions seem to follow a very simple pattern where general conclusions about each teams’ network are drawn (the attack revolves around one player) and then depending on whether or not the team is the favorite, this characteristic is construed as good or bad. For this reason I would have liked to see a network representation of Serbia versus Germany where the underdogs Serbia managed to beat the highly favored Germany. I do believe that this kind of network information can be used as an analytical tool. I believe that it is possible to grasp the general strategy of a team from the network, and to a coach looking to counter the opposing team, this could be extremely useful. However among other things, I feel that the variability in defensive tactics, which can change from game to game and are not represented at all in the network, makes attempting to predict the outcomes of soccer matches impossible with this method.
Refences:
http://www.qmul.ac.uk/media/news/items/se/31612.html
http://www.fifa.com/worldcup/archive/southafrica2010/statistics/players/player=254097/passingdistribution.html

Comments

Leave a Reply

Blogging Calendar

October 2011
M T W T F S S
 12
3456789
10111213141516
17181920212223
24252627282930
31  

Archives