Skip to main content

Cascading Networks in MLB Free Agency

Cascading behavior, as we learned in class, can be found in a large number of different networks, and in theory it works very well. The concept behind it is relatively simple; there are two states that a node can be in, and all nodes are initially in the first state (A) with the exception of a small set that is in the other state (B). Then, after some period of time, some nodes that were in state A will switch to state B based on the state of their neighbors. The graph can then either fully convert to state B (a cascade), or there will be some point where no more nodes will change from A to B. The end behavior is determined by which nodes are connected to each other. In theory, this is fine and we can easily determine the behavior of a network based on the connections, the initial states of each node, and the constant q that determines what percentage of neighbors need to be in state B in order for a node to also switch. I, however, was more interested in how (or if) this phenomenon worked in the real world.

As an avid sports fan, I immediately thought of free agency as a situation where cascades seem to occur. More often than not, there are more teams looking to sign a player at a certain position than there are free agents available at that position. Therefore, once a player or two that play certain position sign, it seems as though the rest of the players at that position sign with teams rather quickly. As the number of players available at a position decreases, teams in need of those players are more likely to sign somebody so that they don’t end up with nothing, thus creating a sort of behavioral cascade.

I wanted to see if this phenomenon could truly be modeled by the model we discussed in lecture, so I took data from the 2014-2015 Major League Baseball free agency period, spanning from 7 days after the end of the world series until the beginning of spring training. During this period, 47 players signed contracts for over $5 million, so in order to reduce the huge data sample to a more manageable size, I only focused on these 47 players (as opposed to the over 100 who signed major league contracts of any amount). For each player, I collected data such as the date they signed with their new team, their contract information, their agent, and the position they played. This data is shown below:

Screen Shot 2015-12-03 at 2.26.15 AM

Screen Shot 2015-12-03 at 2.26.57 AM

Screen Shot 2015-12-03 at 2.27.12 AM

I then decided that connections would be determined simply by position. While there are many other factors that go into determining when each player will sign, I figured that for the sake of the experiment, I would consider only position. Because there are some players that play multiple positions, there ended up being five main clusters: the starting pitchers, the relief pitchers, the catchers, the middle infielders, and the corner infielders/outfielders. Because some of the clusters were so small (catchers: 2 players, middle infielders: 3 players), I decided that they wouldn’t be very helpful in looking at the results, and thus ignored them.

I split up the time into 10 periods, each lasting around 10 days (I know I called them weeks in the spreadsheet), and I then listed the players who signed during each period

Screen Shot 2015-12-03 at 9.55.06 PM

Let’s first look at the starting pitching cluster. The first player to sign is A.J. Burnett during the initial signing period (11/10 – 11/20). The signings then die off for two periods before Justin Masterson, Jason Hammel, Jon Lester, Francisco Liriano, Ervin Santana, and Brandon McCarthy all sign during period 3 (12/11 – 12/20). Based on pure theory, the fact that nobody else signed during the first period (or the second for that matter) should mean that no starting pitcher would ever sign because each node would only have one neighbor who signed and this would mean the percentage of neighbors who did sign is less than whatever q is. This of course is not the case in the real world. Eventually, most if not all of the free agents will sign contracts, so in this case the theoretical model that we discussed does not work. A similar trend can be seen for the other position groups.

That said, there are a number of outside factors that play into when various free agents sign. For example, the Winter Meetings, a conference for all people in the baseball industry occurred during period 3 (12/11 – 12/20). Because all teams and nearly all free agents are in the same city for an entire week, this period is prone to more signings which could throw off the data somewhat. In addition, teams tend to sign free agents in bunches. For example, the Chicago White Sox were active early on, signing Zach Duke, Adam LaRoche, David Robertson, and Melky Cabrera all before Christmas Day, while on the other hand, the Washington Nationals were more patient, waiting until after New Year’s to make their signings of Max Scherzer and Casey Janssen. I also had the idea early on that a player’s agency had an impact on when they signed (i.e. all clients of the Octagon would sign early in the winter, while those represented by ACES would sign later on), although when looking at the data, this doesn’t seem to be the case. Instead, there looks to be a smattering of agencies during each period.

I would assume that when performing a similar experiment on other data, a similar result would be found: while the behavioral cascade method makes sense in theory, when applied to real data it doesn’t always work. Perhaps my assumption that this situation could be modeled by what we learned in class was wrong, since it’s somewhat impossible for a cascade to not occur. Perhaps the way I conducted the experiment was incorrect, especially in the way I determined which nodes would be connected. Perhaps I simply didn’t have enough data. But all in all, I found that it’s incredibly difficult to predict when free agents will sign; especially when using only their position.


All data gathered from:


Leave a Reply

Blogging Calendar

December 2015