Using DAGs to Understand Relational Data in MLB Salary Decision Making
https://math.montana.edu/grad_students/writing-projects/2012/12thornton.pdf
In his paper, Causal Inference and Major League Baseball, Jamie Thornton explained how directional acyclic graphs, DAGs, can be used to model the relationships between different variables related to major league baseball team’s player salary decisions and revenue. While players are focused on playing well, teams are businesses that must be profitable to survive. Teams must balance the cost of a being successful, namely its players’ salaries, with revenue. Players’ salaries are usually a function of their skills, therefore, the higher the team payroll, the more skillful, and successful it should be. Likewise, as players perform better, their pay will increase to reflect their skills (over time). If a team does not spend enough on its players, it will have a low winning percentage, making fans less likely to want to pay to come to a game. Thornton discussed three variables of interest, winning percentage, team payroll, and on field performance. He used a DAG to simplify the relationships into a visual representation. Information contained in the following table can be represented sufficiently or clarified by a graph.
The DAG is set up with variables as nodes, and with their statistical relationships as edges. Edges can be directed, bi-directed, undirected and partially directed. In the following graph, the directed edge from node A to node D indicates that A affects D. The bi-directed edge between nodes B and C indicate that both B and C affect one another. Finally, a node can be undirected, meaning it would have no arrow and that neither variable would have an effect on the other.
In a more contextualized setting, such as MLB teams deciding how much money to spend on their payroll, the DAG looks like this:
This graph also serves as a visualization for the regression function for each variable. For example, winning percentage (WINPCT) can be predicted using the following regression function, which can be seen in the graph above:
DAGs allow us to see relational data more concretely, making it easier to understand how variables may affect decisions. For example, when applied to baseball teams and decisions about where to spend money, one can easily see from the graph above with the relevant variables and edges labeled, that while improved team performance may increase payroll, increasing payroll does not directly cause winning percentage to increase, but rather causes it to indirectly increase by way of increasing field performance.



