In an earlier post, I discussed the San Francisco Giants’ vaunted home field advantage and came to the conclusion that, while a home field advantage exists, it’s not related to the Giants scoring more runs at home than on the road. That was done with about 90 games’ worth of data. In order to come up with a more robust measure of home field advantage, I grabbed game-by-game data for the national league from the first half of the 2011 season and crunched some numbers.
I have two questions:
- Is there a statistically significant increase in winning probability while playing at home?
- Is that effect statistically distinct from any effect due to attendance?
- If it exists, does that effect differ from team to team? (I’ll attack this in a future post.)
Methodology: Using data with, among other things, per-game run totals, win-loss data, and attendance, I’ll run three regressions. The first will be a linear probability model of the form
where is a binary variable for playing at home, Attendance is announced attendance at the game, and AttH is listed attendance only if the team is at home and 0 if the team is on the road. Thus, I expect so that a team on the road suffers from a larger crowd but a team at home reaps a larger benefit from a larger crowd. The linear probability model is easy to interpret, but not very rigorous and subject to some problems.
As such, I’ll also run a Probit model of the same equation to avoid problems caused by the simplicity of the linear probability model.
Finally, just as a sanity check, I’ll run the same regression, but for runs, instead of win probability. Since runs aren’t binary, I’ll use ordinary least squares, and also control for the possibility that games played in American League parks lead to higher run totals by controlling for the designated hitter:
Since runs are a factor in winning, I have the same expectations about the signs of the beta values as above.
Results:
Regression 1 (Linear Probability Model):
So, my prediction about the attendance betas was incorrect, but only because I failed to account for the squared terms. The effect from home attendance increases as we approach full attendance; the effect from road attendance decreases at about the same rate. There’s still a net positive effect.
Regression 2 (Probit Model):
Note that in both cases, there’s a statistically significant , meaning that teams are more likely to win at home, and that for large values of attendance, the Home effect outweighs the attendance effect entirely. That indicates that the attendance effect is probably spurious.
Finally, the regression on runs:
Regression 3 (Predicted Runs):
Again, with runs, there is a statistically significant effect from being at home, and a variety of possible attendance effects. For low attendance values, the Home effect is probably swamped by the negative attendance effect, but for high attendance games, the Home effect probably outweighs the attendance effect or the attendance effect becomes positive.
Again, the Home effect is statistically significant no matter which model we use, so at least in the National League, there is a noticeable home field advantage.
Filed under: Baseball, Economics Tagged: attendance effects, Baseball, Giants, home field advantage, linear regression, probability, probit, statistics