Tuesday, July 30, 2013

Home Runs, Foxes, and Hedgehogs

Since 1985 one of the most fun events surrounding the major league all star game is the Home Run Derby. Last season it was won by Prince Fielder of the Detroit Tigers and this year by Yoenis Cespedes.  Each contestant is pitched to by someone from his own team to make it easier to hit a home run than in a normal game until each player gets 10 outs (by not hitting a home run) and then they move on to the next round.   

I was thinking how strong is the correlation between how the players do in the competition and how many home runs they hit during the regular season.  I looked at how many home runs they hit during the first round (because there are the largest number of players (8) and they have about the same number of pitches thrown to them) and compared them to the total number of home runs hit that year.  I looked at the years 2012 and 2011.  2013 is not finished yet.  The results are summarized in the table at the bottom.

It is hard to tell just by looking at those numbers if there is any correlation between the derby and season totals.  This is why we do correlational studies and do scatterplots to see the overall trend between the two with the best fit straight line.  The plot below seems to suggest that there is a negative association between the first round derby totals and the season HR totals.  However the R-squared statistic says that it accounts for only 6.5% of the variability of the data and it is not statistically significant (p greater than 0.05). 

I then looked at the numbers from the first round of the HR derby from 2012.  This graph suggests an even weaker positive correlation between the derby and season totals accounting for 0.7% of the variability and it likewise was not significant.  Three batters were in both years derbys: Cano (2011 winner), Fielder (2012 winner), and Kemp. 

Yes the sample sizes are small for these correlations but combining the data for the two years does not yield a significant result.  With the percentage of the variability accounted for being so low it is unlikely that a meaningful relationship would be found.  Perhaps if we look at a different era it will look different.  

I looked at 1998 the year Mark McGwire broke Roger Maris' season home run record with 70 and many players were taking steroids as Jose Canseco (not there) revealed.  In this home run derby there were 10 players with Ken Griffey Jr. winning with 19 HR and Mark McGwire hitting 4 HR.  The graph for 1998 shows a slight positive relationship accounting for 3.8% of the variability.  This relationship is still not statistically significant.  A much larger sample size would be needed to prove that an effect size this small exists.  Combining all three years produces almost no correlation accounting for 0.6% of the variance like the 2012 correlation.  There does not appear to be a real relationship between the number of home run derby and regular season HR's.  The conditions are too different or the sample is biased.  Players performances fluctuate from day to day.  It's impossible to tell the impact of performance enhancing drugs from this analysis.
In Latin their name is Ericii

Nate Silver of fivethirtyeight.com began working with baseball statistics before modelling poll data to predict elections.  After working at the New York Times where he stumped the pundits on the election results, he has been hired by ESPN/ABC to do statistical modeling in the area of politics and sports.  He says that it's better to act as the fox than as a hedgehog because the hedgehog does the same thing over and over again while the fox is more clever.  

This analogy that Silver used caught my attention as the plural Latin word for hedgehog is ericii.  It has derivatives in Spanish (Erizos) and French (Herissons).  In Italian the word is Ricci (I know Christina's publicist might not be happy about me revealing this).  I feel that I cover a wide variety of topics on this blog and approach them from a variety of angles.  Alex Rodriguez was exactly in the middle of the graph in 1998 before becoming baseball's highest paid player at $25 million a year and is now facing a big suspension for substance abuse.  He may be more of a hedgehog. 

teamRound 1totalSeason Home Runs
year19981Ken Griffey, Jr.Seattle81956
2Jim ThomeCleveland71730
3Vinny CastillaColorado71246
4Rafael PalmeiroBaltimore71043
5Moisés AlouHouston7738
6Javy LópezAtlanta5534
7Alex RodriguezSeattle5542
8Mark McGwireSt. Louis4470
9Damion EasleyDetroit3327
10Chipper JonesAtlanta2234
20111Robinson CanóYankees83229
2Adrian GonzalezRed Sox93127
3Prince FielderBrewers5938
4David OrtizRed Sox5940
5Matt HollidayCardinals5522
6José BautistaBlue Jays4443
7Rickie WeeksBrewers3320
8Matt KempDodgers2239
20121Prince FielderTigers52830
2José BautistaBlue Jays112027
3Mark TrumboAngels71332
4Carlos BeltránCardinals71232
5Carlos GonzálezRockies4422
6Andrew McCutchenPirates4431
7Matt KempDodgers1123
8Robinson CanóYankees0033


Stephen Colbet gives a funny discussion of the Alex Rodriguez scandal. ARod sounds more like and erizo(hedgehog) than a Zorro(fox) with his suspension until the end of the end of the 2014 season.  Colbert uses analogies more like Zorro.

**Related Posts**

Sports Stats Boring?


Draft Logic


Lance Armstrong's Doping Claim: A Probabilistic Calculation


Income and Life Expectancy. What does it Tell Us About US?