## Tuesday, July 30, 2013

### Home Runs, Foxes, and Hedgehogs

Since 1985 one of the most fun events surrounding the major league all star game is the Home Run Derby. Last season it was won by Prince Fielder of the Detroit Tigers and this year by Yoenis Cespedes.  Each contestant is pitched to by someone from his own team to make it easier to hit a home run than in a normal game until each player gets 10 outs (by not hitting a home run) and then they move on to the next round.

I was thinking how strong is the correlation between how the players do in the competition and how many home runs they hit during the regular season.  I looked at how many home runs they hit during the first round (because there are the largest number of players (8) and they have about the same number of pitches thrown to them) and compared them to the total number of home runs hit that year.  I looked at the years 2012 and 2011.  2013 is not finished yet.  The results are summarized in the table at the bottom.

It is hard to tell just by looking at those numbers if there is any correlation between the derby and season totals.  This is why we do correlational studies and do scatterplots to see the overall trend between the two with the best fit straight line.  The plot below seems to suggest that there is a negative association between the first round derby totals and the season HR totals.  However the R-squared statistic says that it accounts for only 6.5% of the variability of the data and it is not statistically significant (p greater than 0.05).

I then looked at the numbers from the first round of the HR derby from 2012.  This graph suggests an even weaker positive correlation between the derby and season totals accounting for 0.7% of the variability and it likewise was not significant.  Three batters were in both years derbys: Cano (2011 winner), Fielder (2012 winner), and Kemp.

Yes the sample sizes are small for these correlations but combining the data for the two years does not yield a significant result.  With the percentage of the variability accounted for being so low it is unlikely that a meaningful relationship would be found.  Perhaps if we look at a different era it will look different.

I looked at 1998 the year Mark McGwire broke Roger Maris' season home run record with 70 and many players were taking steroids as Jose Canseco (not there) revealed.  In this home run derby there were 10 players with Ken Griffey Jr. winning with 19 HR and Mark McGwire hitting 4 HR.  The graph for 1998 shows a slight positive relationship accounting for 3.8% of the variability.  This relationship is still not statistically significant.  A much larger sample size would be needed to prove that an effect size this small exists.  Combining all three years produces almost no correlation accounting for 0.6% of the variance like the 2012 correlation.  There does not appear to be a real relationship between the number of home run derby and regular season HR's.  The conditions are too different or the sample is biased.  Players performances fluctuate from day to day.  It's impossible to tell the impact of performance enhancing drugs from this analysis.
 In Latin their name is Ericii

Nate Silver of fivethirtyeight.com began working with baseball statistics before modelling poll data to predict elections.  After working at the New York Times where he stumped the pundits on the election results, he has been hired by ESPN/ABC to do statistical modeling in the area of politics and sports.  He says that it's better to act as the fox than as a hedgehog because the hedgehog does the same thing over and over again while the fox is more clever.

This analogy that Silver used caught my attention as the plural Latin word for hedgehog is ericii.  It has derivatives in Spanish (Erizos) and French (Herissons).  In Italian the word is Ricci (I know Christina's publicist might not be happy about me revealing this).  I feel that I cover a wide variety of topics on this blog and approach them from a variety of angles.  Alex Rodriguez was exactly in the middle of the graph in 1998 before becoming baseball's highest paid player at \$25 million a year and is now facing a big suspension for substance abuse.  He may be more of a hedgehog.

 Name team Round 1 total Season Home Runs year 1998 1 Ken Griffey, Jr. Seattle 8 19 56 2 Jim Thome Cleveland 7 17 30 3 Vinny Castilla Colorado 7 12 46 4 Rafael Palmeiro Baltimore 7 10 43 5 Moisés Alou Houston 7 7 38 6 Javy López Atlanta 5 5 34 7 Alex Rodriguez Seattle 5 5 42 8 Mark McGwire St. Louis 4 4 70 9 Damion Easley Detroit 3 3 27 10 Chipper Jones Atlanta 2 2 34 Total N 10 10 10 10 10 2011 1 Robinson Canó Yankees 8 32 29 2 Adrian Gonzalez Red Sox 9 31 27 3 Prince Fielder Brewers 5 9 38 4 David Ortiz Red Sox 5 9 40 5 Matt Holliday Cardinals 5 5 22 6 José Bautista Blue Jays 4 4 43 7 Rickie Weeks Brewers 3 3 20 8 Matt Kemp Dodgers 2 2 39 Total N 8 8 8 8 8 2012 1 Prince Fielder Tigers 5 28 30 2 José Bautista Blue Jays 11 20 27 3 Mark Trumbo Angels 7 13 32 4 Carlos Beltrán Cardinals 7 12 32 5 Carlos González Rockies 4 4 22 6 Andrew McCutchen Pirates 4 4 31 7 Matt Kemp Dodgers 1 1 23 8 Robinson Canó Yankees 0 0 33 Total N 8 8 8 8 8 Total N 26 26 26 26 26