Tuesday, July 30, 2013

Home Runs, Foxes, and Hedgehogs

Since 1985 one of the most fun events surrounding the major league all star game is the Home Run Derby. Last season it was won by Prince Fielder of the Detroit Tigers and this year by Yoenis Cespedes.  Each contestant is pitched to by someone from his own team to make it easier to hit a home run than in a normal game until each player gets 10 outs (by not hitting a home run) and then they move on to the next round.

I was thinking how strong is the correlation between how the players do in the competition and how many home runs they hit during the regular season.  I looked at how many home runs they hit during the first round (because there are the largest number of players (8) and they have about the same number of pitches thrown to them) and compared them to the total number of home runs hit that year.  I looked at the years 2012 and 2011.  2013 is not finished yet.  The results are summarized in the table at the bottom.

It is hard to tell just by looking at those numbers if there is any correlation between the derby and season totals.  This is why we do correlational studies and do scatterplots to see the overall trend between the two with the best fit straight line.  The plot below seems to suggest that there is a negative association between the first round derby totals and the season HR totals.  However the R-squared statistic says that it accounts for only 6.5% of the variability of the data and it is not statistically significant (p greater than 0.05).

I then looked at the numbers from the first round of the HR derby from 2012.  This graph suggests an even weaker positive correlation between the derby and season totals accounting for 0.7% of the variability and it likewise was not significant.  Three batters were in both years derbys: Cano (2011 winner), Fielder (2012 winner), and Kemp.

Yes the sample sizes are small for these correlations but combining the data for the two years does not yield a significant result.  With the percentage of the variability accounted for being so low it is unlikely that a meaningful relationship would be found.  Perhaps if we look at a different era it will look different.

I looked at 1998 the year Mark McGwire broke Roger Maris' season home run record with 70 and many players were taking steroids as Jose Canseco (not there) revealed.  In this home run derby there were 10 players with Ken Griffey Jr. winning with 19 HR and Mark McGwire hitting 4 HR.  The graph for 1998 shows a slight positive relationship accounting for 3.8% of the variability.  This relationship is still not statistically significant.  A much larger sample size would be needed to prove that an effect size this small exists.  Combining all three years produces almost no correlation accounting for 0.6% of the variance like the 2012 correlation.  There does not appear to be a real relationship between the number of home run derby and regular season HR's.  The conditions are too different or the sample is biased.  Players performances fluctuate from day to day.  It's impossible to tell the impact of performance enhancing drugs from this analysis.
Nate Silver of fivethirtyeight.com began working with baseball statistics before modelling poll data to predict elections.  After working at the New York Times where he stumped the pundits on the election results, he has been hired by ESPN/ABC to do statistical modeling in the area of politics and sports.  He says that it's better to act as the fox than as a hedgehog because the hedgehog does the same thing over and over again while the fox is more clever.

This analogy that Silver used caught my attention as the plural Latin word for hedgehog is ericii.  It has derivatives in Spanish (Erizos) and French (Herissons).  In Italian the word is Ricci (I know Christina's publicist might not be happy about me revealing this).  I feel that I cover a wide variety of topics on this blog and approach them from a variety of angles.  Alex Rodriguez was exactly in the middle of the graph in 1998 before becoming baseball's highest paid player at \$25 million a year and is now facing a big suspension for substance abuse.  He may be more of a hedgehog.

 Name team Round 1 total Season Home Runs year 1998 1 Ken Griffey, Jr. Seattle 8 19 56 2 Jim Thome Cleveland 7 17 30 3 Vinny Castilla Colorado 7 12 46 4 Rafael Palmeiro Baltimore 7 10 43 5 Moisés Alou Houston 7 7 38 6 Javy López Atlanta 5 5 34 7 Alex Rodriguez Seattle 5 5 42 8 Mark McGwire St. Louis 4 4 70 9 Damion Easley Detroit 3 3 27 10 Chipper Jones Atlanta 2 2 34 Total N 10 10 10 10 10 2011 1 Robinson Canó Yankees 8 32 29 2 Adrian Gonzalez Red Sox 9 31 27 3 Prince Fielder Brewers 5 9 38 4 David Ortiz Red Sox 5 9 40 5 Matt Holliday Cardinals 5 5 22 6 José Bautista Blue Jays 4 4 43 7 Rickie Weeks Brewers 3 3 20 8 Matt Kemp Dodgers 2 2 39 Total N 8 8 8 8 8 2012 1 Prince Fielder Tigers 5 28 30 2 José Bautista Blue Jays 11 20 27 3 Mark Trumbo Angels 7 13 32 4 Carlos Beltrán Cardinals 7 12 32 5 Carlos González Rockies 4 4 22 6 Andrew McCutchen Pirates 4 4 31 7 Matt Kemp Dodgers 1 1 23 8 Robinson Canó Yankees 0 0 33 Total N 8 8 8 8 8 Total N 26 26 26 26 26

Tuesday, May 28, 2013

'Tis the Season for Women in Racing and the Military

Memorial Day has many traditions, picnics, parades, fireworks, and the like to remember our war dead.  One of the most prominent traditions since 1909 is the Indianapolis 500.  Much was made of the fact that Tony Kanaan finally won the race after years of trying.  Less noted was that after Danica Patrick left the Indy series for NASCAR there were four women racers (all international) out of 33.  Two of them finished, 15th and 17th, out of 20 cars that completed all 200 laps.

Danica Patrick did race on Memorial Day however in the Coca Cola 600 race in the NASCAR series which was won by Kevin Harvick.  Patrick was the only woman racer in the predominately white male field and finished 29th after her boyfriend bumper her into a wreck in lap 311.  She finished 385 out of 400 laps.  The chart below shows she lost about 10 places after the crash.  Her best finish was 8th in the Daytona 500 this year.

Also at this time of year is horse racing's triple crown, The Kentucky Derby, The Preakness, and The Belmont Stakes.  The Derby winner Orb finished fourth in the Preakness meaning there will be no triple crown winner for another year since 1978.  In that race there was a rare female jockey,  Rosie Napravnik finished third or showing in horse racing terminology being the first woman to do so in any triple crown race.  She will ride a different horse in the Belmont on June 8.

As these sports become more inclusive and more mechanized, the talent pool becomes larger to choose from leveling the playing field.  In 1947 when the color line was breached in Major League Baseball, the talent level was raised there and it is very unlikely that the 0.400 batting average level will ever be eclipsed and Roger Maris' and Hank Aaron's home run records were only eclipsed by Barry Bonds with the help of drugs.  The women horse and auto racers don't need to use drugs because they're not doing the bulk of the work.  They're, just like the male racers, just telling the cars and horses where to go.  Women in team and individual sports where they cannot compete against the men still struggle for more attention.  The Olympics are now the best showcase for women's other sports.

Just as sports has become more mechanized, the military has and women have played a more prominent role with the costs becoming more evident as well though not necessarily equally distributed.  Before stepping down, Defense Secretary Leon Panetta lifted restrictions on women in combat. Congressional hearings are being held to ensure that they are treated fairly.

**Update**

The Thomas Merton Center in Pittsburgh is having screenings of the film Band of Sisters from May 31-June 6 and one of the Invisible War (trailer seen above) at the FRIENDS MEETING HOUSE 4836 Ellsworth Avenue, Pittsburgh, PA 15213  on Monday, June 3 at 5:30.
Monday, January 21, 2013

Winning?

This is the phrase that Charlie Sheen popularized when he was fired from his sitcom Two and a Half Men.  We all want to back a winner but we never ask what the price of winning is. Today Barack Obama has been inaugurated for a second term after garnering 51% of the vote over Mitt Romney last fall. He has many challenges coming up on gun control, the economy, Afghanistan, and who knows what else.  He had to raise billions of dollars for his campaign and millions more to pay for the ceremony today.  How many favors will be expected in return?

In April, the Thomas Merton Center in Pittsburgh will be honoring Sheen's father Martin for his peace activism.  I'm not saying he doesn't deserve the recognition but please don't ask him about Charlie.  His activism has not come without a price just like it has for the rest of us.

Lance Armstrong has finally come clean to Oprah Winfrey (but not yet under oath) on using performance enhancing drugs.  I have written before about how abuse of these drugs goes far beyond Armstrong. He is an extreme case of gaming the system.  If he had raced clean and finished in the top 20 seven years in a row in the Tour de France would that have been any less heroic? Sure he wouldn't have had all the money or fame which he so desired but we all wanted a hero for cancer survivors everywhere who suffer with the disease and get no recognition.  I still wear the Livestrong rubber bracelet which he first popularized in 2005 and people ask me why.  Yes Lance is flawed and who isn't.  He can still become a hero even without winning bike races or heads a foundation.  Is the Livestrong Foundation bad because of his extreme bullying to preserve his titles?  Life is full of contradictions that we must negotiate.

So much energy, legal and illegal, is invested in winning.  We need to ask ourselves when is the price worth it?

Saturday, May 21, 2011

Lance Armstrong's Doping Claim: A Probabilistic Calculation

This Sunday CBS' 60 Minutes did an expose on 7 time Tour de France winner Lance Armstrong where his former teammate and close friend Tyler Hamilton accuse him on camera (and the piece also says that teammate George Hincapie testified this under oath secretly to a grand jury) of using the performance enhancing drug erythropoietin or EPO.  Part one of the 60 minutes interview can be seen above.  This is just the latest of many accusations made against him over the years.  His response has always been that he has been tested many times but no drugs have been found.  His most recent quote on his Facebook page is "20+ year career. 500 drug controls worldwide, in and out of competition. Never a failed test. I rest my case."

Using probability theory, it is possible to compute the chance of him never testing positive assuming that he was using EPO.  First we look at the probability of testing positive for the drug when you are really taking EPO.  This is determined by the test maker.  Finding this information is difficult.  The World Anti Doping Agency or WADA which oversees the testing of athletes does not readily provide this data on their webpage.  They do provide testing positive rates for each of their labs worldwide.  The probability equation for one test is given by:

P(-EPO test when using EPO) = 1 - P(+EPO test when using EPO)

A group of researchers in Denmark in 2008 gave EPO to a group of eight male non athlete volunteers, put them on exercise tests, took urine and blood samples from them, sent them to two WADA labs.  Post exercise one lab had no positive tests and another had 8 positives out of 40 samples or 20%.  Using the better case scenario lab (WADA does not agree with these results) that would mean that if this probability were accurate across his 20 year career and the outcome of each test was independent of the others, the probability of testing negative on each test over this period when he was using EPO equals

P(-EPO test when using EPO on one test)^(Number of tests)

Where ^ means raised to the power of the number of tests.  The logic is like when you toss a coin once the chance of it coming up heads is 0.5.  If you toss it twice the chance of it coming up heads twice is
0.5 x 0.5= (0.5)^2 = 0.25 = 1/4.

Plugging the Danish probability into the equations above there is an 80% chance of him testing negative when he was using EPO on one test and the chance of him testing negative on 500 tests is (.80)^500.  You can plug that into a calculator to see that that is a really small probability of him always lying (3.50*10^-49 to be exact).

This example is of course an oversimplification.  The accuracy of screening tests does change over time as do doping drugs and masking agents.  This does not prove conclusively that Lance Armstrong never used EPO but it does illustrate how hard it would be to hide if WADA were doing an adequate job even with a test with 20% accuracy.  Part II of Hamilton's interview with 60 Minutes (abbreviated version below) can be seen here where Hamilton claims that Armstrong did fail a test in 2001 in the Tour of Switzerland. The report does raise questions about the integrity of WADA.

In the worst case scenario where there is a 0.1% chance of getting caught when doping on one test when doping and thus a 99.9% chance of not getting caught when doping on one test then there would be a 61% chance of never getting caught when doping across 500 tests.  If Tyler Hamilton is right then Lance Armstrong falls into that 39% who did get caught and had it covered up.

**Update**

The news that Armstrong has been stripped of his seven tour titles and all of his other titles dating back to 1998 shouldn't be that surprising.  The interviews that I posted last year are available in abbreviated version above (they were originally available in full format) from CBS with no mention of his alleged failed 2001 drug test.  The full transcript of the 60 Minutes interview with Hamilton can be read here.  On CBS This Morning there was a discussion of the charges also with no mention of the 2001 failed drug test.  It seems strange that there is no mention of this as this allegation should not be hard to check out.  Armstrong was cleared of criminal wrongdoing.

