Tuesday, July 10, 2018

Managing Predictions with Bayes Theorem

I have been reading Nate Silver's book The Signal and the Noise, about how hard it is to make predictions even when we have complete data. I tend to be a slow reader so I've been reading it for a while.   Finally I got to the chapter on his years as a professional poker player.  He talked about how players like him get overconfident if they start out winning.  Poker is a game of skill and luck.  The great player are lucky and good.  He argues is a thing that can be managed according to Bayes theorem.

In layman's terms, Bayes theorem says that people bring their preconceived notions to a situation and then update their notions based on what happens in that situation.  The image above shows the mathematical formula for the theorem.  If we know the probability of event B occurring given that event A has occurred, If we know the overall (or marginal) probabilities of events A and B occurring separately from each other we can calculate the probability of event A occurring given that B has occurred.  An example of this theorem is shown below.
The table above (that I used in my first post on data driven journalism) shows the 2016 primary wins for Clinton, Trump, and Sanders.  The probability of Trump winning a state given that Clinton has won in the other party's contest in that state is 25 out of 29 or 86%.  We can use Bayes Theorem to find the probability that Clinton wins the state given that Trump has won.

First we need the overall probability of Trump winning the state overall which is 37 out of 51 contests (DC is included) or 73%.  Next we need the probability of Clinton winning a state which is 29 out of 51 contests or 57%.  We can now plug those numbers into Bayes theorem as in the above image.  The percents were converted to decimals for computation sake.  

We update our knowledge of the probability of Trump winning given that Clinton won in the other party with the overall probability of Trump and Clinton winning to find the probability of Clinton winning given that Trump has won.  That number is 67% which is considerable lower than the original 83%.  This is because Trump was more likely to win on his party's side than Clinton was in her party.

We can calculate these probabilities easily after the contests are over but it is different making these predictions beforehand.  Nate Silver was wrong about how many primaries Bernie Sanders would win and who would win the general election after correctly predicting the winner in 2012.  How will he update his predicting model for the 2018 and 2020 elections?  Time will tell.

**Related Posts**

Don’t test me: Using Fisher’s exact test to unearth stories about statistical relationships (Repost)