Showing posts with label Probability Theory. Show all posts
Showing posts with label Probability Theory. Show all posts

Thursday, March 14, 2019

Happy 3.14159..... Day

I have an interview published on the website Manufacturing Chemist on healthcare analytics which you are welcome to check out.  

Today is pi Day (march 14 or 3/14).  Which is more celebrated now than March 15 or the Ides of March, the day Julius Caesar was killed.  Of course it's not as celebrated as St. Patrick's day everywhere but in Ireland itself.  

Pi is the ratio of the circumference of a circle to its diameter.  It is used to calculate the area of a circle, the circumference of a circle, and the volume of a sphere.  It is also used in the famous normal distribution in statistics, aka the bell shaped curve as seen in the formula below.  Here the probability value of x is conditioned on the variables mean mu and variance sigma squared.  Pi is a constant in this equation.
There are many other uses for pi in math.  Many of which I am not aware so I will leave it here.

**Related Posts**

Tuesday, July 10, 2018

Managing Predictions with Bayes Theorem

I have been reading Nate Silver's book The Signal and the Noise, about how hard it is to make predictions even when we have complete data. I tend to be a slow reader so I've been reading it for a while.   Finally I got to the chapter on his years as a professional poker player.  He talked about how players like him get overconfident if they start out winning.  Poker is a game of skill and luck.  The great player are lucky and good.  He argues is a thing that can be managed according to Bayes theorem.

In layman's terms, Bayes theorem says that people bring their preconceived notions to a situation and then update their notions based on what happens in that situation.  The image above shows the mathematical formula for the theorem.  If we know the probability of event B occurring given that event A has occurred, If we know the overall (or marginal) probabilities of events A and B occurring separately from each other we can calculate the probability of event A occurring given that B has occurred.  An example of this theorem is shown below.
The table above (that I used in my first post on data driven journalism) shows the 2016 primary wins for Clinton, Trump, and Sanders.  The probability of Trump winning a state given that Clinton has won in the other party's contest in that state is 25 out of 29 or 86%.  We can use Bayes Theorem to find the probability that Clinton wins the state given that Trump has won.

First we need the overall probability of Trump winning the state overall which is 37 out of 51 contests (DC is included) or 73%.  Next we need the probability of Clinton winning a state which is 29 out of 51 contests or 57%.  We can now plug those numbers into Bayes theorem as in the above image.  The percents were converted to decimals for computation sake.  

We update our knowledge of the probability of Trump winning given that Clinton won in the other party with the overall probability of Trump and Clinton winning to find the probability of Clinton winning given that Trump has won.  That number is 67% which is considerable lower than the original 83%.  This is because Trump was more likely to win on his party's side than Clinton was in her party.

We can calculate these probabilities easily after the contests are over but it is different making these predictions beforehand.  Nate Silver was wrong about how many primaries Bernie Sanders would win and who would win the general election after correctly predicting the winner in 2012.  How will he update his predicting model for the 2018 and 2020 elections?  Time will tell.

**Related Posts**

Don’t test me: Using Fisher’s exact test to unearth stories about statistical relationships (Repost)

Monday, February 8, 2016

Iowa Coin Toss Math Part 2

I received a lot of comments on my post on the probability of Hillary Clinton winning all six coin tosses in a tied caucus site is 1 in 64 or 1.56% or 0.0156 as the Des Moines Register reported.  NPR cited an unnamed Democratic party official saying that there were a dozen coin tosses and Sanders won "at least a handful" while other media outlets and the Sanders campaign have repeated the claim that there were six.   Someone sent me the above video which appears to show Sanders winning a coin toss.

Lets be conservative and say that there were seven coin tosses with Sanders winning one.  What is the probability of this outcome?  According to the binomial distribution, the probability of exactly one success out of seven coin tosses (with the chance of success on 1 toss being 0.5) is 0.0547 or 1 in 18.28.  The probability of 1 or fewer successes is 0.0625 or 1/16.  

If we take the democratic official at his word that he won "a handful" out of a dozen tosses (we will call a handful 5), the probability of exactly 5 successes in 12 tosses is 0.1934 or about 1 in 5.  The probability of 5 or fewer successes in 12 tosses is 0.381 or 38.7% or 1 in 2.58 tosses.  

The probability in the first two scenarios is low but there is a greater chance of these scenarios than dying in a plane crash (1 in 5.4 million), being struck by lightning in one's lifetime (1 in 12,000), or the probability of winning the Powerball jackpot (1 in 175,223,510).  The Des Moines Register still says that there were irreguarities in the caucus process and that the gap between Clinton and Sanders narrowing to 700.47 state delegates for Clinton and 696.92 for Sanders.  

Hopefully New Hampshire;s primary won't have the same issues.  Other states do have caucuses similar to Iowa.  Voting or caucusing irregularities are usually noticed when the results are as close as Iowa this year or Florida in 2000.  Results that are independently verifiable is the key.

**Related Posts**

Iowa Caucus Coin Toss Math


Adjusting Exit Polls? Assumptions Make All The Difference (Response to Charmin)


The Need for Exactness

Wednesday, February 3, 2016

Iowa Caucus Coin Toss Math

The 2016 Iowa Caucuses had a fantastic photo finish.on Monday night with Hillary Clinton outlasting Bernie Sanders by 4 delegates to the states county conventions.  The delegates to the county conventions are allocated to the candidates based on the percentage of the caucus goers supporting a viable candidate.  A candidate is viable is he or she has more than 15% of the caucus goers present.  In the caucus room there is a scramble for unviable candidates caucus goers (in this case Martin O'Malley and uncommitted supporters) too go into either Clinton or Sanders corner.  

For caucuses the networks do entrance polls of caucus goers rather than exit polls for primary and general elections.  The entrance poll tells us about who the goers are and what they were thinking as they enter the caucus room with a random sample of 1660 out of over 100,000 total goers.  The table below shows the breakdown of democratic caucus goers by gender.  Multiplying the marginal percentages for gender by the cell percentages for the candidates and then summing across columns gives the preferences of goers as they entered the room.  For example the overall % for Sanders is found by the formula 0.50*0.43 + 0.42*0.57 = 0.4544 or 45.44%.  The final delegate total suggests that the Sanders people did a better job attracting O'Malley and uncommitted caucus goers into their corner than Clinton once they were in the room.  The rest of the exit poll shows that Sanders was preferred overwhelmingly by younger goers, by those with some college or a college degree, by single goers, and by low income caucus goers.

Men (43%)
Women (57%)
% Caucus Goers on Entrance
Final Delegates (%)
701 (49.86%)
8 (0.6%)
697 (49.57%)
0 (0%)

In cases where there was a tie in the caucus room, the winner is determined by a coin toss.  For example in the case of a tie for a room with 5 delegates, 2 delegates would be awarded to Sanders, 2 to Clinton, and the 5th delegate awarded to a candidate by a coin toss.  On Monday night there were 6 caucus rooms decided by a coin toss with all 6 delegates being awarded to Clinton.  Given that the margin of victory was 4 delegates for Clinton it seems that that is what determined the outcome.  The Nightly Show with Larry Wilmore had a funny take on this.
Who wins this coin toss?
What exactly is the probability of Clinton winning all six tosses given that a fair coin is usedWhen a fair coin is used the probability of winning one toss is 1/2 or 50%.  The chance of winning two tosses is 1/4 or 25%.  In the case of 6 coin tosses, the chance of Clinton winning all 6 tosses is 1/64 or 1.6%.  This outcome is possible by pure chance but its likelihood is very small.

NPR has reported that there were in fact a dozen coin tosses and that Bernie Sanders won "a handful" of them according to an unnamed Democratic party official.  How many is a handful?  Can this number be independently verified?  Truth is a slippery thing indeed.  There is apparently this one video where Sander's did win a coin toss so he has won at least one toss.

**Related Posts**

The Facebook and Twitter Primary in Iowa


Multiple Comparisons, Margins of Error, and the Affordable Care Act Census Data




Lance Armstrong's Doping Claim: A Probabilistic Calculation

Thursday, October 31, 2013

Multiple Comparisons, Margins of Error, and the Affordable Care Act Census Data

In my last post I looked at changes in the census rates in 51 states (50 plus DC) since parts of the act have come into effect.  I found 14 with a significant change from 2010 to 2011 (the most recent years available since the ACA was passed).  Each individual state was classified as changed if the difference in their rates were outside of the 95% probability margin of error (MOE) for both years.  That means that we are 95% certain that the actual rate is between an upper and lower limit.

For example Texas had a MOE of +/- 0.2%, that means it's estimated rate for 2010 of 26.3% is between 26.5% and 26.1% with 95% probability while it's estimated rate for 2011 of 25.7%  is between 25.9% and 25.5% with 95% probability.  Because the intervals for both years do not overlap, we can be confident that the change in the rate is real across the years.

Contrary to his claims, the results suggest that Ted Cruz's Texas so far has had a real but small decrease in the uninsured rate since the ACA or Obamacare has been enacted. In the graph above, California and Vermont have had significant decreases while Missouri was the only one that increased.  Massachusetts and Pennsylvania stayed the same.  The other states are summarized in my previous post.  

Statistician critics may argue that repeating 51 comparisons inflates the chance that at least one state has been significantly different by pure chance.  The 95% confidence interval means that there is a 5% chance or 0.05 probability that each individual comparison is significant by pure chance.  Repeated 51 times means that the expected number of chance differences is 51(0.05)=2.55.  Because there were 14 significant differences which is well above the expected number of chance differences.     I can be confident that almost all of changes in the rates are real.

Looking at the county level rates for Pennsylvania there were zero significant changes either positive or negative out of the 67 counties.  Counties with small populations have very large MOEs however.

**Related Posts**

The Affordable Care Act (ACA) Having Little Effect on PA's Uninsured Rate So Far (Repost with PUSH)

Sunday, July 10, 2011

Casey Anthony's "CSI" Effect

The shock waves still continue to emanate from the Casey Anthony trial where the young Florida mother was acquitted of murdering her 3 year old daughter but convicted of the lesser offenses of lying to police.  Rendering a verdict in court cases is much like testing hypotheses in statistics.  The researcher (or the jury) makes a decision based on the data (aka evidence) that is available and based on the likelihood of this data (or evidence) being observed when the research hypothesis is assumed false (ie. presumed innocent until proven guilty).  This decision is later compared to some objective truth (which if you're religious only God and maybe the perpetrator would know for certain). Criminal trials are seldom as clear on guilt or innocence as that of Richard Poplawski.  There are two types of errors that can be made: the first is to put an innocent person in prison and the second is to let a guilty person go free.  The founding fathers considered the first error to be the more serious as Ms. Anthony can now never be tried again for this crime.
Criminologists have talked about a CSI effect where juries expect police and prosecutors to have detailed physical evidence tying the accused to the crime scene and the victim.  This may or may not have happened in this case, I haven't followed it that closely.  Of course in scientific research, the news media and the public is often far less critical of published research findings often latching on to them (especially when it supports what one already believes) unquestioningly.  In both of these cases one's emotions can override one's rational faculties when things like the death of a child are involved.

Jack the Ripper was never caught in the Victorian era at least partly because CSI tools like finger printing or DNA analysis were not available in the 19th century.  Arthur Conan Doyle started writing the Sherlock Holmes novels and short stories about the same time as Jack was active.  It may be a coincidence that Doyle's success with Holmes and the public's fear and frustration with Jack the Ripper but I doubt it.  Just as modern police hate shows like CSI I wonder what real Victorian detectives thought of Doyle's Holmes?  My hypothesis is that they sympathized with Dr. Watson.  Look for Holmes solving the Casley Anthony case soon on CSI.
**Related Posts**

Lance Armstrong's Doping Claim: A Probabilistic Calculation




ADHD, Genetics, and Causality: A Chicken-Egg Problem


Cause and Effect, Slip Slidin' Away





Monday, May 30, 2011


My recent post on Lance Armstrong received a high number of pageviews in one week (it's already my second all time most read post behind Income and Life Expectancy. What does it Tell Us About US) but a low number of comments and a relatively short average visit time.  That suggests to me that while readers might be interested in the topic of whether he doped or not, they have a hard time understanding the mathematical calculations that I did.  

Probability theory is a tricky concept.  It is what gives students of statistics the hardest time, myself included.  It had it's beginnings with games of chance.  This is why casinos do such big business because most people are such poor judges of their real chance of winning.  The real ones who got rich off of the California gold rush were the ones who sold the shovels.

Taking the performance enhancing drugs and taking the drug test is a form of gambling where the athlete's career is being put on the line.  One commenter on Twitter thought I proved nothing, another on Facebook thought I should've used a more sophisticated analysis like Bayes theorem and one still thought I should have considered time from injection to the testing.  The rest said nothing so they leave me to make a probabilistic guess about what they're thinking.  

K. Werner Heisenberg's uncertainty principle states that you can be absolutely certain of a particle's position or it's velocity but never both at the same time.  Merely the act of observing a phenomenon introduces into it uncertainty.  The only way we can know the world is through probability.  Our prior judgments do cloud our decisions and it is important to be aware of them when considering new information.

**Related Posts**

Cause & Effect, Slip Slidin' Away 

The Joy of Stats -

Making Sense of the Pat Toomey-Joe Sestak Senate Race