## Saturday, April 8, 2017

### New Uninsured Estimates Improve Precision of Trump-Hate Group Model (but there's multicollinarity)

The Census bureau just released it's annual Small Area Health Insurance Estimates (SAHIE) for every state and county in the US in 2015. With all of the discussion of repealing and replacing the Affordable Care Act I thought I would take a look at how the state level uninsured rates correlated with the % of the vote that Trump received in 2016.  The graph below shows a statistically significant positive relationship with 23% of the variability accounted for by this relationship.  If 100% of the variability were accounted for, all of the states would form a perfect straight line sloping upward like the red dots on the graph below.  The red dote represent the predicted values for the regression equation:

Trump % = % Uninsured * 1.3% + 36.1%

This equation states that a state with 0% uninsured (which doesn't exist) Trump would receive 36.1% of the vote with every 1% increase in the uninsured rate giving him an increase of 1.34% of the vote.  This univariate model predicted the % of the vote of some states better than others.  For the next step I took a look at whether this relationship would hold up if I added % uninsured to the model I created for hate group rates and Trump's % of the vote.

When I added the % uninsured to the model, I thought took a look at how it would perform in a model with hate group rate, % in poverty, and % with a bachelor's degree or higher.  The full model gives the following estimates.

Trump % = %Uninsured * 0.6% - %Poverty*1.4% + Hate Groups*1.4% - %bachelors*1.6% + 105.6%

The intercept for this model suggests that for a state with values of zero for each of the predictors Trump would receive 105.6% of the vote which is impossible.  The slopes for the predictors % poverty also contradicts the univariate analysis for this variable.  This suggests a problem with multicollinearity which biases the regression coefficients.  The change in the % of the variance explained suggests that the predictors are statistically significant.  I tried to convert the values of the predictors and omit predictors with little success.  To find regression coefficients that are more realistic, I need to use a regression method that is less susceptible to multicollinearity called ridge regression.

Ridge regression is a biased regression method with a penalty term lambda which shrinks the variability of the estimates.  In an iterative process a value of 40 was chosen for lambda.  This method provided the following estimates for the regression slopes.

Trump % = %Uninsured*0.4+ %Poverty*0.02% + Hate Groups*0.9% - %bachelors*0.6% + 60.8%

These estimates are closer to what I found univariately and account for 68% of the variability in Trump's % of the vote.  The above chart shows the relationship between % uninsured and Trump's % of the vote for the regular regression model.  The predicted values are closer to the actual values for this model.  The ridge coefficient estimates suggest that the concentration of hate groups have the strongest predictive effect on Trump's % of the vote followed by the % of the population with a bachelor's degree, the % uninsured, and the % in poverty.

**Update**

Below is the raw data used in this analysis.

 State Name Hate groups 2016 Pop 2016 Hate groups per million '16 % in poverty % uninsured % bachelors degree or higher Trump % Alabama 27 4863300 5.55 18.5 11.9 15.4 62.9 Alaska 0 741894 0 10.4 16.3 29.7 52.9 Arizona 18 6931071 2.6 17.4 12.8 27.7 49.5 Arkansas 16 2988248 5.35 18.7 11.1 21.8 60.4 California 79 39250017 2.01 15.4 9.7 32.3 32.7 Colorado 16 5540545 2.89 11.5 9.2 39.2 44.4 Connecticut 5 3576452 1.4 10.6 6.9 38.3 41.2 Delaware 4 952065 4.2 12.6 6.9 30.9 41.9 Florida 63 20612439 3.06 15.8 16.3 28.4 49.1 Georgia 32 10310371 3.1 17.2 15.8 29.9 51.3 Hawaii 0 1428557 0 10.7 4.8 31.4 30 Idaho 12 1683140 7.13 14.7 12.8 26 59.2 Illinois 32 12801539 2.5 13.6 8.2 32.9 39.4 Indiana 26 6633053 3.92 14.4 11.3 24.9 57.2 Iowa 4 3134693 1.28 12.1 5.9 26.8 51.8 Kansas 7 2907289 2.41 12.9 10.5 31.7 57.2 Kentucky 23 4436974 5.18 18.3 7.1 23.3 62.5 Louisiana 14 4681666 2.99 19.5 13.8 23.2 58.1 Maine 3 1331479 2.25 13.2 10.3 30.1 45.2 Maryland 18 6016447 2.99 9.9 7.4 38.8 35.3 Massachusetts 12 6811779 1.76 11.5 3.2 41.5 33.5 Michigan 28 9928300 2.82 15.7 7.2 27.8 47.6 Minnesota 10 5519952 1.81 10.2 5.2 34.7 45.4 Mississippi 18 2988726 6.02 22.1 14.8 20.8 58.3 Missouri 24 6093000 3.94 14.8 11.5 27.8 57.1 Montana 10 1042520 9.59 14.4 14.2 30.6 56.5 Nebraska 5 1907116 2.62 12.2 9.4 30.2 60.3 Nevada 4 2940058 1.36 14.9 14.1 23.6 45.5 New Hampshire 6 1334795 4.5 8.4 7.8 35.7 47.2 New Jersey 15 8944469 1.68 10.8 10 37.6 41.8 New Mexico 2 2081015 0.96 19.8 13.1 26.5 40 New York 47 19745289 2.38 15.5 8.2 35 37.5 North Carolina 31 10146788 3.06 16.4 13 29.4 50.5 North Dakota 1 757952 1.32 10.7 8.7 29.1 64.1 Ohio 35 11614373 3.01 14.8 7.7 26.8 52.1 Oklahoma 6 3923561 1.53 16 16.1 24.6 65.3 Oregon 11 4093465 2.69 15.2 8.4 32.2 41.1 Pennsylvania 40 12784227 3.13 13.1 7.6 29.7 48.8 Rhode Island 1 1056426 0.95 14.1 6.7 32.7 39.8 South Carolina 12 4961119 2.42 16.8 13 26.8 54.9 South Dakota 7 865454 8.09 13.5 11.8 27.5 61.5 Tennessee 38 6651194 5.71 16.7 12 25.7 61.1 Texas 55 27862596 1.97 15.9 19.2 28.4 52.6 Utah 3 3051217 0.98 11.2 11.6 31.8 45.9 Vermont 1 624594 1.6 10.4 4.7 36.9 32.6 Virginia 39 8411808 4.64 11.2 10.4 37 45 Washington 21 7288000 2.88 12.2 7.6 34.2 38.2 West Virginia 4 1831102 2.18 18 7.3 19.6 68.7 Wisconsin 9 5778708 1.56 12.1 6.6 28.4 47.9 Wyoming 2 585501 3.42 10.6 13.4 26.2 70.1

**Related Posts**