## Friday, December 10, 2021

### PA Health Behavior Measures Correlate More with Trump % of the vote than with COVID Case Mortality

Last week I posted on COVID Case Mortality and County Health Ranking (CHR) measures.  I also posted that there was a stronger correlation between the health behaviors and Trump's % of the vote than with COVID case mortality.  This week I will look at which individual county statistics used by CHR are the best predictors of Trump's % of the vote.

Thirty-one out of the 48 county health rankings statistics were significantly correlated with Trump's % of the vote in the 67 Pennsylvania Counties.  The variable with the strongest univariate correlation was the % of the population who were smokers in each county.  This relationship accounted for 54% of the variability in Trump's vote in 2020. This relationship can be seen in the graph above.  Philadelphia is an outlier in this model.

Trump % of the vote = 38.29 + 2.03*(% Smokers) - 0.24*(% access to exercise) +                  0.55*(Social Assn) - 0.03*(Chlamydia) - 2.16*(Housing problems)

For the other county level statistics, I entered the variables with the strongest correlations into a multiple regression model.  Variables that were significant were kept into the model.  Five variables were settled on accounting for 91.9% of the variability in Trump's vote.  The model is summarized in the equation above in italics.

The 38.29 value is the predicted value for Trump's vote % in PA if all of the predictor variables have a value of zero.  Percent smokers has a regression coefficient of 2.03.  This means that for every 1% increase in the percent of smokers, there is a predicted 2.03% increase in Trump's vote.  The other predictor variables will be summarized below.

The univariate relationship between the % of the county with easy access to exercise opportunities and Trump's vote is presented in the above graph.  In the multiple model there is a predicted decrease of 0.24% in Trump's vote for every 1% increase in access to exercise opportunities.  Philadelphia county is less of an outlier in this model.  Univariately, this relationship accounts for 45.1% of the variability in Trump's vote.

The rate of the number of social membership organizations per 100,000 for each PA county is compared to Trump's % of the vote in the graph above.  In the multiple regression model, there is a predicted 0.55% increase in Trump's vote for each unit increase in the social association rate.  Univariately, this relationship accounts for 42.6% of the variability in Trump's vote.  Dauphin County is an outlier for this relationship with a relatively high social association rate 18.5/100,000 but only 45% of the vote for Trump.

The rate of chlamydia per 100,000 for each PA county is negatively associated with with Trump's % of the vote accounting for 47% of the variance.  In the multiple regression model, for every unit increase in the chlamydia rate, there is a predicted 0.03% decrease in Trump's vote.  Philadelphia county is an influential county for this relationship but not an outlier.

The last variable in the model is the % of the county with housing problems for each county.  In the multiple regression model, for every 1% increase in this variable, there is a predicted 2.16% decrease in Trump's % of the vote.  Univariately, housing problems account for 48.2% of the variability in Trump's vote. Bedford County may be an outlier.

These five variables account for 91.9% of the variance in Trump's vote when combined in a multiple regression model.  One should always be careful about assuming a cause and effect relationship between variables that are correlated.  There are always potential unknown variables which can explain this relationship.  These variables were the most robust when entered into the model and do not have a strong association with each other.  The five variables in this model can be ruled out as alternative explanations for each other.

# **Update**

Some have asked me about the correlation coefficients for the univariate relationships in this model.  If one takes the square root of the r-squared statistic for % smokers (0.5452) we get a correlation coefficient 0.738  which is much stronger than any of the correlations for COVID case mortality.  For access to exercise opportunities the coefficient is -0.671.  For social association rates, it is 0.673.  For chlamydia rate, it is -0.685.  For housing problems, it is 0.694.