Showing posts with label Regression. Show all posts
Showing posts with label Regression. Show all posts

Friday, May 27, 2022

County Health Rankings predict COVID Mortality in the 10 County Area

 

Last year I looked at COVID mortality and County Health Rankings (CHR) numbers for the whole state of PA. This year, the state does not make cumulative mortality numbers readily available. This makes updating the numbers subject to copying error. For this year, I thought I would focus on ten county region surrounding Cambria County.

The 10 county area is listed in the table at the top. The univariate correlation coefficients are presented in the table below for the COVID measures above and the CHR rankings for health outcomes and health factors. Health outcomes is a composite of length of life and quality of life measures. Health factors is a composite of health behaviors, clinical care, social and economic, and physical environment factors. The correlation matrix is presented below. Because of the small sample size, correlation coefficients of 0.632 or higher or -0.632 or lower were flagged as significant and presented in bold below.

 

% COVID Fully Vaccinated 

Case Mortality %

COVID Case Rate /100,000

COVID Mortality /100,000

Hosp  /100,000

County Health Outcomes Rank

Health Factors Rank

% COVID Fully Vaccinated 

1

Case Mortality

-0.738

1.000

COVID Case Rate

0.158

0.311

1.000

COVID Mortality /100,000

-0.579

0.953

0.583

1.000

Hosp /100,000

0.360

-0.440

0.134

-0.339

1.000

County Health Outcomes Rank

-0.289

0.632

0.344

0.642

-0.254

1.000

Health Factors Rank

-0.462

0.730

0.398

0.747

-0.409

0.850

1.000

Length of life rank

-0.297

0.696

0.232

0.659

-0.242

0.896

0.683

Quality of Life Rank

-0.160

0.428

0.368

0.477

-0.254

0.925

0.826

Health Behaviors rank

-0.363

0.678

0.477

0.726

-0.227

0.676

0.913

Clinical Care rank

-0.548

0.803

0.232

0.764

-0.420

0.828

0.785

Soc & Econ Factors rank

-0.357

0.598

0.318

0.608

-0.414

0.843

0.950

Phys env rank

0.031

0.135

-0.381

0.003

-0.242

0.385

0.116

The COVID case and population adjusted mortality rates were most strongly associated with the CHR rankings of length of life, health behaviors, and clinical care. A positive correlation with the rankings suggest that the lower the ranking is, the higher the COVID mortality. Case mortality is simply the number of COVID deaths divided by the number of COVID cases.




























The above graph shows the scatter plot for the length of life rank showing a linear association with COVID case mortality. The R squared statistic of 0.485 means that 48.5% of the variability in case mortality is accounted for by the length of life ranking. The regression equation says that for every unit a county is ranked lower, there is a predicted increase of 0.02% in the case mortality. There were no strong outliers in in this plot. 




























The strongest correlation for the health factors rankings was clinical care at 0.808. The scatter plot above shows a stronger correlation between that and case mortality accounting for 64.5% of its variability. If it were 100% of the variability, all of the counties would fall on the regression line. Like length of life, the regression equation predicts that for every unit lower that a county is ranked, a 0.02% increase in case mortality should happen.

These sub rankings are themselves composites of dozens of statistics. The individual statistics should shed more light on what variables may be driving COVID vaccination, case, mortality, and hospitalization rates. The devil is always in the details. This will be the next step.

**Related Posts**

Friday, January 14, 2022

PA County Factors that Predict Full COVID Vaccination Rates

Last month I posted on which county health ranking variables were most strongly correlated with COVID case mortality rates.  Granted these posts were made just as the omicron variant was arriving on the scene.  This month, I thought I would take a look at which variables were the best predictors of full vaccination rates.  Full vaccination means receiving the first two shots.  As of today, only 51.8% of the state population has received the first two shots.  This includes Philadelphia County and individuals who are ineligible to receive the shots.

Univariately, the average number of mentally unhealthy days had the strongest negative correlation with the full vaccination rate.  Not surprisingly, the flu vaccination rate at the county level had the highest positive correlation.  These and other variables were entered into a multiple regression model to find the most robust predictors.  The flu vaccination rate was not significant in the presence of other variables but others were and are presented below.  These variables accounted for 62% of the variability in the full vaccination rate.  Philadelphia county was a problematic outlier and was excluded from the data set.

Full vaccination rate = 0.585 - 0.006(social assoc rate) - 0.044(avg # mentally unhealthy days) + 0.003(% with access to exercise opportunities) + 0.001(Primary Care Physician Rate)


The rate of social associations in a county was negatively associated with the full vaccination rate.  In the model, for every unit increase in the association rate there is a 0.6% decrease in the full vaccination rate.  Univariately this relationship accounts for 22.8% of the variability in this relationship.  Montour county, with the highest vaccination rate in the state is an outlier for this relationship as well as other variables.

The average number of mentally unhealthy days in the last month is also negatively associated with the full vaccination rate.  For every increase of one day in this variable there is a predicted 4.4% decline in the full vaccination rate.  Univariately this relationship accounts for 34.9% of the variability in the rate.  


Access to exercise opportunities is positively associated with the full vaccination rate.  For every 1% increase in the rate, there is a predicted 0.3% increase in the full vaccination rate.  Univariately, this variable accounts for 33.9% of the variability in the full vaccination rate.
















The fourth variable is the primary care physician rate in the county which is positively associated with the full vaccination rate.  For every unit increase in the physician rate, there is a predicted 0.1% increase in the vaccination rate.  Univariately this variable accounts for 20.8% of the variability in the vaccination rate.  Montour County (where Geisinger Hospital is located), is influential in this relationship but not poorly fit.

When I looked at which County Healthy Ranking variables predicted COVID case mortality rate, the social association rate, the rate of mentally unhealthy days, and access to exercise opportunities were significantly associated with the outcome.  The PCP rate was not significant for case mortality and the % smoking did not hold up in variable selection for the vaccination rate.  

The correspondence is high for the models for these two dependent variables.  These predictors may provide clues as to crafting strategies for improving vaccination rates and thus decreasing COVID mortality.

**Related Posts**

Rates of Smoking and Social Associations Predict PA County COVID Case Mortality



Friday, December 10, 2021

PA Health Behavior Measures Correlate More with Trump % of the vote than with COVID Case Mortality

 

Last week I posted on COVID Case Mortality and County Health Ranking (CHR) measures.  I also posted that there was a stronger correlation between the health behaviors and Trump's % of the vote than with COVID case mortality.  This week I will look at which individual county statistics used by CHR are the best predictors of Trump's % of the vote.
















Thirty-one out of the 48 county health rankings statistics were significantly correlated with Trump's % of the vote in the 67 Pennsylvania Counties.  The variable with the strongest univariate correlation was the % of the population who were smokers in each county.  This relationship accounted for 54% of the variability in Trump's vote in 2020. This relationship can be seen in the graph above.  Philadelphia is an outlier in this model.

Trump % of the vote = 38.29 + 2.03*(% Smokers) - 0.24*(% access to exercise) +                  0.55*(Social Assn) - 0.03*(Chlamydia) - 2.16*(Housing problems)

For the other county level statistics, I entered the variables with the strongest correlations into a multiple regression model.  Variables that were significant were kept into the model.  Five variables were settled on accounting for 91.9% of the variability in Trump's vote.  The model is summarized in the equation above in italics.  

The 38.29 value is the predicted value for Trump's vote % in PA if all of the predictor variables have a value of zero.  Percent smokers has a regression coefficient of 2.03.  This means that for every 1% increase in the percent of smokers, there is a predicted 2.03% increase in Trump's vote.  The other predictor variables will be summarized below.
















The univariate relationship between the % of the county with easy access to exercise opportunities and Trump's vote is presented in the above graph.  In the multiple model there is a predicted decrease of 0.24% in Trump's vote for every 1% increase in access to exercise opportunities.  Philadelphia county is less of an outlier in this model.  Univariately, this relationship accounts for 45.1% of the variability in Trump's vote.
















The rate of the number of social membership organizations per 100,000 for each PA county is compared to Trump's % of the vote in the graph above.  In the multiple regression model, there is a predicted 0.55% increase in Trump's vote for each unit increase in the social association rate.  Univariately, this relationship accounts for 42.6% of the variability in Trump's vote.  Dauphin County is an outlier for this relationship with a relatively high social association rate 18.5/100,000 but only 45% of the vote for Trump.


The rate of chlamydia per 100,000 for each PA county is negatively associated with with Trump's % of the vote accounting for 47% of the variance.  In the multiple regression model, for every unit increase in the chlamydia rate, there is a predicted 0.03% decrease in Trump's vote.  Philadelphia county is an influential county for this relationship but not an outlier.
  


























The last variable in the model is the % of the county with housing problems for each county.  In the multiple regression model, for every 1% increase in this variable, there is a predicted 2.16% decrease in Trump's % of the vote.  Univariately, housing problems account for 48.2% of the variability in Trump's vote. Bedford County may be an outlier.  

These five variables account for 91.9% of the variance in Trump's vote when combined in a multiple regression model.  One should always be careful about assuming a cause and effect relationship between variables that are correlated.  There are always potential unknown variables which can explain this relationship.  These variables were the most robust when entered into the model and do not have a strong association with each other.  The five variables in this model can be ruled out as alternative explanations for each other.

**Update**

Some have asked me about the correlation coefficients for the univariate relationships in this model.  If one takes the square root of the r-squared statistic for % smokers (0.5452) we get a correlation coefficient 0.738  which is much stronger than any of the correlations for COVID case mortality.  For access to exercise opportunities the coefficient is -0.671.  For social association rates, it is 0.673.  For chlamydia rate, it is -0.685.  For housing problems, it is 0.694.

**Related Posts**