Showing posts with label poisson regression. Show all posts
Showing posts with label poisson regression. Show all posts

Sunday, April 19, 2020

COVID-19 and County Health Rankings in PA: Which Variables Predict Cases and Deaths

The trend in Covid-19 Cases in Cambria County

Last week I posted on how county health rankings overall measures predicted the number of cases and deaths in each county in Pennsylvania.  These measures were composites of dozens of more specific county level measures.  There are too many univariate correlations to summarize here.  The cases used in the analysis are from April 18. The variables were added one at a time and stayed in the Poisson regression model if they were significant.

The final Poisson model for the number of cases is:

ln(cases) = -1.033 + 0.000002*(population) + 0.045*(% with access to exercise opportunities) - 0.070*(Social Association Rate) + 0.13*(% who Drive Alone to Work) + 
0.41*(% not proficient in English) + 0.14*(% with severe housing problems)

  • Access to exercise opportunities, is a component of health behaviors and is positively associated with the number of cases in each county.  
  • The social association rate is the number of membership organizations per 100,000 people and is negatively associated with the number of cases.  This measure is a component of the social and economic z-score.
  • The percent who drive alone to work is also a component of social and economic z-score and is positively associated with the number of cases.  
  • The percent with severe housing problems is a component of the physical environment z-score and is positively associated with the number of cases.  It is correlated with poor length of life outcomes.
  • The percent not proficient in English is a demographic variable that is not a component of the rankings.  It has the strongest association with the number of cases.
The model for deaths last week was:

ln(number of deaths) = -0.14 - 7.97*(health behavior z-score) + 2.83*(social economic z score) + 1.62*(quality of life z score) + 0.000003*(population)

The final model with the submeasures that was settled on was:

ln(number of deaths) = -4.94 + 0.000001*(population) + 0.07*(% with access to exercise opportunities) + 0.78*(%Unemployed) - 0.16*(% of Children in poverty) + 0.46*(% not proficient in English)
  • Access to exercise opportunities, is a component of health behaviors and is positively associated with the number of deaths in each county.  
  • The percent unemployed is a component of the social and economic factors z-score and is positively associated with the number of deaths.
  • The percent of children in poverty is also a component of the social and economic factors z-score and is negatively associated with the number of deaths.  This seems counter intuitive but counties with higher rates of child poverty may have less social interaction with more susceptible populations such as the elderly.
  • Like the number of cases, The percent not proficient in English was positively associated with the number of deaths.  This variable and the % unemployed could be positively correlated with poor quality of life.
These variables would be better to study at the individual level than the county level.  I looked at these variables as they were readily available.  The graph above shows the trend in cases in my home county, Cambria County.  It does provide some clues as to what factors may be exacerbating this pandemic.


**Related Posts**

Friday, April 10, 2020

The Number of Corona Virus Cases in Cambria County has Grown Exponentially While Health Behaviors Predict Cases in PA

 

The number of corona virus cases has grown exponentially in Cambria County.  I have been keeping track of the number of cases in a google sheet as can be seen above.  The cumulative case line has been following a cubic trend with the polynomial, y = 0.0347x2 - 3051.6x + 7E+07.  This equation accounts for 98.5% of the variability in the solid trend line.  

Two weeks ago I correlated the number of COVID-19 cases at the county level in Pennsylvania with the county health ranking for that county using Poisson regression.  This week I thought I would take a look at the submeasures for the rankings with the case and death numbers from April 8.  Population numbers for each county were added so that Philadelphia county could be added.

Number of Corona Cases

Corona Deaths 

Length of Life   Z-Score

0.046

0.067

Quality of Life Z-Score

0.286

0.284

Health Behavior Z-Score

-0.038

0.065

Clinical Care   Z-Score

-0.059

0.114

Social Economic   Z-Score

0.301

0.412

Physical Environment Z-Score

0.062

-0.449

Number of Corona Cases

1.000

0.957

Corona Deaths 

0.957

1.000

population

0.841

0.792


The table above shows the univariate correlations of the submeasures with Philadelphia included.  For the number of cases, the quality of life z score (part of the health outcomes ranking) and the social economic z score (with the health factor ranking) were correlated.  For the number of deaths, quality of life, social economic, and physical environment (part of health factors) were correlated. Z scores are numbers scaled so that the mean is zero and 

For the case numbers, three of the county health ranking submeasures were significantly associated with the outcome along with population.  The poisson regression equation is given by:

ln(number of cases) = 4.15 -5.91*(health behavior z-score)  + 4.31*(social economic z score) - 0.74*(length of life z score) + 0.000002*(population)

This means that the number of cases increases as the health behavior and length of life z scores improve and (a negative score is better).  The number of cases decrease as the social economic z score improves.  Ln is the natural logarithm of the number of cases.

For the number of deaths in each county as of April 8, three submeasures were significantly associated with the number of cases.  The poisson regression equation is given by:

ln(number of deaths) = -0.14 - 7.97*(health behavior z-score) + 2.83*(social economic z score) + 1.62*(quality of life z score) + 0.000003*(population)

Like the number of cases, the natural logarithm of the predicted number of deaths at the county level increase as the health behavior z score decreases.  The predicted number of deaths decrease as the social economic, quality of life z scores, and population decrease.  




Adding multiple predictors often leads to variables that were not significant univariately to being significant in a multiple regression model, especially after population is adjusted for.  In the graphs above we see that Philadelphia county is an extreme outlier.  This is mostly due to its population.  Adding population to the model helps to negate its outlier effect.

These submeasures are themselves composites of dozens of county level statistics.  The next step is to look at these individual measures and the up to date counts of COVID-19 cases and deaths.

**Related Posts**

Saturday, March 28, 2020

County Health Rankings and Corona Virus Cases: Lower ranked counties have fewer cases in PA (except for Philly)

I thought I would take a break from Corona Virus numbers to talk about other numbers that I have covered over the years: County Health Rankings and the the Southern Poverty Law Centers annual count of hate groups in the US.  On the surface they are unrelated but because both happens in Pennsylvania and the rest of the U.S. they are tangentially related.


Above are the maps showing the rankings for Pennsylvania.  On the left are the rankings for health outcomes which are a composite of length of life and quality of life data.  Darker green means lower ranked.  

On the right are the rankings for health factors which contribute to the health outcomes. Likewise this ranking is a composite of health behaviors, clinical care, social and economic, and physical environment factors.  Darker blue counties are lower ranked. 



Philadelphia County was ranked last in both measures.  Union county was ranked first on health outcomes and Montgomery County was first on health factors. Ironically some of the highest ranked counties (except for Philadelphia) are the ones that have the fewest Corona Virus cases. I thought I would take a look at how the rankings correlate with the number of cases so far.

Measure
Number of Cases with Philly
Number of Cases w/o Philly
Health Outcome Z-Score
0.068
-0.349
Health Outcome Rank
-0.021
-0.300
Health Factor Z-Score
0.016
-0.464
Health Factor Rank
-0.069
-0.378

The correlation values with the ranking and the overall number of cases in each county are provided above.  With Philadelphia County included there is negligible correlation because it has a low ranking and a high number of cases.  The z scores are used to determine the ranking.  A high positive z score gives a low ranking.  

With Philadelphia County excluded, there are fairly strong negative correlations with the number of COVID-19 cases.  The strongest negative correlation is with the health factor z score (-0.464 or 21.5% of the variability).  This one I will look into further with a poisson regression analysis which is used to model count data.

The regression equation for health factors is 2.984 -2.001(zscore) and was statistically significant.  This means that for counties with a z score of zero they would be expected to have around 20 cases.  For every unit increase in the score, the number of cases is expected to decrease by 2.718 raised to the -2 power.    



The graph above is different from my usual regression plots because the y axis is on a logarithmic scale.  It indicates a good fit with the outliers of Montgomery, Fayette, and Montour counties.  More research will be needed to see if this pattern holds up elsewhere.

As previously mentioned, health factors is a composite of different sub measures.  These sub measures are determined by a dozens of county level statistics. Next I will look at which of these sub measures are most closely associated with Corona Virus cases.  Later I will look at the Southern Poverty Law Center's new hate group numbers.


**Related Posts**