Showing posts with label Regression. Show all posts
Showing posts with label Regression. Show all posts

Tuesday, November 16, 2021

County Health Ranking Factors that Predict COVID Case Mortality in PA

This is the 3rd installment of my posts on case mortality in Pennsylvania.  Before I explained case mortality and looked at counties with the highest rates.  This time I'm looking at which County Health Ranking factors predict COVID Case Mortality at the county level.  This is done to see which health issues occurring in these counties prior to the pandemic predict mortality rates.

County Health Ranking Variable

Correlation with Case mortality (*Significant)

Length of Life

0.30*

Quality of Life

0.17

Health Behaviors

0.29*

Critical Care

0.16

Social and Economic

0.28*

Physical Environment

-0.10

Trump %

0.33*


The above table presents Spearman correlation coefficients for 6 county health ranking sub-measures and Trump % of the vote in 2020 with county case mortality rates.  The length of life, health behaviors, social and economic and Trump% of the vote coefficients were statistically significant and positive.  Scatterplots for these relationships are presented below.




























The above plot shows the relationship between the length of life z score (lower is better) for all 67 counties in Pennsylvania and COVID Case mortality.  This relationship accounts for 5.3% of the variability in case mortality.  Sullivan and Juniata counties with high case mortality rates but average length of life scores.




























The next graph shows the relationship of health behaviors (lower is better) with COVID case mortality.  This relationship accounts for 6.7% of the variability in case mortality.  As can be seen there is considerable spread in the data.



Social and economic factors (a smaller score is better) was a significant predictor of case mortality but only accounted for 3.1% of the variability in case mortality. Sullivan and Juniata Counties are still outliers for each measure.




























Trump's % of the vote is not a factor in County Health Rankings but there is a lot of data showing that counties where Trump is popular having higher COVID rates.  This relationship is stronger than any of the county health ranking factors accounting for 6.9% of the variability in case mortality.  


In a multiple regression model, only the health behaviors measure remains a significant predictor of case mortality.  These county health ranking sub-measures are themselves composites of over 60 individual statistics.  The next step will be to see which of these statistics are significant predictors of case mortality.

**Related Posts**

Saturday, March 28, 2020

County Health Rankings and Corona Virus Cases: Lower ranked counties have fewer cases in PA (except for Philly)

I thought I would take a break from Corona Virus numbers to talk about other numbers that I have covered over the years: County Health Rankings and the the Southern Poverty Law Centers annual count of hate groups in the US.  On the surface they are unrelated but because both happens in Pennsylvania and the rest of the U.S. they are tangentially related.


Above are the maps showing the rankings for Pennsylvania.  On the left are the rankings for health outcomes which are a composite of length of life and quality of life data.  Darker green means lower ranked.  

On the right are the rankings for health factors which contribute to the health outcomes. Likewise this ranking is a composite of health behaviors, clinical care, social and economic, and physical environment factors.  Darker blue counties are lower ranked. 



Philadelphia County was ranked last in both measures.  Union county was ranked first on health outcomes and Montgomery County was first on health factors. Ironically some of the highest ranked counties (except for Philadelphia) are the ones that have the fewest Corona Virus cases. I thought I would take a look at how the rankings correlate with the number of cases so far.

Measure
Number of Cases with Philly
Number of Cases w/o Philly
Health Outcome Z-Score
0.068
-0.349
Health Outcome Rank
-0.021
-0.300
Health Factor Z-Score
0.016
-0.464
Health Factor Rank
-0.069
-0.378

The correlation values with the ranking and the overall number of cases in each county are provided above.  With Philadelphia County included there is negligible correlation because it has a low ranking and a high number of cases.  The z scores are used to determine the ranking.  A high positive z score gives a low ranking.  

With Philadelphia County excluded, there are fairly strong negative correlations with the number of COVID-19 cases.  The strongest negative correlation is with the health factor z score (-0.464 or 21.5% of the variability).  This one I will look into further with a poisson regression analysis which is used to model count data.

The regression equation for health factors is 2.984 -2.001(zscore) and was statistically significant.  This means that for counties with a z score of zero they would be expected to have around 20 cases.  For every unit increase in the score, the number of cases is expected to decrease by 2.718 raised to the -2 power.    



The graph above is different from my usual regression plots because the y axis is on a logarithmic scale.  It indicates a good fit with the outliers of Montgomery, Fayette, and Montour counties.  More research will be needed to see if this pattern holds up elsewhere.

As previously mentioned, health factors is a composite of different sub measures.  These sub measures are determined by a dozens of county level statistics. Next I will look at which of these sub measures are most closely associated with Corona Virus cases.  Later I will look at the Southern Poverty Law Center's new hate group numbers.


**Related Posts**



Monday, September 2, 2019

Facebook Primary: Page Likes Predict Democrat Support (Except for Joe Biden)

Four years ago I showed that a candidate's following on Facebook predicted their support in the polls at this point four years ago.  It predicted 70.6% of the variability for Republicans and 75.6% of it for Democrats.  I thought I would lake a look at how the candidates fare with Facebook and Twitter in the Democratic primary this time around.  The 13 candidates that appear in the RCP poll Average are summarized in the graph below.



The graph above shows the number of Facebook page likes for the candidates on the x axis and the Real Clear Politics (RCP) national poll average % for that candidate.  The linear regression model accounted for 42.1% of the variability in the RCP average.  The fit line plot above show a good fit for each of the candidates except for Joe Biden who has the highest RCP poll average at 28.9% but only 1,487,733 Facebook page likes.  Biden has name recognition because he was Obama's Vice President.



If the model is rerun with Biden excluded, Facebook page likes now account for 87.1% of the variability in the RCP poll average.  The model predicts that for every increase on one million page likes for the candidate's official Facebook page, the RCP poll average is expected to increase by 3.62%.  If 100% of the variability were accounted for, all of the candidates would form a perfect straight line on the graph.  

I did look at the candidate's Twitter followings and their poll averages.  A similar pattern was found but the relationship was not as strong as it was for Facebook.  The model with Biden included accounted for 34.3% of the variability and without him it accounted for 55.4%.  

One should be careful not to assume a strong Facebook or Twitter following causes a candidate to have high poll numbers.  Strong poll numbers could cause a high Facebook following.  The Twitter and Facebook followings were highly correlated with a coefficient of 0.9018 accounting for 81.3% of the variability.

Kirsten Gillibrand and Jay Inslee dropped out of the race with 381,476 and 75,202 page likes respectively.  Michael Bennett, Bill De Blasio, and others remain in the race despite not registering in the polls and not qualifying for the Sept 12 debate.  Bennett has 103,933 page likes and De Blasio has 66,070.  Tulsi Gabbard just missed the debate because she did not have enough individual donors.  

The candidates poll numbers and Facebook and Twitter followings are summarized in the table below.  Facebook has taken steps to curve foreign influence in the upcoming election.  It will remain a force in the election.

Candidate
RCP Poll Avg %
FB Following
Twitter following
Biden
28.9
   1,487,733
   3,706,982
Sanders
17.1
   5,104,561
   9,619,000
Warren
16.5
   3,281,315
   3,087,005
Harris
7
   1,148,762
   3,090,636
Buttigieg
4.6
      441,495
   1,396,792
Yang
2.5
      178,036
      747,425
Booker
2.4
   1,192,725
   4,344,745
O'Rourke
2.4
        916,711
   1,565,273
Gabbard
1.4
       377,434
      542,790
Castro
1.1
       141,257
        379,882
Klobuchar
0.9
       258,582
        755,517
Bullock
0.8
         32,231
        185,238
Williamson
0.8
       814,883
     2,759,880


**Related Posts**



Saturday, October 28, 2017

Veterans, The Elderly, and Living Wage Cities/Counties

I asked others in the field of demographics on my last post on the percentage of veterans being a negative predictor of the amount of living wage enacted in the 38 cities/counties that have passed living wage ordinances.  One expert in the field suggested one variable that I hadn't considered.  




Chris Briem over at the blog Nullspace suggested I look at age as a possible variable that could mediate this relationship.  He stated that there are higher concentrations of veterans among the elderly.  This makes sense as the draft existed before 1970.  I did obtain the % of the population over the age of 65 for cities in the 2010 census and added it to the model seen below.


Coefficients
Standard Error
t Stat
P-value
Lower 95%
Upper 95%
Intercept
14.70
1.58
9.30
0.00
11.49
17.91
% veteran
-0.60
0.17
-3.61
0.00
-0.94
-0.26
% over 65
0.03
0.13
0.22
0.83
-0.24
0.30

The % of veterans in the city/county still significantly negatively predicted the amount of the living wage passed while the % over the age of 65 did not predict it in either direction.  These cities did have lower percentage of veterans (mean=4.95%) than the US (6.22%).  Likewise these cities did have lower percentages of those over 65 (mean=11.77%) than the US (13.00%).  

I looked at the correlation between the % of veterans and the % over 65.  There was a non-significant positive correlation between the variables as can be seen in the graph below.  Only 8% of the variability in the % over 65 was accounted for by the % veterans for these cities.  There are cities with high elderly populations and low veteran populations such as Palo Alto and El Cerrito, CA



It may be more informative to look at the % of elderly veterans vs. younger veterans as a predictor of the amount of the living wage.  I'm not sure where that data is available but it is a good area of inquiry.

**Related Posts**


What do Living Wage Cities Have in Common?





Veterans, the Living Wage, and the McNamara Fallacy