Showing posts with label Graphics. Show all posts
Showing posts with label Graphics. Show all posts

Wednesday, July 15, 2020

A Look at Positive COVID-19 Testing Rates in Cambria, PA, and the US

There have been recent media reports about positive testing rates increasing in Pennsylvania.  I thought I would take a closer look at positive test rates since they were reported by the state on April 17. The positive testing rate is simply the number of positive tests divided by the total number of tests.  The graph above shows the cumulative positive test rates for the US (orange line), PA (red line), and Cambria County (blue dotted line).  The black line shows the daily positive testing rate for Cambria County.  

The positive rate for the state has been consistently higher than the US and county rates.  Both rates have been decreasing as testing has become more readily available.  Cambria's positivity rate has been consistently lower than the state and US rates.  The daily testing rate for the county on April 13 was high because on that day there were three positive tests out of 16 total tests (18.25%).  

Since June 24, the cumulative positive test rate increased from 1.02% to 1.28% as of today.  This rise may not sound like much but the solid black line shows positive rates that were consistently at or above the cumulative rates for this period with one day being higher than 5% on July 11.  

The graph above is from the Johns-Hopkins Univeristy site tracking the Corona Virus Pandemic.  It shows the trend in testing for the US during the pandemic.  The blue line shows the seven day average of positive test % with a steady decrease from early April (21.9%) until the middle of June (4.4%) with an increase to 8.7% today.

This graph shows the trend in testing for the state of Pennsylvania for the same period.  Here, we see that there was a corresponding peak in mid April in the positive rate at 27.8% to 3.4% around June 21.  This decrease was followed by an increase to 5.5% as of today.  We can see that testing has risen at a slower rate in the state than in the US as a whole.

The black line in the graph at the top was replaced with the 7 day moving average in testing which does show a rise in positive tests after June 24.  I did not have the same access to 
testing data that Johns-Hopkins had.  I used publicly available data that the PA health department provided beginning on April 17.  

The graph below shows the comparison of the positive testing rate (in the red line) to the testing rate as a percentage of the population for six Johnstown zip codes, the overall rates for Greater Johnstown, Cambria County, Pennsylvania, and the US. These rates are cumulative.  The numbers at the top of each bar are the cumulative testing rates as a percentage of the population.  It's interesting that the state has a higher cumulative positive rate (10.08%) than its testing rate (7.57%).  Johns-Hopkins testing tracker has the state ranked 47th in the testing rates over the last two weeks (1.1 per 1,000) while it ranks 37th in the two week average positive testing rate (5.5%).  At the bottom is a summary of testing data for Pennsylvania.  How one frames the statistics makes all the difference.

**Related Posts**

Friday, April 10, 2020

The Number of Corona Virus Cases in Cambria County has Grown Exponentially While Health Behaviors Predict Cases in PA


The number of corona virus cases has grown exponentially in Cambria County.  I have been keeping track of the number of cases in a google sheet as can be seen above.  The cumulative case line has been following a cubic trend with the polynomial, y = 0.0347x2 - 3051.6x + 7E+07.  This equation accounts for 98.5% of the variability in the solid trend line.  

Two weeks ago I correlated the number of COVID-19 cases at the county level in Pennsylvania with the county health ranking for that county using Poisson regression.  This week I thought I would take a look at the submeasures for the rankings with the case and death numbers from April 8.  Population numbers for each county were added so that Philadelphia county could be added.

Number of Corona Cases

Corona Deaths 

Length of Life   Z-Score



Quality of Life Z-Score



Health Behavior Z-Score



Clinical Care   Z-Score



Social Economic   Z-Score



Physical Environment Z-Score



Number of Corona Cases



Corona Deaths 






The table above shows the univariate correlations of the submeasures with Philadelphia included.  For the number of cases, the quality of life z score (part of the health outcomes ranking) and the social economic z score (with the health factor ranking) were correlated.  For the number of deaths, quality of life, social economic, and physical environment (part of health factors) were correlated. Z scores are numbers scaled so that the mean is zero and 

For the case numbers, three of the county health ranking submeasures were significantly associated with the outcome along with population.  The poisson regression equation is given by:

ln(number of cases) = 4.15 -5.91*(health behavior z-score)  + 4.31*(social economic z score) - 0.74*(length of life z score) + 0.000002*(population)

This means that the number of cases increases as the health behavior and length of life z scores improve and (a negative score is better).  The number of cases decrease as the social economic z score improves.  Ln is the natural logarithm of the number of cases.

For the number of deaths in each county as of April 8, three submeasures were significantly associated with the number of cases.  The poisson regression equation is given by:

ln(number of deaths) = -0.14 - 7.97*(health behavior z-score) + 2.83*(social economic z score) + 1.62*(quality of life z score) + 0.000003*(population)

Like the number of cases, the natural logarithm of the predicted number of deaths at the county level increase as the health behavior z score decreases.  The predicted number of deaths decrease as the social economic, quality of life z scores, and population decrease.  

Adding multiple predictors often leads to variables that were not significant univariately to being significant in a multiple regression model, especially after population is adjusted for.  In the graphs above we see that Philadelphia county is an extreme outlier.  This is mostly due to its population.  Adding population to the model helps to negate its outlier effect.

These submeasures are themselves composites of dozens of county level statistics.  The next step is to look at these individual measures and the up to date counts of COVID-19 cases and deaths.

**Related Posts**

Friday, December 19, 2014

My 15 Minutes (5 really) of Fame at the Warhol

I did a screen test at the Warhol Museum.  He said "everyone has their 15 minutes of Fame."  The screen test was five minutes with no sound.  There is no point to the video.  Just like the countless others who have posted.  He did make a point with this image of Nixon which got him audited by the IRS but that is as close to offering a solution in his work as he got.

Warhol's art pokes fun at our culture and it's obsession with stuff.  He offered no solutions which is why I present the video below which does.

**Related Posts**

Podcamp Session Feedback Part 2-The Video of My Session Has Been Posted

PodCamp 6 Interview


A Kinder, Gentler Looney Tunes


What is Sanity?

Sunday, September 28, 2014

The Fourth Year of CSI wo DB

It's that time of year again. To review the top posts the of the last year and how the traffic has changed according to Google Analytics.  The total number of sessions has decreased by 4.55% but the number of users of this site has increased by 11.03%.  The top post from the last year are seen below.

Sep 29, 2013 - Sep 28, 2014
Sep 29, 2012 - Sep 28, 2013
29.4%70.6%New Visitor2,513 Sessions (70.6%)

10. First Time I Heard Multivariate Analysis and Multicollinearity on Mentioned on TV

A  humorous look at statistical issues thanks to the Daily Show.  

9. Two Years Ago in Stanton Heights

This post from 2011 still gets some traffic on the shooting of three police officers in my neighborhood by a right wing radical.

8. Income and Life Expectancy. What does it Tell Us About US?

My all time most read post on life expectancy and income thanks to a link on the BBC website for the program The Joy of Stats.  The link is not there anymore but it still gets some traffic.

7. Global Warming, Wikileaks, and Statistics: What Barry Sanders Can Teach Us

The second all time most read post using sports statistics to explain a complicated phenomenon like global warming received a few more views than the first all time most read.

6. Hitler, Napoleon, and Stalin: Outsider Despots

This post is from this year on the history of three outsiders who exploited power vacuums to become absolute rulers of their countries.  Their similarities and differences are described.

5.  The World Wars and Today's Wars

This post is related to the number 6 post on this list as it is the 100th anniversary of the First World War and many of today's problems in the Middle East are tied to what happened 100 years ago.

4.  Bullying & Society

A post from 2010 where I argued that bullying is a reflection of society's greater ills. 

3. A Geographical Represenation of the Mode and Ethnicity

A post from last November on ethnicity in the United States and how it corresponds to other regional differences.

2. Correlation with the Number of Hate Groups per Million, Poor Health Suggests More Hate 

A look at the concentration of hate groups in each state and health outcomes.

1. A Wave of Hate Groups in California? No in Washington, DC

This post managed to make the all time most viewed list.  The number of hate groups in the US in each state is standardized by the size of each state's population.  The results are surprising.

**Related Posts**

Three Years of CSI Without Dead Bodies


The Second Year of CSI without Dead Bodies


One Year of CSI Without Dead Bodies


My (Quarter Year of) Blogging in Review

CSI senza cadavere (my first post)


Friday, March 21, 2014

Correlation with the Number of Hate Groups per Million, Poor Health Suggests More Hate

This is a follow up on the last post on the number of hate groups (such as the Ku Klux Klan and the Westboro Baptist Church) in each state that are being watched by the Southern Poverty Law Center.  Some may not agree with the inclusion of African American separatists like the Nation of Islam.  If these groups are excluded from the national total (115 out of 939). Computing the population adjusted rate per million gives a rate of 2.62 groups per million for the US.

The state with the highest previous rate of 23.72 groups per million was the District of Columbia.  One possible criticism is that they have a large African American population and that they are not technically a state.  If the four black separatist groups in DC are excluded from their total of 15, it still has a rate of 17.40 groups per million which is well above the national rate.  I decided to look at which other state level variables are correlated with the rate of hate groups in each state.

I combined this data set with a state level health and income data set and several of them are significantly correlated with the health measures.  The strongest of these effects was the one between infant mortality and hate groups per million accounting for 40.9% of the variability.  In the chart on the left, DC is an outlier on both variables. 

The correlation was rerun with DC excluded.  The relationship was still significant but with 12.3% of the variability accounted.  This indicates that the relationship is weaker with DC excluded but still present.

The relationship between hate groups and state level life expectancy was also significant with 29.4% of the variability accounted in a negative relationship where as the number of hate groups increases, the state's life expectancy decreases.  Like the previous graph, DC is an outlier on hate groups per million.  When DC is removed from the graph, 30.2% of the variability is accounted for in a relationship that is still negative.  This suggests that  DC has high influence but is not poorly fit to the data.

There was no significant correlation between state level per capita income and the rate of hate groups.  Other health related outcomes were significantly associated.  These individual correlations are not described in detail here for space considerations.

There is a more advanced method that can identify clusters of highly correlated variables.  It is called factor analysis.  There were two factors extracted which account for 68.8 % of the variability.  They are presented in the table below.

Rotated Factor Matrixa

(46% of var explained)
(22% of var explained)
Infant Mortality 2007 Deaths/1000

Life Expectancy
% Low Birthweight Babies
Hate Groups per million

Percent under age 65 in 200% of Poverty
Percent Uninsured in Demographic Group for All Income Levels
Expanding medicaid

Extraction Method: Principal Axis Factoring.
 Rotation Method: Varimax with Kaiser Normalization.a
a. Rotation converged in 3 iterations.

The first factor extracted has the health related variables loading on it and accounts for 46% of the total variance.  Infant mortality, life expectancy, % low birth weight babies, and the rate of hate groups load most strongly on this factor.  Percent within 200% of poverty, income, and % uninsured load most strongly on the second extracted factor (called an income factor) while accounting for 22% of the variability.  

The hate group rate does not load on the income factor but it does on the health suggesting an association with health related outcomes.  One must always be careful about inferring a cause and effect relationship based on correlational data. When DC was removed, the factor analysis did not run.


Mark Potok of the Southern Poverty Law Center discusses the rise in hate groups and the prominence of Overland, Kansas shooter Frazier Glenn Miller.  Missouri, where Miller was living, had a rate of 3.82 hate groups per million and has life expectancy of 76.8 years with a ranking of 38th .  Kansas had a rate of 1.73 hate groups per million with a life expectancy rating of  and a ranking of 27th.

**Related Posts**


A Wave of Hate Groups in California? No in Washington, DC


How do the States Stack Up on Infant Mortality? (Cross Post with PUSH)

A Statistical Profile of the Uninsured in Washington, DC, New Mexico, and Texas