Showing posts with label Big Data. Show all posts
Showing posts with label Big Data. Show all posts

Saturday, June 13, 2020

Update on Johnstown Zip Code Testing




























Last week I reported that my most viewed post on COVID-19 was on zip code testing in Johnstown zip codes compared to the county, state and the U.S I reported that the testing rates for the zip code for downtown Johnstown (15901) was nearly identical to the U.S. rates while the rates for the other zip codes and the county lagged behind the state and the U.S. rates.  I reposted the chart from that post above. 

 

As I was tracking the testing rates I noticed that the pattern was changing in the testing rates.  You can see that pattern in the bar chart above from the Google Sheets I put together for the county.  The rate for 15901 is still the highest in the city but it is falling behind the U.S. rate. The 15902 zip code has been inching upwards while the 15909 zip code lags behind the others and the county.  


I updated the line chart at the top of the post with the trend in testing up to the current date.  I got rid of the data table to make it more readable. In the chart we see that the testing rate for 15901 started to fall behind the U.S. rate on May 29. The post for the chart at the top was on May 17.  

The 15902 zip code (red dotted line) testing rate has nudged ahead of the state rate (solid black line.  Finally we see that the 15909 zip code has fallen further behind the other zip codes and the state and county overall in the testing rates on May 26.  

The number of COVID-19 cases in the county on May 17 was 54.  Today (27 days later) it is 61.  The 15902 zip code (Hornerstown and Moxham) now has a cluster of five confirmed cases and between 1 and 4 probable cases (the exact number is not released for privacy concerns) while 15901 has between 1 and 4 confirmed cases according to the PA Department of Health.  
Does the lag in testing account for the decrease in the number of new cases for the county?  I suppose only God knows for sure.  The COVID cast website shows a decrease in the indicators that they use to forecast the future cases from may 17 to May 31.  These indicators include doctor visits with COVID-19 symptoms, google search data, and Facebook search data.  After May 31 the indicators has leveled off which suggests that a surge in cases is not imminent.  Dr. Fauci has warned that a surge could happen as the states have been reopening.  So far it has yet to materialize in Cambria County.  Time will tell.


**Related Posts**

Friday, May 1, 2020

Extreme Outliers in Pennsylvania: Beaver and Montour Counties

COVIDcast map for Beaver County

Today Governor Wolf has announced that 24 counties in will have restrictions limited on May 8.  The state uses the COVID-cast analytics tool from Carnegie Mellon University to predict future cases in each county, according to state health secretary Rachel Levine.  Their tool uses doctor visit data and data from Google and Facebook to make this prediction.  One county that is not included in the map at the top is Beaver county which is to the northwest of Pittsburgh.  

Beaver County

According to the health department, Beaver County currently has 426 cases and 67 deaths. This gives is a case mortality rate of 15.73% which is 6% higher than the next highest counties (Susquehanna and Wyoming).  The health department has another table showing that 297 out of 426 cases (69.7%) are located in three nursing homes in the county.  Also 60 out of the 67 deaths (89.6%) in the county are in these nursing homes.  These cases and deaths include staff and residents at these homes.  A report by WTAE states that the Brighton Rehabilitation Facility is one of the homes with a large number of cases.  

The health department's zipcode map shows that 276 of the 426 cases (64.8%) are located in the zip code 15009 which is in Beaver, PA.  It does not state how many deaths are in Beaver.  Brighton rehab center is located in Beaver, PA.  The COVIDcast website predicts that there will be a steady number of future cases.  To date, 2,511 tests have been conducted in the county for a rate of 1.52%.  In County Health Rankings, Beaver County ranked 44th in health outcomes and 56th in health factors in the state in 2019.

Montour County

Included in the Governor's easing of restrictions is Montour County.  It currently has 48 cases and zero deaths.  The statistic that jumps out at me is that 3,006 tests have been conducted there in a county with about 18,240 inhabitants.  This gives a testing rate of 16.5% which is by far the highest in the state (a person can be tested more than once).  All other counties have testing rates less than 3%.  In County Health Rankings, the county ranked 10th in health outcomes and sixth in health factors in the state in 2019.

The county has no cases in nursing homes.  Although it has 11% of Beaver County's population 495 more tests for the virus have been conducted there.  I don't have a good explanation for their testing numbers.  It would be a good question to ask Secretary Rachel Levine.  



The increased testing for the virus does allow authorities to pinpoint where the cases are and better enables them to contain the spread.  The field of epidemiology began with John Snow in London during a cholera outbreak in 1854.  Snow was a physician who was looking at the number of cholera cases.  He found that most of them were coming from those who got their drinking water from a well that was drawing water from a part of the Thames river that was contaminated with raw sewage.  He took the handle off of the well and the outbreak eventually ended while no one know about how germs caused the disease.  We have better methods for tracking the disease now but we still need to place facts over fears.  I have been tracking the cases for Cambria County and other counties for comparison.

**Related Posts**

Friday, April 10, 2020

The Number of Corona Virus Cases in Cambria County has Grown Exponentially While Health Behaviors Predict Cases in PA

 

The number of corona virus cases has grown exponentially in Cambria County.  I have been keeping track of the number of cases in a google sheet as can be seen above.  The cumulative case line has been following a cubic trend with the polynomial, y = 0.0347x2 - 3051.6x + 7E+07.  This equation accounts for 98.5% of the variability in the solid trend line.  

Two weeks ago I correlated the number of COVID-19 cases at the county level in Pennsylvania with the county health ranking for that county using Poisson regression.  This week I thought I would take a look at the submeasures for the rankings with the case and death numbers from April 8.  Population numbers for each county were added so that Philadelphia county could be added.

Number of Corona Cases

Corona Deaths 

Length of Life   Z-Score

0.046

0.067

Quality of Life Z-Score

0.286

0.284

Health Behavior Z-Score

-0.038

0.065

Clinical Care   Z-Score

-0.059

0.114

Social Economic   Z-Score

0.301

0.412

Physical Environment Z-Score

0.062

-0.449

Number of Corona Cases

1.000

0.957

Corona Deaths 

0.957

1.000

population

0.841

0.792


The table above shows the univariate correlations of the submeasures with Philadelphia included.  For the number of cases, the quality of life z score (part of the health outcomes ranking) and the social economic z score (with the health factor ranking) were correlated.  For the number of deaths, quality of life, social economic, and physical environment (part of health factors) were correlated. Z scores are numbers scaled so that the mean is zero and 

For the case numbers, three of the county health ranking submeasures were significantly associated with the outcome along with population.  The poisson regression equation is given by:

ln(number of cases) = 4.15 -5.91*(health behavior z-score)  + 4.31*(social economic z score) - 0.74*(length of life z score) + 0.000002*(population)

This means that the number of cases increases as the health behavior and length of life z scores improve and (a negative score is better).  The number of cases decrease as the social economic z score improves.  Ln is the natural logarithm of the number of cases.

For the number of deaths in each county as of April 8, three submeasures were significantly associated with the number of cases.  The poisson regression equation is given by:

ln(number of deaths) = -0.14 - 7.97*(health behavior z-score) + 2.83*(social economic z score) + 1.62*(quality of life z score) + 0.000003*(population)

Like the number of cases, the natural logarithm of the predicted number of deaths at the county level increase as the health behavior z score decreases.  The predicted number of deaths decrease as the social economic, quality of life z scores, and population decrease.  




Adding multiple predictors often leads to variables that were not significant univariately to being significant in a multiple regression model, especially after population is adjusted for.  In the graphs above we see that Philadelphia county is an extreme outlier.  This is mostly due to its population.  Adding population to the model helps to negate its outlier effect.

These submeasures are themselves composites of dozens of county level statistics.  The next step is to look at these individual measures and the up to date counts of COVID-19 cases and deaths.

**Related Posts**

Saturday, March 21, 2020

Corona Numbers from the WHO and JHU

Week 2 of the state of emergency is upon us.  Last week I posted numbers from the World Heath Organization's (WHO) Corona Virus dashboard showing the progression in the number of cases worldwide.  Above is the cumulative frequency graph and overall numbers from their dashboard last week.  

At the left is the WHO's world wide cumulative frequency chart from today.  The number of cases increased by 123,535 or 87%.  The number of deaths increased by 5,793 or 107%.  

The two curves show that after the curve was starting to flatten out (mostly in China), it began to increase exponentially in early March everywhere but China.  The mortality rate for the total number of cases a week ago was 3.8%.  Currently the rate is 4.2% with it being the highest in Italy where it is 8.6%.

Johns-Hopkins University (JHU) also has a dashboard that is frequently cited in the news media.  Their reported numbers are higher than the WHO's. They report 287,238 cases worldwide with 11,942 deaths.  The mortality rate according to their numbers is also 4.2%.  




The JHU dashboard also provides the numbers of people who have recovered from the virus.  According to them, 89,899 have recovered worldwide from the virus or 31% of the total cases. The graph above shows the cumulative number of cases in mainland China (orange line) and the cumulative number of cases everywhere else (yellow line).  We can see that the yellow line is still growing while china's line has flattened.  The number of recovered cases has been growing steadily.


Top 10 Countries with Cases according to WHO and JHU and the differences between numbers
World health Organization
Johns Hopkins University
Difference Between Countries
China 
81416
China
81304
China 
112
Italy 
47021
Italy
47021
Italy 
0
Spain 
19980
Spain
25374
Spain 
-5394
Iran
19644
Germany
21652
Iran
-966
Germany 
18323
Iran
20610
Germany 
-3329
US
15219
US
19931
US
-4712
France 
12475
France
12483
France 
-8
S Korea 
8799
S Korea
8799
S Korea 
0
Switzerland 
4840
Switzerland
6113
Switzerland 
-1273
United Kingdom 
3983
United Kingdom
4014
United Kingdom 
-31

The numbers for each country differ between the two dashboards.  The top 10 countries with cases are presented above with the number of cases reported.  The two dashboards agree on the countries in the top 10.  The order that the countries are ranked agree except for Iran and Germany which are flipped for fourth and fifth.  The two dashboards agree on the number of cases only for Italy and South Korea.  The greatest discrepancies for the two dashboards are for Spain (-5,394), the US (-4,712), and Germany (-3,329) with JHU having more cases.

Certainly there is confusion keeping track of all the cases in a world with more than 7 billion inhabitants.  The countries most affected are some of the richest and most powerful in Asia, Europe, and North America with the exception of Iran.  Until the curve of new cases flattens out the state of emergency is likely to continue.  The video below gives a good summary of how epidemics and pandemics progress.



**Related Posts**

Cambridge Analytica was Behind the Results I Found on Facebook and the 2016 Election


Saturday, September 28, 2019

9th Anniversary Post: Evaluating the Domain Name

The 9th anniversary of CSI without Dead Bodies is upon us/me.  It has been a hectic year where I purchased a domain name for this blog.  The number of users per month for the last year when the domain name was bought are summarized in the blue line in the graph below.  The number of users for the previous year are summarized in the orange line below.


Overall there was a 128.8% increase in the number of users in the past year (mostly in the first three months of last year).  Likewise there was a 130.21% and a 129.06% increase in the number of new users and sessions respectively.  There was a slight increase in the number of sessions per user (0.11%).  However there was only a 20.65% increase in the number of pageviews and a 47.33% decrease in the number of pages per session.  There was a smaller decrease in the average session duration but a larger increase in the bounce rate.  This suggests that while the number of users has increased, the level of engagement hasn't.  

Looking at the countries where the users came from, there was a 4,426.09% increase from India, a 2,400% increase from Nigeria, a 2,000% increase from Bangladesh but an 85.13% increase from the United States.  Looking at the United States, there was a one second increase in the average session duration.  There was a decrease in the pages per session in the US but It was higher there (1.85 pages per session) than it was in the page overall (1.71 pages per session).  This suggests that the low engagement comes from outside the US which makes sense as most of my posts are about the US.

I counted 50 posts to the blog in the last year.  With 415 total posts on the blog over the last nine years, that averages out to 46.1 posts per year so it has been a productive year.  I will continue the domain name from google to promote my blog. On to the tenth year.


**Related Posts**


CSI senza cadaveri (First Post)


The 5th Anniversary of CSI wo DB: Top 25 All Time Posts