COVID-19 In Kentucky - The Data / by Andrew Wyllie

NOTE: This analysis is a work in progress. This first post discusses the potential data sources that could be used to build a model of the COVID-19 outbreak.

NOTE 2: The most up to date of the map animation below can be found on our covid-19 project page

Most people are following the news about the COVID-19 virus as it makes it’s impact all over the world. While it’s fairly easy to find the latest counts of confirmed cases and deaths for most countries and in the US the totals for individual states, it’s still hard to get an idea of how the virus is spreading across a particular region and how at risk individuals in those areas might have. This article is about tracking the spread of the disease in the state of Kentucky which, as of right now, has one of the lowest rates on infection in the US based on the rate of infection by capita.

The animation below shows how the virus is spreading across the state by showing the number of confirmed cases in each county. From March 22, 2020 to April 7, 2020 (16 days) the number of confirmed cases has gone from 64 to 933, doubling roughly four times or once every four days.

ky-covid.gif

By understanding how the virus spreads and how rapidly the number new cases are appearing we can get a sense of where and when extra resources are needed. From a public policy perspective, government officials need to determine when to shut things down, and maybe even harder, when to open things up again.

The Data

Unfortunately, the data available to make these decisions is sparse for a multitude of reasons. We don’t have enough testing capacity to test a wide cross section of the population due to lack of test kits. This means that is some cases we are only testing who have severe enough symptoms severe enough that, if they test positive, would be considered for hospital admission(1) . The tests will not necessarily be run on people arriving at the hospital with severe symptoms unless the results of the test will change the course of treatment for the patient. To make matters worse from a data perspective, the while the accuracy of the current PCR test is well understood, these types of tests only tell us if a patient is currently contagious. While this information is critical to the patient, it’s less useful from a data modeling perspective as we are more interested in estimating the total number of cases, this would include people who have already recovered from the virus or who are currently infected but are asymptomatic.

We don’t necessarily have an accurate count of the number of deaths the disease can exacerbate other conditions that cause death but a test will not necessarily be run on the deceased since the tests are limited(2).

There are other reasons that our data is messy. Overburdened hospital staff are focused on treatments and saving lives while recording accurate data for every single patient becomes a secondary concern. This is by no means a new problem though, for example of the yearly deaths due to the flu in the US is not really known or completely understood until the flu season is over(3). The general idea is that the CDC looks at number of different sources to build models that estimate the impact of the disease. While this is important information, these models do not provide much help when trying to identify what is happening during an ongoing outbreak with a new disease.

One reliable data point we do have is the number of tests that have been administered and what the results of those tests are. This is potentially helpful if the weak assumption is made that the test criteria are uniform across the country. The idea is that people are self identifying whether they need to be tested or not, medical professionals are then determining whether the symptoms are sever enough to warrant testing. Still lots of bias but the idea is that we get an idea of what part of the population feels that they are sick with the virus and then the percentage of that group that actually have the disease. There are still some issues though. There will be people that are very nervous about having the disease that can convince a healthcare professional that they should be tested. There is also an issue of multiple tests on the same patient. For example, a hospitalized patient would have tested positive to be admitted, and then be tested a number of times afterwards to make sure they are not contagious when they recover.

Other Considerations

We do have data from other countries that are ahead of us as the number of daily new cases is decreasing. In the same way, within the U.S. we will see some areas initially hit hard (like NYC) while other parts of the country are still waiting for the outbreak to peak - like Kentucky. We have demographic information, including age groups, health status, smoking rates, etc. that may help create a risk profile. We do have some Kentucky specific issues though like black lung disease in coal mining areas, high obesity rates, and the political factor where some percentage of the population is not convinced that the outbreak is worth worrying about and therefor not respecting the social distancing and lockdown rules.

Next Steps

There are a number of directions to go with the data we have:

  • build a model that predicts how fast the outbreak is moving across the state, specifically looking at resource constraints at the peak of infection in different areas.

  • see if we can identify the effects of social distancing and lockdown directives

  • create a Kentucky specific risk profile

References

  1. Evaluating and Testing Persons for Coronavirus Disease 2019 (COVID-19)”. U.S. Department of Health & Human Services, CDC

  2. Sarah Kliff, Julie Bosman (7 April 2020). “Official Counts Understate the U.S. Coronavirus Death Toll”. The New York Times

  3. How CDC Estimates the Burden of Seasonal Influenza in the U.S.”. U.S. Department of Health & Human Services, CDC