It seems like a pretty simple question. But the answer is not so simple. Just like my earlier post on Case Fatality Rates, an accurate figure is made much more complicated by the high rate of mild and asymptomatic cases of the Coronavirus that often go undetected. So what are some of the ways we might go about calculating the number of Americans who have been infected by the Coronavirus? We will explore three possibilities.
- Count of detected cases resulting from COVID-19 tests
- Sample based anti-body tests
- Extrapolation from number of deaths using an estimated Infection Fatality Rate
Count of Positive COVID-19 Tests
A little over a week ago multiple news outlets have made mention of the fact that the number of known Coronavirus infections in the United States had passed the 10 million mark (see figure 1). Of course in an period of exponential growth in infections, any news is old news. More recently we crossed the eleven million positive cases. Eleven million is certainly a large number and represents around 3% of the entire US population. But most public health experts will say that this number vastly underestimates the number of people who have been infected.
The data in figure 1, shows over 10 million cases as of November 8th. This data is form Worldometer.com. Worldometer defines cases as the cumulative count of detected and laboratory confirmed positive cases, though the count may also include presumptive or probable cases.
The problem with the count of these cases is twofold. During the initial phase of the outbreak in the United States there was far too little testing going on. Probably due to the lack of available tests, the CDC provided guidelines for who should be tested that were severely restrictive. In the early weeks of the pandemic, you had to have been stuck in an elevator with 20 COVID-19 patients in Wuhan China to meet testing guidelines. Even people with un-explained flu-like symptoms could not get testing unless they have had recently travelled in international hotspots.
Later, as the availability of testing increased it became much easier for symptomatic patients to be tested. But we know that a large percentage of people who get infected have very mild symptoms or are entirely asymptomatic. In my home state of Massachusetts, free testing for asymptomatic members of the community only became widespread in August, and that was still limited to communities that had relatively high infection rates.
Even with free testing of asymptomatic individuals, our current rate of tests, over 1 millions per day, cannot possibly detect all the asymptomatic cases occurring among the 331 million people in our country.
So as shocking as the daily increases in detected infections are, they still under represent the true number of infections occurring in the community. This under representation is probably quite large. Could we just apply some sort of multiplier to the daily cases to better estimate the true number of infections? For example maybe we decide that for every confirmed case detected there are 4 other cases that were not detected, so we could multiple the daily cases by 5 to get a more accurate number. The problem here is: how would be calculate what that multiplier should be? Also its very likely that multiplier would change over time, with it being very high in the early days of the pandemic and gradually declining as testing has expanded.
The daily cases is a useful number, as it does allow short term predictability. As new cases precede hospitalizations and deaths by fairly predictable intervals, the new case count helps government and health care officials plan for changes in health system demands. But it does not tell us with much precision the total number of infections going on in the community.
Early on it was thought that it would be possible to conduct anti-body tests on population samples to estimate the number of people who had been infected. Typically, when a person’s immune system encounters a new viral infection, the body produces virus specific anti-bodies. These then help the immune system find, disable and eventually remove viruses from our bodies. Typically once we have recovered, these virus specific antibodies remain in our system and can be tested for using an “anti-body” test.
In April, New York State conducted sample based anti-body tests on 3000 people and found nearly 14% of the population tested positive. When applied to the entire population of New York State this indicated that 10 times more people in New York had been infected than based on simply counting the number of positive COVID-19 tests. For more details about New York State’s anti-body testing results see my blog. However, anti-body tests have been plagued with low reliability making them not useful for individuals to make important decisions. If you took such a test and decided to visit your elderly parents because the test indicated you had already had the coronavirus, there is a significant chance that your test result was wrong, which could impact the lives of your elderly parents.
Sweden, which was at least initially pursuing a herd immunity strategy for dealing with pandemic (see my post for more discussion of herd immunity). Also conducted an anti-body test in Stockholm looking for evidence of wide spread immunity. They were disappointed as only around 14% of the population tested came back positive. This was far short of the 40 to 50% they were hoping for.
The CDC conducted a series of antibody prevalance studies starting in April. Four waves of data were collected in 10 locations in the United States. The Commercial Laboratory Seroprevalence Survey Data found that estimates of the number of people infected based on antibody tests were often 10 or more times higher than estimates based on counts of positive virus tests.
If antibody immunity is long lasting, we would expect the 4 time periods in the 10 locations to always show an increase in the number of people with antibodies. But there are a number of examples in the CDC study where the seroprevelance of antibodies declined between testing time periods. Figure 2 from the CDC study shows how at the Missouri site, estimated seroprevalance as time period 1 and 2 were actually higher than at period 3 &4. Graph shows estimated level as a range in light and dark blue, with the point estimate being at the point where light blue turns to dark blue. So for example the second time period in late May had an estimate of 2.8% (range of 1.7 to 4.1), while the third time period in mid June only had an estimate of 0.8% (range 0.6 to 1.8%).
This lack of a consistent increase in the prevalence of antibodies over time where more and more people were testing positive to the virus suggests a couple possible causes. Either the antibody test is very unreliable and/or the durability of antibodies generated by infection is very short lived. Short lived antibody response could represent bad news for the ability of vaccines to provide long term protection.
Sadly the CDC did not continue this study beyond the 4 time periods shown in figure 2.
In more recent months there has been a growing discussion of two different types of immunity. There is cell mediated immunity and antibody mediated immunity. Here is a good, pre-COVID, article on the issue. While this article looks specifically at COVID. I also highly recommend the Medcram video series on COVID-19. They have an excellent episode on cell vs antibody mediated immunity. The take home message is that not all COVID-19 infections may be generating an antibody response, or perhaps the anti-body response is short lived. But people can still show cell-mediated immune response to COVID-19 without having any antibodies. It has also been suggested that people who have milder or asymptomatic infections are less likely to show a strong anti-body response, but still have a cell-mediated response which can protect them from future infection.
Testing for cell-mediated immunity is more complex, time consuming and expensive than testing for antibodies. All of this undermines the idea that we could figure out how many people have had the disease simply by doing antibody tests on representative population samples.
Extrapolation based on Infection Fatality Rate
Another approach to estimating the number of infections in the US is to work backwards from the number of deaths using the Infection Fatality Rate (IFR). IFR is a measure of the number of people who die after being infected with Coronavirus. This is often confused with the Case Fatality Rate (CFR), but there is an important difference. CRF only count people who have tested positive for the virus in the denominator. This is a number that has changed dramatically over time as discussed in the first section of the post as well as my post on the subject. While the IFR uses the “true” number of infections in the denominator. Of course without everyone being tested everyday we don’t have a true number of infections in the country and many infections go undetected because of mild or asymptomatic infections. But scientists have used various methods to attempt to estimate the IFR. In September the CDC updated their estimate of the IFR in US to be .65%. While in August the WHO has put the IFR in the range of 0.5 to 1.0%. Since much of the data for these estimates were based on the early days of the pandemic, and we know in hospital deaths have declined with improved approaches to COVID-19 patients, I will use the low end of the WHO estimate. I also chose to use .5% because it makes the math easier! An IFR of 0.5% means that for every two hundred infections we can expect one death. Or the logic can be used backwards to estimate the number of infections that likely happened to cause a specific number of deaths.
Deaths x 200 = probable number of infections in the community
Since the start of the pandemic the US has seen 250,000 deaths. Therefor it is likely that 250,000 * 200 = 50,000,000 people in the United States have been infected by the Coronavirus. While data from the cumulative positive PCR tests for the virus is at about 11.5 million. This indicates that we have had more than 4 times the number of infections indicated by the cumulative number of positive PCR tests.
Fifty million infection represents a little over 15% of the population of the United States. This approach can also be used on a state by state basis. In New York:
34,000 deaths * 200 = 6.8 million infections / 19.5 million population = 34%
This would indicate that in New York State, 34% of the population may already have had the infection (and perhaps be immune). North Dakota works out to be about 20%, while Vermont is less than 2%. To see how your state is doing by this measure see my post: How many infections in your state?
The approach of extrapolating total infections based on deaths and an estimate of the Infection Fatality Rate has the advantage of being simple to calculate for entire nations, or even States. However, there are several weaknesses. One is that we know the Infection Fatality varies widely based on age, and this approach does not take that into account. Using the deaths * 200 formula to calculate infections assumes those infections come from some sort of mythical average population. If the infections behind any number of deaths are disproportionally younger healthier people, then the Infection to Death ratio will be higher, that is more infections per death. If the infections behind a group of deaths disproportionally include older, congregate care or people with underlying conditions, then the Infection to Death ratio will be lower. So this approach is more likely to be more accurate with larger groups over longer periods of time. A second major weakness is the estimate of the IFR itself. If, for example, improvements in hospital care have reduced deaths, the IFR might be lower, say 0.4%, which would make for a 250:1 infection to death ration. If on the other hand, if the IFR is on the high end of the WHO estimate (1%) then there are only 100 infections per death.
Only counting people who have tested positive for the coronavirus clearly results in an an under estimate of the true number of people who have been infected with this virus. Using antibody tests supports the finding that positive PCR tests underestimate the number of infections, but the reliability, and the complexity of the human immune response to COVID-19 has rendered them impractical, for now, for accurately estimating the true number of infections in the community. Lastly extrapolation of infections from deaths and the estimated Infection Fatality Rate provides an easy and consistent method of estimating true infections, though there remain weaknesses in this approach as well.