The Simple Math behind Pool Testing

6 min readJul 5, 2020

Why is pool testing cost efficient and what is the optimal pool size?

As the global population enters a new phase in the pandemic control efforts with expanding need for testing, Dr. Fauci suggested that pool testing can be used to increase efficiency. The concept may not be new to many, but mathematically speaking, what exactly are we expecting to save?

The Mathematics behind Pool Testing

Let’s take a look at the mathematical model to understand what pool testing is trying to achieve: say there are n people in total that needs a test (for instance, n ≈ 300,000,000 if almost everybody in the US would like to take a test), for simplest case assume everyone just needs to be tested once and the tests are always accurate. This means a total of 300,000,000 tests are required given we have found a way to collect all the samples.

However, this might not be feasible in reality in terms of capacity, which is incurred by all sorts of resources needed from the medical supplies that needs to be manufactured, to the number of testing facilities suited with necessary equipment, to the availability of qualified personnel who is well-trained to read the results. Moreover, each test has some waiting time.

According to CDC reported numbers, the total tests administered in the US from when the pandemic started to spread around early March to July 4, 2020 is around one tenth of the total population. See below:

https://www.cdc.gov/coronavirus/2019-ncov/cases-updates/testing-in-us.html

At this pace, it will take another 9 times 4 months = 3 years to get everyone else tested. This is where pool testing could potentially play a role to help speed up. Suppose we choose pool size k, which means we put the samples collected from k different people together and administer the next step tests as one. If the combined test result comes back negative, then we know none of the people tested has contracted the virus. However, if the result comes back positive, we are not sure at this point which of the tested has it or not. Then we will need to collect a second round of samples and administer individual tests on each of them to get the final results.

The reason why this is more efficient compared to given an individual test to everyone in the first place is that the need for capacity in the first round is reduced in large fractions. And there is a certain probability level that we don’t need to give a test to every group for a second round. But how do you know the efficiency created will overwhelm the extra cost incurred by those who do need a second test?

Now we use below calculations to show under what circumstances the pool testing are more likely to be efficient. The number of tests needed for the first round is n/k (k = 2, 3, 4 …), and we need the number of tests needed for the second round to add them up and compare to the initial population size n. Say there is a probability p for each person who gets tested to be actually infected. Now, in a group of k people, the likelihood that a certain number of people are actually infected follows Binomial Distribution B(k, p). This means, the probability for a group to have a second test is the probability that at least one person in this group is infected:

P(k≥1)=1- P(k=0)=1- (1-p)^k

The expected value of total tests needed for the second round is:

n/k groups ⋅ probability 1- (1-p)^k for a second test ⋅ k people in the group

which is simplified to n(1-(1-p)^k) since k cancels out. So the number of total of tests needed for both rounds included is a function of k, given n and p are somewhat fixed values.

Total Tests = n/k + n(1-(1-p)^k)

This gives us a motivation to find out what pool size we should choose to minimize the resources required. One can observe that the first term of the function decreases quickly as k increases, whereas the second terms grows at exponential rate on a small base number as k increase. Take n = 300,000,000 and p = 0.2 (arbitrarily selected value), the below plot shows the different numbers of total tests needed if we choose different pool size k:

https://www.desmos.com/calculator/4io0n9hfln

Looking at the three points near the bottom of the curve, given the above parameters, choose k = 3 should yield the most potential, with a near 18% capacity saving, whereas when k is as high as 10, the total tests needed is almost the same as before. Now we know that the pool testing might save efforts to a certain degree only if we pick the right pool size. Still, even with the 18% efficiency gain, there are still many people waiting in line to be tested, what else is there that can be done to shift this curve?

It turns out the function is even more sensitive to the value of p, the base number of the exponential term. Suppose everybody wears masks and uses good manners for social distancing, etc. and the probability of infection in this universal testing effort has dropped by half to 10%. What this does to the total number of tests needed is dramatic:

https://www.desmos.com/calculator/dqzfbtafdg

With probability 0.1, we can see that the optimal pool size is now 4, with about 41% capacity saving. This tells us maybe the pool testing will do better if only largely used on the asymptomatic population that collectively have a much smaller probability of infection. Furthermore, the target population can be triaged into those who have been strictly social distancing themselves and those who have frequent contact with others even though themselves are not appearing sick, in order to better optimize this model.

The numbers used in this article are hypothetical with many assumptions, such as 1) efforts needed to collect second round samples can be neglected for modeling (when in reality this would be costly and should be considered in the model); 2) medical resources can be freely optimized across the nation (since we theorized with national population); and 3) the infection of people tested in groups are independent events (whereas in reality network effect may be observed, especially when people tested in groups all work in the same factory shop, etc.). When states decide to use pool testing, they are more likely to adapt it to the relevant circumstances in their own local settings, where the parameters vary from place to place. The above model helps us understand the potential and the limitations of the pool testing method in a general setting, but needs refinement.

Disclaimer: The author does not work in the public health domain and has absolutely no experience in medical testing. This article is merely trying to explain the idealization of pool testing concept from a mathematical approach to those who might be interested.

The Simple Math behind Pool Testing

The Mathematics behind Pool Testing

Written by Sabrina Liu