For many infectious diseases it is common practice to screen the population through pool testing.
In this process, specimens (e.g., blood, urine, swabs) from different subjects are pooled and tested together.
Dorfman pool testing procedure: Whenever a pooled test detects infection, each specimen from that group is tested individually.
This process may reduce the cost of testing, as individual tests are only carried out in the event an infection is detected in the pool.
This testing protocol also has applications in other fields, such as the detection of defective parts in production lines (e.g., Sobel and Groll (1959) ) and the detection of contaminated food (e.g., Price, Olsen, and Hunter (1972) and Adams et al. (2013)).
But why is it that so many labs still implement random pooling?
Could it be that benefits of implementing ordered pooling as opposed to random pooling are not high enough to justify the added costs of collecting more information from patients and sorting them from lowest to highest probability of infection?
This paper derives an upper bound to the maximum benefit that can be achieved by implementing ordered pooling as opposed to random pooling
Computing the upper bound only requires knowing the prevalence of the disease and the maximum and minimum possible probabilities of infection
The upper bounds may aid laboratories make their optimal decision
From these upper bounds one can see that, not only the pool size \(K\) has a significant impact on the potential benefits of implementing ordered pooling, but the batch size \(N\) as well
Dorfman (1943): foundational work
Aprahamian et al. (2019) Aprahamian et al. (2018) and Saraiva (2023a) present conditions under which ordered pooling is optimal
Saraiva (2023b) shows that, even if patients behave strategically by misreporting their types (to affect their test results) in a Nash equilibrium, ordered pooling still outperforms random pooling.
Lipnowski and Ravid (2021) show that ordered pooling is optimal in a different setting (without retesting).
Bobkova, Chen, and Eraslan (2023) shows that, when using a different testing protocol, negative assortative matching may be optimal.
There is a finite batch \(S\) of subjects to be tested, with \(|S|=N\).
Each subject can either be infected or not infected.
Each subject \(s\in S\) is infected with probability \(q^s\in [0,1]\), where \((q^s)_{s\in S}\) follows an i.i.d distribution with support \([a,b]\subseteq [0,1]\).
Let \(q_{i}\) be the \(i\)th order statistic of the realized vector of probabilities of infection, so that \[ q_{1}\leq q_{2}\leq \cdots\leq q_{N}. \]
\(\bar{q}=\sum_iq_i/N\) (realized average probability of infection from the batch)
\(\mu=\mathbb{E}(\bar{q})=\mathbb{E}(q^s)\) (the prevalence of the disease population-wise)
\(S_e=\) sensitivity of an individual test. It gives the probability that an individual test detects infection given that the subject is infected.
\(S_p=\) specificity of an individual test. It gives the probability that an individual test does not detect infection, given that the subject is not infected.
\(S_e>1-S_p\)
All subjects are tested in pools of size \(K<N\).
Number of pools: \(n=N/K\).
If a pooled test detects infection, each member from that group is individually tested (Dorfman Testing)
Dilution function: \(h(I,K)\) is the probability of detecting an infection in a pooled sample of \(K\) subjects, with exactly \(I\) of them infected.
So, \(h(1,1)=S_e\) and \(h(0,1)=1-S_p\).
\(h(I,K)\) is increasing in \(I\)
\(h(K,K)=h(1,1)=S_e\)
\(h(0,K)=h(0,1)=1-S_p\)
Assumption 1: The dilution function is concave: \[ h(I+1,K)+h(I-1,k)\leq 2h(I,K) \]
Assumption 2: The dilution function is not “too concave”: \[ \frac{I+1}{2I}h(I+1,K)+\frac{I-1}{2I}h(I-1,K)\geq h(I,K)\quad\forall I\geq 1. \]
Proposition: Saraiva (2023a) If the dilution function \(h(I,K)\) is increasing in \(I\) and satisfies assumptions 1 and 2, and all pools have the same size \(K\), ordered pooling minimizes the expected number of tests, the expected number of false positives and the expected number of false negatives.
\[ h(I,K)=\left\{ \begin{array}{c c} 1-S_p,&\text{ if }I=0\\ S_e,&\text{ else} \end{array} \right. \]
\(q(\mu,n,K)\) is the vector of probabilities of infection that maximize the variance between pools and minimize the variance within pools, conditional that \(\sum_iq_i/n=\mu\).
For example, if \(K=4\), \(q(\mu,n,K)\) looks like this: \[ (\underbrace{\text{a,a,a,a}}_{\text{1st pool }},\underbrace{\text{a,a,a,a}}_{\text{2nd pool }},\cdots \underbrace{\text{c,c,c,c}}_{\text{ith pool}},\cdots \underbrace{\text{b,b,b,b}}_{\text{(n-1)th pool}},\underbrace{\text{b,b,b,b}}_{\text{nth pool}}), \] where \(c\in [a,b]\).
It is also useful to define \[ \alpha\equiv\frac{b-\mu}{b-a}, \] as the proportion of agents with probability of infection \(a\) from \(q(\mu,n,K)\) converges to \(\alpha\) as \(n\to\infty\) (and the proportion of those with probability of infection \(b\) converges to \((1-\alpha)\)).
\[\begin{align} \lim_{n\to\infty}UB_o^T(\mu,n,K)&=\sum_{I=0}^K h(I,K)\binom{K}{I}\left[\mu^I(1-\mu)^{K-I}\right.\notag\\ &\quad\left.-\alpha a^I(1-a)^{K-I}-(1-\alpha)b^I(1-b)^{K-I}\right],\label{lim_UB} \end{align}\] and
\[\begin{align} UB_o^T(\bar{q},n,K)&\overset{p}{\to}\lim_{n\to\infty}UB_o^T(\mu,n,K)\notag \end{align}\]
We obtain similar results for the expected number of false positives.
Proposition: \(FP_o^K(q(\bar{q},n,K))\leq FP_o^K(q_1,\cdots,q_N)\)
Define \[ UB_o^{FP}(\bar{q},n,K)\equiv \frac{FP_r^K(\mu,n,K)-FP_o^K(q(\bar{q},n,K))}{N} \]
Theorem: \[ \begin{align} \lim_{n\to\infty}UB_o^{FP}(\mu,n,K)&=(1-S_p)\sum_{I=0}^{K-1} h(I,K)\binom{K-1}{I}\left[\mu^I(1-\mu)^{K-I}\right.\notag\\ &\quad\left.-\alpha a^I(1-a)^{K-I}-(1-\alpha)b^I(1-b)^{K-I}\right],\label{lim_UB_FP} \end{align} \] and \[ \begin{align} UB_o^{FP}(\bar{q},n,K)&\overset{p}{\to}\lim_{n\to\infty}UB_o^{FP}(\mu,n,K)\notag \end{align} \]
Proposition: \[ \begin{align} UB_o^{FP}(\mu,n,K)&\leq \lim_{n\to\infty}UB_o^{FP}(\mu,n,K)\notag \end{align} \]
We also obtain similar results for the expected number of false negatives.
The main difference is the assumption required to obtain this result: instead of requiring the dilution to be concave, we require it not to be “too concave”: \[ \frac{I+1}{2I}h(I+1,K)+\frac{I-1}{2I}h(I-1,K)\geq h(I,K)\quad\forall I\geq 1. \]
Proposition: \(FN_o^K(q(\bar{q},n,K))\leq FN_o^K(q_1,\cdots,q_N)\)
Define: \[ UB_o^{FN}(\bar{q},n,K)\equiv \frac{FN_r^K(\mu,n,K)-FN_o^K(q(\bar{q},n,K))}{N} \]
Theorem: \[ \begin{align} \lim_{n\to\infty}UB_o^{FN}(\mu,n,K)&=\sum_{I=0}^{K-1} (1-S_eh(I+1,K))\binom{K-1}{I}\left[\mu^{I+1}(1-\mu)^{K-1-I}\right.\notag\\ &\quad\left.-\alpha a^{I+1}(1-a)^{K-1-I}-(1-\alpha)b^{I+1}(1-b)^{K-1-I}\right], \end{align} \] and \[ \begin{align} UB_o^{FN}(\bar{q},n,K)&\overset{p}{\to}\lim_{n\to\infty}UB_o^{FN}(\mu,n,K)\notag \end{align} \]
Proposition: \[ UB_o^{FN}(\mu,n,K)\leq \lim_{n\to\infty}UB_o^{FN}(\mu,n,K). \]
| Gender | Race/Ethnicity | Age Group (years) | Prevalence | Proportion in general population (%) |
|---|---|---|---|---|
| Hispanic | 15-24 | 6.54 | 1.41 | |
| Female | Other | 0.65 | 7.01 | |
| Black | 15-24 | 19.19 | 1.07 | |
| Other | 1.22 | 5.67 | ||
| Other | 15-24 | 4.38 | 4.29 | |
| Other | 0.25 | 31.31 | ||
| Hispanic | 15-24 | 1.78 | 1.53 | |
| Male | Other | 0.36 | 7.16 | |
| Black | 15-24 | 7.45 | 1.09 | |
| Other | 1.05 | 5.08 | ||
| Other | 15-24 | 1.20 | 4.51 | |
| Other | 0.17 | 29.87 | ||
| Total | - | - | 0.97 | 100 |
Reduction in the expected number of tests when \(N=100\) vs. \(N=300\).
Reduction in the expected number of tests per subject when \(N=100\)
Reduction in the expected number of false positives and false negatives when \(N=100\) vs. \(N=300\).
Reduction in the expected number of false positives and false negatives when \(N=100\)
We set the cost of a test at $55 and the cost of a false positive equal to the cost of a new test ($55)
Following Aprahamian et al. (2019), we use the average between the cost of sequelae for men and women, estimated by Owusu-Edusei Jr et al. (2015) , to set the cost of a false negative at \(C(FN)=\$2,927\).
\[ h(I,K)=(1-S_p)+(S_p+S_e-1)\left(\frac{I}{K}\right)^{\delta} \]
Suppose that we find out that those who exhibit COVID symptoms and are not vaccinated have a 10% probability of infection.
Then we can compute an upper bound to the benefits of collecting information on symptoms in addition to information on vaccination status.
The upper bounds derived in this article may aid laboratories in their decision to implement ordered vs. random pooling
In the Appendix we derive looser upper bounds that do not even require estimating the dilution effect.
Our results may also aid in the decision of data collection: what is the maximum benefit one can get by collecting more data from patients (e.g., whether they exhibit symptoms or not).