An Upper Bound to the Benefits of Implementing Positive Assortative Matching in Pooled Testing

Gustavo Q. Saraiva
Pontificia Universidad Católica de Chile, School of Management

12/09/2024

Introduction

  • For many infectious diseases it is common practice to screen the population through pool testing.

  • In this process, specimens (e.g., blood, urine, swabs) from different subjects are pooled and tested together.

  • Dorfman pool testing procedure: Whenever a pooled test detects infection, each specimen from that group is tested individually.

Introduction (Continued)

  • This process may reduce the cost of testing, as individual tests are only carried out in the event an infection is detected in the pool.

  • This testing protocol also has applications in other fields, such as the detection of defective parts in production lines (e.g., Sobel and Groll (1959) ) and the detection of contaminated food (e.g., Price, Olsen, and Hunter (1972) and Adams et al. (2013)).

Introduction (Continued)

  • Several different criterion can be used to form the pools, e.g.:
    • Random pooling: Subjects are grouped randomly or in the order they arrive
    • Ordered pooling: Subjects with similar probability of infection are pooled together
  • The literature has shown that ordered pooling simultaneously minimizes the expected number of tests, the expected number of false positives, and the expected number of false negatives (Aprahamian, Bish, and Bish (2019), Aprahamian, Bish, and Bish (2018) and Saraiva (2023a)).

Introduction (Continued)

  • But why is it that so many labs still implement random pooling?

  • Could it be that benefits of implementing ordered pooling as opposed to random pooling are not high enough to justify the added costs of collecting more information from patients and sorting them from lowest to highest probability of infection?

My contribution

  • This paper derives an upper bound to the maximum benefit that can be achieved by implementing ordered pooling as opposed to random pooling

  • Computing the upper bound only requires knowing the prevalence of the disease and the maximum and minimum possible probabilities of infection

  • The upper bounds may aid laboratories make their optimal decision

  • From these upper bounds one can see that, not only the pool size \(K\) has a significant impact on the potential benefits of implementing ordered pooling, but the batch size \(N\) as well

Environment

  • There is a finite batch \(S\) of subjects to be tested, with \(|S|=N\).

  • Each subject can either be infected or not infected.

  • Each subject \(s\in S\) is infected with probability \(q^s\in [0,1]\), where \((q^s)_{s\in S}\) follows an i.i.d distribution with support \([a,b]\subseteq [0,1]\).

  • Let \(q_{i}\) be the \(i\)th order statistic of the realized vector of probabilities of infection, so that \[ q_{1}\leq q_{2}\leq \cdots\leq q_{N}. \]

Notation

  • \(\bar{q}=\sum_iq_i/N\) (realized average probability of infection from the batch)

  • \(\mu=\mathbb{E}(\bar{q})=\mathbb{E}(q^s)\) (the prevalence of the disease population-wise)

  • \(S_e=\) sensitivity of an individual test. It gives the probability that an individual test detects infection given that the subject is infected.

  • \(S_p=\) specificity of an individual test. It gives the probability that an individual test does not detect infection, given that the subject is not infected.

  • \(S_e>1-S_p\)

Testing Mechanism

  • All subjects are tested in pools of size \(K<N\).

  • Number of pools: \(n=N/K\).

  • If a pooled test detects infection, each member from that group is individually tested (Dorfman Testing)

  • Dilution function: \(h(I,K)\) is the probability of detecting an infection in a pooled sample of \(K\) subjects, with exactly \(I\) of them infected.

  • So, \(h(1,1)=S_e\) and \(h(0,1)=1-S_p\).

Assumptions regarding the dilution function:

  • \(h(I,K)\) is increasing in \(I\)

  • \(h(K,K)=h(1,1)=S_e\)

  • \(h(0,K)=h(0,1)=1-S_p\)

Optimality of ordered pooling

  • Assumption 1: The dilution function is concave: \[ h(I+1,K)+h(I-1,k)\leq 2h(I,K) \]

  • Assumption 2: The dilution function is not “too concave”: \[ \frac{I+1}{2I}h(I+1,K)+\frac{I-1}{2I}h(I-1,K)\geq h(I,K)\quad\forall I\geq 1. \]

  • Proposition: Saraiva (2023a) If the dilution function \(h(I,K)\) is increasing in \(I\) and satisfies assumptions 1 and 2, and all pools have the same size \(K\), ordered pooling minimizes the expected number of tests, the expected number of false positives and the expected number of false negatives.

Examples

  • The following dilution functions satisfy all of these properties: \[ h(I,K)=(1-S_p)+(S_p+S_e-1)\left(\frac{I}{K}\right)^{\delta},\quad \text{with }\delta\in[0,1]. \]

\[ h(I,K)=\left\{ \begin{array}{c c} 1-S_p,&\text{ if }I=0\\ S_e,&\text{ else} \end{array} \right. \]

Additional Notation

  • \(q(\mu,n,K)\) is the vector of probabilities of infection that maximize the variance between pools and minimize the variance within pools, conditional that \(\sum_iq_i/n=\mu\).

  • For example, if \(K=4\), \(q(\mu,n,K)\) looks like this: \[ (\underbrace{\text{a,a,a,a}}_{\text{1st pool }},\underbrace{\text{a,a,a,a}}_{\text{2nd pool }},\cdots \underbrace{\text{c,c,c,c}}_{\text{ith pool}},\cdots \underbrace{\text{b,b,b,b}}_{\text{(n-1)th pool}},\underbrace{\text{b,b,b,b}}_{\text{nth pool}}), \] where \(c\in [a,b]\).

  • It is also useful to define \[ \alpha\equiv\frac{b-\mu}{b-a}, \] as the proportion of agents with probability of infection \(a\) from \(q(\mu,n,K)\) converges to \(\alpha\) as \(n\to\infty\) (and the proportion of those with probability of infection \(b\) converges to \((1-\alpha)\)).

Expected Number of tests

  • Let \[ T_o^K(q_1,\cdots,q_N) \] be the expected number of tests when implementing ordered pooling and the order statistics are given by \((q_1,q_2,\cdots,q_N)\).

  • Proposition 1: \(T_o^K(q(\bar{q},n,K))\leq T_o^K(q_1,\cdots,q_N)\), where \(\bar{q}=\sum_iq_i/N\).

  • Therefore, if \[T_r^K(\mu,n,K)\equiv\frac{N}{K}+\frac{N}{K}K\left[\sum_{I=0}^Kh(I,K)\binom{K}{I}\mu^{I}(1-\mu)^{K-I}\right]\] is the expected number of tests when implementing random pooling, the following would be a good candidate for an upper bound to the reduction in the expected number of tests when implementing ordered pooling: \[ UB_o^T(\bar{q},n,K)\equiv \frac{T_r^K(\mu,n,K)-T_o^K(q(\bar{q},n,K))}{N}. \]

Upper bound to reduction in Exp. Tests

  • Theorem 1:

\[\begin{align} \lim_{n\to\infty}UB_o^T(\mu,n,K)&=\sum_{I=0}^K h(I,K)\binom{K}{I}\left[\mu^I(1-\mu)^{K-I}\right.\notag\\ &\quad\left.-\alpha a^I(1-a)^{K-I}-(1-\alpha)b^I(1-b)^{K-I}\right],\label{lim_UB} \end{align}\] and

\[\begin{align} UB_o^T(\bar{q},n,K)&\overset{p}{\to}\lim_{n\to\infty}UB_o^T(\mu,n,K)\notag \end{align}\]

Is \(\lim_{n}UB_o^T(\mu,n,K)\) a reliable upper bound when \(n\) is small?

  • Proposition 2: For all \(n\in \mathbb{N}\) we have that \[ \begin{align} UB_o^T(\mu,n,K)&\leq \lim_{n\to\infty}UB_o^T(\mu,n,K)\notag \end{align} \]
  • So the answer is yes: for a small batch size \(N\) (and therefore a small number of pools \(n=N/K\)) the maximum savings from ordered pooling are smaller.

Upper Bound as \(n\to\infty\)

Description of image

Expected number of false positives

  • We obtain similar results for the expected number of false positives.

  • Proposition: \(FP_o^K(q(\bar{q},n,K))\leq FP_o^K(q_1,\cdots,q_N)\)

  • Define \[ UB_o^{FP}(\bar{q},n,K)\equiv \frac{FP_r^K(\mu,n,K)-FP_o^K(q(\bar{q},n,K))}{N} \]

  • Theorem: \[ \begin{align} \lim_{n\to\infty}UB_o^{FP}(\mu,n,K)&=(1-S_p)\sum_{I=0}^{K-1} h(I,K)\binom{K-1}{I}\left[\mu^I(1-\mu)^{K-I}\right.\notag\\ &\quad\left.-\alpha a^I(1-a)^{K-I}-(1-\alpha)b^I(1-b)^{K-I}\right],\label{lim_UB_FP} \end{align} \] and \[ \begin{align} UB_o^{FP}(\bar{q},n,K)&\overset{p}{\to}\lim_{n\to\infty}UB_o^{FP}(\mu,n,K)\notag \end{align} \]

  • Proposition: \[ \begin{align} UB_o^{FP}(\mu,n,K)&\leq \lim_{n\to\infty}UB_o^{FP}(\mu,n,K)\notag \end{align} \]

Expected number of false negatives

  • We also obtain similar results for the expected number of false negatives.

  • The main difference is the assumption required to obtain this result: instead of requiring the dilution to be concave, we require it not to be “too concave”: \[ \frac{I+1}{2I}h(I+1,K)+\frac{I-1}{2I}h(I-1,K)\geq h(I,K)\quad\forall I\geq 1. \]

Expected number of false negatives

  • Proposition: \(FN_o^K(q(\bar{q},n,K))\leq FN_o^K(q_1,\cdots,q_N)\)

  • Define: \[ UB_o^{FN}(\bar{q},n,K)\equiv \frac{FN_r^K(\mu,n,K)-FN_o^K(q(\bar{q},n,K))}{N} \]

  • Theorem: \[ \begin{align} \lim_{n\to\infty}UB_o^{FN}(\mu,n,K)&=\sum_{I=0}^{K-1} (1-S_eh(I+1,K))\binom{K-1}{I}\left[\mu^{I+1}(1-\mu)^{K-1-I}\right.\notag\\ &\quad\left.-\alpha a^{I+1}(1-a)^{K-1-I}-(1-\alpha)b^{I+1}(1-b)^{K-1-I}\right], \end{align} \] and \[ \begin{align} UB_o^{FN}(\bar{q},n,K)&\overset{p}{\to}\lim_{n\to\infty}UB_o^{FN}(\mu,n,K)\notag \end{align} \]

  • Proposition: \[ UB_o^{FN}(\mu,n,K)\leq \lim_{n\to\infty}UB_o^{FN}(\mu,n,K). \]

Chlamydia: Probabilities of infection (CDC)

Gender Race/Ethnicity Age Group (years) Prevalence Proportion in general population (%)
Hispanic 15-24 6.54 1.41
Female Other 0.65 7.01
Black 15-24 19.19 1.07
Other 1.22 5.67
Other 15-24 4.38 4.29
Other 0.25 31.31
Hispanic 15-24 1.78 1.53
Male Other 0.36 7.16
Black 15-24 7.45 1.09
Other 1.05 5.08
Other 15-24 1.20 4.51
Other 0.17 29.87
Total - - 0.97 100

Histogram of probabilities of infection for Chlamydia

Description of image

Upper bound increases with \(N\)

Description of gif

Reduction in the expected number of tests when \(N=100\) vs. \(N=300\).

Description of gif

Reduction in the expected number of tests per subject when \(N=100\)

Reduction in FP and FN per subject

Reduction in the expected number of false positives and false negatives when \(N=100\) vs. \(N=300\).

Reduction in FP and FN

Reduction in the expected number of false positives and false negatives when \(N=100\)

Reduction in total costs

  • We set the cost of a test at $55 and the cost of a false positive equal to the cost of a new test ($55)

  • Following Aprahamian et al. (2019), we use the average between the cost of sequelae for men and women, estimated by Owusu-Edusei Jr et al. (2015) , to set the cost of a false negative at \(C(FN)=\$2,927\).

Description of gif

Histogram of COVID 19 in Chile, during the peak of the pandemic

Description of image

RT-PCR dilution function

Description of image

  • I set \(S_p=0.99\) and used MLE to recover \(S_e\) and \(\delta\) using data from Yelin et al. (2020) :

\[ h(I,K)=(1-S_p)+(S_p+S_e-1)\left(\frac{I}{K}\right)^{\delta} \]

  • COVID-19 Savings per subject
  • The following plots were built assuming \(N=100\)

  • Assuming that the PCR test costs \(31.25\) USD, the cost savings per subject for a pool of size \(K\leq 5\) was less than 2 cents.

Savings per subject in the expected number of tests for each infection week

What if we had better data?

  • Suppose that we find out that those who exhibit COVID symptoms and are not vaccinated have a 10% probability of infection.

  • Then we can compute an upper bound to the benefits of collecting information on symptoms in addition to information on vaccination status.

Description of gif

Discussion

  • The upper bounds derived in this article may aid laboratories in their decision to implement ordered vs. random pooling

  • In the Appendix we derive looser upper bounds that do not even require estimating the dilution effect.

  • Our results may also aid in the decision of data collection: what is the maximum benefit one can get by collecting more data from patients (e.g., whether they exhibit symptoms or not).

References

Adams, Derek R., Wendy R. Stensland, Chong H. Wang, Annette M. O’Connor, Darrell W. Trampel, Karen M. Harmon, Erin L. Strait, and Timothy S. Frana. 2013. Detection of Salmonella Enteritidis in Pooled Poultry Environmental Samples Using a Serotype-Specific Real-Time—Polymerase Chain Reaction Assay.” Avian Diseases 57(1):22–28.
Aprahamian, Hrayer, Douglas R. Bish, and Erub K. Bish. 2019. “Optimal Risk-Based Group Testing.” Management Science 65(9):4365–84.
Aprahamian, Hrayer, Ebru K. Bish, and Douglas R. Bish. 2018. “Adaptive Risk-Based Pooling in Public Health Screening.” IISE Transactions 50(9):753–66.
Bobkova, Nina, Ying Chen, and Hülya Eraslan. 2023. “Optimal Group Testing with Heterogeneous Risks.” Economic Theory. doi: 10.1007/s00199-023-01502-3.
Dorfman, Robert. 1943. “The Detection of Defective Members of Large Populations.” The Annals of Mathematical Statistics 14:436–40.
Lipnowski, Elliot, and Doron Ravid. 2021. “Pooled Testing for Quarantine Decisions.” Journal of Economic Theory 198:105372. doi: https://doi.org/10.1016/j.jet.2021.105372.
Owusu-Edusei Jr, Kwame, Harrell W. Chesson, Thomas L. Gift, Robert C. Brunham, and Gail Bolan. 2015. Cost-Effectiveness of Chlamydia Vaccination Programs for Young Women.” Emerging Infectious Diseases 21(6):960–68.
Price, W. R., R. A. Olsen, and J. E. Hunter. 1972. “Salmonella Testing of Pooled Pre-Enrichment Broth Cultures for Screening Multiple Food Samples.” Applied Microbiology 23(4):679–82. doi: 10.1128/am.23.4.679-682.1972.
Saraiva, Gustavo Quinderé. 2023a. “Pool Testing with Dilution Effects and Heterogeneous Priors.” Health Care Management Science 1–22. doi: 10.1007/s10729-023-09650-7.
Saraiva, Gustavo Quinderé. 2023b. “Strategic Incentives When Implementing Dorfman Testing with Assortative Matching.” Economics Letters 232:111314. doi: https://doi.org/10.1016/j.econlet.2023.111314.
Sobel, M., and P. A. Groll. 1959. “Group Testing to Eliminate Efficiently All Defectives in a Binomial Sample.” The Bell System Technical Journal 38(5):1179–1252. doi: 10.1002/j.1538-7305.1959.tb03914.x.
Yelin, Idan, Noga Aharony, Einat Shaer Tamar, Amir Argoetti, Esther Messer, Dina Berenbaum, Einat Shafran, Areen Kuzli, Nagham Gandali, Omer Shkedi, Tamar Hashimshony, Yael Mandel-Gutfreund, Michael Halberthal, Yuval Geffen, Moran Szwarcwort-Cohen, and Roy Kishony. 2020. Evaluation of COVID-19 RT-qPCR Test in Multi sample Pools.” Clinical Infectious Diseases 71(16):2073–78. doi: 10.1093/cid/ciaa531.