Eighteen studies with 7,180 participants (927 with major depressive disorder) were included. PHQ-9 cut-off scores ranged from 7 to 15. Four studies did not apply the reference standard test to all participants and were at high risk of partial verification bias. Heterogeneity was high (Ι²=82.4%).
Pooled specificity ranged from 0.73 (95% CI 0.63 to 0.82) at a cut-off of 7 to 0.96 (95% CI 0.94 to 0.97) at a cut-off of 15. Pooled sensitivity values varied between cut-off scores with no consistent pattern. For the widely recommended cut-off score of 10 (evaluated in 16 studies), pooled sensitivity was 0.85 (95% CI 0.75 to 0.91) and specificity 0.89 (95% CI 0.83 to 0.92). There were no substantial differences in pooled sensitivity and specificity for cut-off scores between 8 and 11. A cut-off score of 11 had the best trade-off between sensitivity and specificity.
In the meta-regression, only blinded application of the reference standard was a significant predictor of diagnostic performance. Funnel plots were not provided but the authors stated that they could not rule out publication bias.