Twenty-one published studies and 19 unpublished studies were included; the total number of participants was 1,953. These studies provided 51 treatment versus control comparisons.
The overall pooled analysis (mean d=0.67) included 173 effect sizes, generated from 40 studies of all relevant measures and comparisons. Since the value was in the medium range it indicated that the average child receiving CBT would be in the lower quartile of the nontreated group. However, there was evidence of statistical heterogeneity (Q=250.42, d.f.=172, P<0.0001). The effect sizes by domain of measurement, source of information and treatment type varied from 0.63 to 0.73, -0.07 to 0.69 and 0.36 to 0.79, respectively; skills training and multimodal interventions were the most effective treatment types. It was also found that feedback, modelling and homework assignments were all statistically positively related to the overall effect size (Spearman rank-order correlation, rho=0.55, P<0.001; rho=0.46, P<0.001; rho=0.31, P<0.05, respectively).
Results of the tests of inter-observer agreement levels ranged from 0.87 to 0.51 for treatment type, problem severity and treatment integrity.
Using a 'small effect' value of 0.20 in the fail-safe N calculations, the authors anticipated that an extra 117 studies with small effect size would be required to reduce the overall d-value to 0.20.
There was no significant difference in the overall effect size between published and unpublished studies (d=0.64 and d=0.70, respectively, P<0.59).
Randomisation, rater blinding and treatment integrity were found not to be significantly related to the magnitude of the effect size.