Eighteen studies were included (N = 674 patients).
Methodological quality score ranged from 11% to 63% of the maximum possible score. Agreement on methodological scoring Kappa = 0.8. In all 18 studies one or more reliable instruments measuring exercise capacity were lacking.
Overall effect size:
Maximal exercise capacity (12 studies): 0.4 (95%CI: 0.2, 0.6; P < 0.0001), heterogeneity not significant (NS).
Endurance time (7 studies): 1.2 (95%CI: 0.9, 1.5; P < 0.0001), heterogeneity P< 0.0001.
Endurance time using random-effects model (7 studies): 1.0 (95%CI: 0.04, 2.1; P = 0.02).
Walking distance (15 studies): 0.5 (95%CI: 0.3, 0.7; P < 0.0001), heterogeneity NS.
HRQL. Dyspnoea (5 studies): 0.7 (95%CI: 0.4, 1.0; P < 0.0001), heterogeneity NS.
Total score (7 studies): 0.6 (95%CI: 0.4, 0.8; P < 0.0001), heterogeneity NS.
Fatigue (4 studies): 0.6 (95%CI: 0.3, 0.9; P = 0.0001), heterogeneity NS.
Emotion (4 studies): 0.5 (95%CI: 0.2, 0.7; P = 0.001), heterogeneity NS.
Mastery (4 studies): 0.6 (95%CI: 0.3, 0.9; P < 0.0001), heterogeneity NS.
Total score (11 studies): 0.6 (95%CI: 0.5, 0.7; P < 0.0001), heterogeneity NS.
Subgroup analysis: FEV < 50th centile vs FEV > 50th centile (Q- within < 0.0001; Q-between NS). All other sub-groups, Q-between was insignificant.
Effect size using a weighting factor for methodological quality: identical summary effect sizes and CI for maximal exercise capacity and walking distance.
Long term effect size of rehabilitation: Maximal exercise capacity (6 studies) 0.3 (95%CI: 0.1, 0.5; P < 0.003 ), heterogeneity NS.
Walking distance (5 studies): 0.4 (95%CI: 0.1, 0.7; P < 0.005 ), heterogeneity NS.