Seven trials (four quasi-randomised and two randomised controlled trials) and one prospective case-control study (n=383 for the seven studies) were included in the review. Sample sizes ranged from 15 to 150 patients. The mean score for study quality was 5.25, with scores ranging from 4 to 8, with good inter-rater agreement and intra-rater correlation.
Fusion failure: The risk of fusion failure was reduced in patients receiving BMP (14.5 per cent) compared to controls (39 per cent). Relative risk was 0.42 (95% CI: 0.28 to 0.61, p<0.00001; eight studies). The number needed to treat analysis indicated that four posterolateral fusion procedures with the use of autologous bone graft would result in one additional case of fusion failure.
Time to fusion: There was a significant reduction in risk of fusion failure with BMP compared with controls at all time points. Overall relative risk was 0.45 (95% CI: 0.35 to 0.58, p<0.00001; eight studies). There was evidence of statistical heterogeneity at six and 12 months.
Re-operation, clinical failure and hospital stay: No statistically significant differences were reported for re-operation (five studies) or clinical failure (two studies) between treatment and control groups. Hospital stay was significantly shorter in patients receiving BMP compared to controls (weighted mean difference was -1.03, 95% CI: -1.45 to -0.61, p<0.00001; two studies).
Operative time: Three of four studies reported shorter mean operative times with BMP compared to controls, but pooling of the results was not statistically significant and there was evidence of significant heterogeneity (I2=90.5%).
Subgroup analyses: These showed a significantly greater reduction in risk of fusion failure with BMP-2 compared with OP-1 (p=0.003). Studies using instrumented fusion reported significantly less risk of fusion failure in patients receiving BMP compared to controls (relative risk was 0.33, 95% CI: 0.21 to 0.52, p<0.00001; six studies). There was some degree of statistical heterogeneity for instrumented fusion studies. Sensitivity analyses did not significantly alter the results.
There was no evidence of statistical heterogeneity for fusion failure, re-operation, clinical failure or hospital stay. The funnel plots showed no evidence of publication bias.