Fifty-two studies (16,730 patients) were included.
Four quality items were found to be associated with the DOR: adequate description of study population, adequate description of the test, prospective study and no case-control design. The 12 studies that did not fulfill these four criteria were considered to be of a low quality.
Initial analysis: investigation of heterogeneity (55 studies).
Low-quality studies overestimated the DOR by a factor of 3.7. The accuracy of the test was much greater for the identification of HF than LVD (relative DOR 6.4). Clinical setting, prevalence of cardiac dysfunction and type of BNP assay were not associated with the DOR. Low-quality studies were excluded from further analysis, which was stratified by target condition.
Heart failure (11 studies after the exclusion of 5 poor-quality studies).
There was no indication of a threshold effect (p=0.55) or publication bias (p=0.18). There was evidence of significant heterogeneity in the DOR (p=0.001); this disappeared on the exclusion of 2 outlying studies with very high DORs. After exclusion of these outliers, BNP levels were very accurate for the diagnosis of HF (DOR 28.94; area under the curve 0.93). The negative LR showed very little heterogeneity (p=0.09). The pooled negative LR was 0.11 (95% CI: 0.08, 0.16). There was greater heterogeneity in positive LRs, so these were not pooled.
Systolic and/or diastolic function (7 studies after the exclusion of 3 poor-quality studies).
There was significant heterogeneity between the studies (p<0.0001). The sensitivity ranged from 28 to 92% and the specificity from 44 to 97%. The funnel plot suggested the possibility of publication bias (p=0.06).
Systolic dysfunction (25 studies after the exclusion of 4 poor-quality studies).
There was significant heterogeneity (p<0.0001) and the funnel plot suggested the presence of publication bias (p=0.0005). Despite the heterogeneity, it appeared that the accuracy was poorer than that of studies of HF.