The authors addressed a clear research question. An appropriate reference standard was required for a study to be inclued. Several appropriate databases were searched, but EMBASE was not included. Neither search dates nor language restrictions were reported. There was no attempt to locate unpublished data. Therefore, publication and language bias may have been introduced. Publication bias was investigated, and the authors state that there was no evidence of such bias, but some of the statistical tests showed significant results for some outcomes at some stages of disease. A relatively large number of studies were included, but the overall number of patients included was relatively small and there was no way of ascertaining the number of patients in any individual study and, therefore, assessing the reliability of the individual study results.
Data extraction was conducted in duplicate, but it was unclear whether similar methods were used to reduce error and bias during study selection. The authors stated that criteria for RCTs were not appropriate for assessing the study designs included, specified only two criteria on which study quality was assessed, and indicated how patients were enrolled in the studies. QUADAS criteria, a validated quality assessment tool for diagnostic accuracy studies, were not applied. The authors stated that there was no statistical heterogeneity between studies, but with the distinct lack of study or population details it was not possible to assess clinical heterogeneity and so the appropriateness of pooling the studies could not be assessed.
Pooled sensitivities were similar for disease stages T3 and T4, and higher than those for T1 and T2 (indicating a lower number of false negative results in later stage disease). Pooled specificities were similar for all disease stages. The pooled positive likelihood ratio was greater in T1 (indicating fewer false positive results with early stage disease). However, confidence intervals for each outcome overlapped across all stages. Therefore, the implication that EUS was better at diagnosing certain stages was unreliable. Further, the implication that EUS was better at detecting the smallest/least invasive (early stage) and largest/most invasive (late stage) cancers may have been more an effect of the reliability of the data than the accuracy of EUS in clinical practice. The conclusion that FNA improved accuracy was based on the results of only four studies of unknown size and quality.
The reliability of the conclusions of this review was limited by poor reporting of the review process and characteristics of the included studies, and the lack of an appropriate quality assessment.