Is this patient clinically depressed?


Is this patient clinically depressed?
Williams J W, Hitchcock Noel P, Cordes J A, Ramirez G, Pignone M


Authors' objectives To determine the accuracy of depression questionnaires and clinical examination in the diagnosis of depression in primary care. Searching MEDLINE and a specialised registry of depression trials were searched for English language publications (1970 to July 2000); an incomplete list of search terms was provided. The bibliographies of retrieved articles were screened for additional references. Unpublished material was not sought. Study selection Study designs of evaluations included in the review No a priori inclusion criteria relating to the study design were reported. Specific interventions included in the review Studies evaluating the diagnostic accuracy of depression questionnaires in a primary care setting were eligible for inclusion. The questionnaires were required to contain a depression-specific component, be simply scored, and need only easy to average literacy skills to complete. In addition, the questionnaires had to be readily used in clinical situations. A minimum of 100 participants in at least one trial per index was also required. Eleven questionnaires were evaluated for their diagnostic ability: 6 were depression specific, 1 assessed depression and anxiety, and 4 were multi-component. Reference standard test against which the new test was compared A criterion-based diagnosis (using DSM-IV or ICD-9 specifications), established by standard clinical interview and carried out by a trained interviewer, was required to confirm diagnosis. The clinical interviews could be semi-structured or non-structured. Participants included in the review No inclusion criteria relating to the participants were specified. Participants from a variety of primary care settings (community, veteran, university-affiliated clinics, and private practice) were included in the review. Eight of the primary studies specifically examined older participants: 4 studies with a minimum age of 60 years, 3 studies with a minimum age of 65 years, and 1 study with a minimum age of 75 years. Outcomes assessed in the review No inclusion criteria were specified. However, the following measures of diagnostic accuracy were calculated: the likelihood ratio (LR) for a positive test and the LR for a negative test (weighted by study precision and corrected for 2-stage assessment techniques when indicated). In addition, simple agreement scores and kappa statistics were presented for the reproducibility of the reference standard. How were decisions on the relevance of primary studies made? The authors did not state how the papers were selected for the review, or how many reviewers performed the selection. Assessment of study quality The criteria used to assess validity were sample size, consecutive or random selection of the participants, blinding, incorporation bias, and whether the proportion of persons receiving a standard assessment was greater or less than 50% of those approached for a criterion assessment. The results of the validity assessment were reported in tabular form. Two reviewers independently assessed the primary studies. The authors did not state how final decisions were made. Data extraction Two reviewers independently extracted data from the selected studies. The authors did not state how final decisions were made. Information was extracted on the inclusion criteria, study quality, test characteristics and cut-off values. Where additional information was required, attempts were made to contact the corresponding author of the primary study. Methods of synthesis How were the studies combined? LRs were calculated from the 2x2 tables for each included study. Pooled LRs (mean weighted by precision) were calculated for each instrument. The results of individual studies, coded by instrument, were plotted in receiver operating characteristic curve (ROC) space and an ROC was presented. A summary effectiveness score, calculated according to the method described by Hasselblad and Hedges (see Other Publications of Related Interest), was used to evaluate overall performance. Studies were grouped according instrument and by diagnostic category (major depressive disorder, and major depressive disorder or dysthymia). How were differences between studies investigated? Heterogeneity of the effectiveness scores (measure of overall accuracy) was calculated within and between depression instruments. Results of the review Twenty-eight studies were included: 21 studies examined major depressive disorder (n=9,293) and 7 studies examined major depressive disorder or dysthymia (n=2,609). Major depressive disorder. The median LR for positive tests was 3.3 (range: 2.3 to 12.2), suggesting that a positive depression screen is over 3 times more likely to be seen in someone with major depressive disorder than in someone without. The median LR for negative tests was 0.19 (range: 0.14 to 0.35), suggesting that a negative depression screen was 0.2 times as likely to be seen in someone with major depressive disorder than in someone without. Major depressive disorder or dysthymia. The median LR across all instruments for positive results was 3.9 (range: 2.27 to 5.19). The median LR for negative tests was 0.3 (range: 0.05 to 0.53). Heterogeneity. Statistically significant differences in effectiveness scores between instruments were shown for a number of different questionnaires (BDI, CES-D, HSCL and SDS), suggesting that the instruments performed variably across the individual studies. Performance did not differ significantly between instruments. Reproducibility of the reference standard. Semi-structured clinical interviews (n=7): inter-rater reliability, as measured by the kappa statistic, ranged from 0.64 to 0.93, representing good to excellent agreement. Non-structured clinical interviews (n=7): inter-rater reliability, as measured by the kappa statistic, ranged from 0.55 to 0.74, representing fair to good agreement. Authors' conclusions A number of questionnaires with acceptable performance characteristics are available to the clinician, to help identify and diagnose major depression. The authors also stated that the reproducibility of the reference standard was high; reliable diagnostic confirmation was shown by mental health professionals using clinical interview, and by primary care clinicians using a semi-structured interview. CRD commentary The research question was supported by appropriate inclusion and validity criteria. The search strategy was limited to published, English language articles from two bibliographic databases and, as such, important information might have been missed. The authors did not report the methods used to evaluate retrieved studies for inclusion in the review, therefore the potential for reviewer bias at this stage cannot be assessed. However, methods were used to protect against bias in the extraction of data and assessment of methodological quality. Fifty-four per cent (9) of the primary studies were considered to be of reasonably high quality, although the validity assessment suggested that many of the primary studies appear to demonstrate important methodological limitations. The authors choice of a quantitative synthesis may not have been appropriate given the variation in outcome (effectiveness measure), population, and settings within the studies of a given instrument. A clearer description of how 'effectiveness' was calculated and how heterogeneity was treated would also have been useful. Details of the individual studies were reported, but information relating to the patients' characteristics was limited, thus reducing the ability to usefully interpret findings. The reliability of the reference standard was primarily assessed with health professionals working within mental health specialty settings, thus reducing any generalisability to primary care. The authors acknowledged several additional methodological limitations that also limit confidence in results relating to the reliability of the reference standard. Confidence in the study conclusions is limited by these considerations. Implications of the review for practice and research Practice: The authors suggested that instruments should be selected on the basis of brevity, response format and desire to screen for other psychiatric disorders, recommending the Patient Health Questionnaire as a good example. The authors suggested the Single Question for clinicians who wish to screen only for depression. Research: The authors did not state any implications for further research. Bibliographic details Williams J W, Hitchcock Noel P, Cordes J A, Ramirez G, Pignone M. Is this patient clinically depressed? JAMA 2002; 287(9): 1160-1170 PubMedID 11879114 Original Paper URL http://jama.ama-assn.org/ Other publications of related interest Hasselblad V, Hedges LV. Meta-analysis of screening and diagnostic tests. Psychol Bull 1995;117:167-78. These additional published commentaries may also be of interest. O'Malley PG. Review: questionnaires for detecting clinical depression in primary care have similar diagnostic accuracy. Evid Based Med 2002;7:159. Finding depression in primary care. Bandolier 2002;99:6-7. Indexing Status Subject indexing assigned by NLM MeSH Depressive Disorder /classification /diagnosis /physiopathology; Fatigue; Headache; Humans; Psychiatric Status Rating Scales; Reproducibility of Results; Stress, Psychological; Surveys and Questionnaires AccessionNumber 12002008168 Date bibliographic record published 30/04/2005 Date abstract record published 30/04/2005 Record Status This is a critical abstract of a systematic review that meets the criteria for inclusion on DARE. Each critical abstract contains a brief summary of the review methods, results and conclusions followed by a detailed critical assessment on the reliability of the review and the conclusions drawn.

Database of Abstracts of Reviews of Effects (DARE) Produced by the Centre for Reviews and Dissemination Copyright © 2026 University of York

Homepage

Options

Print

PubMed record

Original research

Share

Message for DARE database users