ELSEVIER Journal of Affective Disorders 114 (2009) 163–173 Journal of Affective Disorders www.elsevier.com/locate/jad # Research report # The PHQ-8 as a measure of current depression in the general population Kurt Kroenke $^{a,\ast}$, Tara W. Strine $^{b}$, Robert L. Spitzer $^{c}$, Janet B.W. Williams $^{c}$, Joyce T. Berry $^{d}$, Ali H. Mokdad $^{b}$ $^{a}$ Department of Medicine, Indiana University School of Medicine and Regenstrief Institute, Indianapolis, IN, United States $^{b}$ Centers for Disease Control and Prevention, Atlanta, GA, United States $^{c}$ Department of Psychiatry, Columbia University, and New York State Psychiatric Institute, New York, NY, United States $^{d}$ Substance Abuse and Mental Health Services Administration, DC, United States Received 24 October 2007; received in revised form 29 June 2008; accepted 30 June 2008 Available online 27 August 2008 # Abstract Background: The eight-item Patient Health Questionnaire depression scale (PHQ-8) is established as a valid diagnostic and severity measure for depressive disorders in large clinical studies. Our objectives were to assess the PHQ-8 as a depression measure in a large, epidemiological population-based study, and to determine the comparability of depression as defined by the PHQ-8 diagnostic algorithm vs. a PHQ-8 cutpoint $\geq 10$. Methods: Random-digit-dialed telephone survey of 198,678 participants in the 2006 Behavioral Risk Factor Surveillance Survey (BRFSS), a population-based survey in the United States. Current depression as defined by either the DSM-IV based diagnostic algorithm (i.e., major depressive or other depressive disorder) of the PHQ-8 or a PHQ-8 score $\geq 10$; respondent sociodemographic characteristics; number of days of impairment in the past 30 days in multiple domains of health-related quality of life (HRQoL). Results: The prevalence of current depression was similar whether defined by the diagnostic algorithm or a PHQ-8 score $\geq 10$ (9.1% vs. 8.6%). Depressed patients had substantially more days of impairment across multiple domains of HRQoL, and the impairment was nearly identical in depressed groups defined by either method. Of the 17,040 respondents with a PHQ-8 score $\geq 10$, major depressive disorder was present in 49.7%, other depressive disorder in 23.9%, depressed mood or anhedonia in another 22.8%, and no evidence of depressive disorder or depressive symptoms in only 3.5%. Limitations: The PHQ-8 diagnostic algorithm rather than an independent structured psychiatric interview was used as the criterion standard. Conclusions: The PHQ-8 is a useful depression measure for population-based studies, and either its diagnostic algorithm or a cutpoint $\geq 10$ can be used for defining current depression. © 2008 Elsevier B.V. All rights reserved. Keywords: Depression; Psychometrics; Prevalence; Epidemiology; Quality of life; Patient Health Questionnaire 0165-0327/$ - see front matter © 2008 Elsevier B.V. All rights reserved. doi:10.1016/j.jad.2008.06.026 # Introduction Depression is not only the most common mental disorder in general practice as well as mental health settings, but also is a major public health problem. The World Health Organization now recognizes depression as one of the most burdensome diseases in the world (World Health Organization, 2002). It is also among the leading causes of decreased work productivity (Stewart et al. 2003). The prevalence and impact of depression in the United States has been assessed in important population-based studies, with modern methods first used in the Epidemiological Catchment Area study in the early 1980s (Robins and Regier, 1991) and proceeding to the National Comorbidity Survey in 1990--1992 (Kessler et al. 1994) and its replication (NCS-R) a decade later (Kessler et al. 2003). Utilizing structured psychiatric interviews, these landmark epidemiological studies have provided invaluable information on the community prevalence of depression and other mental disorders. However, there are a number of periodic population-based surveys conducted by federal or state agencies that provide an opportunity for more regular surveillance, although these surveys do not focus exclusively on depression or psychiatric conditions. Because mental health may be only one of a number of health indicators assessed, brief measures may be essential to reduce respondent burden. One increasingly popular measure for assessing depression is the Patient Health Questionnaire nine-item depression scale (PHQ-9). Since its original validation study in 2001 (Kroenke et al. 2001), the PHQ-9 already has been used in several hundred published studies and translated into more than 30 languages. It consists of the nine criteria for depression from the Diagnostic and Statistical Manual of Mental Disorders, fourth edition (DSM-IV). The PHQ-9 is half the length of many depression measures, comparable or superior in operating characteristics, and valid as both a diagnostic and severity measure (Lowe et al. 2004a, Williams et al. 2002a, Williams et al. 2002b). It has been used in clinical (Diez-Quevedo et al. 2001, Kroenke and Spitzer, 2002) and population-based settings (Martin et al. 2006) and is valid in self-administered (Diez-Quevedo et al. 2001, Kroenke et al. 2001) and telephone-administered modes (Pinto-Meza et al. 2005). Additionally, the PHQ-9 is effective for detecting depressive symptoms in various racial/ethnic groups (Huang et al. 2006a, Huang et al. 2006b) and older populations (Klapow et al. 2002), as well as in patients with neurological disorders (Bombardier et al. 2006, Bombardier et al. 2004, Callahan et al. 2006, Fann et al. 2005, Williams et al. 2004, Williams et al. 2005), cardiovascular disease (Holzapfel et al. 2007, Ruo et al. 2003), HIV/AIDS (Justice et al. 2004), diabetes (Glasgow et al. 2004, Katon et al. 2004), chronic kidney disease (Drayer et al. 2006), cancer (Dwight-Johnson et al. 2005), rheumatological disorders (Lowe et al. 2004c, Rosemann et al. 2007), gastrointestinal disease (Persoons et al. 2001), dermatological disorders (Picardi et al. 2004), and other conditions (Lowe et al. 2004b, Maizels et al. 2006, Persoons et al. 2003, Scholle et al. 2003, Spitzer et al. 2000, Tietjen et al. 2007, Turner and Dworkin, 2004, Turvey et al. 2007). In order to assess the current prevalence and impact of depression in the United States, an eight-item version of the Patient Health Questionnaire depression scale (PHQ-8) recently was made available for use by state health departments in the 2006 Behavioral Risk Factor Surveillance Survey (BRFSS). The PHQ-8 is comparable to the PHQ-9 in terms of diagnosing depressive disorders when using a DSM-IV based diagnostic algorithm (Corson et al. 2004, Kroenke and Spitzer, 2002). However, there is evidence that a PHQ-8 score ≥10 represents clinically significant depression (Kroenke et al. 2001) and is more convenient to use than a diagnostic algorithm. In this paper, we compare the standard diagnostic algorithm and the PHQ-8 cutpoint of 10 in terms of depression prevalence, respondent sociodemographic characteristics, PHQ-8 operating characteristics, and construct validity as assessed by multiple domains of health-related quality of life. Assessment of the PHQ-8 in this large, epidemiological study may provide further evidence of its utility as a depression measure in population-based research. ## Methods ### Behavioral Risk Factor Surveillance Survey (BRFSS) The BRFSS is a surveillance system operated by state health departments in collaboration with CDC. It aims to collect uniform, state-specific data on preventive health practices and risk behaviors that are linked to chronic diseases, injuries, and preventable infectious diseases in the adult population (Centers for Disease Control and Prevention, 2005, Mokdad et al. 2003). Trained interviewers collect data from a standardized questionnaire using an independent probability sample of households with telephones in the non-institutionalized U.S. adult population. Data from all states and areas were pooled to produce national estimates. The BRFSS questionnaire consists of three parts: 1) core questions asked in all 50 states, the District of Columbia (D.C.), Puerto Rico (PR), and the U.S. Virgin Islands (USVI); 2) supplemental modules, which are series of questions on specific topics (e.g. adult asthma history, intimate partner violence, mental health); and 3) state-added questions. In 2006, trained interviewers administered questions about depression severity and lifetime diagnosis of anxiety and depression (Anxiety and Depression Module) in 38 states as well as D.C. PR, and USVI. Additional BRFSS methodology is described elsewhere (Holtzman, 2004). All BRFSS questionnaires, data, and reports are available at http://www.cdc.gov/brfss. ### Patient Health Questionnaire eight-item depression scale (PHQ-8) To assess the prevalence of depression and its severity in the general U.S. population, the standardized and validated PHQ-8 (see Appendix A) was used (Kroenke and Spitzer, 2002). The PHQ-8 consists of eight of the nine criteria on which the DSM-IV diagnosis of depressive disorders is based (American Psychiatric Association, 1994). The ninth question in the DSM-IV assesses suicidal or self-injurious thoughts. It was omitted because interviewers are not able to provide adequate intervention by telephone. Research indicates that the deletion of this question has only a minor effect on scoring because thoughts of self-harm are fairly uncommon in the general population, and the ninth item is by far the least frequently endorsed item on the PHQ-9 (Huang et al. 2006a, Kroenke and Spitzer, 2002, Lee et al. 2007, Rief et al. 2004). Indeed, the two original validation studies of the PHQ totaling 6000 patients established that identical scoring thresholds for depression severity could be used for the PHQ-9 and PHQ-8 (Kroenke and Spitzer, 2002). The PHQ-8 response set was standardized to make it similar to other BRFSS questions by asking the number of days in the past 2 weeks the respondent had experienced a particular depressive symptom. The modified response set was converted back to the original response set: 0 to 1 day=“not at all,” 2 to 6 days=“several days,” 7 to 11 days=“more than half the days,” and 12 to 14 days=“nearly every day,” with points (0 to 3) assigned to each category, respectively. The scores for each item are summed to produce a total score between 0 and 24 points. A total score of 0 to 4 represents no significant depressive symptoms. A total score of 5 to 9 represents mild depressive symptoms; 10 to 14, moderate; 15 to 19, moderately severe; and 20 to 24, severe. (Kroenke et al. 2001) Current depression was defined in two ways: 1) a PHQ-8 algorithm diagnosis of major depression (this requires either the first or second item (depressed mood or anhedonia) to be present “more than half the days” and at least 5 of the 8 symptoms to be present “more than half the days”) or other depression (2 to 4 symptoms, including depressed mood or anhedonia, are required to be present “more than half the days”); 2) a PHQ-8 score of ≥10, which has an 88% sensitivity and 88% specificity for major depression (Kroenke and Spitzer, 2002) and, regardless of diagnostic status, typically represents clinically significant depression (Corson et al. 2004, Kroenke et al. 2001). ### Health-related quality of life and other items Three health-related quality of life (HRQoL) questions with demonstrated validity and reliability for population health surveillance were examined (Andresen et al. 2003, Mielenz et al. 2006, Moriarty et al. 2003). The three questions involved respondents' self-assessment of their health over the previous 30 days. 1) Physical health: “How many days was your physical health, which includes physical illness or injury, not good?” 2) Mental health: “How many days was your mental health, which includes stress, depression, and problems with emotions, not good?” 3) Activity limitations: “How many days did poor physical or mental health keep you from doing your usual activities, such as self-care, work, or recreation?” Additionally, a “Healthy Days Symptoms Module” was used in Delaware, Hawaii, and Rhode Island. Questions in this module also referred to the previous 30 days: 1) Depressive symptoms: “How many days did you feel sad, blue, or depressed?” 2) Anxiety symptoms: “How many days did you feel worried, tense, or anxious?” 3) Sleep problems: “How many days have you felt you did not get enough rest or sleep?” 4) Pain limitations: “How many days did pain make it difficult to do your usual activities?” 5) Vitality: “How many days have you felt very healthy and full of energy?” We calculated fatigue by subtracting the number of days of vitality from 30. Sociodemographic information was obtained for each respondent. Employment status was assessed by the question: “Are you currently: employed for wages, self-employed, out of work for more than 1 year, out of work for less than 1 year, a homemaker, a student, retired, or unable to work?” Additionally, two questions were asked about lifetime diagnosis: “Has a doctor or other health care provider ever told you that you have an anxiety disorder (including acute stress disorder, anxiety, generalized anxiety disorder, obsessive-compulsive disorder, panic attacks, panic disorder, phobia, posttraumatic stress disorder, or social anxiety disorder)?” and “Has a doctor or other health care provider ever told you that you have a depressive disorder (including depression, major depression, dysthymia, or minor depression)?” There were 198,678 respondents from the 38 states, D.C. PR, and USVI who completed all PHQ-8 questions. Of these, nearly all (198,574, or 99.95%) completed at least one of the first three HRQoL items (196,673 for mental health, 196,141 for physical health, and 197,543 for activity limitations). Among the 13,622 respondents in Delaware, Hawaii, and Rhode Island, 13,619 (99.98%) answered at least one of the 5 HRQOL questions (depression 13,514, anxiety 13,487, sleep 13,536, fatigue 13,381, and pain 13,534). The median cooperation rate of BRFSS (i.e. the percentage of eligible respondents who completed the survey) was 74.5%. ### Analysis Depression was classified as either major depression or other depression (using the PHQ-8 diagnostic algorithm as described) or a PHQ-8 score ≥10. Sociodemographic characteristics of depressed and nondepressed respondents were compared. The frequency distribution of major depression and other depression by standard PHQ-8 severity intervals (0--4, 5--9, 10--14, 15--19, and 20--24) as well as the commonly used cutpoint of ≥10 were described (Kroenke and Spitzer, 2002). Operating characteristics (sensitivity, specificity, likelihood ratios) for PHQ-8 intervals and cutpoint were calculated, using the PHQ-8 diagnostic algorithm as the criterion standard (Kroenke and Spitzer, 2002, Kroenke et al. 2001). The mean number of impairment days in the past 30 days for HRQoL domains was determined for depressed and nondepressed groups. Because of the large sample size, statistical testing was not emphasized. Weighting in BRFSS is designed to make the total number of cases equal to the number of people in the state who are age 18 and older. In the BRFSS, such poststratification serves as an adjustment for noncoverage and nonresponse and forces the total number of cases to equal population estimates for each geographic region, which for the BRFSS is usually a state. Sample characteristics (Table 1) and PHQ-8 operating characteristics (Table 2) use unweighted BRFSS data, while construct validity analyses (Table 3 and Fig. 1, Fig. 2) use weighted data. ## Results ### Respondent characteristics Data were analyzed from 198,678 respondents to the 2006 BRFSS survey. Overall, the sample was 61.6% women, 78% non-Hispanic white, 58.3% currently employed, 61.2% college educated, and 56.9% currently married. A lifetime diagnosis of a depressive or anxiety disorder was reported by 18.0% and 12.3%, respectively. Table 1 compares the characteristics of depressed vs. nondepressed respondents, with depression defined either by the PHQ-8 diagnostic algorithm (major depressive or other depressive disorder) or by a PHQ-8 cutpoint ≥10. Two findings should be emphasized. First, depressed respondents were more likely to be women, nonwhite, less educated, unemployed or unable to work, unmarried, and younger than 55 years. Not surprisingly, depressed respondents also were much more likely to report lifetime diagnoses of both depression and anxiety. Second, characteristics of the depressed groups were quite similar between the two methods of defining depression, as were characteristics of the nondepressed groups. Compared to the diagnostic algorithm, the cutpoint method produced slightly lower estimates of depression in men and in the two oldest age groups and modestly higher estimates in those with a self-reported lifetime diagnosis of depression or anxiety. ### PHQ-8 distribution and operating characteristics Table 2 shows the relationship between PHQ-8 severity scores and depression diagnostic status. There were 8476 respondents with major depression in the BRFSS sample using the PHQ-8 diagnostic algorithm, resulting in a prevalence of 4.3%. There were 18,053 respondents with any depression using the diagnostic algorithm and 17,040 with any depression using a PHQ-8 cutpoint of ≥10, yielding relatively similar prevalences of 9.1% and 8.6%, respectively. No respondents with scores less than 10 had major depression, because this diagnosis requires at least 5 symptoms to be present more than half the days (resulting in a score of 2 for each symptom). The sensitivity of a PHQ-8 score ≥10 is the proportion of respondents with a depressive disorder who have a score of 10 or greater, and the specificity is the proportion of respondents without a depressive disorder who have a score less than 10. The sensitivity and specificity of a PHQ-8 score ≥10 for major depressive disorder (vs. other + none) were 100% (8476/8476) and 95% (181,638)/(190,202), respectively; for any depressive disorder, the sensitivity and specificity were 70% (12,556/18,053) and 98% (176,141/180,625). We also calculated the likelihood ratios associated with the PHQ-8 score ranges or thresholds shown in Table 3. The likelihood ratio is defined as the ratio of the probability of a score range or threshold count in individuals with and without a depressive disorder. For example, 9968 people had PHQ-8 scores of 10--14. The likelihood ratio associated K. Kroenke et al. / Journal of Affective Disorders 114 (2009) 163-173 167 Table 1 Characteristics of respondents by depression status — percent in various groups | Characteristic | No depressive disorder (n=180,625) | PHQ-8<10 (n=181,638) | Depressive disorder* (n=18,053) | PHQ-8 ≥10 (n=17,040) | | --- | --- | --- | --- | --- | | Sex | | | | | | Women | 61.0 | 60.7 | 67.4 | 71.3 | | Men | 39.0 | 39.3 | 32.6 | 28.7 | | Age | | | | | | 18–24 | 4.5 | 4.5 | 5.9 | 6.2 | | 25–34 | 12.4 | 12.4 | 12.3 | 13.1 | | 35–44 | 17.6 | 17.5 | 18.3 | 19.6 | | 45–54 | 21.1 | 21.0 | 24.7 | 26.4 | | 55–64 | 19.9 | 19.8 | 20.2 | 20.5 | | 65–74 | 14.0 | 14.2 | 10.6 | 8.6 | | 75 or greater | 10.5 | 10.7 | 8.0 | 5.6 | | Race/ethnicity | | | | | | White | 78.8 | 78.6 | 69.6 | 70.9 | | Black | 7.7 | 7.8 | 11.4 | 10.7 | | Hispanic | 7.6 | 7.7 | 11.1 | 10.2 | | Other | 5.9 | 5.9 | 7.9 | 8.1 | | Education | | | | | |