Comments on the PHQ-9 for Minnesota Department of Health from Hamm Clinic

Note:  The Patient Health Questionnaire-9 (PHQ-9) is a screening measure for depression developed by the Pfizer Corporation and is based on the diagnostic criteria from the DSM-IV for Major Depression.  All physician clinics, including mental health clinics, are required by the Minnesota Statewide Quality Reporting and Measurement System to use this screening measure to assess patient outcome, specifically depression remission at six months. The PHQ-9 score is also being used as part of the risk adjustment determination.

The Cost of Measurement
Hamm Clinic is a small community mental health clinic including 15 staff clinicians (MD, LP, LICSW), a $2.6 million annual budget, 9,000+ annual visits, and 900+ active clients.  We tracked the cost of our efforts to prepare and submit PHQ‑9 depression data for measurement and reporting purposes since 2011, the year reporting started. Hamm calculates that it has spent about $11,000 in database programming and about 100 staff hours, valued at about $3,000, for PHQ‑9 reporting since 2011.

For a small, zero-margin, nonprofit clinic like Hamm Clinic, $14,000 represents a large expense.  As of the Minnesota Community Measurement’s (“MNCM”) 2015 reporting year, our total number of valid, non-excluded records in the MNCM database is around 650, meaning we spent $20 to develop and prepare each record over that four‑year period. Hamm wants to be compliant with state-mandated reporting, but more importantly, Hamm wants to be a good citizen in the health care community. However, the cost of providing this data is prohibitive.

PHQ‑9 Is Not an Outcome Measure and Is Not Intended for Mental Health Settings
The PHQ‑9 was developed to screen for major depression in a medical population.  Since the original research there are many studies, both domestically and internationally, establishing the usefulness of the PHQ‑9 as a screening tool for depression in a number of medical populations and nationalities. There are also studies establishing the usefulness of the PHQ‑9 in medical populations in terms of monitoring treatment progress.

However, in an adequate (but not exhaustive) search, we were not able to locate any studies of the PHQ‑9 in a general mental health or community mental health population, and only one study of the use of the PHQ‑9 in a psychiatric clinic (see below).  We could find no research studies that established that the PHQ‑9 is a reliable and valid tool for measuring progress in a general mental health clinic population. This is not surprising, in that patients within community mental health settings would be assessed with more thorough tools specifically developed for a mental health population, and addressing symptom acuity among a variety of mental health diagnoses or co-occurring diagnoses. Thus, the PHQ‑9 is an instrument that is recognized nationally for a different purpose than is currently required by MDH. Based on the current literature, it does seem appropriate to use the PHQ‑9 in primary care settings, and perhaps some other medical settings, to screen for major depression in those medical populations. There is also some literature support that the PHQ‑9 can be useful in assessing treatment progress in medical populations. However, there does not appear to be any literature establishing that the instrument adds diagnostic utility in a general mental health population, or that it is an appropriate outcome measure among an already identified mental health population, where other mental health symptoms often covary with depression symptoms.

The extent to which the results of the quality measure, as visible on the MNCM website, are likely to demonstrate a wide degree of variation across providers:
At present, there is no reason to assume that the PHQ‑9 would have the same psychometric properties or level of clinical utility when administered in a mental health clinic population, as opposed to administration in a general medical population. These points were made very compellingly in a rare study of the PHQ‑9 in a psychiatry specialty clinic, published in BioMedCentral (Psychiatry 2012, 12:73).[1]  That publication found that despite "high sensitivity and high negative predictive value [of the PHQ‑9]… Its specificity and positive predictive value were low." The research concluded "PHQ‑9 is useful for screening, but not for diagnosis of ‘current major depressive episode’ in a psychiatry specialty clinic." If the PHQ‑9 is of questionable value in diagnosing major depression in a psychiatry setting, there are reasons to be very concerned about the value of using this instrument to track progress for psychiatry patients over time in a mental health clinic.

Here at Hamm, we also suspect that the PHQ‑9 questions might be interpreted differently by patients in medical versus psychiatric or community mental health settings. For example, the PHQ‑9 items might be seen as more of a medical inquiry when given in a medical clinic, and as less loaded toward inquiring about mental health issues. This perception on the part of patients could, in turn, influence the way that clients respond to the questions. Additionally, the base rates of undetected and untreated depression would be expected to be much higher in a general medical population than in a mental health clinic population that specializes in identifying such conditions. In a general medical population, the majority of the patients would be expected to have a predominance of medical presenting problems, with these problems taking up the bulk of clinic visit time. Within this context, it would make sense to include a brief screening instrument, such as the PHQ‑9, to make sure that a common condition such as major depression (not identified by the patient as the target reason for the clinic visit) did not go undetected. For this reason, a screening instrument such as the PHQ‑9 would likely be differentially more useful in a population in which there is a proportionately higher level of undetected depression (that is, a medical practice), than for a population where there is likely be a much lower level of undetected depression (that is, a mental health practice). This is particularly the case as depression symptoms would typically be screened on intake for all patients in a general mental health diagnostic assessment.

The extent to which the quality measure is valid and reliable.
The existing PHQ‑9 literature focuses on the reliability and validity in medical populations. We know that the reliability and validity of measures vary across populations and circumstances. We have no reason to assume that the reliability and validity of the PHQ‑9 would be comparable in medical settings and in community mental health populations, absent supporting research.

For example:
1. The PHQ‑9 score can be elevated for reasons other than the target concern of major depression. This measure contains items reflecting symptoms which are not specific to major depression. For this reason, clients with other diagnoses can be measured as having depression when their score is an artifact of symptoms from other medical or mental health conditions. For example, a patient with ADHD might load on

1.c. Trouble falling asleep, nearly every day - 3
1.g Trouble concentrating on things, nearly every day - 3
1.h. Being so fidgety or restless that you have been moving around more than usual  - 1

With a score of 7, the patient would be seen as qualifying for a mild depression diagnosis when the symptoms were exclusively from ADHD.

2. The PHQ‑9 is a face valid instrument. In measurement terms, this means that on its face, it is easy for the subject responding to the instrument to know what is meant to be measured and to distort results. This allows subjects to either deliberately overstate their symptoms or deliberately minimize their symptoms, if they have a motivation to do so. The measure does not have any validity scales that would check for tendencies to distort in an overly pathological direction or to minimize symptoms. Such validity scales are a standard aspect of more sophisticated psychological tests, such as the MMPI- 2 or MMPI- 2 RF. These considerations are very important in a mental health setting, where patients are likely to be much more sophisticated about mental health issues than patients seen in general medical settings. For example, patients in community mental health centers are more likely to utilize psychiatric disability, and may hesitate to endorse a reduced range of symptoms, indicating improvement, if they believe that this will jeopardize their disability status.

3. The PHQ‑9 was developed as a simple screen for major depression, so that major depression was not overlooked when patients were seen by their primary care physicians. The PHQ‑9 has not been shown to incrementally increase diagnostic validity (improve incremental validity) over existing procedures within a mental health setting context. Additionally, the PHQ-9 has not been researched in terms of reliability or validity as a treatment progress measure in community mental health treatment programs in which multiple co-occurring mental health diagnoses are common. These co-occurring conditions, which often have symptom overlap with the PHQ‑9 items, may not respond to appropriate depression treatment and would influence the ultimate outcome of the PHQ – 9 scores.

In addition, we have no reason to assume that the "response set" of general medical patients is the same as the "response set" of a community mental health population when responding to a PHQ‑9 measure that can be perceived as not simply assessing treatment progress, but also assessing eligibility for continued short-term disability, permanent disability, or disability accommodations at employment settings.

An additional problem with the community measures model is comparing depression outcomes obtained in primary care settings with depression outcomes obtained in community mental health settings, where the patient can be expected to present with far more complexity in terms of co-occurring diagnoses, and perhaps depression severity. These co-occurring diagnoses may significantly impact symptom improvement apart from appropriate treatment of depression.

The PHQ‑9 is a valid tool to screen for depression in a primary care setting, and Hamm Clinic agrees that it is appropriate and helpful for those settings.  However, the PHQ‑9 is not appropriate in a mental health setting such as Hamm Clinic (or other community mental health settings) for several reasons:

  • The possibility of undetected depression in a mental health setting is remote, negating the need for a broad simple screening tool such as the PHQ‑9.
  • The questions of a PHQ‑9 can be interpreted in very different ways when administered in a mental health setting as opposed to a primary care setting.
  • The PHQ‑9 is not adequately researched or validated as an outcome measure in a general mental health population, which is how this screening tool is currently being used in Minnesota. Current use of this instrument to validate quality of care in psychiatry settings or other mental health specialty settings is unwarranted, as there is no empirical support for such a use.
  • Results vary significantly across organizations because mental health and primary care agencies are mixed indiscriminately as are data capture and querying methods.
  • There are other, more sophisticated mental health outcome measures that surpass the PHQ‑9 as an outcome measure for depression and mental health, such as the OQ 45.2. 

Hamm Clinic’s experience with collecting and compiling the data for the state-mandated PHQ‑9 depression measure found that it is prohibitively expensive.  Hamm Clinic would welcome the opportunity to participate in a discussion of adopting a different measure for mental health care settings. 

James G. Dungan-Seaver, M.A., has served Hamm Clinic since 2007 when he became IT Manager.  He managed the implementation of Hamm Clinic’s data systems that currently capture, store and report all outcome and research data.  In 2015, Mr. Dungan-Seaver became the Director of Operations.

Nancy L. Hammond, Ph.D., LP, earned her Ph.D. in clinical psychology from the University Minnesota in 1981, and has been in continuous clinical practice since that time, including both private practice and community mental health settings. She is currently the assessment psychologist at Hamm Clinic, where she also provides supervision to doctoral interns in the APA training program and coordinates the Hamm Clinic research program.

David J. Roseborough, Ph.D., is an Associate Professor of Social Work at the University of St. Thomas in St. Paul, Minnesota. He is a certified cognitive therapist and diplomate in the Academy of Cognitive Therapy with 18 years of clinical practice experience.  His research focuses on psychotherapy outcomes in "real life" clinical practice settings, using the Outcome Questionnaire (OQ 45.2).




Richard Sethre - Tuesday, October 04, 2016

Thanks to the Hamm team for a very thorough and thoughtful assessment of the (limited) benefits of the PHQ-9 and the very real limitations of the tool for mental health settings. In addition, the PHQ-9 is really an initial screening tool and requires a follow up protocol in medical settings - which may or may not be done. An even higher level of concern is, I think, warranted about the PHQ-2. Unfornately, mental health professionals have let medical researchers take the lead in developing outcomes measures that get adopted by policy makers, and get imposed on mental healht providers. Assessment and outcome measures that have been developed by mental health professionals are usually a better match - and are available. Of course, it would be helpful if DHS and MHD would be more flexible about accepting a range of assessment and outcomes measurement tools.

