The Price We Pay

Home > Other > The Price We Pay > Page 12
The Price We Pay Page 12

by Marty Makary


  We took our results to the Mohs surgery leaders and they said the analysis confirmed their suspicions. They even recognized the names of some of the doctors who were in the top 2% of the outliers. They had heard stories about them or seen some of their patients for follow-up care. The experts in the field said that any high-volume surgeon who averaged more than 2.2 stages per operation was beyond the threshold of what they would consider appropriate. We had consensus.

  Next, we sent letters to about half of the surgeons we analyzed. We didn’t reach out to all of them right away because we wanted to study whether our outreach had any effect on their performance. We needed one group with whom we intervened and another that we did not (a control group). The letters came from the American College of Mohs Surgery (ACMS) and my team at Johns Hopkins. They included a one-page report that showed each surgeon how he or she compared to the rest of the Mohs surgeons in the country. A graphic designer helped my team generate physician-specific reports showing where each doctor stood on the bell curve. Doctors who used an average of 3 and 4 stages or more per case were way out on the tail end of the chart.

  We didn’t chastise anyone who fell into the outlier category. We simply said “This is where you stand relative to the rest of the Mohs surgeons in the country.” We also indicated in the report that the association embraced a range of normal variation in the average number of stages per operation. According to the Mohs experts who designed the reports with us, it would be obvious to the outliers that they were well outside the range of what seemed appropriate.

  The letter, signed by the top leaders in the field, also offered educational resources and invited feedback on the project.

  We sent out about a thousand reports and then I held my breath. Would the notification make any difference?

  Surprising Response

  In the days after we notified the Mohs surgeons, I kept expecting the phone to ring with complaints. But the gripes didn’t come—neither to me nor to the Mohs surgery leaders who had cosigned the report’s cover letter. Then the emails began to roll in:

  Thank you for the recent report. I had no idea where I stood relative to my peers nationally and now I know. I’m above average but will take a careful look to see how I can improve.

  I love showing this metric to my patients.

  I just wanted to give you a quick word of feedback on the Individual Surgeon Data Report I just received: I absolutely love it! I have wanted to know for some time where I stand relative to my peers regarding my average number of stages versus my peers and to my chagrin it just arrived in the mail! It gives us a nice benchmark to how we are doing.

  Thanks for sharing this data. I’ll work on my technique. Will this information be used for anything? Will it be made public? Please let me know.

  I’d like to learn more about the retraining offered by ACMS.

  When will the next report be delivered?

  Thank you for the report. Very important.

  I had heard of the reports coming out and was glad to see I’m not an outlier.

  The surgeons appreciated seeing their data! Sure, the emails were anecdotal, but none of the responses challenged the metric. My team followed up with a survey that found that 80% of all surgeons in the association believed that sharing performance data like this was important. In my opinion, the positive response to the “Dear Doctor” letters was because this program was 100% homegrown, based on the wisdom of practicing doctors who understood the proper use and misuse of their craft.

  But the million-dollar question was whether our intervention would work. Would it spur outliers to change their practice patterns? Our goal wasn’t merely to inform the surgeons; we wanted change. We gave it a year and checked the national Medicare data again for the doctors we notified. The results were striking. We found that 83% of notified outliers had changed their ways for the better. Moreover, the reduction in blocks per case appeared to be sustained.3

  The long-term follow-up data revealed an additional interesting trend. In the months after we sent the letters, even the outlier surgeons in the control group, whom we did not contact, began changing their behavior. They had not even seen their data but began to reduce their average number of stages per case, albeit to a lesser extent. Our intervention appeared to have had a crossover effect. And I can see why. It created a lot of buzz when we sent out the reports, and word travels fast among doctors within a specialty. The Mohs leaders also wrote commentaries about the initiative and gave talks about the importance of the program. I heard that some surgeons who fared well in our analysis were broadcasting the fact to their friends and peers. Hey, nothing wrong with that. The program had sent a message to outliers: your national leaders are monitoring the macro trends in your practice data.

  Albertini liked what he was seeing. The initiative had created a culture of accountability, and he was hearing stories confirming the improvements were real. “Moreover, no one got humiliated or penalized,” Albertini said. “It’s a confidential peer-to-peer way to address our outliers in a civil way.”

  The entire program cost $150,000 that first year, but it resulted in $11.1 million in direct savings to Medicare—that is, to U.S. taxpayers. At the time I submitted this book to the publisher, the savings had escalated to $18 million for the 18 months after the intervention. Not only were the findings well-received by the medical community, but the publication of the results was accompanied by a very supportive editorial entitled “Physicians Respond to Accurate, Actionable Data on Their Performance,” written by the American Medical Association board chair Dr. Jack Resneck and University of Iowa Mohs surgeon Dr. Marta Van Beek. The national conversation that followed affirmed the notion that when doctors are involved in quality improvement from the get-go, the results can be incredible. What else in health care yields a 7,430% return on investment?

  But why do some performance improvement programs work so well while others struggle? I attribute part of the success to civility. By including practicing doctors early and by using a peer-to-peer method of sharing data in a way that is nonpunitive and confidential, we were highly effective. Moreover, the project focused on what doctors believe to be important. I have visited hundreds of U.S. hospitals and a consistent message I hear from their quality improvement leaders is “We collect all this data, now what do we do with it?” They’re burdened with tracking all sorts of things, some of which don’t matter. Ultimately, out of a sense of helplessness, the leaders dump this data on the doctors, who in turn explain it away with the claim “My patients are sicker.”

  The “My patients are sicker” argument has been a major barrier to improving health care. But it’s code for something else. This is doctors saying “You don’t understand me or what I do.” It’s what happens when quality improvement programs are forced on doctors without their consensus. To be effective, a method of measuring care must be developed and endorsed by the doctors and clinicians who work in that specialty. The input needs to come from a range of physicians who serve diverse patient populations. As we expanded Improving Wisely, I required that all the doctors on the expert panels spend at least 70% of their time in patient care. And I insisted on rural and community hospital representation to balance the doctors representing big academic hospitals.

  Many other industries have their practice patterns measured. In 2009, the utility company Positive Energy (now Opower) was interested in reducing power use in neighborhoods. Their data showed that some households used far more electricity than their neighbors. After all, there are no standardized protocols on turning lights on or off when one vacates a room. Just ask anyone who’s argued with a spouse about this issue.

  The company decided to mail each household a regular feedback report that compared their electricity and natural gas usage to that of similarly sized households in their neighborhood. Playing on the benchmarking theme, the data feedback intervention resulted in an overall reduction in household energy use. When people saw they were outliers, they modified their habits so t
heir usage fell more into line with that of their peers. In a year, this simple intervention reduced the total carbon emissions of the participating houses by the equivalent of 14.3 million gallons of gasoline, saving consumers more than $20 million.4 Lots of utility companies now take this approach—and it works.

  Metrics Matter

  I’ve examined hundreds of quality metrics over the years and developed my own. I’ve come to believe many of them need context to be meaningful. The metrics must zero in on what it means for a patient’s quality of life and potential disability. The criteria should focus on significant harm or waste by extreme outliers rather than small variations in practice. The metric also must be measurable and designed so it can’t be tainted by bias or gaming. And finally, a sound metric should be highly actionable for the physician. Metrics such as mortality, while easy to collect, are hard to make actionable. We need more measures that provide direct insight into what the individual physicians can do right now to modify the way they practice.

  After the skin cancer project with the American College of Mohs Surgery, I asked its leadership if anyone had ever proposed the measure they used for the project to Medicare or the broader health care community. Our data suggested that reining in these unwarranted practice variations has dramatic implications for lowering health care costs. “No,” they said. No one in the broader medical policy world ever asked for their input, they explained. Yet again, I saw the disconnect between those making the rules in health care and those practicing it.

  To try to address this gap, I set up a meeting with midlevel Medicare leaders. They liked the idea of setting boundaries of acceptable variation. They referred me to their website, which lays out their standard process for proposing a new quality measure for Medicare to consider using. I noticed that one of the requirements was that any proposed new measure be supported with multiple published articles proving that the measure was evidence-based. That’s a nice idea, but a narrow way to look at quality improvement. No one would ever do a trial comparing the outcomes of surgeons who take two versus three blocks per case. For one thing, a trial subjecting people to three blocks per case would be unethical. I gave up on the website.

  Medicare requires a quality measure to be based on published evidence. But in the Robert Wood Johnson Foundation project on Mohs surgery, we had a different way of looking at things. We used the wisdom of busy practicing doctors to create a specialty-specific way to measure quality.

  The following year, I was invited to meet with the new batch of Medicare leaders at the highest level, Seema Verma, Paul Mango, Kim Brandt, and Adam Boehler. I explained to them that these pattern measures were very telling and that using them had broad implications for cutting waste in Medicare. Waste like the unnecessary vascular procedures Medicare was paying for, such as the ones being performed on church members down the street. The Medicare leaders I met with quickly “got it” and made further work on pattern measurement a priority. Within months, Medicare made plans to send out “Dear Doctor” letters to the country’s most extreme outliers.

  The peer comparison program conducted by the American College of Mohs Surgery was extremely well received. The following year, the association’s leadership decided to expand the program to tackle overuse of skin flaps (a technique to move skin) and the overuse of Mohs surgery in areas of the body where it was rarely indicated, such as lesions on the trunk and legs.

  The Improving Wisely project looking at skin cancer surgery rejected the conventional way that we measure performance. Instead it measured a physician’s patterns to identify practices experts deemed dangerous or otherwise indefensible. We published our “appropriateness measure” for skin cancer surgery at the surgeon level, and then it was time to press forward. We would soon scale the model. Improving Wisely5 was about to get a lot bigger.

  CHAPTER 8

  Scaling Improvement

  The hospital cafeteria at Johns Hopkins is the Grand Central Station of new ideas in medicine. Grabbing yogurt and fruit one morning, I bumped into Dr. Ali Bydon, a very smart and jovial colleague specializing in spine surgery. I always enjoyed my conversations with Ali because he has such a pragmatic approach to medicine. We caught up quickly and I filled him in on the Improving Wisely project I was undertaking with the skin cancer surgeons. I treasure these interdisciplinary conversations with colleagues because I learn so much from them. When I asked Dr. Bydon if there was something in spine surgery that was overdone in a way that was measurable, he perked up. Some surgeons will inappropriately perform surgery for back pain in patients who have never tried physical therapy (PT), he said. In most cases, physical therapy manages the pain better than surgery. It should always be tried first. If PT fails, then surgery may be necessary.

  His idea sounded as though it had potential for the Improving Wisely project. I asked him which specific operations he was referring to, to see if I could pull the cases from the Medicare data. He rattled off a series of elective operations by name: lumbar laminectomy, discectomy, hardware insertion, and other elective (non-urgent) procedures. He also listed specific situations to exclude from the analysis because they could warrant emergency back surgery. For example, he told us not to include any operation that involved trauma, possible neurological injury, a spinal tumor, or paralysis.

  Once I gathered the medical codes for each operation and clinical diagnosis, I swung by Ali Bydon’s office and asked him the big question: “If we measured the percentage of elective back surgery operations that a surgeon does in which the patient had at least one physical therapy appointment in the preceding year, would that be a reasonable measure of appropriateness?”

  “That would tell you a lot about the surgeon,” said Bydon. “If there’s no PT, that would probably tell you which surgeons are doing operations they shouldn’t be doing.”

  I took the idea back to my research team and we graphed out every back surgeon in the United States by the proportion of elective back operations prior to which the patient had had at least one physical therapy visit in the preceding year. Sure enough, most surgeons were in the same range, but a small group of surgeons were doing a large proportion of their operations in patients who had never tried physical therapy even once before surgery.

  Later, I went back to his office and this time brought a med student observing with me that day. I shared the findings with Dr. Bydon and he pointed out the subgroup of outlier surgeons on the graph. “Those are the surgeons doing unnecessary back surgery,” he said. After a closer look, he remarked that most doctors do the right thing, but a segment of outlier surgeons had a practice pattern that was indefensible. “This is amazing data.”

  The med student asked, “Why can’t you just require physical therapy before all back surgery?” Bydon pointed out there are rare exceptions when physical therapy does not make sense, or an occasional miscoded patient in the data. For those reasons, we wanted to create an acceptable range for a surgeon. For this measure, Bydon and other experts determined that at least half of a surgeon’s elective back surgery patients should have physical therapy first, regardless of circumstances. That would be a starting point to promote best practices.

  Surgeons who had none of their patients try physical therapy before back surgery were obviously outliers. Looking at their names in the national data made Bydon furious. Becoming a spine surgeon requires years of specialized training. It’s a technical tour de force, an amazing specialty that can cure disability and other ailments. The outliers were playing a lucrative trick on patients and it was no secret in the spine surgery community. In the past, they had just not been identified.

  Looking at the analysis again, Bydon said, “These doctors are giving spine surgeons a bad name.”

  In medicine, a recommended treatment guideline is rarely absolute. It could be entirely appropriate to modify procedures in an individual patient’s situation. On the other hand, a recommended treatment could be modified because the doctor is profiteering, responding to our consumerist patient culture.
Or perhaps the doctor is unaware of the best practice. Or it could be a mix of these reasons. In any case, it’s difficult to determine whether deviating from the standards is okay or not. The approach of measuring patterns appealed to Bydon and other doctors because it factored in rare cases by creating an acceptable range. Doctors wouldn’t be treated like miscreants for making an exception for a patient who needed it.

  Bydon asked me if anyone had ever cut the data this way. I explained that we only recently got access to the information. Researchers, including me, had lobbied the government to give us access to the national Medicare data. We argued that since it’s funded by taxpayers it should be accessible to the public. Medicare responded by providing a limited group of us access to the Medicare servers under a user agreement allowing us to look at a physician’s unique national identification number. Even though it took a year to get access to the Medicare servers, we were now able to study pattern data. That’s how my team was able to generate physician-specific reports for skin cancer surgeons. Prior to this unprecedented access, researchers like me were only given data that was three years old.

  Measuring patterns also seems novel because it’s hammered into every doctor—from medical school to residency and throughout our careers—that we cannot believe anything unless a randomized controlled trial has proved it. Of course, no one had ever done a trial in which patients were randomized to have elective back surgery without first trying physical therapy. Randomized trials done just to prove such a point would be unethical. We’ve all seen with our own eyes patients who avoided surgery because a physical therapist did a great job treating them. Even if someone did such a trial, it wouldn’t tell you how often patients should have surgery without first doing physical therapy. The idea is ludicrous. Where I’ve challenged academic elites is that the randomized controlled trial design of research was developed to test medications compared to a placebo. Thankfully, others have spoken up as well. An entire issue of the journal Social Science and Medicine was recently devoted to it, with many articles pointing to the shortcomings of randomized trials.1 Here’s one way to think of it: randomized controlled trials are not the way one should evaluate whether a parachute is effective in saving the lives of skydivers.

 

‹ Prev