by Marty Makary
Now imagine that the same rude flight attendant’s supervisor tells her that her individual customer satisfaction score ranks at the very bottom, in the first percentile of the airline’s 10,000 flight attendants. Then the supervisor informs her that her scores will be reviewed again every six months. That would change her behavior. Cindy might even start offering passengers an extra pack of pretzels.
That flight home reminded me of what should be a central tenet of measuring quality across any industry: data must produce meaningful results so people will be prompted to take action.
I often visit hospitals and hear “We collect all this data, now what do we do with it?” Administrators deliver organizational-level data to doctors and nurses and expect it to somehow transform behavior. That doesn’t work. However, showing how individual doctors are performing can be formative and typically leads to rapid improvement. I’ve seen it happen many times.
Competitive by Nature
When I traveled to Providence St. John’s Hospital in Santa Monica, California, I witnessed the dramatic effects of switching from group to individual data. In January 2017, the hospital’s C-section rate for first-time deliveries was the highest among 12 hospitals in the region. Dr. Jon Matsunaga, chair of the OB/GYN department, didn’t like how his hospital’s statistics compared. Knowing that their patients were no different from the others’, Matsunaga decided to get a handle on the data. The hospital began using peer benchmarking: they compared doctors to one another.
He showed each doctor in his department his or her C-section rate, and something magical happened. Immediately, the C-section rate plummeted. Within a matter of months, he reduced the hospital’s C-section rate by half. Now Providence St. John’s Hospital held the lowest ranking out of the dozen hospitals in the region. Dr. Matsunaga’s leadership proved a critical prerequisite, but it was the power of peer benchmarking, with usable data, that resulted in thousands of healthier moms and babies in Southern California.
I asked Dr. Matsunaga how the obstetricians with the highest C-section rates changed their ways so quickly. He recognized that the doctors had allowed competing priorities to override what’s best for the patient. He found that once he started sharing data among the doctors, it made them much less likely to nudge patients toward a C-section.
In the larger scheme of things, Dr. Matsunaga’s intervention truly improved health outcomes, far beyond reducing complications per operation. His work redesigning the way doctors are managed had a greater impact on the community’s health than any new medication or new technology in the field that year. Data, when used properly, can be incredibly powerful.
In a different case, a New York hospital asked me to analyze C-section rates for each doctor in a group that shared the same call schedule. Somehow one had an extremely high C-section rate compared to the others. I produced the doctor-specific data so each could see how he or she compared to their peers.
Based on the performance of the group, I showed an upper boundary of what would be considered acceptable. That way each doctor wouldn’t feel pressure to be right at the middle. The focus was on the outliers whose rate went beyond the threshold—far beyond in the one case. The hospital leadership presented the reports to the doctors.
A few months later I sat in a conference room with a handful of this hospital’s leaders. I asked them how the data was being received. “Most all of our obstetricians were pleased to see their data was in the reasonable range,” one said.
“But,” he continued, “our outlier doctor, the guy with the 60% C-section rate, he gave us an earful. He argued that his patients are sicker. He claims he is an expert on high-risk pregnancies, and because he is the best in his field, he gets the hardest cases within a poorer population.”
I’ve heard those claims before. We agreed to do homework. I returned to Hopkins to talk with my obstetrics colleagues. They didn’t buy the doctor’s excuses. They treat a lot of high-risk patients from inner-city Baltimore and still have a C-section rate under 30%.
I asked the hospital leaders to ask the doctor what he considers to be a high C-section rate, given the complexity of his patients. 50%? 80%?
When I took a closer look at this doctor’s data, I saw his patients were about the same as those of his peers. That was to be expected. It wouldn’t make sense for him to get all the complex cases. The deliveries were random, based on whichever doctor was on call at the time.
Then I noticed something peculiar in the data. This doctor’s C-section rate wasn’t especially high on most days of the week. But on Fridays, it shot up to 80%. Were all the high-risk women coming in exclusively on Fridays? Maybe all these C-sections had more to do with this doctor’s desire to enjoy his weekend.
I showed the hospital leaders the Friday phenomenon. They smiled and said, “Thank you very much.” When they went back and showed the doctor his C-section rate broken down by day of the week, he quickly changed his tune. He no longer claimed his patients were more complex. He responded, “Okay, I’ll see what I can do.”
Seeing my own performance data has also helped me improve. At Hopkins, Dr. Caitlin Hicks, the surgeon on my team, teamed up with one of our anesthesiologists, Dr. Steve Frank, to tackle the problem of unnecessary blood transfusions. The problem was that doctors were sometimes ordering blood on patients who did not meet the laboratory criteria. The criteria were well established and based on many studies, including one in the New England Journal of Medicine from 20 years ago, but getting doctors to follow the evidence was a daily struggle at our hospital. So Drs. Hicks and Frank decided to be creative and harness the competitive nature of a doctor’s personality. They began sending us regular reports on how our personal blood transfusion rates compared to those of others in our department. The result was dramatic. They saw an immediate reduction in unnecessary transfusions. One of the quarterly reports showed that my rate was higher than those of four of my partners who do pancreas surgery. I didn’t like being an outlier without any justifiable explanation. It was the kick I needed to get back in step. The next time an anesthesiologist started hanging a bag of blood during surgery, I stopped him and asked whether the patient’s blood level was low enough to meet the criteria in the national guidelines. He said “no,” and that he had just assumed I might want the transfusion. I nixed it, sparing the patient unnecessary risks of a transfusion. The power of peer benchmarking became real to me.
Hard Work Ahead
After I got back from Florida, I talked to my research team about Dr. Dinner. We discussed ways to measure patterns of individual doctors’ performance and evaluate the appropriateness of care. We knew our methods would have to be clinically smart so they’d be fair to physicians, tailored to their specialty and unique types of patients. We also had to make sure we collaborated with a diverse group of physicians who practice in the specialty being measured. Community physicians, not just academic physicians, had to be included. They have a unique vantage point, and after all, they deliver most of the medical care in the United States.
Measuring appropriateness would become a focus of our research team. We studied patterns of performance. They showed the style of doctors’ practices, their threshold to intervene, and the degree of risk they take. Patterns are what we talk about in our surgical locker rooms and lounges. They are how we describe doctors who are following best practices and doctors who need help. They are what doctors like me use to find our own highest standard of medical care.
There’s one strong reason that practice patterns have not been used to measure clinical appropriateness and waste in health care: this kind of study is hard work. Constructing measures that are smart and don’t unfairly label great doctors as bad ones requires an intense understanding of the specialty, treatment algorithms, medical codes used, and an appreciation for doctors who may treat complex populations.
It also requires a good deal of time to establish consensus around thresholds of what constitutes an outlier, and a good deal of boldness to challenge outdated pr
actice styles. It would take digging into the clinical nuances and areas of waste in countless clinical scenarios. Thousands of practicing doctors would need to contribute. Daunted by the project’s scope, I considered retreating and instead simply writing philosophical pieces in the medical journals. But executing these objectives could make medicine more precise and less costly.
Soon after I presented the goal of measuring patterns to my research team, I spoke at the national health insurance conference put on by America’s Health Insurance Plans (AHIP), where I outlined how using pattern data could allow for the measurement of the appropriateness of medical care. I concluded by suggesting we measure patterns of overuse. Susan Dentzer from the Robert Wood Johnson Foundation, the largest philanthropy in health care, was in the audience. She approached me after my speech. “Marty,” she said, “I get it. Let’s talk.”
Susan previously worked as the editor in chief of Health Affairs, the nation’s leading health policy journal. She knew the issues. Thanks to Susan, Anne Weiss, and Emmy Ganos at the RWJ Foundation, I soon had a large grant to create a new generation of quality measures. We would finally be able to measure appropriateness of care.
The RWJ Foundation had already funded the Choosing Wisely program, a national collaborative that challenged every medical specialty to list five tests or treatments that are usually unnecessary. For example, one of the consensus recommendations is not to use a DEXA bone scan in women under 65 to screen for osteoporosis. Another Choosing Wisely recommendation is to avoid a CT scan or MRI of the head in a child with a simple febrile seizure. Choosing Wisely has done an impressive job of raising awareness among both doctors and patients about the problem of overtreatment. The consensus recommendations can be found at ChoosingWisely.org.
Remarkably, the Choosing Wisely project created a new conversation in medicine about overuse. Eighty medical specialty associations participated and changed the culture of medicine. The next step would be to go beyond awareness and move to meaningful quality metrics. My charge from the foundation was to pick an area of medicine in which data transparency could be used to reduce unnecessary procedures and lower health care costs.
We called this new program Improving Wisely.
CHAPTER 7
Dear Doctor
The usual odorless, tasteless cubed cantaloupe made its standard appearance in the back of the Marriott Marquis meeting room, but this was no typical meeting. I was in Washington, D.C., with the top leadership of an association of skin cancer surgeons called the American College of Mohs Surgery. If you’re not one of the millions of patients a year who get skin cancer, you may not have heard of Mohs surgery. But the technique—developed in the 1930s by Dr. Frederic E. Mohs1—is a big reason skin cancer is much more manageable today.
I had heard that the Mohs surgeons were interested in addressing overtreatment. And months earlier, I’d connected with the association’s president, Dr. John Albertini, by phone. I had told him how my Hopkins team and I wanted to work with specialty associations to identify ways to measure the appropriateness of medical care. Right away, he got it. He even one-upped me, telling me about something else he saw doctors doing that troubled him.
To enable me to understand the problem, Albertini had to explain the technicalities of Mohs surgery. In this specialty, the doctor’s role is unique because the doctor acts as surgeon, pathologist, and reconstructive specialist all during the same procedure. The goal is to excise all the skin cancer while minimizing the amount of healthy flesh that gets removed. A Mohs surgeon starts by cutting out the cancer in a block of tissue and examining it under the microscope. It might be a sliver of flesh or an inch-square cube. If the tissue block has cancer cells on the edge, that means the surgeon didn’t remove all the cancer. It’s what we surgeons call a “positive margin.” The surgeon goes back to the patient and removes an additional sliver of tissue at that location. Each tissue block removed is referred to as a “stage” of the surgery. The Mohs surgery breakthrough is removing all the visible cancer while preserving as much normal skin as possible. In the old days, skin cancer surgery disfigured patients because doctors took so much flesh at once.
Here’s where things get interesting. Mohs surgery typically takes one or two precise stages. On rare occasions, a third stage may be necessary. Surgeons get paid well for Mohs procedures. And it turns out that they get paid per stage. Cut a little extra here and there and you get a bigger paycheck, whether the extra cuts are necessary or not. As a surgeon, I was familiar with these types of financial “carrots” lying around the operating room.
Albertini explained that over the last several years, the association’s leadership had heard multiple reports that some doctors appear to be doing the operation in too many stages. It may be that the doctors need further training. Or they could be motivated by money. Albertini proposed a pattern we could examine for the Improving Wisely project. We could look at the average number of stages each surgeon used during the procedures. We would see who was making the most cuts. “Most surgeons fall in a certain range,” Albertini told me. “But some are going to be way out there adding time and expense to the procedure and unnecessary surgery for patients.”
Albertini said that the project’s success would depend on the buy-in of his colleagues: the leaders of the Mohs surgery association. We made a plan for me to join the association’s executive leadership team when they gathered at the American Dermatology Association conference in Washington, D.C.
A Crucial Buy-in
I was nervous walking into the hotel conference room, so I took a pass on the cantaloupe cubes. This was my first time pitching to a surgical society the idea of analyzing the practice patterns of individual doctors. I hoped they would agree that lowering health care costs can start with eliminating medical care that doesn’t need to be done in the first place.
This meeting could be critical. We as a country spend more than $15 billion a year reporting quality metrics to the government and to one another.2 Doctors tire of flavor-of-the-month quality improvement campaigns—especially those imposed on us without our input. I knew doctors on the front lines had to define which practice patterns were appropriate and which were not. I needed to see consensus from them. I needed them to tell me the best way to proceed.
Albertini welcomed me, introducing me to the titans of the field. Sitting around the conference table were Dr. Tom Stasko from Oklahoma, Dr. Allison Vidimos from Cleveland Clinic, Dr. Richard Bennett from UCLA, Dr. Victor Marks from Geisinger, Dr. Barry Leshin from Winston-Salem, and Dr. Brett Coldiron from Cincinnati. I had read about Dr. Coldiron: he alone had performed more than 50,000 of these state-of-the art Mohs operations.
I dived into my presentation. I told them about the Improving Wisely model. I explained that the first step would be to identify something that’s overdone in their field, then devise a smart way to measure how often a doctor does it, and then see if there is an agreement about how much is too much. Finally, we would reach out to the doctors whose practice patterns fell outside the boundary of what they considered appropriate. That feedback would show them how they compare to their peers and allow us the opportunity to help these doctors improve. My research team had taken Albertini’s idea and run with it. I provided preliminary numbers showing that most surgeons averaged one or two stages during Mohs operations. But some averaged three or four.
As I talked, the surgeons were murmuring and nodding. The vibes were good. Then they started jumping in with comments supporting what I was saying.
“Makes sense,” said one of the board members. “We need to do something about surgeons out there who are operating with no accountability.”
“There are practice patterns that clearly cross a threshold,” another surgeon added.
“This is what a professional association is supposed to be doing,” chimed in one of the other Mohs leaders.
They got it! They felt pride in their profession and a duty to act. The surgeons there were concerned that a small number of d
octors in their field might be sucking a lot of money from the system. They agreed that doctors who were out of line would not like being identified as outliers in their field. They hypothesized that these doctors’ competitive nature would kick in and they would probably reduce their overuse on their own. From my own observations in medicine, I couldn’t have agreed more. People respond well to competition.
The Mohs surgery leaders liked Albertini’s proposal, and I did, too. Our intention wasn’t to penalize or even require preauthorization for a doctor to remove a cancer in three stages or more; in some cases, it might be necessary. But a pattern of doing three blocks in a large number of patients is something these experts said seemed inappropriate. Cutting through the tumor rather than around it was a lucrative temptation doctors face frequently. Skin cancer is the most common cancer in the world, and the technique is used to treat the basal cell and squamous cell subtypes and is increasingly being applied to melanoma as well.
The Improving Wisely approach sure beat the old way of measuring infection rates and readmissions to the hospital. Both are exceedingly rare with Mohs surgery, which is done as an outpatient procedure. The board enthusiastically accepted the offer to partner with me and my team at Johns Hopkins. We were ready for takeoff.
Identifying Outliers
I got to work with my research team. We obtained data from the federal government for every patient in Medicare, the government’s insurance program for the disabled and patients over age 65. The data included each doctor’s identification number, and it showed the number of stages billed for each operation. We used the data to graph each surgeon in the United States by the average number of tissue blocks they removed during the skin cancer operations performed. Sure enough, as the Mohs surgery leaders had predicted, most doctors were within a range of normal practice variation. The typical surgeon averaged between 1.2 and 2 blocks, or stages, per patient over the course of a year. But there were also some outliers who averaged 4 or more stages per patient.