When a patient comes into the hospital with severe abdominal pain, doctors use clinical reasoning to make a diagnosis. They develop hypotheses based on who the patient is (a 20-year-old man or a 75-year-old woman) and then test these hypotheses. The tests begin with history questions—Where exactly is the pain? Have you had this pain before? Physical examination maneuvers come next: inspection, auscultation, palpation, and percussion—the four pillars of examination. And then laboratory, radiological, or even more invasive tests are done. At each stage, the doctor considers the likelihood of a diagnosis and then revises that likelihood based on the test results. When diagnoses are easy, it is because the tests (history, physical, laboratory, and so on) are highly predictive. If the over-weight, 40-year-old woman with pain in the right upper quadrant of her abdomen tells you, “I always get this pain after I eat a fatty meal,” you know she has gallstones.
Much of this process is, in fact, evidence-based. There are data that describe how likely women around this age are to have gallstones. We call this the “pretest probability.” There are also data about the test characteristics that describe the accuracy of these diagnostic studies. Knowing these characteristics—the sensitivity (how likely a test is to be positive in people with disease) and the specificity (how likely a test is to be negative in people without disease)—allows doctors to calculate, usually more qualitatively than quantitatively, a “post-test probability,” the likelihood of a diagnosis.
You are probably already beginning to realize why the ability to be evidence-based in the diagnostic process is challenging. There may be reasons that the pretest probability that you look up in a book does not apply to your patient. Then there are the test characteristics. Are the test characteristics of your physical examination of the patient’s abdomen the same as those reported in a journal article? Probably not. The patient and the doctor are different.
Then there are the tests we do that are more exploratory than diagnostic. If our patient is not a textbook case (such as our overweight, 40-year-old woman with right-upper-quadrant abdominal pain) but a more enigmatic one (a 29-year-old man with nonspecific belly pain), how should the case be approached? After a medical history and physical examination, most doctors start with a battery of blood work: blood counts, blood chemistries, liver-function tests. This strategy is not evidence-based. No one has tested in a randomized fashion whether obtaining blood counts or chemistries improves the outcomes of patients with abdominal pain. However, we do it because it is often a helpful part of the process. It gives the doctor a very general idea of a person’s health. It might alter the pretest probability of a disease the doctor considered highly improbable. It might provide practical information—knowing the kidney function will be useful in deciding on the next test.
Another issue with trying to study diagnostic tests is that the end points are not as clear as those that exist in studies of therapy. A new treatment for back pain should be evaluated to see whether it helps back pain, but what should the end point be for a new diagnostic test for back pain? The most dogmatic advocate of evidence-based medicine would say that the test is beneficial only if it improves back pain. But common sense tells us that we use diagnostic tests for many reasons. A test might be done to choose later tests, to reassure patients, to offer prognostic information, to evaluate the effectiveness of therapy. These are very difficult end points to study.
All that said, diagnostic medicine should not be an evidence-free zone. There are tests being done that seem to have no role in a well-considered diagnostic process. When such tests are performed routinely, you do not know whether it is because the doctor is practicing defensive medicine, not thinking carefully, or just trying to pad his paycheck. Ultrasound images of the carotid arteries (two of the three arteries supplying the brain) are routinely done in many hospitals as part of the evaluation of patients who have fainted. There is no reason to routinely do this test, and randomized controlled trials could prove that this test is unnecessary in the vast majority of patients. Trials should be done for diagnostic strategies such as this, tests that are routinely employed but whose efficacy is unproved and doubtful. Examination of these practices could have an enormous effect on quality of care and health-care expenditures. When a middle-aged person comes to the emergency room with chest pain that has resolved, hospitals around America routinely follow the same protocol to detect a heart attack: perform an EKG, a chest X-ray, and blood work. If those tests are negative, most hospitals will pursue some sort of stress test. We should study whether this last test is necessary. Does doing a stress test decrease the rate of missed heart attacks, or does it just run up costs and harm patients by leading to unnecessary, invasive procedures?
There have been some great successes over the past couple of decades in proving the best way to evaluate some very common complaints. Studies have tested how to approach patients with complaints as diverse as low-back pain, ankle sprains, and disabling headaches. The studies have yielded “clinical decision rules” that define which patients require evaluation (and what evaluation is necessary) and which patients can be safely observed. The impact of these studies has been enormously beneficial to patients and the health-care system. We can foresee other studies that would allow us to say things like: for any patient over age 75 who comes in with unintentional weight loss, four tests have been shown to improve survival, six tests contribute to the diagnostic workup only 1 percent of the time, and three other tests add nothing other than increased costs.
It is said that diagnosis is more an art than a science. A patient’s symptoms can be idiosyncratic—there may be a finite number of diseases, but there are an infinite number of ways that patients manifest these diseases. Doctors reason through cases in diverse ways. Recognizing that this is the case, and understanding that the goals of diagnostic testing are varied, we understand that clinical diagnosis cannot be completely ruled by the outcomes of clinical trials. That said, not all diagnostic practices are acceptable, and patient care will improve if common diagnostic evaluations with clear end points are standardized. Diagnosis is the part of the job that can still lead us into heated arguments with fellow doctors. We recognize that there needs to be room to accommodate this diversity of thinking.
OPTIMIZING MEDICAL CARE
Francis Peabody is one of the most quoted physicians in American medicine. Probably his most repeated line is, “The secret in caring for the patient is to care for the patient.” In order to provide good medical care, you have to have the patient’s best interests at heart. By and large, doctors have their hearts in the right place. We all want to do what is right for the people who walk through our doors. Yet, we tend to believe that our biological understanding of disease translates into choosing therapies that work. But as we have seen repeatedly in this book, not biological understanding, not common sense, not observational studies, and not even small, single-center randomized controlled trials are sufficient to conclude that a medical practice works. Although our hearts are in the right place, our heads do not always get it right.
We must also confront the uncomfortable fact that in medicine today, financial incentives bias us. Doctors come to believe—maybe they fool themselves into believing—that a procedure or test that is logical and happens to be well-reimbursed also benefits their patients. We want to believe the practice works, and money can corrupt our thinking.
We must also acknowledge that a bias toward adopting practices prematurely is shared by nearly all the players in health care: researchers and innovators, the drug and device industry, and practicing doctors. Professionally or financially, they all profit from new practices. As a result, regulatory agencies are under pressure to approve more products more quickly. We cannot begin to count the number of articles we have read criticizing the FDA for its excessive caution. As we have tried to convince you, there are good reasons for regulators to be cautious and set a high standard for approval.
Finally, we must recognize that the amount of funding available to
study the effectiveness of medical practices is much smaller than the amount of funding available to pay for the untested medical practices. The entire budget of the National Institutes of Health is around $30 billion. Not a small sum, but nothing compared the $550 billion budget of the Centers for Medicare and Medicaid. Our commitment to studying what we do is not nearly where it should be.
For these reasons, our medical system is too tolerant of unproven practices. Doctors are too comfortable recommending a practice without real knowledge of whether it is helping or hurting patients. People are too willing to accept practices that seem like they should help. When a medical reversal does occur, most physicians consider it an exception to the rule. However, as we demonstrated in chapter 7, it may be that nearly half of physicians’ treatments are untested practices and simply do not work. Medical reversals are everywhere.
We need a culture change in medicine. We need to recommit to evidence-based medicine and realize that it is the only rational way to provide care. In this book we have provided a few suggestions for ways we can improve. We do not advocate that these recommendations be immediately implemented but that they be carefully considered, alongside recommendations proposed by other thoughtful analysts, and tested in prospective trials. As we move forward, we must recognize that drastic and dramatic change can often be harmful. We acknowledge that there will be areas of medicine in which, for now, we must tolerate the status quo. As we go through the house of medicine and clean up each room, we have to prioritize. This chapter is our best guess regarding which rooms should be cleaned last.
Now is the time to begin cleaning.
ACKNOWLEDGMENTS
VKP AND ASC
Two authors were nowhere nearly enough to bring this book to completion. Many smart, talented, and hardworking people have collaborated with us in our work on medical reversal. These collaborators included Andrae Vandross, Jason Rho, Victor Gall, Joel Jorgenson, Senthil Selvaraj, Nancy Ho, Caitlin Toomey, Jacob Chacko, Steven Quinn, Durga Borkar, and Michael Cheung. We are especially indebted to Rita Redberg, who, in her position as editor of JAMA Internal Medicine, supported us from our earliest investigations and has been a tireless advocate of evidence-based practice. John P. A. Ioannidis, a pioneer in the study of medical research, has been a generous collaborator.
Alex Lickerman, Eric Oliver, and Thea Goodman helped guide us when we first began work on this book. Stephany Evans, at FinePrint Literary Management, recognized the potential of this project and worked tirelessly on our behalf. At Johns Hopkins University Press, thank you to the reviewers who so carefully read and commented on the manuscript. Jackie Wehmueller encouraged and supported us in this project and helped us arrive at a title.
VKP
I am grateful to the people who taught me to think critically about the world around us, and those who showed me how beautiful medicine can be, particularly Marcia Aldrich, Sanjeeve Balasubramaniam, Jean Burns, Fred Gifford, Robert Hirschtick, David Horn, Barnett Kramer, Peter Mayock, H. G. Munshi, David Neely, James Nelson, John Sherrick, Scott Stern, Scot Yoder, and many others whom I am forgetting. I also thank my friends, who constantly challenged my thinking: Andrae Vandross, David Straus, William Thistlethwaite, Tim Howes, and Nathan Lord. My deep gratitude to Dr. Antonio Tito Fojo, senior investigator and program director at the National Cancer Institute. Tito Fojo taught me to be a better researcher, a better critic, and a better oncologist. He invested so much in me, and has bailed me out of jail more times than I deserved. Adam Cifu has been the perfect partner in crime over these years, all of the above, and a dear friend. Finally, the people who have been with me through everything: my parents, Padma and Ram, who deserve credit for everything good within me and still teach me what it means to be wise and compassionate; my brother Karthik, whose kindness and cleverness amaze me; and my wife, Nancy, who brings out the best in me.
ASC
I am indebted to the colleagues and mentors who inspired and encouraged both my love of medicine and my interest in the evidence on which we base our practice. During my training, Carol Bates, Booker Bush, and physicians in the Division of General Internal Medicine at the Beth Israel Hospital in Boston taught me what it means to be a specialist in general medicine and how to critically analyze clinical research. Their wisdom and enthusiasm have been with me ever since. At the University of Chicago, Halina Brukner has guided me through the world of academic medicine from the start. Diane Altkorn and Scott Stern have been both colleagues and mentors for the past 18 years. For me, our work together laid the foundation of this book. The Pritzker students have always been refreshingly challenging in their intellectual curiosity and energy. Final thanks to Anne Cifu, who provided me a few of the writers’ genes from the Craig family; Vinayak Prasad, who is the perfect collaborator—generous, energetic, creative, and thick-skinned; and Sarah Stein, who has been everything—from most ardent supporter to most critical (in every sense) editor.
APPENDIX
THIS APPENDIX SUMMARIZES studies appearing in the New England Journal of Medicine between 2001 and 2010 that contradicted accepted practices. These are not occasions when a newer, better therapy was announced, and they are not negative studies of potential innovations; they are reversals—each study provided evidence that overturned a practice that was already in use, suggesting that what had come before that practice was better.
Most of the studies are unambiguous examples of reversal. Studies 31, 37, and 38 overturned the practice of prescribing estrogen-replacement therapy. Study 85 is the COURAGE trial that argued against placing stents in people with stable coronary-artery disease. Studies 124 and 125 proved that vertebroplasty was ineffective. Some are examples of less complete reversals, where it is proved that an existing practice or therapy is much less effective than believed. Numbers 40 and 136 are in this category. Other studies looked at two therapies that were considered equivalent and established that one was superior (studies 52 and 133).
Among the studies listed below are novel types of reversal that we recognized. Sometimes an effective therapy has been withheld because of unsupported concerns about its safety. Studies 1, 2, and 68 are examples of research proving that such concerns were spurious and allowing an effective therapy to be used once again. A few of the reversals can be attributed to the evolution of medicine. In study 48, for example, a therapy that had been proved effective is later shown to be ineffective. This reversal probably occurred not because the initial data were flawed but because, in the intervening years, new therapies were developed that overwhelmed the small benefit of the initial intervention. Some of the studies overturn new practices (for example, study 60), while others overturn practices that had been used for a half century (studies 51, 141, and 142). On the whole, the list suggests that a great many medical practices—practices sometimes supported by professional guidelines and paid for by major insurance providers—are later shown not to work.
For those who are interested, greater detail about each study is provided in the supplemental material of our original article, “The Frequency of Medical Reversal” (see the list of references for the chapter 7 section). The supplemental material is available at www.mayoclinicproceedings.org/cms/attachment/2007391767/2029532464/mmc2.pdf.
STUDY
DATE OF PUBLICATION
SUMMARY
1 Vaccinations and the risk of relapse in multiple sclerosis
2/1/01
Long-standing concerns about vaccinations preceding the onset or relapse of multiple sclerosis led to clinicians’ reluctance to give vaccinations to these patients. This study proved no increased risk of relapse in the two-month period immediately following tetanus, hepatitis B, or influenza vaccination.
2 Hepatitis B vaccination and the risk of multiple sclerosis
2/1/01
On a topic related to the previous study, this study disproved the relationship between vaccination and the development of multiple sclerosis.
3 Lack of effect of induction of hypothermia after acute brain
injury
2/22/01
This study found that the practice of cooling patients after brain injury, which had been done for decades, is not beneficial
4 Initial plasma HIV-1 RNA levels and progression to AIDS in women and men
3/8/01
This study contradicted a common decision-making practice concerning when to start medications for HIV in women.
5 The teratogenicity of anticonvulsant drugs
4/12/01
Two medical textbooks and one review article have doubted that anticonvulsants taken during pregnancy are more teratogenic than epilepsy itself. This large study found the opposite: anti-convulsants during pregnancy increase the risk of fetal malformation.
6 Effect of early or delayed insertion of tympanostomy tubes for persistent otitis media on developmental outcomes at the age of three years
4/19/01
Guidelines recommended insertion of ear tubes in a child with an ear infection of greater than three months’ duration because of concerns that associated conductive hearing loss might lead to poor developmental outcomes. This study did not find any benefit in early tube placement.
7 The effect of chelation therapy with succimer on neuropsychological development in children exposed to lead
5/10/01
This study looked at an accepted therapy for children with moderately elevated blood lead levels and found no benefit.
Ending Medical Reversal Page 24