Betting—assigning different probabilities to the same phenomenon—became the tangible expression of Bayesian beliefs. “Every day there’d be a half dozen bets around [Schlaifer’s group] about anything—elections and sports. Dollar bills were changing hands all the time. It was part of the ingrained way of life. You really believed this stuff,” Schleifer said.25 Schlaifer and Raiffa were developing reputations as zealots with a cause.
Schlaifer sent 700 pages of his first textbook to McGraw-Hill for publication as Probability and Statistics for Business Decisions: An Introduction to Managerial Economics under Uncertainty. Then he discovered his usual host of errors and infelicities and insisted that McGraw-Hill withdraw the first printing and replace it with a second. It was a classic case of placing intellectual rigor above business economics, and Schlaifer won. The book sold for 11.50 in 1959, and Harvard promoted its former tutor to a professorship in business administration.
Probability and Statistics for Business Decisions was the first textbook written entirely and wholeheartedly from the Bayesian point of view. Students could solve inventory, marketing, and queuing problems using simple arithmetic, slide rules, or, at most, a desk calculator. The book acknowledged few previous authorities. Schlaifer had come to his subjectivist position independently of Ramsey, de Finetti, and Savage. In turn, Savage recognized that Schlaifer had developed his ideas “wholly independently” and was more “down to earth and less spellbound by tradition.”26
Mulling over the state of Bayes’ rule, Schlaifer and Raiffa realized that Bayesians, unlike frequentists, had no bookshelves of mathematical tools ready for use. As a result, Bayesian methods were regarded as impractically complicated, particularly by business students, who were often mathematically unprepared. While theoreticians like Savage and Lindley tried to make Bayes mathematically respectable, Raiffa and Schlaifer set out in 1958 to make it fully operational and easy to use for bread-and-butter problems. Like George Box, they parodied a popular song, this one from Annie Get Your Gun, claiming that anything a frequentist could do, they could do better.
To make calculations easier, they introduced decision trees, tree-flipping, and conjugate priors. “I began using decision tree diagrams that depicted the sequential nature of decision problems faced by business managers,” Raiffa said. “Should I, as decision maker, act now or wait to collect further marketing information (by sampling or further engineering)? . . . I never made any claim to being the inventor of the decision tree but . . . I became known as Mr. Decision Tree.”27 Soon the diagrams of Bayes’ decisionmaking process were, like many-branched trees, rooted in undergraduate business curricula. The trees are probably the best-known practical application of Bayes’ rule.
Tree-flipping began as a simplification to help one of Raiffa’s graduate students who was interested in wildcat drilling for oil. Normally, a wildcatter decided whether to test a particular site before deciding to drill or not drill. To avoid some messy algebra, Raiffa flipped the order of the wildcatter’s decision. He dealt with the probability that test results would be positive or negative before he considered whether or not to conduct the test. Working through the diagram produced information about x’s followed by y’s. Tree-flipping put the y’s first. It amounted to using Bayes’ rule because the probability of x given y and the probability of y given x are the two critical elements of its formula.
“So you flip trees,” Raiffa said. “We didn’t call it Bayes. The worst thing you can do is to use Bayes’ theorem. It’s too complicated. Just use common sense and play around with these things, then it was pretty easy. We had people doing complicated things that could have been done by Bayes, but we didn’t do it by Bayes. We did it by tree-flipping.”28
Raiffa also developed a handy shortcut for updating priors and posteriors. Called conjugate prior distributions, it used the fact that in many cases the shape or curve of a probability’s distribution is the same in both prior and posterior. Thus, if you start with normal Gaussians, you’ll end up with normal Gaussians. Conjugate priors paid dividends with the repeated updating called for by Bayes’ method. Albert Madansky used a similar concept for his H-bomb study. The shortcut would later become unnecessary with the adoption of Monte Carlo Markov Chain methods.
In a further simplification, some business Bayesians even dropped the prior odds called for by Bayes’ rule. Schleifer said, “My take on it was to forget the priors unless there was overwhelming prior evidence that you really know a lot about the parameter you’re interested in.”29
Today, when TV and radio are filled with talking heads, it is hard to imagine that the use of expert opinion was terra incognita in the early 1960s. No one knew whether business executives would be willing to offer their opinions for incorporation into a mathematical formula. And no one was sure whether an expert’s subjective judgment would be valid. John Pratt asked his wife, Joy, whose job was promoting films in local theaters, to estimate their daily attendance. At first, her estimates fell into too narrow a range. By comparing them with actual attendance figures—hundreds of data points taken night after night at two local theaters—Joy Pratt learned to make such accurate predictions that her husband became convinced that expert opinion could be useful. Bayesians objected that Pratt and Schlaifer analyzed the data using frequentist techniques. Bayesian methods—comparing different kinds of movies, the length of time they played, the popularity of their stars, and so on—would have been too complex. The use of expert opinion for decision making later became a major field of study.
It turned out that Joy and John Pratt were right: marketing executives risked a lot of money on the basis of very little information and loved being asked for their professional judgment. Accustomed to waiting until the end of a frequentist study to voice their opinions, they actually liked having their “managerial intuition” or “feel for a situation” folded into preliminary assessments.
Raiffa and Schlaifer began exploring such nitty-gritty questions as how to interview experts and measure their expertise. DuPont, trying to decide how big a factory to build for artificial shoe leather in 1962, was delighted to assess the prior odds of the demand for its new product. Design engineers at Ford Motor Company were equally pleased that incorporating their opinions into Bayes’ prior let Ford use smaller opinion samples. Their work opened up just about any business problem to mathematical analysis. An engineering problem might have 20 sources of uncertainty; of these, perhaps 12 could be handled by single guesses; 5 needed more testing; and 2 might be so critical that experts had to be interviewed. Bayes’ rule was solving far more complex problems than Savage’s mental exercises about curling rabbit ears.
Between 1961 and 1965 an exciting weekly seminar, generally followed by drinks in Schlaifer’s office, focused on decision making under uncertainty (DUU for short). The seminar explored utility analysis, portfolio analysis, group decision processes, theory of syndicates, behavioral anomalies, and ways to ask about uncertainties and values. Said Raiffa, “We helped shape a field.”30 The seminar and two books Raiffa and Schlaifer cowrote during this period spurred the Bayesian revival of the 1960s. Raiffa was later surprised to realize that the most fertile period of his collaboration with Schlaifer had lasted only four years.
Raiffa’s and Schlaifer’s classic book for advanced statisticians, Applied Statistical Decision Theory, was published in 1961. Its careful, detailed analytical methods set the direction of Bayesian statistics for the next two decades. Today it sits on almost every decision analyst’s bookshelf.
When Pratt joined Raiffa and Schlaifer to write Introduction to Statistical Decision Theory, he soon realized that what was easy for him to do mathematically was quite difficult for Schlaifer, who could understand the mathematics but not produce it himself. By the time the book was ready for editing, Schlaifer and Raiffa had moved on to other interests. They received so many requests for their preliminary manuscript, however, that McGraw-Hill published it as a typescript in 1965. Thirty years later, Pratt and Raiffa finished
it, and MIT published it as an 875-page book.
To introduce business school professors to mathematical methods, Raiffa ran an 11-month-long Ford Foundation program in 1960 and 1961. As a result, the next generation of business school deans at Harvard, Stanford, Northwestern, and elsewhere had received a heavy dose of Bayesian subjectivism for decision making, and the gospel radiated outward to schools of management. Raiffa even gave his students an 84-page handout, “An Introduction to Markov Chains,” more than 30 years before their widespread adoption by the statistical profession. By 2000 Bayesian methods were often centered in university business schools rather than statistics departments.
Raiffa and Schlaifer drifted apart after 1965. Raiffa still called himself a Bayesian who, “roughly speaking. . . . wish[es] to introduce intuitive judgments and feelings directly into the formal analysis of a decision problem.”31 Broadening what he knew about subjective probability, game theory, and Bayes’ rule, he left Harvard’s statistics department to take a joint chair in the business school and the economics department. There he pursued societal, rather than primarily statistical, issues in medicine, law, engineering, international relations, and public policy.
By any measure Raiffa’s move was a success. As a pioneer in decision analysis, he was one of four organizers of the Kennedy School of Government at Harvard; the founder and director of a joint East–West think tank to reduce Cold War tensions long before perestroika; a founder of Harvard Law School’s widely replicated role-playing course in negotiations; and scientific adviser to McGeorge Bundy, the national security assistant under Presidents Kennedy and Johnson. Raiffa also supervised more than 90 Ph.D. dissertations at Harvard in business and economics and wrote 11 books—no articles, only books—one of which has been in print for more than fifty years. As a Bayesian, Raiffa would cast a long shadow.
Ultimately, however, Raiffa and Schlaifer failed in their bold attempt to permeate business curricula, statistical theory, and American business life with Bayes’ rule. Schlaifer built managerial economics into a strong program at the Harvard Business School, but Bayesian decision analysis faded from its curriculum, and Bayes’ rule never supplanted “the old stuff” in American classrooms. Since the 1970s, when all the top business schools emphasized Bayesian decision theory, it has been compressed into a few weeks’ study. Business students no longer do their own calculations; presumably they can hire a consultant or buy a computer program.
Many theoretical statisticians also ignored Raiffa’s and Schlaifer’s contributions; they were, after all, outsiders working in a business school. From his vantage point in Britain, Lindley was astonished that the statistical community paid so little attention to Schlaifer. “I was bowled over by him. The book with Raiffa is wonderful,” and Schlaifer’s 1971 book had computer methods “in advance of their time.” Lindley considered Schlaifer “one of the most original minds that I have ever met [with] extraordinarily wide knowledge.”32
Part of their failure lay in the fact that Schlaifer remained a confirmed university theoretician. When confronted with a problem he could not solve, he set it aside and worked on something else, something business managers cannot afford to do. Nor did he consider long-term solutions and their consequences over time; he dealt in short-term results. He did little consulting work, and his lack of experience selling complicated ideas to busy executives limited the impact of Bayes’ rule on working business people. He spent weeks exploring a marketing case about cottage cheese packaging in all its abstract complexities but stripped off all the textural surroundings that most caseworkers would have brought back from a field trip to a dairy. He turned his only graduate student’s thesis about all the messy glory of IBM’s quality control problems into a dry theoretical paper about two-stage sampling. The student’s thesis ended up piled so high with abstract issues that it was not until Schlaifer went on sabbatical that Raiffa could intervene and secure the young man’s Ph.D. Schlaifer was a passionate intellectual with a deep interest in narrow topics and all the time in the world for disputation.
After Raiffa moved on to other projects, Schlaifer threw himself into designing a new introductory course for Harvard’s first-year students in managerial economics. Naturally, it would be based on Bayesian methods, a first in any business school. He wrote a text and titled it Managerial Economics Reporting Control, which he nicknamed MERC. Students hated it, called it Murk, and burned their copies on the front steps of Baker Library. When a reporter for the Harvard newspaper asked for a comment, Schlaifer replied, “Well, I’d rather be among those whose books are burned than those who burn books.”
Then Schlaifer leaned intently forward: “Tell me. There is one thing that really interests me. This book is printed on very good, very glossy paper. It must have burned very poorly. How do you burn them?”
“Well, sir,” the student answered respectfully, “we burn them page by page.”33
Schlaifer, farseeing to the last, spent the remaining years of his life trying to write computer software for practitioners, even though teams of mathematically sophisticated programmers were already taking over the field. In 1994, at the age of 79, Schlaifer died of lung cancer. After his death, Raiffa and Pratt finished the trio’s 30-year-old opus, Introduction to Statistical Decision Theory. Dedicating it to their former colleague, Pratt and Raiffa hailed Schlaifer as “an original, deep, creative, indefatigable, persistent, versatile, demanding, sometimes irascible scholar, who was an inspiration to us both.”34
12.
who wrote the federalist?
Alfred C. Kinsey’s explosive bestseller Sexual Behavior in the Human Male was published in 1948, the same year pollsters failed to predict Harry Truman’s victory over Thomas Dewey in the presidential election. With the public crying foul, fraud, and debauchery, social scientists feared for the future of their profession. Opinion polling was one of their basic tools, so the Social Science Research Council, representing seven professional societies, appointed statistician Frederick Mosteller of Harvard University to investigate the scandals.
Mosteller’s forthright report on Truman’s election blamed the nation’s pollsters for rejecting randomized sampling and for clinging to outdated sampling designs that underrepresented blacks, women, and the poor—all of whom voted more heavily Democratic than the population reached by the pollsters.
In the case of Kinsey’s research, powerful men—including John Foster Dulles, secretary of state under Eisenhower; Arthur Sulzberger, publisher of the New York Times; Harold W. Dodds, president of Princeton University; and Henry P. Van Dusen, president of the liberal Union Theological Seminary— were demanding an end to funding for research on human sexuality. But Mosteller underwent Kinsey’s standard interview about his sexual history and emerged impressed. Kinsey’s lack of randomized sampling was statistically damning, but his work was far better than anyone else’s in the field, and the country did not have 20 statisticians who could have done better. It was quietly arranged that when Kinsey wrote his next study, on the sexuality of women, Jerome Cornfield of the National Institutes of Health would help with the statistics.
Both scandals involved discrimination problems, also called classification problems, which struck at the heart of polling, science, social science, and statistics. Researchers tended to assign people or things to categories without being totally sure that the assignments were accurate or that the categories were well defined. Pollsters classified people as Republicans or Democrats; marketers divided consumers into users of one detergent or another; scientists classified plants in biology and skulls in anthropology; and social scientists categorized individuals according to personality.
Finishing up with the Kinsey committee, Mosteller looked around for a research topic involving classification issues. He had a feet-on-the-ground attitude, perhaps the result of having been raised by his divorced mother, who never graduated from high school but who had insisted, over the objections of her ex-husband, that Fred get an education. Mosteller had earned his bachelo
r’s and master’s degrees in mathematics from Carnegie Institute of Technology (now Carnegie Mellon University) in Pittsburgh and enrolled in graduate statistics at Princeton University’s highly abstract mathematics department. As the primary liaison between Princeton and Columbia statisticians working on military research, he learned that he loved working on deadline on real-world problems. After the war, in 1946, Mosteller finished his Ph.D. at Princeton and, driven by his interest in health, education, and baseball, moved to Harvard. The investigations of campaign polling and sexual research left him ripe for a problem of his own choosing.
Mosteller began looking around for a large database to use for developing ways to discriminate between two cases. He began thinking—not about Bayes’ rule—but about a minor historical puzzle: The Federalist papers. Between 1787 and 1788, three founding fathers of the United States, Alexander Hamilton, John Jay, and James Madison, anonymously wrote 85 newspaper articles to persuade New York State voters to ratify the American Constitution. Historians could attribute most of the essays, but no one agreed whether Madison or Hamilton had written 12 others.
Mosteller had learned about the problem during a summer job he held as a graduate student in 1941. Counting the number of words in each sentence of The Federalist papers with psychologist Frederick Williams, he discovered “an important empirical principle—people cannot count, at least not very high.” He also learned that, stylistically, Hamilton and Madison were practically twins, skilled in a complicated oratorical style popularized in 1700s England. Mosteller agreed to “leave general style as a poor bet and pay attention to words.”1 The job was daunting because he would need a lot of single words to supply a pool of thousands of variables. When the summer job ended and the Second World War intervened, Mosteller forgot about The Federalist.
The Theory That Would Not Die Page 20