An Accidental Statistician

Home > Other > An Accidental Statistician > Page 19
An Accidental Statistician Page 19

by George E P Box


  The best thing about Madison is the friends that I have, which includes Judy, Jack, and Justin. And you, too, George. I love you, and I wish you a happy 65th!

  Bill

  Soon after coming to Madison, I started an intermediate-level course—Statistics 424—on experimental design. Later Bill taught the course to hundreds of students. One of his requirements was that every student should produce and analyze a factorial design of their own devising and draw appropriate conclusions. One student baked cakes with different ingredients. Another, who was a pilot, experimented with putting his plane into spins and measuring the factors that enabled him to escape them successfully.4 Below I discuss two examples.

  Statistics is about how to use and generate data to solve scientific problems. To do this, familiarity with science and scientific method is essential. In science and technology, it is frequently necessary to study a number of variables. Let's call the variables you can change “inputs,” or “factors,” and the variables you measure “outputs,” or “responses.” It used to be believed that the correct way to study such a system affected by a number of factors was by changing one factor at a time. More than 80 years ago, R.A. Fisher showed that this procedure was extremely wasteful of experimental effort. In fact, you should change a number of factors simultaneously in patterns called “experimental designs.” Even now, however, the one factor at a time method is still taught.

  Here is a simple factorial design due to Bill Hunter for an experiment on a polymer solution for use in floor waxes with eight experimental runs to study the effects of three factors. These factors were as follows: 1) the amount of monomer; 2) the type of chain length regulator; and 3) the amount of chain length regulator on three responses: milkiness, viscosity, and yellowness.

  A designed experiment has the merit that it quite often “analyzes itself.” For this experiment, with only eight runs, it is clear that milkiness is affected only by factor 1, viscosity only by factor 3, and slight yellowness by a combination of 1 and 2.*

  Bill and I believed that it was important to learn by doing. We wanted the class to experience how process improvement could be achieved using statistical design. In many of our demonstrations, we used a paper helicopter because it was easy to make, modify, and test. Our basic helicopter design is shown in the figure. In the figure, the heavy lines show where to make cuts in the paper and the dotted lines show where to make folds. If you release the helicopter, it will rotate and fall slowly to the ground. The problem is to modify the design so that the helicopter will stay in the air for the longest possible time.

  For simplicity we illustrate with eight different helicopter designs arranged in a 23 experiment.

  Over these comparatively short flight times, the effects are roughly linear and can be represented by parallel contours plotted on a cube as follows:

  The arrow indicates that a helicopter with shorter body (W) width and longer wing (S) length will fly longer, but a change in body length doesn't make much difference. Of course an experiment such as this may be run with more than three factors.

  Most of the important ideas in statistics have come about because of scientific necessity, not because of mathematical manipulation.5 Some examples and the people involved are described below.

  A famous example of practice generating theory was Charles Darwin's study of plants and animals on the voyage of the Beagle. Darwin, who was pathetically deficient in mathematics, used scientific observation to develop his theory of evolution.

  An important factor in evolutionary theory was the variation of species. But Francis Galton wondered why this variation did not continually increase. He found that this was because similarities between relatives was only partial, and the partial similarity could be measured by the correlation coefficient.

  This in turn was taken up with enthusiasm by Karl Pearson, who realized that if we were to find out when items were significantly correlated, it was necessary to discover the distribution of the correlation coefficient.

  Pearson's methods were clumsy and were developed for large samples. Fisher, however, easily obtained the normal theory distribution and its elaborations using n-dimensional geometry. Pearson's methodology also failed to meet the practical needs of W. S. Gosset, when he came to study statistics with Pearson for a year at University College, London, in 1906. Gosset had graduated from Oxford with a degree in chemistry and had gone to work for Guinness's, following the company's policy (begun in 1893!) of recruiting scientists as brewers. He soon found himself faced with analyzing small sets of observations coming from the experimental brewery of which he was placed in charge.

  Gosset's invention of the t test is a milestone in the development of statistics because it showed how account might be taken of the uncertainty in estimated parameters. It thus paved the way for an enormous expansion of the usefulness of statistics, which could now begin to provide answers for agriculture, chemistry, biology, and many other subjects in which small rather than large data samples were the rule.

  Fisher, as he always acknowledged, owed a great debt to Gosset, both for providing the initial clue as to how the general problem of small samples might be approached, and for mooting the idea of statistically designed experiments.

  When Fisher took a job at Rothamsted Agricultural Experimental Station in 1919, he was immediately confronted with a massive set of data on rainfall recorded every day, and of harvested yields every year, for over 60 years. He devised ingenious methods for analyzing these data, but he soon realized that the data that he had, although massive, did not provide much information on the important questions he needed to answer. The outcome was his invention of experimental design. Fisher considered the following question: How can experiments be conducted so that they answer the specific questions posed by the investigator? One can clearly see his many ideas developing in response to the practical necessities of field experimentation.

  Fisher left Rothamsted in 1933 and was succeeded by Yates, who made further important advances. He invented new designs, and showed how to cope when, as sometimes happened, things went wrong and there were missing or suspect data.

  Later Finney, responding to the frequent practical need to maximize the number of factors studied, introduced fractional factorial designs. These designs, together with another broad class developed independently by Plackett and Burman in response to war-time problems, have since proved of great value in industrial experimentation. An isolated example of how such a highly fractionated design could be used for screening out a source of trouble in a spinning machine had been described as early as 1934 by L. H. C. Tippett of the British Cotton Industry Research Association. This arrangement was a 125th fraction of a 55 design and required only 25 runs!

  In another example, Henry Daniels, a statistician at the Wool Industries Research Association from 1935 to 1946, solved the problem of determining how much of the variation in the woolen thread was due to each of a series of processes through which the wool passed. Variance component models, which could be used to expose those particular parts of a production process responsible for large variations, had wide application in many other industries.

  Later, Henry and I would meet under strange conditions, during the time of the Iron Curtain. We were both members of a small group attending a conference in West Germany who had arranged to travel to the home of Johan Sebastian Bach, at Eisenach in East Germany. At the border, we received a glimpse of what the Iron Curtain involved: an interminable row of concrete dragons' teeth stretching far into the distance, and guards with vicious dogs on leads. The guards took our passports, and they kept them until we crossed back into West Germany. I had not known Henry well before our trip to the border, but I became quite well acquainted with him and his wife during the two hours that we waited to cross into East Germany.

  In the development of applied statistics, an important influence was the work of Walter Shewhart on quality control. This work and that on sampling inspection by Harold Dodge heralded more than a half ce
ntury of statistical innovation, much of it coming from the Bell Telephone Laboratories. This included a rekindling of interest in data analysis in a much needed revolution led by John Tukey.

  Another innovator guided by practical matters was Frank Wilcoxon, an entomologist turned statistician at the Lederle Labs of the American Cyanamid Company. He said that it was simply the need for quickness that led to his famous Wilcoxon tests, the origins of much subsequent research by mathematical statisticians on nonparametric methods.

  An early contribution was by M. S. Bartlett, whose courses I sat in on while I was still at ICI. His work on the theory of transformation of data came about because he was concerned with the testing of pesticides and so with data that appeared as frequencies or proportions.

  William Beveridge's attempt to analyze time series by fitting sine waves had revealed significant oscillations at strange and inexplicable frequencies. Yule suggested that such series should be represented, not by deterministic functions, but by dynamic systems. Yule's revolutionary idea was the origin of modern time series models. Unfortunately, the practical use of these models was for some time hampered by an excessive concern with stationary processes in equilibrium about a fixed mean. Almost all of the series arising in business, economics, and manufacturing do not behave like realizations from a stationary model. Consequently, for lack of anything better, operations research workers led by Holt and Winters devised an non-stationary model. They began in the 1950s to use the exponentially weighted moving average of past data and its extensions for forecasting series of this kind. This weighted average was introduced because it seemed sensible for a forecast steadily to discount the past, and it seemed to work reasonably well. However, in 1960, Muth showed that this empirical statistic was an optimal forecast for an important kind of nonstationary model. This model and its generalizations, together with Yule's contributions, later turned out to be extremely valuable for representing many kinds of practically occurring series, including seasonal series, and are the basis for so-called ARIMA models.

  In further developments, mathematical statisticians had a theory of what they called “most powerful tests,” showing that given their assumptions, it was impossible to outperform such a test. In particular, this led to the conclusion that for a binomial testing scheme, you should inspect a fixed number n of items, say 20, drawn at random from the batch, and if the number of duds was greater than some fixed number, say three, you failed the whole batch. Allen Wallis has described the dramatic consequence of a simple query made by a serving officer, “Suppose in such a test it should happen that the first three components tested were all duds, why would we need to test the remaining seventeen?” Allen Wallis and Milton Friedman were quick to see the apparent implication that “super-powerful” tests were possible!

  At the time Abraham Wald was accepted to be the premier mathematical statistician, but some thought the suggestion that he be invited to work on the problem of a test that was more powerful than a most powerful test was ridiculous. To do better than a most powerful test was impossible! What the mathematicians had failed to see was that the test considered was most powerful only if it was assumed the n

  was fixed, and what the officer had seen was that n did not need to be fixed. This led to the important development of sequential tests that could be carried out graphically.6

  A pioneer of graphical techniques of a different kind was Cuthbert Daniel, an industrial consultant who used his wide experience to make many contributions to statistics. An early user of unreplicated and fractionally replicated designs, he was concerned with the practical difficulty of understanding how significant effects could be determined without estimating the size of the experimental error by repetition. In particular he was quick to realize that higher order interactions that were unlikely to occur could be used to estimate experimental error. His introduction of graphical analysis of factorials by plotting effects and residuals on probability paper has had major consequences. It has encouraged the development of many other graphical aids, and together with the work of John Tukey, it has contributed to the growing understanding that at the hypothesis generation stage of the cycle of discovery, it is the imagination that needs to be stimulated, and that this often best be done by graphical methods.

  Obviously one could go on with other examples, but at this point, I should like to draw some interim conclusions.

  There are important ingredients leading to statistical advance. They are (1) the presence of an original mind that can perceive and formulate a new problem and move to its solution, and (2) a challenging and active scientific environment for that mind, conducive to discovery.

  Gosset at Guinness's; Fisher, Yates, and Finney at Rothamsted; Tippett at the Cotton Research Institute; Youden at the Boyce Thomson Institute (with which organization Wilcoxon and Bliss were also at one time associated); Daniels and Cox at the Wool Industries Research Association; Shewhart, Dodge, Tukey, and Mallows at Bell Labs; Wilcoxon at American Cyanamid; and Cuthbert Daniel in his consulting practice: These are all examples of fortunate conjunctions leading to innovation.

  Further examples are Don Rubin's work at the Educational Testing Service, Jerry Friedman's computer intensive methods developed at the Stanford linear accelerator, George Tiao's involvement with environmental problems, Brad Efron's interaction with Stanford Medical School, Gwilym Jenkin's applications of time series analysis in systems applications, and John Nelder's development of statistical computing at Rothamsted.

  The message seems clear: A statistician or any scientist who believes himself or herself capable of genuinely original research will find inspiration in a stimulating scientific investigational environment. In all the important scientific developments described earlier, it was the need for new methods in appropriate environments that led to their conception.

  As undergraduates, students are encouraged to sit with their mouths open and their teachers pour in “knowledge” for several years. Then those that become graduate students are expected to do something totally different. They have been regularly fed, and now they have to feed themselves and they haven't been taught how to do it. Undergraduate education must provide more opportunities for students to use their creativity—they need help in understanding the art of problem solving. Also, new graduate students tend to start trying to solve a problem in full generality with all the bells and whistles. I've told them, “Don't try to get a general solution all at once. Start out with n = 1, and m = 2. Once you can really understand the problem in its simplest form, then you can begin to generalize.” Also, you must try to see the essence of the problem. As it says in the New Testament, “Except ye be as little children ye shall not enter the kingdom of heaven.”

  So I tell my students that it's best if you try to try to think of problems from first principles. It is easy to miss the obvious, and sometimes there is nothing less obvious than the obvious. If you don't approach problems in this way, you may get caught in the tramlines, you think what everyone else has already thought, and you don't arrive at anything new.

  Mathematics is primarily concerned with the question: Given certain assumptions, is this true or isn't it? And in a myriad of other disciplines—physics, chemistry, engineering, and the like—mathematics is an essential tool. But statistics is concerned with finding out things that were not in the original model. For example, Albert Einstein noted that many people thought that he developed the idea of relativity from pure theory, but, he said, this was untrue—his theory of relativity was based on observation. Because of the necessity to change the model as one's understanding develops, scientific investigation can never be coherent. One important method that can result in innovation is the interactive process involving induction and deduction. As was said in Statistics for Experimenters:

  An initial idea (or model or hypothesis or theory or conjecture) leads by a process of deduction to certain necessary consequences that may be compared with data. When consequences and data fail to agree, the discrepancy can lead
, by a process called induction, to modification of the model. A second cycle in the iteration may thus be initiated. The consequences of the modified model are worked out and again compared with the data (old or newly acquired), which in turn can lead to further modification and gain of knowledge. The data acquiring process may be scientific experimentation, but it could be a walk to the library or a browse in the Internet.

  The iterative inductive-deductive process, which is geared to the structure of the human brain and has been known since the time of Aristotle, is part of one's everyday experience. For example, a chemical engineer Peter Minerex parks his car every morning in an allocated parking space. One afternoon after leaving work he is led to follow the following deductive-inductive learning sequence:

  Model: Today is like every other day.

  Deduction: My car will be in its parking place.

  Data: It isn't.

  Induction: Someone must have stolen it.

  Model: My car has been stolen.

  Deduction: My car will not be in the parking lot.

  Data: No. It's over there!

  Induction: Someone took it and brought it back.

  Model: A thief took it and brought it back.

  Deduction: My car will have been broken into.

  Data: It's unharmed and unlocked.

  Induction: Someone who had a key took it.

  Model: My wife used my car.

  Deduction: She probably left a note.

  Data: Yes. Here it is.

  Suppose you want to solve a particular problem and initial speculation produces some relevant idea [model, theory]. You will then seek data to further support or refute this theory. This could consist of some of the following: a search of your files and of the Web, a walk to the library, a brainstorming meeting with co-workers and executives, passive observation of a process, or active experimentation. In any case, the facts and data gathered sometimes confirm your conjecture, in which case you have solved your problem. Often, however, it appears that your initial idea is only partly right or perhaps totally wrong. In the latter two cases, the difference between deduction and actuality causes you to keep digging. This can point to a modified or totally different idea and to the reanalysis of your present data or to the generation of new data.

 

‹ Prev