Everyday Chaos
Page 6
So, in the spring of 1978, Bricklin prototyped a spreadsheet on an Apple II personal computer, using a gaming controller in place of a mouse.18 With the rise of PCs and with the decision by Bricklin and his partner, Bob Frankston, not to patent the software,19 spreadsheets became a crucial way businesses understood themselves and made decisions: a company’s conceptual model of itself now could be expressed in a working model that let the business see the effects of the forces affecting it and of decisions the company was contemplating.
In a remarkably prescient article in 1984, Steven Levy wrote, “It is not far-fetched to imagine that the introduction of the electronic spreadsheet will have an effect like that brought about by the development during the Renaissance of double-entry bookkeeping.”20 He was right. “The spreadsheet is a tool, and it is also a world view—reality by the numbers,” Levy wrote.
A spreadsheet is what a business looks like to a traditional computer: quantitative information connected by rules. The rules—formulas—and some of the data, such as fixed costs, are relatively stable. But some of the data changes frequently or even constantly: sales, expenses, headcount, and so on. Personal computers running spreadsheets made keeping the working model up to date so easy and fast that a new decision-making process was made feasible: a spreadsheet is a temptation to fiddle, to try out new futures by plugging in different numbers or by tweaking a relationship. This makes them very different from most traditional models, which focus on representing unchanging relationships, whether they’re Newtonian laws or the effect that raising taxes has on savings. Spreadsheets are models that encourage play: you “run the numbers,” but then you poke at them to try out “what if this” or “what if that.” This was a model meant to be played with.
A spreadsheet thus is a simple example of a working model based on a fully understandable conceptual model. It lets you plug in data or play with the rules to see what the future might or could look like. Of course, they are inexact, they can’t capture all of the relationships among all of the pieces, and the predictions made from their models may be thrown off by events that no one predicted. Because spreadsheets are tools and not perfect encapsulations of every possible eventuality, we accept some distance between the working model and the conceptual model, and between the conceptual model and the real world. We continue to use them because, as George E. P. Box said, “[a]ll models are wrong but some are useful.”21
Armillary
In the Galileo Museum in Florence sits a beautiful set of nested geared rings, 6.5 feet tall.22 If we today had to guess the point of this intricate mechanism just by looking at it, we might suppose that it’s some type of clock. If we were contemporaries of it, we’d be far more likely to recognize that it shows the positions of the major heavenly bodies in Earth’s skies for any night.
Antonio Santucci finished this object, called an armillary, in 1593, after five years of work. Although forty-six years earlier Nicolaus Copernicus had shown that the Earth revolves around the sun, Santucci still put the Earth at the center, circled by seven rings that display the positions of the seven known planets. An eighth ring has the fixed stars on it, as well as the markings of the zodiac. Adjust the rings on this wood-and-metal machine, and the planets and fixed stars will align relative to one another and to the Earth. Now gild it and paint in the four winds, the shield of your patron’s Medici-related family, and an image of God Himself, and you have a beautiful room-size model of the universe.23
According to no less an authority than Galileo, even Santucci eventually came around to Copernicus’s idea.24 But the armillary’s model of the universe is odd beyond its Earth-centric view. It simulates the movement of the heavenly bodies using only circles as components of the mechanism because, from the early Greeks on, it was commonly assumed that because the heavens were the realm of perfection, and circles were the perfect shape, the heavenly bodies must move in perfect circles. That makes the planets a problem, for they wander through Earth’s sky in distinctly noncircular ways; planet comes from the Greek word for wanderer. Therefore, if the armillary were to be truthful to its conceptual model, not only did it have to get the planets in the right places relative to Earth, it also had to do it the way the universe does: by using circles. So Santucci set smaller gears turning as they revolved around larger gears that were themselves turning, adding in as many as necessary to model the paths of the planets accurately.25
The result is a successful working model that uses a convoluted mechanism dictated by a conceptual model that has been shown to be wildly wrong.
The error in its conceptual model also happens to make the working model quite beautiful.
Tides
“Unlike the human brain, this one cannot make a mistake.”26
That’s how a 1914 article in Scientific American described a tide-predicting machine made of brass and wood that made mistakes all the time. And its creators knew it.
Newton had shown that the gravitational pull of the sun and moon accounted for the rise and fall of the tides around Earth. But his formulas only worked approximately, for, as the Scientific American article pointed out,
the earth is not a perfect sphere, it isn’t covered with water to a uniform depth, it has many continents and islands and sea passages of peculiar shapes and depths, the earth does not travel about the Sun in a circular path, and Earth, Sun and Moon are not always in line. The result is that two tides are rarely the same for the same place twice running, and that tides differ from each other enormously in both times and in amplitude.27
In his book Tides: The Science and Spirit of the Ocean, Jonathan White notes, “There are hundreds of these eccentricities, each calling out to the oceans—some loudly, some faintly, some repeating every four hours and others every twenty thousand years.” Newton knew he was ignoring these complications, but they were too complicated to account for. (It’s quite possible that he never saw an ocean himself.)28
It was Laplace who again got Newton righter than Newton did, creating formulas that included the moon’s eight-year cycle of distances from the Earth, its varying distance north and south of the equator, the effect of the shape and depth of the ocean’s basin, the texture of the ocean floor, the water’s fluctuating temperatures, and other conditions.29
This added nuance to Newton’s model, but a vast number of additional factors also affect the tides. It took about another hundred years for Lord Kelvin, in 1867, to come up with a way of predicting tides that takes all the factors into account without having to know what all of them are.30
As the 1914 Scientific American article explains it, imagine a pencil floating up and down in an ocean, creating a curve as it draws on a piece of paper scrolling past it. Imagine lots of pencils placed at uniform distances from one another. Now imagine the ocean lying still, without any bodies exerting gravitational forces on it. Finally, imagine a series of fictitious suns and moons above Earth in exactly the right spots for their gravity to pull that pencil to create exactly those curves. Wherever you have a curve that needs explaining, add another imaginary sun or moon in the right position to get the expected result. Lord Kelvin ended up with a “very respectable number” of imaginary suns and moons circling the Earth, as the article puts it. If adding sea serpents would have helped, presumably Lord Kelvin would have added them as well.31
With the assistance of George Darwin—brother of Charles—Lord Kelvin computed formulas that expressed the pull of these imaginary bodies, then designed a machine that used chains and pulleys to add up all of those forces and to draw the tidal curves. By 1914, this had evolved into the beast feted in the Scientific American article: fifteen thousand parts that, combined, could draw a line showing the tides at any hour.
Lord Kelvin was in fact not the first to imagine a science-fiction Earth circled by multiple suns and moons that create the wrinkled swells and ebbs of tides caused by the vagaries of the Earth’s geography, topology, weather, and hundreds of other factors. Laplace himself “imagined a stationary Earth with
these tide components circling as satellites.”32 Lord Kelvin’s machine and its iterations took this to further levels of detail, while accepting that the actual tides are subject to still more factors that simply could not be captured in the machine’s model—the influx of melted snow from a particularly long winter, the effect of storms, and all the other influences Earth is heir to. The Scientific American article could claim the machine never makes a mistake because Kelvin’s machine was as accurate as the tools and data of the time allowed, so it became the accuracy we counted as acceptable … all while relying on a fictitious model.
It set this level of accuracy by building a working model that is knowingly, even wildly, divorced from its conceptual model.
The River
In 1943, the US Army Corps of Engineers set Italian and German prisoners of war to work building the largest scale model in history: two hundred acres representing the 41 percent of the United States that drains into the Mississippi River. By 1949 the model was being used to run simulations to determine what would happen to cities and towns along the way if water flooded in. It’s credited with preventing $65 million in damage from a flood in Omaha in 1952.33 In fact, some claim its simulations are more accurate than the existing digital models.34
Water was at the heart of another type of physical model: the MONIAC (Monetary National Income Analogue Computer) economic simulator built in 1949 by the New Zealand economist Alban William Housego Phillips.35 The MONIAC used colored water in transparent pipes to simulate the effects of Keynesian economic policies. Tanks of water represented “households, business, government, exporting and importing sectors of the economy,” measuring income, spending, and GDP.36
It worked, given its limitations. The number of variables it could include was constrained by the number of valves, tubes, and tanks that could fit in a device about the size of a refrigerator.37 But because it only took account of a relative handful of the variables that influence the state of a national economy, it was far less accurate than the Mississippi River simulator. Yet the flow of water through a river the size of the Mississippi is also affected by more variables than humans can list. So how could the Mississippi model get predictions so right?
The Mississippi had the advantage of not requiring its creators to have a complete conceptual model of how a river works. For example, if you want to predict what will happen if you place a boulder in a rapids, you don’t have to have a complete model of fluid dynamics; you can just build a working scale model that puts a small rock into a small flow. So long as scale doesn’t matter, your model will give you your answer. As Stanford Gibson, a senior hydraulic engineer in the Army Corps of Engineers, said about the Mississippi basin project, “The physical model will simulate the processes on its own.”38
So this working model can deal with more complexity because it doesn’t have a conceptual model: it puts the actual forces to use in a controlled and adjustable way. Because the model is not merely a symbolic one—real water is rolling past a real, scaled-down boulder—the results aren’t limited by what we know to factor in. That’s the problem with the MONIAC: it sticks with factors that we know about. It’s like reducing weather to seven known factors.
Still, the Mississippi River basin model may seem to make no assumptions about what affects floods, but of course it does. It assumes that what happens at full scale also happens at 1/2000 scale, which is not completely accurate for the dynamics of water; for example, the creators of a model of San Francisco Bay purposefully distorted the horizontal and vertical scales by a factor of ten in order to get the right flow over the tidal flats.39 Likewise, the Mississippi model does not simulate the gravitational pull of the sun and the moon. Nor does it grow miniature crops in the fields. The model assumes those factors are not relevant to the predictions it was designed to enable. Using the Mississippi model to simulate the effects of climate change or the effect of paddle wheelers on algae growth probably wouldn’t give reliable results, for those phenomena are affected by factors not in the model and are sensitive to scale.
The Mississippi model wasn’t constructed based on an explicit conceptual model of the Mississippi River basin, and it works without yielding one. Indeed, it works because it doesn’t require us to understand the Mississippi River: it lets the physics of the simulation do its job without imposing the limitations of human reason on it. The result is a model that is more accurate than one like the MONIAC that was constructed based on human theory and understanding. So the advent of machine learning is not the first time we have been presented with working models for which we have no conceptual model.
But, as we’ll see, machine learning is making clear a problem with the very idea of conceptual models. Suppose our concepts and the world they model aren’t nearly as alike as we’ve thought? After all, when it comes to the Mighty Mississippi, the most accurate working model lets physical water flow deeper than our conceptual understanding.
* * *
Despite the important differences among all these models—from spreadsheets to the Mississippi—it’s the similarities that tell us the most about how we have made our way in a wildly unpredictable world.
In all these cases, models stand in for the real thing: the armillary is not the heavenly domain, the spreadsheet is not the business, the tubes filled with colored water are not the economy. They do so by simplifying the real-world version. A complete tidal model would have to include a complete weather model, which would have to include a complete model of industrial effects on the climate, until the entire world and heavens have been included. Models simplify systems until they yield acceptably accurate predictions.
Models thereby assume that we humans can identify the elements that are relevant to the thing we are modeling: the factors, rules, and principles that determine how it behaves. Even the model of the Mississippi, which does not need to understand the physics of fluid dynamics, assumes that floods are affected by the curves and depths of the river and not by whether the blue vervain growing along the sides of the river are in flower. This also implies that models assume some degree of regularity. The armillary assumes that the heavenly bodies will continue to move across the skies in their accustomed paths; the tidal machine assumes the gravitational mass of the sun and moon will remain constant; the spreadsheet assumes that sales are always going to be added to revenues.
Because the simplification process is done by human beings, models reflect our strengths and our weaknesses. The strengths include our ability to see the order beneath the apparent flux of change. But we are also inevitably prone to using unexamined assumptions, have limited memories and inherent biases, and are willing to simplify our world to the point where we can understand it.
Despite models’ inescapable weaknesses due to our own flawed natures, they have been essential to how we understand and control our world. They have become the stable frameworks that enable us to predict and explain the ever-changing and overwhelming world in process all around us.
Beyond Explanation
We are transitioning to a new type of working model, one that does not require knowing how a system works and that does not require simplifying it, at least not to the degree we have in the past. This makes the rise of machine learning one of the most significant disruptions in our history.40
In the introduction, we talked about Deep Patient, a machine learning system that researchers at Mount Sinai Hospital in New York fed hundreds of pieces of medical data about seven hundred thousand patients. As a result, it was able to predict the onset of diseases that have defied human diagnostic abilities. Likewise, a Google research project analyzed the hospital health records of 216,221 adults. From the forty-six billion data points, it was able to predict the length of a patient’s stay in the hospital, the probability that the patient would exit alive, and more.41
These systems work: they produce probabilistically accurate outcomes. But why?
Both of these examples use deep learning, a type of machine learning that looks for
relationships among the data points without being instructed what to look for. The system connects the nodes into a web of probabilistic dependencies, and then uses that web—an “artificial neural network”—to refine the relationships again and again. The result is a network of data nodes, each with a “weight” that is used to determine whether the nodes it is connected to will activate; in this way, artificial neural networks are like the brain’s very real neural network.
These networks can be insanely complicated. For example, Deep Patient looked at five hundred factors for each of the hundreds of thousands of patients whose records it analyzed, creating a final data set of two hundred million pieces of data. To check on a particular patient’s health, you run her data through that network and get back probabilistic predictions about the medical risks she faces. For example, Deep Patient is unusually good at telling which patients are at risk of developing schizophrenia, a condition that is extremely hard for human doctors to predict.42
But the clues the system uses to make these predictions are not necessarily like the signs doctors typically use, the way tingling and numbness can be an early sign of multiple sclerosis, or sudden thirst sometimes indicates diabetes. In fact, if you asked Deep Patient how it came to classify people as likely to develop schizophrenia, there could be so many variables arranged in such a complex constellation that we humans might not be able to see the patterns in the data even if they were pointed out to us. Some factor might increase the probability of a patient becoming schizophrenic but only in conjunction with other factors, and the set of relevant factors may itself vary widely, just as your spouse dressing more formally might mean nothing alone but, in conjunction with one set of “tells,” might be a sign that she is feeling more confident about herself and, with other sets, might mean that she is aiming for a promotion at work or is cheating on you. The number and complexity of contextual variables mean that Deep Patient simply cannot always explain its diagnoses as a conceptual model that its human keepers can understand.