The Future of Everything: The Science of Prediction

Page 15

by David Orrell

A curious feature of prediction of any sort is that there is little correlation between the forecast accuracy and the amount that people are willing to pay. An acquaintance who used to work at Enron told me that some power companies even paid for forecasts a year in advance. (It would have been cheaper to buy the almanac.) As Cato the Censor said of ancient Rome, “I wonder how one augur can keep from laughing when he passes another.”47

So what’s going on? Can chaos be responsible for all this error? Or could the problem lie deeper?

COMPLICATIONS

According to that great savant, U.S. Secretary of Defense Donald Rumsfeld, errors can arise because of both the known unknowns and the unknown unknowns.48 In weather forecasting, the known unknowns are the errors in the initial condition. We know that a particular cell in a grid has a certain average temperature, pressure, and so on, but we also know that our estimate of those quantities— the analysis—is in error. The measurement devices are accurate only to a certain tolerance; the temperature, or any other quantity, may vary widely over a single cell; and finally, there are many areas of the globe, like the oceans, where measurements are quite sparse. In these areas, the data-assimilation scheme must interpolate from the few measurements that exist, a procedure that is prone to error. However, we at least know that the errors exist, and therefore can estimate their magnitude.

The second source of errors is the unknown unknowns—the things we don’t know that we don’t know. The actual weather prediction is carried out by a GCM that’s based in part on physical laws, such as conservation of energy. However, it still has to make a large

number of approximations. For example, the coarse resolution means that the weather over a single grid cell is averaged out, and anything smaller than a cell, such as a single cloud, a small storm, or a detail of a coastline, doesn’t appear. Turbulent eddies in the atmosphere have the effect of transferring energy from large-scale (kilometre-size) to small-scale flows.49 As Lewis Fry Richardson put it, paraphrasing Jonathan Swift, “Big whirls have little whirls that feed on their velocity, and little whirls have lesser whirls and so on to viscosity—in the molecular sense.” Models resolve only big whirls, and must estimate the effect of small whirls. Experiments using cellular automata, which simulate idealized local interactions between parcels of air, show that turbulent flow is not generally sensitive to initial condition, but it still eludes computation using equations.50 The atmospheric motion near the ground is particularly difficult to model, since the flow interacts with the planet’s surface, is highly turbulent, and must take into account local heat fluxes (which depend on factors such as soil type, vegetation coverage, and so on).

The biggest contribution to error, though, is water. From a dynamical point of view, the most important substance in the atmosphere is not air but the water that air contains. There isn’t that much of it: if all the water in the sky were to fall to the ground as rain, it would form a layer about an inch high.51 But the high thermal capacity of water means that it can contain more than its fair share of energy in the form of heat. Indeed, the top few feet of water in the earth’s oceans contain as much heat as the whole atmosphere (and this drives El Niño). The release of latent heat energy, which occurs when water vapour condenses into clouds, is one of the major drivers behind atmospheric circulation. Clouds in turn reflect solar radiation, cause precipitation, and determine much of what we call the weather. These processes are poorly understood and cannot be modelled from first principles, so modellers are forced to use parameterizations: ad hoc formulas that attempt to fit the available data.

Take clouds. In a GCM, one cell of the grid might be assigned a certain degree of “cloudiness” to reflect the average cloud cover. Variables such as temperature or pressure are by definition averages over a large number of molecules, so it makes sense to average them over a cell. Clouds, however, are entities with a high degree of structure. Leonardo found that a human body would not fit easily into a square, and clouds do not fit into a Cartesian box.

A common game for children is to look for clouds that resemble animals, and it is perhaps fitting that the first system for categorizing clouds was invented in 1801 by the French botanist and zoologist Jean Lamarck, whose theory of evolution preceded Darwin’s. The system that was eventually adopted worldwide was proposed independently in 1803 by the Englishman Luke Howard. It divided clouds into three basic types: cirrus, cumulus, and stratus. Cirrus clouds are high, wispy clouds that consist almost entirely of ice crystals, cumulus clouds are mid-level, and stratus are the closest to the ground. There are numerous subcategories, from the cauliflower-like cumulus congestus to the threatening, anvil-shaped cumulonimbus. Clouds also come in a huge range of sizes and have a fractal, scale-free property: wisps of cloud viewed up close from an airplane window can look similar to massive cloud systems in satellite pictures. You can’t talk about an average cloud, or even an average-size cloud, in the same way you can talk about an average molecule of air.

The formation and dissipation of clouds is also a complex dynamical process. Clouds are a mixture of minute water droplets and ice crystals. Their growth depends on a wide variety of factors, such as humidity, temperature, and the presence of small particles in the air. These particles—which include dust, smoke, sea salt, droplets of sulphuric acid and ammonium sulphate caused by algae emissions,52 plant fragments, industrial chemicals, spores, pollen, and even human dandruff 53—act as nuclei on which water can condense. (Dubious attempts at weather control often involve shooting such particles into the atmosphere to “seed” clouds.) Because of the intrinsically social nature of water molecules, the rate of condensation on a droplet depends on the size and, particularly, the radius of the cluster already there. Like people choosing a restaurant, water molecules prefer a spot that is already well attended. As a result of this positive feedback, the growth of each droplet depends in a non-linear way on its size; in any given cloud, there will be a range of droplet sizes.

Trying to model a cloud is about as easy as trying to hold one in your hands. Despite the heroic efforts of meteorologists, the best that a GCM can do is to assign rough values for cloud properties for each cell, making them vary in some plausible way to account for things like temperature and humidity. Such parameterizations may be based to a degree on physics, but they are a long way from Newton’s laws of motion. (There may be a law of gravity, but there isn’t a law of clouds.) They are a major source of error, especially since estimates of cloud cover affect the calculations of temperature, humidity, and so on, which are the calculations used to make the estimates in the first place. To predict how clouds change and evolve with time, details matter. And because clouds exist over a huge range of scales, there is no particular grid size that is small enough to capture all the information. Even if the resolution is improved, new parameterizations will be needed to model the finescale physics. The number of model variables will therefore explode, and forecast accuracy may actually get worse.54

In many ways, it makes more sense to view clouds as emergent properties of a complex system, rather than as something that can be computed from first principles. In a paper about human cognition, psychologists Esther Thelen and Linda Smith compare the formation of clouds and storms to the development of cognition: There is a clear order and directionality to the way thunderheads emerge over time. . . . But there is no design written anywhere in a cloud. . . . There is no set of instructions that causes a cloud . . . to change form in a particular way. There are only a number of complex physical . . . systems interacting over time, such that the precise nature of their interactions leads inevitably to a thunderhead. . . . We suggest that action and cognition are also emergent and not designed.55

As will be discussed further in Chapter 7, there are countless other sources of error in the prediction of weather, many of which relate to the complex positive and negative feedback loops that regulate the climate. In fact, while the GCM is a huge and sophisticated project—perhaps the most complex mat
hematical model yet produced—it is still only a crude approximation of the oceanatmosphere system.

GETTING THE DRIFT

While errors in the initial condition can grow exponentially (which is not the same as rapidly) in chaotic systems, the errors in the model are even harder to predict because they grow in a cumulative and dynamic manner. The model error at the initial time is different from the model error a while later. And since the model is an ODE, which specifies the rate of change of variables with time, the errors are expressed not in terms of atmospheric quantities like temperature and pressure, but in terms of their rate of change. In the Pythagorean scheme, they are not at rest, but in motion.

When I first started researching model error as a Ph.D. student in 1999, the consensus opinion among experts was that it was a small effect compared with chaos. As the story went, “In the early years of NWP [Numerical Weather Prediction], forecast errors due to simplified model formulations dominated the total error growth. . . . By now, however, models have become much more sophisticated and it is the errors that arise due to [chaotic] instabilities in the atmosphere (even in case of small initial errors) that dominate forecast errors.”56 Some even thought model error wasn’t worth investigating. An anonymous reviewer of a paper said that “many people at the operational [weather] centers are not convinced that the effect of model error on forecasting is such that it warrants a great deal of research effort.”57

” As a “mature” student who had already worked several years on the design and testing of magnet systems for particle accelerators, I found this position a little strange. As any engineer knows, there can be a big difference between theory and reality. Even if you have complete knowledge and control of the materials used in construction, they can still behave in unexpected ways when they are all put together in a structure. This is what happened with London’s Millennium Bridge, better known as the wobbly bridge because of the way it started to weave and shake the first day a crowd walked over it. An engineer from the firm that designed it told me they had done three detailed analyses of its motion, but none had revealed a potential problem. Perhaps the reason engineers are more concerned with model error than forecasters is because the latter never get sued (though people have tried).58

My Ph.D. supervisor was Lenny Smith from Oxford University (now also at the London School of Economics), who is an expert on chaos theory and non-linear systems, as well as an inspiring mentor.59 He suggested model error as a topic for the very reason that it would be new, uncharted territory. There were only a handful of experiments that compared the predictions of two different models—say, an American one and a European one—with the actual weather.

These experiments showed that the difference between the model and the weather was similar for either model, so it was assumed that the sensitivity of forecasts to the choice of model was small. However, the fact that two models are wrong by a similar amount does not mean they are both right: they may just be wrong in similar ways. Also, while Americans and Europeans are dissimilar, their models are quite alike. They are each written by meteorologists who read the same textbooks and attend the same conferences and incorporate one another’s improvements.

The easiest way to measure the difference between two models is to compare them to each other. In collaboration with ECMWF, we set out to compare a low-resolution model (which would have been the best available just a few years earlier) and a more recent high-resolution model. The latter would play the role of “truth”—the real weather— while the former would be the “model,” with the error being the difference between the two. The idea was to see how close the model versions were to each other: if errors were large, it implied that the models had not converged. Later, we would compare the operational model, used in everyday forecasting, to actual observations of the weather. Neither of these basic reality checks had ever been performed before.

Model error was measured using two methods. The first was to look for shadows. If error is a result of chaos rather than a problem with the model, it should be possible to perturb the initial condition slightly so that the resulting model trajectory, known as a shadow, stays close to truth.60 On the other hand, if sensitivity to initial condition is low compared with model error, then no small adjustment will help. It is like an archer trying to hit a target in a strong wind. Shadows have a fairly long history in the field of non-linear dynamics, but the technique had not been used with weather models.61 The models have millions of variables, and a perturbation to the initial condition can be made to any or all of them. Computing shadows therefore involves a search over many possible choices, but it can be done using a suitable optimization program.

FIGURE 4.7. Simplified schematic diagram of a shadow orbit. When the model is initiated at the “true” initial condition, it soon exceeds the prescribed tolerance (rs). The shadow trajectory starts at a perturbed initial condition and shadows (stays within the tolerance)for a longer time.

The second method, calculating the model drift, is a simple way of estimating model error, but it has the benefit of being easy and fast to apply. The calculation is performed by making a large number of short forecasts at regular intervals—say, every six hours— along the true trajectory. The forecast errors are then summed together, and the magnitude of the sum is the drift. Because the forecasts are constantly being set back to truth, this filters out the effects of chaos, so the drift approximates the error resulting from the model alone.62 It adds up all the small, moment-by-moment errors that push the model away from the truth and can be used to estimate the expected shadow performance.

Both the drift and the shadow calculations were in agreement. They showed that most error was the result of differences in the models. Since we were comparing two models of different resolution, this meant that they had not converged on a single version of reality.63

CURVING UP OR CURVING DOWN

Of course, what we really cared about was how the operational forecast model, used every day to make predictions, compared to the analysis. The results here were even more striking. A drift calculation showed that nearly all the error over the first three days was the result of the model. Observational errors obviously also played a role; however, this was reduced in part because of the way the data assimilation was being performed. Chaos appeared to cause only a slight amplification of the model errors. Even then, the shape of the error curve showed no sign of the exponential growth that we would expect to see in a strongly chaotic system. In fact, over the first couple of days, it grew roughly with the square root of time, so by day two the error was about twice what it was at twelve hours (as seen in figure 4.8 on page 164).64

The reason, we realized, was that the model errors were behaving like a random walk. This term, commonly used in economics to describe price fluctuations caused by external shocks, was first introduced in a 1905 paper in the journal Nature. The paper aimed to determine how far a drunken man walking randomly in an open field could be expected to travel.65 The man takes a step in a random direction, then another step in a random direction, and so on, gradually getting farther and farther away from his starting point (which is bad if he is looking for his keys). The expected distance travelled grows with the square root of time. In weather forecasting, the model was like a sober person marching in a straight line, while the weather was the drunken friend weaving randomly from side to side and getting gradually farther away.

In many ways, square-root error growth is the opposite of the exponential growth in figure4.5.The former grows most quickly for short times, the latter for long times. The former curves down with time, while the latter curves up. It is impossible to confuse the two. So when I presented the results in a seminar at ECMWF, I was surprised when I put up a plot of error growth, and someone interrupted my talk to say that the plot must be wrong, since error growth has positive curvature, not negative. Others in the audience murmured in agreement.

FIGURE 4.8.Plot of error growth in total energy for weather model versus analysis
. Errors grow roughly with the square root of time over the first couple of days.

I was naturally disconcerted, and though I continued with my talk, I wondered if I had made a mistake in the calculations I had performed to produce the figures. My apprehension increased the next day, when I received an e-mail from one of the top research heads at ECMWF, which said that he had checked a plot of wind errors, and in stark contrast to mine, his plot “certainly [did] show positive curvature.”66 We therefore decided that someone there should try to reproduce my results by plotting the errors as a function of time. Either they would curve up, or they would curve down.

That weekend, I nervously looked through old ECMWF reports showing plots of error growth. Although none of the experiments had been performed in a global metric like total energy, and all were only incomplete snapshots of the errors, they still seemed in agreement with my results, and there was no sign of exponential growth. Also, I noticed that the authors of the reports, almost in an act of self-censorship, had consistently omitted the first couple of time points, which made the negative curvature less obvious. But surely ECMWF, which had perhaps the best weather models in the world, would know what error plots looked like.

The next week, I received an e-mail containing the recalculated error results. They were exactly the same as the ones I had presented. So it hadn’t been some numerical mistake on my part—the errors really did grow with negative curvature. It seemed that the desire to blame chaos had affected people’s basic shape-recognition skills. I thought this would settle the matter, but it turned out to be only the beginning. The weather centres refused to accept that the error could be caused by the model, and they advanced a number of theories (but no more experiments) about how the results were compatible with exponential growth.

‹ Prev Next ›