This is what happens when a system becomes large enough. It interacts with the user, other systems, and itself in unexpected ways. In fact, it is estimated that when a software project becomes twice as large, the rate of errors increases. You don’t just get twice as many problems, you get more than twice as many problems: the number of errors per thousand lines increases. So a program of 10,000 lines, say, could have four times as many errors as one with 5,000 lines.
This unpredictability and fragility is actually a hallmark of the complex systems that we build. While complicated systems are often incredibly robust to shocks that are anticipated—that is, ones they have been designed for—their complexity can be a liability in the face of the unanticipated.
A mathematical model has even been devised for understanding this specific situation, known as highly optimized tolerance. Our systems are optimized to tolerate a wide variety of situations, but anything new can lead them into a catastrophic spiral of failure. Take the Boeing 777, a massive machine of an airplane that contains 150,000 subsystem modules designed to ensure it flies well and adapts to numerous situations. But it can’t handle every possible contingency. According to two scientists, “The 777 is robust to large-scale atmospheric disturbances, variations in cargo loads and fuels, turbulent boundary layers, and inhomogeneities and aging of materials, but could be catastrophically disabled by microscopic alterations in a handful of very large-scale integrated chips or by software failures.” In other words, as a system becomes more complex, the tiniest stimulus could potentially be a catastrophic disruption. We simply don’t understand what might happen.
In fact, these unexpected consequences are related to the edge cases and exceptions discussed earlier. The world is complicated, necessitating a complicated system to handle it. But many of these complications are only rarely encountered. As rare as they are individually, though, these uncommon situations can cause problems for technological systems because there are too many of them to test for properly. For example, returning to vehicles with their sophisticated software, it isn’t possible to actually test them exhaustively. As the computer scientist Philip Koopman has noted, “Vehicle testing simply can’t find all uncommon failures.” Even a human lifetime is not long enough to examine everything.
When the world we have created is too complicated for our humble human brains, the nightmare scenario is not Skynet—the self-aware network declaring war on humanity—but messy systems so convoluted that nearly any glitch you can think of (and many you can’t) can and will happen. Complexity brings the unexpected, but we realize it only when something goes wrong.
The Symptoms of the Entanglement
Kate Ascher, a professor of urban development at Columbia University, has released a series of books about how cities, transportation networks, and individual buildings work, and the complexities underlying their respective systems. The books are lush with diagrams and fascinating in their details. They are also somewhat overwhelming. These systems have accreted over decades, sometimes centuries, with layer upon layer being added over time, from road networks to the methods by which energy and other necessities are distributed to the buildings of a city. For example, to provide each of our homes and places of business with water is an enormously complex affair. To give a sense of the enormous scale of just the removal of wastewater, this process involves more than 6,000 miles of underground pipes in New York City alone, part of an elaborate construction that handles over one billion gallons of sewage per day.
However, we usually recognize this complexity only when things go wrong. In spring 2010, the population of metropolitan Boston was treated to a crash course in how water is managed and distributed. On the first day of May that year, a water main broke in Weston, Massachusetts, one that carried water from the Quabbin Reservoir. For several days, the residents of many communities (including Brookline, where I was living at the time) were advised to boil their water, as they were now receiving water from backup sources. However, the residents of Cambridge—just across the river and surrounded by the towns affected—were fine, as their water arrived from its own distinct source. While those working for the city were certainly aware of the complexity of the water system and its idiosyncrasies, much of the metropolitan population likely became aware of these facts only once they experienced the system’s failure.
Andrew Blum, in Tubes, a book exploring the physical infrastructure of the Internet, begins with a similar realization: the sheer interconnected physicality of the Internet—a tangible network that crisscrosses the globe—is revealed to him only when his Internet connection stops working because wiring in his backyard has been chewed on by squirrels.
A truism in the open source software development world is known as Linus’s law (after Linus Torvalds, the creator of Linux): “Given enough eyeballs, all bugs are shallow.” In other words, with a large enough group of people examining a piece of technology, any glitch—no matter how complicated it is and how difficult it may seem to remedy—can be fixed, because the chances are high that someone will find a way to overcome the problem.
But as our systems become more and more complicated, this may not be true anymore. All the bugs cannot be eradicated: the likelihood that someone will spot—and eliminate—every bug is far lower when we are reckoning with complex and interconnected systems. Furthermore, each fix can, in turn, lead to new problems. This sounds pretty depressing, and in a way it is. But there is at least a partial way out of this funk.
These technological werewolves are not simply markers of the new era we find ourselves in; they can also point us toward a new way of managing our systems. Just as the water crisis around Boston taught many of us where our water actually comes from, examining bugs is one of the few options we have for learning about our world and thriving in the Entanglement.
What Glitches Can Teach Us
Several years ago, Gmail—Google’s email service—suffered an outage, and went down for many users for a total of about eighteen minutes. It was eventually determined that a slight error in an update of one piece of Google’s software (the software that ensures that processing is balanced so that no part of the system becomes overwhelmed) caused a large number of servers to be considered unavailable, even though they were working properly. This error didn’t affect many of Google’s other services, but Gmail turned out to require specific data center information, and so it just went down. This cascade was triggered by such a small problem that few might have thought it could cause such a major meltdown. The bug demonstrated a hidden interconnectivity between certain systems that would have remained hidden if the glitch hadn’t occurred.
When you debug a piece of technology—that is, attempt to root out errors—you learn the difference between how you expected the system to operate and how it actually works. Sometimes these bugs, whether in automobile software, Internet security, or our urban infrastructure, are simple, easily understood and fixed. At other times they are incredibly frustrating, and can be nearly impossible to diagnose and remedy. But the cataloging of these failures is the first step to learning about a part of the complex system that we’re looking at. This natural history of our technological world is vital. Just as naturalists go out into the natural world and study it, cataloging its variety and its complexity, we need a similar approach to our technologies.
We are going to continue to need this sort of “technological natural history” more and more. Take another example from programming. Let’s say you’re thinking of a number between 1 and 100 and I have to guess it. I first ask you if it’s greater than 50. If it is, then I ask you if it’s bigger or smaller than 75, and I keep on dividing the remaining numbers in half until I find your number. This method is known as binary search, because of the division into two groups, and is a highly efficient method for finding what you are looking for in a large sorted list.
Implementations of binary search are found throughout the world of software. Therefore, it was more th
an a little disturbing to read the title of a blog post from Google in 2006 discussing a bug in many implementations: “Extra, Extra—Read All About It: Nearly All Binary Searches . . . Are Broken.”
While code implementing binary search in our software does generally work well, it turns out that many versions can fail under conditions that involve huge amounts of data: “This bug can manifest itself for arrays whose length (in elements) is 230 or greater (roughly a billion elements). This was inconceivable back in the ’80s, when Programming Pearls [a classic book on computer programming, whose binary search implementation contains this error] was written, but it is common these days at Google and other places,” according to the blog post.
Only in today’s world of huge datasets could we have discovered this error and learned more about this particular system that we have built. The bug is a window into how this technology actually works, rather than how we intended it to operate.
There are many other instances when unexpected behaviors provide a mechanism for learning how a technology really works. For example, at the beginning of 1982, the Vancouver Stock Exchange unveiled its own stock index, similar to the S&P 500 or the Dow Jones Industrial Average. It was initially pegged to a value of 1,000 points and then, over nearly two years, steadily dropped. Near the end of 1983, it sat at about half its original value. But this didn’t make any sense. There was a bull market in the early 1980s; how could the index be declining? Thus began an investigation into what was happening, culminating in the discovery that the calculations of the index were wrong. Instead of taking the index’s value and rounding it to three digits, the algorithm responsible for the index’s calculation was simply lopping off what came after the three digits. For example, if the index was calculated as 382.4527, it would be stored as 382.452, even though the final 2 in the truncated number should have been rounded up, for a value of 382.453. When this is done thousands of times a day, actual value is lost—in this case, a lot of value. The error was finally corrected in November 1983, when the index closed around 500 on a Friday, only to open the following Monday at over 1,000, with the lost value added back. This deeper problem in calculation—the flaw in the algorithm—was noticed because of an anomaly (in hindsight, not a particularly subtle one): the index was going down while the market was going up in value.
We see similar examples of failures becoming our teachers when companies and governments use sophisticated machine-learning algorithms to comb through huge datasets. For the most part, these learning systems remain impenetrable black boxes for both the general public and, increasingly, the people working with them. Occasionally, however, these huge systems spit out strange or even worrisome results that inadvertently provide us with a hint of what is happening inside the system.
Take Microsoft's artificial intelligence chatbot, Tay, which was designed to interact with users in the style of a nineteen-year-old woman. Less than a day after Microsoft launched the bot on Twitter, the combination of the bot's algorithms and the input it received from a particularly ruthless Internet apparently turned Tay into a racist. Among its tweets, Tay agreed with a white supremacist slogan, expressed support for genocide, and noted that Ricky Gervais “learned totalitarianism from adolf hitler, the inventor of atheism.” Clearly, Tay’s turn to bigotry was not meant to happen. Through this failure, the designers at Microsoft became better aware of how their program could interact with the raw, unfiltered id of the Internet and result in such hateful output, a type of output that they likely were not even aware was possible.
But waiting for unexpected behavior to reveal bugs is not enough. Many developers of technology actively search out bugs and collect them, placing them in a database so they can be addressed in a systematic way.
What’s more, while software is in development, people try to actively break it, testing all the edge cases and weird things that users might do in real life, as opposed to the pristine manner in which programmers envision their creation being used. At Netflix, this strategy has even been taken to its logical conclusion in a piece of software known as Chaos Monkey. Chaos Monkey’s function is simple: it unexpectedly takes Netflix systems out of service. Only by seeing how the vast Netflix system responds to these intentional failures can its engineers make it robust enough to withstand the unexpectedness that the messy real world might throw at it. The hope is that once Chaos Monkey has done its job there will no longer be any mismatch between how the engineers thought the system worked and how it actually does work.
Learning from bugs is an important mechanism for understanding any complex system. If we look at the history of science, we see that naturalists have been taking this approach to studying the complex systems of nature for centuries.
Naturalists for Technology
When I was young, the most incredible word for me—it looked so improbable it astonished me with its very existence—was “miscellaneous.” This word was fascinating. It looked like it was cobbled together from so many different linguistic bits. I didn’t know how to pronounce it, but it was wonderful nonetheless.
The magic of the word is not just in its sound and appearance. The cluttered spelling betrays its meaning: that there is a place in life for the grab bag. “Miscellaneous” means that even the messy and disordered is a category, a way of being organized. The existence of the miscellaneous is an affirmation that messiness—sprawling and complicated though it may be—can be tolerated and embraced.
Being comfortable with the miscellaneous—embracing a spirit of miscellany—is something that not everyone is good at. When we look at a complex situation, many of us, myself included, have a first instinct to somehow simplify it, to brush away all the complication and find the underlying elegance. When this works it can be incredibly satisfying, such as when we find a single cause for a failure. But when it doesn’t work and we are left with a muddle, that’s when many people become overwhelmed and unable to respond productively.
Naturalists who examine the natural history around us have long been comfortable with the miscellaneous. Sometimes they detect an order in the living habits and behavior of the animals and plants they are observing, but there is also a purpose to their observations even without some sort of theoretical order—it allows them to understand and record the details, even if they don’t yet have a complete mental framework for every living thing they see. The physicist Enrico Fermi, when asked to name one of the many particles studied in particle physics, replied, “Young man, if I could remember the names of these particles, I would have been a botanist.” Naturalists—like John James Audubon, who, among other things, chronicled and illustrated the birds of the United States—recognize that it is important to know the details, and sometimes even the names, of these different individual pieces, whether or not we know how they fit together.
In the natural world, it is the bugs and glitches in biology, from mutations to diseases, that can teach us how living systems work. Errors in gene replication, from large aberrations in chromosomes to a single incorrect letter in DNA, and the visible differences or defects they produce, are how we can learn about the functions of genes. Mutations in fruit flies have helped to provide insight into how organisms develop from a single cell and a genetic blueprint into full-grown living things. For example, one way that biologists learned about key genetic sequences that govern body form was through the monstrous Antennapedia mutants: flies that have legs sprouting from their heads in place of antennae.
In technology, we need the same sort of approach. The digital universe, to borrow a term from the historian of technology George Dyson, is expanding beyond our control. According to Dyson, there were only 53 kilobytes of high-speed RAM on the entire planet in March 1953. A single personal computer can now have more than a hundred thousand times this much RAM. The digital universe has become unimaginably richer, larger, and more interconnected. And it has become increasingly independent of humanity. Messages speed their way around the globe faster than we ca
n recognize them, intersecting and interacting in often wonderfully unexpected ways.
While we cannot understand all the interrelationships of these systems, we can act as technological naturalists, chronicling and cataloging the diversity of the systems and parts of systems that we encounter. We can examine the anomalies and malfunctions to gain insights, even if we don’t fully understand the system as a whole.
Chapter 5
THE NEED FOR BIOLOGICAL THINKING
In the mid-seventeenth century, an English physician named Nathanael Fairfax published some articles in the scientific journal Philosophical Transactions, the publication of the Royal Society, the British national scientific society. He observed several intriguing phenomena, and decided to communicate them to the scientists of the age.
In one paper, on “Divers Instances of Peculiarities of Nature, Both in Men and Brutes,” Fairfax told the story of a man of about forty years old who was accustomed to drinking hot beer. When he eventually drank a cup of cold beer, he became sick and died within a couple of days. This observation led Fairfax to speculate about the agreeability of certain temperatures to the stomach. Fairfax also wrote of a woman who was struck with a sort of nausea every time she heard thunder. He did not speculate about why thunder might have this effect on a woman, noting only, “And thus it hath been with this Gentlewoman from a Girl.”
One gets the sense that while Fairfax wanted to eventually learn something from these observations and facts he recorded, the act of making the observations was itself enough, at least as a first step toward understanding.
Overcomplicated Page 8