Overcomplicated

Page 5

by Samuel Arbesman

Rare words, then, are quite important as a group; they permeate our language. If we’re building a computer program to model language, it’s tempting to abstract away rare words or odd grammatical structures as outliers. But as a category, if not individually, they make up a large portion of language. Abstracting them out will cause our model to be woefully incomplete. To avoid losing our exceptions and edge cases, we need models that can handle the complexity of these exceptions and details. As Peter Norvig, Google’s director of research, put it, “What constitutes a language is not an eternal ideal form, represented by the settings of a small number of parameters, but rather is the contingent outcome of complex processes.”

So, computational linguists incorporate edge cases and try to build a robust and rich technological model of a complex system—in this case, language. What do they end up with? A complex technological system.

For a clear example of the necessary complexity of a machine model for language, we need only look at how computers are used to translate one language into another. Take this great, though apocryphal, story: During the Cold War, scientists began working on a computational method for translating between English and Russian. When they were ready to test their system, they chose a rather nuanced sentence as their test case: “The spirit is willing, but the flesh is weak.” They converted it into Russian, and then ran the resulting Russian translation back again through the machine into English. The result was something like “The whiskey is strong, but the meat is terrible.”

Machine translation, as this computational task is more formally known, is not easy. Google Translate’s results can be imprecise, though interesting in their own way. But scientists have made great strides.

What techniques are used by experts in machine translation? One early approach was to use the structured grammatical scaffolding of language I mentioned above. Linguists hard-coded the linguistic properties of language into a piece of software in order to translate from one language to another. But it’s one thing to deal with relatively straightforward sentences, and another to assume that such grammars can handle the diversity of language in the wild. For instance, imagine you create a rule that handles straightforward infinitives, but then doesn’t account for split ones, such as “To boldly go where no one has gone before.” And what about regional phrases, like the Pittsburghese utterance “The car needs washed” (skipping over “to be”)? The rules will cower in fear before such regionalisms.

Using grammatical models to process language for translation simply doesn’t work that well. Language is too complex and quirky for these elegant rules to work when translating a text. There are too many edge cases. Into this gap have stepped numerous statistical approaches from the world of machine learning, in which computers ingest huge amounts of translated texts and then translate new ones based on a set of algorithms, without ever actually trying to understand or parse what the sentences mean. For example, instead of a rule saying that placing the suffix “-s” onto a word makes it plural, the machine might know that “-s” creates a plural word, say, 99.9 percent of the time, whereas 0.1 percent of the time it doesn’t, as with words like “sheep” and “deer” that are their own plurals, or irregular plurals such as “men” or “feet” or even “kine.” Now do similar calculations for the countless other exceptions in the language.

Out of the chaos comes order—but at a price. The most effective translation program ends up being not a simple model but a massive computer system with a large number of parameters, all fit to handle the countless edge cases and oddities of language. These kinds of models “based on millions of specific features perform better than elaborate models that try to discover general rules,” in the words of a team of Google researchers. Exceptions must be cherished, rather than discarded, for exceptions or rare instances contain a large amount of information.

The sophisticated machine learning techniques used in linguistics—employing probability and a large array of parameters rather than principled rules—are increasingly being used in numerous other areas, both in science and outside it, from criminal detection to medicine, as well as in the insurance industry. Even our aesthetic tastes are rather complicated, as Netflix discovered when it awarded a prize for improvements in its recommendation engine to a team whose solution was cobbled together from a variety of different statistical techniques. The contest seemed to demonstrate that no simple algorithm could provide a significant improvement in recommendation accuracy; the winners needed to use a more complex suite of methods in order to capture and predict our personal and quirky tastes in films.

This phenomenon occurs in all types of technology. When building software more generally, the computer scientist Frederick P. Brooks Jr. has noted, “The complexity of software is an essential property, not an accidental one.”

Even the complex systems that make up the law are subject to the rule of exceptions and edge cases. While we think of the boundary between what is legal and what is not as a clear dividing line, it is far from being so. Rather, the boundary becomes further and further indented and folded over time, yielding a jagged and complicated border, rather than a clear straight line. In the end, the law turns out to look like a fractal: no matter how much you zoom in on such a shape, there is always more unevenness, more detail to observe. Any general rule must end up dealing with exceptions, which in turn split into further exceptions and rules, yielding an increasingly complicated, branching structure. The legal scholar Jack Balkin discusses this in an article evocatively titled “The Crystalline Structure of Legal Thought”:

We might consider whether under an objective standard of negligence, there is an exception for children, or a different standard for insane persons, or for those who are blind, or intoxicated, and so forth. This leads us to further rule choices, each of which leads to additional branches of doctrinal development. Assume, for example, that we follow one of these branches of doctrinal development and create an exception for children (which is now the majority rule). We might consider if there is an exception to that exception when the child engages in an adult activity (this, too, is the case now generally). We might then go on to ask if operating a motorcycle is an adult activity within the meaning of that rule, and if so, whether operating a motorscooter is also an adult activity. Put together, we have a descending series of rule choices of increasing factual complexity and specificity. . . .

The law professor David Post and the biologist Michael Eisen teamed up to examine this as well, and while they admit they can’t prove that a legal statement can always branch further, and that it’s “turtles all the way down,” they do note that “we have never met a legal question that could not be decomposed into subquestions.” Post and Eisen even show through simulations that certain types of branching structures that mimic legal systems actually can have a fractal structure. Testing this, they find features indicative of fractals when looking at actual legal citations of court case opinions. The fractal complexity of the law might be more than an evocative metaphor.

As the scholars Mark Flood and Oliver Goodenough recognize, “Much of the value of good contracts and good lawyering derives from the seemingly tedious planning for all the ways that a relationship might run off the rails.” In other words, legal complexity is often derived from exceptions and their complications.

Whichever technological system we look at—whether it be a legal system, a piece of software, an appliance, a scientific model, or whatever else that we have built—each is driven to become more complex and more kluge-like because of exceptions and edge cases, alongside the twin forces of accretion and interaction.

The Imperfect Nature of These Forces

While I’ve described these forces of increasing complexity as inexorable, they are not always impossible to resist. However, we can see their true strength by looking at our attempts to work against these forces, and how often we fail to eradicate the kluge.

There are some strategies that
appear to bring order and logic to our technologies. If we can build and design our systems differently, or modify and rebuild them, perhaps we have a hope of taming these technologies. For example, we could try to uncouple certain systems—breaking them apart into smaller pieces—so things stay relatively simple and manageable. The physics-trained sociologist Duncan Watts, a principal researcher at Microsoft Research, has argued that in the financial realm, one solution to failures that arise from complexity is to simply remove the coupled complications: if a firm becomes too big and its failure is expected to cause a cascading shock, it must be divided up or shrunk.

Similarly, some scholars have spoken of finding the optimal levels of interoperability for a large system. An optimal level would be one that allows powerful systems to operate well without the downside brought by high unpredictability. Optimal interoperability, then, rather than maximum interoperability, is the goal. Of course, this is not so easily achieved. It is one thing to want the right level of interoperability or interaction, and another to know how to build it. One way to do this involves using certain design principles, such as building understandability and modularity into our creations.

As discussed earlier, when a system has a great degree of interconnectivity, it is often difficult to pull apart the pieces and see what’s going on. But sometimes there are parts of a large system that are more tightly connected among themselves than they are with other parts. In other words, there are modules, parts of a system that are tightly interconnected and reasonably self-contained. We see many modules in biology, with integrated parts that act in concert, on scales from mitochondria to the human heart. These modules are still intimately connected to the rest of the system, whether through other parts of the body or through chemical signals—I do not recommend trying to remove your heart—but they are relatively distinct and can be understood, at least to some degree, by themselves.

We see modularity in technology, too, such as when a piece of software is made up of many independent functions or pieces; or when you can swap out different applications that do the same thing, but in different ways (think exchanging Microsoft PowerPoint for Keynote on the Mac); or when you examine particular, relatively distinct sections of the United States Code of federal legislation. Modularity embodies the principle of abstraction, allowing a certain amount of managed complexity through compartmentalization.

Unfortunately, understanding individual modules—or building them to begin with—doesn’t always yield the kinds of expected behaviors we might hope for. If each module has multiple inputs and multiple outputs, when they are connected the resulting behavior can still be difficult to comprehend or to predict. We often end up getting a combinatorial explosion of interactions: so many different potential interactions that the number of combinations balloons beyond our ability to handle them all. For example, if each module in a system has a total of six distinct inputs and outputs, and we have only ten modules, there are more ways of connecting all these modules together than there are stars in the universe.

In some realms that can be heavily regulated, such as finance or corporate structures, our dreams of increasing modularity or finding the ideal level of interoperability might work. We could, for instance, mandate the breaking up of certain institutions if they reach a given size. But in most other types of technological systems, these ramifying interconnections will often continue apace, no matter our desires. With a relatively small system, building it modularly or piecewise is possible, but as things grow, this kind of clear modularity becomes less likely. As a result of social pressures and the legacy structures of these systems, we continue interconnecting and muddling these systems over time. They grow and complicate despite our desire for simplicity.

We can try to build better-designed systems, and for a while that might even work. For example, there are good computer science and engineering practices, essentially “engineering hygiene,” that can drastically reduce the complexity of a system, such as avoiding certain types of variables in one’s computer programs. If Toyota had followed these practices, the overall complexity of its system would have been far lower. In addition, professional software development includes methods that can reduce the number of bugs in the programs that are created, bringing rates as low as 0.06 defects per 1,000 lines of code (a very low number); and there are specific practices for managing teams when people work together to construct, operate, and maintain a complex technology, helping to reduce many problems we might encounter in our systems. But in the long term, accretion, interaction, and the edge cases often swamp these attempts at simplification.

As our systems become more complex over time, a gap begins to grow between the structure of these complex systems and what our brains can handle. Whether it’s the entirety of the Internet or other large pieces of infrastructure, understanding the whole is no longer even close to possible.

But why must this be so? We next turn to the social and biological limits of human comprehension: the reasons why our brains—and our societies—are particularly bad at dealing with these complex systems, no matter how hard we try.

Chapter 3

LOSING THE BUBBLE

In 1985, a patient entered a clinic to undergo radiation treatment for cancer of the cervix. The patient was prepared for treatment, and the operator of the large radiation machine known as the Therac-25 proceeded with radiation therapy. The machine responded with an error message, as well as noting that “no dose” had been administered. The operator tried again, with the same result. The operator tried three more times, for a total of five attempts, and each time the machine returned an error and responded that no radiation dosage had been delivered. After the treatment, the patient complained of a burning sensation around her hip and was admitted to the hospital.

Several months later, the patient died of her cancer. It was discovered that she had suffered horrible radiation overexposure—her hip would have needed to be replaced—despite the machine’s having indicated that no dose of radiation was delivered.

This was not the only instance of this radiation machine malfunctioning. In the 1980s, the Therac-25 failed for six patients, irradiating them with many times the dose they should have received. Damage from the massive radiation overdoses killed some of these people. These overdoses were considered the worst failures in the history of this type of machine.

Could these errors have been prevented, or at least minimized? If you look at a 1983 safety analysis of these machines by the manufacturer, one of the reasons for the failure becomes clear. The individuals involved in designing and testing these machines looked only at hardware errors and essentially ignored the software, since “software does not degrade due to wear, fatigue, or reproduction process.” While this is a true statement, it completely ignores the fact that software is complex and can fail in many different ways. This report implies a lack of awareness on the part of its makers that software could have a deadly complexity and be responsible for a radiation overdose. Software bugs are a fact of life, and yet the safety analysis almost completely ignored the risks they present.

The people responsible for ensuring the safety of the Therac-25 misunderstood technological complexity, with lethal consequences. In hindsight it’s almost easy to see where they went wrong: they downplayed the importance of whole portions of the constructed system, and the result was a catastrophic failure. However, it’s more and more difficult to diagnose these kinds of problems in new technology. No matter how hard we try to construct well-built, logical technologies, there will always be some part that is beyond our complete understanding. The reason for this is simple: we are human. There is a fundamental mismatch between how we think and how complex systems operate; the ways in which they are built make them hard—or impossible—to think about.

One of the first things you learn when programming is to count differently. This doesn’t mean counting in binary or even in hexadecimal (16 different digits, rather than the usual 10)�
��for most programmers, this is an interesting but unnecessary skill. What I mean here is counting from zero, with the first object in a list always being the “zeroth.” Computer programmers count from zero rather than one because that’s the way machines count. As the writer Scott Rosenberg notes, the space between machine counting and human counting is an area where we make adjustments in computer code, but it’s also where errors and bugs originate. We have to adjust counts by one, incrementing or decrementing over and over, in order to adjust for the differences between how humans intuitively number the world and how machines enumerate their variables. When we fail to adjust, the errors multiply.

The fact that we don’t count from zero, but our computers do, is emblematic of the larger rift between human thought patterns and how large systems are constructed and how they operate. We can’t keep track of all the parts in these systems, how they all interact, and how each interaction leads to a new set of consequences. Our human brains are just not equipped to encompass this kind of complexity. In this space between how complex systems are built and how humans think, we find the complications that lead to reduced understanding and to unanticipated consequences and problems.

In the military, soldiers are often confronted with complicated situations that require holding a great deal of information in their heads simultaneously, while maintaining the capacity for rapid response. But sometimes a situation is just too complicated, too stressful, and too messy. A soldier gets overwhelmed and loses the capacity to manage the rush of events. When this happens, a soldier is said to “lose the bubble.” As Thomas Homer-Dixon describes the experience, “the comprehensible and predictable suddenly become opaque and bewildering.” Awareness of the situation and system drops precipitously, and the soldier is left unable to process the barrage of stimuli and act on it.

‹ Prev Next ›