Overcomplicated

Page 4

by Samuel Arbesman

Interaction

Students are currently inundated by exhortations to learn how to code. Computer programming is the future, educators and technologists tell us. It is used in the operation of everything around us, from our cars to our ovens. And in certain circumscribed ways, the experts are correct. Programming can teach you a structured way of thinking, and it can also provide a guide to what is truly possible with technology. If I say there’s a new app that can do X, if you’ve coded, even a little bit, you’re that much closer to knowing if my claim is reasonable or utter nonsense.

If you’ve coded before, you’ve also glimpsed how computer programs resist our efforts at simplification. As computer programs become larger and more sophisticated, they become more complicated, with more parts that all come together in ways that can be hard to grasp. We have a wide variety of techniques and technologies to make sure that these large programs are still manageable—things such as version control, bug-tracking software, or tools for communication across a team—but these are often rearguard efforts in a losing battle. For not only does software code accrete, but each of these components interacts with all the others. Far from each new addition existing in a pristine vacuum, a computer program is a massively interconnected system: it interacts with itself and with other pieces of software. We add layer upon layer to older code, using it in new and unexpected ways, stitching the layers all together.

Sometimes this process of interaction (and unexpected interaction) is exacerbated by the programming languages themselves. One problem even beginning programmers are familiar with is that of the command GOTO. In the programming language BASIC, a program can easily jump from one line to another using GOTO. When a GOTO is included in the code, the program’s flow can easily be redirected from line to line, leaping from one point in the code to another. Making this happen is simple and easy, and has even been described as a “godsend” for those from the liberal arts who are testing their abilities in computational thinking.

In a small program, using GOTO to jump around is fine. But as a program becomes bigger, using GOTO ends up tying the code into complicated knots that are very hard for even a skilled programmer to untangle. You end up with what is known as spaghetti code because it is looped together, difficult to unravel, and hard to understand completely. It becomes nearly impossible to figure out the order of the instructions that will be executed in the computer program, leading to unexpected and incomprehensible behavior.

A simplifying construction, while it works in isolation and for small situations, has a way of escaping its purposes. A system or component gets used more often and in different ways than we could have possibly anticipated, leading to a far more complicated situation than initially envisioned. GOTO goes from being a wonderful shortcut to something considered not only inelegant, but actually “harmful.”

There are many methods designed to help make sense of and impose order on the systems that we build, including the use of more sophisticated computer languages. Spaghetti code in most professionally constructed systems should be a thing of the past. However, interaction continues to occur because of the ease of interconnection and the accretion of the various pieces. The dynamics of a highly interconnected system—how information flows through it and what parts will interact with others—become incredibly complicated and unpredictable. To return to the software inside Toyota vehicles, according to one metric, numerous pieces of code within these vehicles are simply beyond the possibility of testing, because of this interconnection and interaction.

We see this same process of increased interconnection in other types of technological systems that we build. In our legal system, because each new law or regulation interacts with those that have come before it, we find ourselves in situations where it becomes difficult to predict the effects of individual laws.

The lawyer Philip K. Howard has examined the story of the Bayonne Bridge, which crosses a channel between New York and New Jersey. This nearly hundred-year-old bridge is too low to allow modern container ships to pass beneath it in order to reach the port of Newark, a critical commercial hub. So what can be done? Among the proposed solutions to this problem was one that involved retrofitting, essentially raising the bridge to the necessary height. It was the cheapest solution and the one that won out, back in 2009. But construction did not commence for several years, because of a combination of accretion and interaction. The rules and regulations dictating the procedure for the bridge’s renovation involved forty-seven permits from nineteen different governmental entities—everything from environmental impact statements to historical surveys. Similar cases occur elsewhere, with some public projects taking around ten years to be approved because of the number of rules and processes that bear on such situations. This is particularly concerning when, as Howard notes, replacing decaying infrastructure can save lives.

Michael Mandel and Diana Carew of the Progressive Policy Institute in Washington, DC, have referred to this growth of rule systems as “regulatory accumulation,” wherein we keep adding more and more rules and regulations over time. Each law or rule individually might make sense, but taken together they can be debilitating because of their interaction, and can even collide in surprising and unexpected ways.

We increasingly interconnect not only the components of a single piece of technology, but also different pieces of software and technology—a higher-order type of interconnection. This is the concept of interoperability.

It is often a good thing to make technology interoperable, able to interact and pass information between systems. For example, the Internet is only as powerful as it is because of the huge number of connections it has and the messages it can pass between the countless machines it stitches together. When you ask Siri what is the population of the world is and your iPhone gets the result via the Wolfram|Alpha service, or when you use Google Maps and it shows you how much an Uber might cost, that’s interoperability. But when we make systems interoperable, we risk incurring massive amounts of complexity as a downside. We are now not just building interconnected systems—such as each of the individual machines and devices that make up the Internet—but interconnected systems of systems.

In addition to this kind of interoperability, we are also building interdependence between different kinds of technologies, such as the Internet coupled with the power grid. Researchers have studied these sorts of systems to understand their strengths and weaknesses and have found that certain types can and will fail under many conditions; for example, a cascade can start if only a relatively small fraction of the power grid goes down. One response would be to pull back on making these interconnections between technological systems. But that’s nearly impossible. The cost of construction of interconnected systems is too low: in an age of interoperability, where engineers and designers purposefully create interfaces for each system, it’s too easy to build and connect systems together.

When we build something new, there is a tradeoff between the cost of failure and the cost of construction. The cost of failure is how bad it will be if something goes wrong. If your word processor application crashes, you lose any unsaved work. Nobody really wants that outcome, but it’s a relatively low cost of failure. However, if a problem in the electrical grid renders a large portion of the United States without power, that’s an extremely costly failure. For example, the Northeast Blackout in 2003 affected 50 million people, contributed to the death of eleven people, and cost an estimated $6 billion.

Each failure—and its cost—can be balanced against the cost of building a system. Historically, the more important systems we have constructed have cost a lot to build; it takes more resources to construct the infrastructure for a banking system than for a chat program. Therefore, it has been worthwhile to ensure that costly systems are resistant to failure (which often adds yet more to the cost of construction). In other words, the high cost of construction has made it vital that the cost of failure be reduced through lots of ch
ecking and effort.

For a long time, this approach worked: the cost of construction appeared to overweigh the cost of failure, with important systems that we rely on for the vital functions of society costing a lot of money to build. But things have begun to change. The cost of construction has gone down drastically, thanks to off-the-shelf tools and components, resources available in the cloud, and much more. Tech start-ups no longer need much initial funding: you can build and market test a sophisticated tool quickly and cheaply.

Simultaneously, and thanks to some of the same trends, the cost of failure associated with interconnection has gone way up. It has now become easy and cheap to make the types of interconnected systems that incur huge costs when something goes wrong. When digital maps are connected to software that provides directions, small errors can be disastrous (for example, Apple Maps mislabeled a supermarket as a hospital when it was first unveiled). In an age when we can conceive of synthetically generating microbes by sending information over the Internet, the risk of some sort of biological disaster grows much higher. The poliovirus has been reconstructed in a lab using mail-order biological components; there are now start-ups working to allow biology experiments to be run remotely; and it is not hard to imagine, in our increasingly automated world, that a biological agent generated by software could inadvertently be unleashed upon the world. When the cost of construction plunges and the cost of failure rises, we enter a realm of technological complexity that should give everyone pause.

In general, interaction within and between systems increases, which increases the complexity of our overall systems. This increased interconnection is virtually a basic imperative of technology, according to some. Technology ultimately connects, interacts, and converges. And when it does, it acts as a further force moving us toward complication.

While these trends have been happening for as long as we have been building technology and large systems, they’ve become more powerful in recent years. As noted in this book’s introduction, the computer scientist Edsger Dijkstra recognized the radical novelty of our large systems, specifically our computational ones. In 1988, Dijkstra noted that programming a computer requires the ability to jump across massive differences in scale, something no one had really had to handle before computers. To understand this traversal of such a massive hierarchy, Dijkstra gave the example of intellectually navigating from the scale of a single bit in a program or machine up to several hundred megabytes of storage, leaping between the very small and very large. This involves jumping up nearly a billion times in size, a change in scale far more extreme than anything anyone had grappled with before. This has only become more extreme since: everyday users of technology must now be familiar with prefixes of giga- or tera-, prefixes that make us responsible for such huge differences in scale that they border on the astronomical.

Only in the past several decades have large systems become so big and so interconnected that we have found ourselves with, to use Dijkstra’s phrasing, “conceptual hierarchies that are much deeper than a single mind ever needed to face before.”

However, even if we could prevent our systems from accreting and interacting, there’s another reason our systems become more complex over time. And it’s one that we are even less able to untangle.

The Edge Case

Imagine you want to create a calendar software application. Sounds straightforward, right? In many ways it is. The calculation of the number of days in the year is relatively easy. It’s 365, unless it’s a leap year, and then we add a single day to the end of February. But how do we calculate a leap year? Well, you look at the year. If the year is divisible by 4, but not by 100 (unless it’s also divisible by 400), then you get a leap year. Easy. Sort of.

But you also want this program to handle changes in time zones, right? Well, that shouldn’t be hard. You simply use the geographical coordinates from GPS and determine which time zone you are in. You compile a list of the states according to their time zones. However, time zones don’t always divide neatly along state boundaries. Now you need something even more fine-grained, down to fairly small regions. Also, don’t forget Arizona, a state that for the most part doesn’t use daylight saving time, placing it in its own strange temporal realm.

Do we need our calendar to handle holidays, too? Of course, and since those are straightforwardly defined they should be easy to add. Well, they are, most of the time. Thanksgiving is the fourth Thursday in November and Veterans Day is always November 11. But what about Passover? It seems we now need to make sure the calendar app can integrate with another calendar that is based on lunar information, since Passover starts the evening of the fifteenth day of a lunar month, as defined by the Hebrew calendar. So we need a bit more information to be included than we initially anticipated.

Do you want the calendar to handle other time periods, and be accurate in the past as well as the future? If we go back into the nineteenth century, then individual towns each had their own times, prior to standardized time zones, and this information might need to be hard-coded into our app as well. Similarly, recall that over the course of several centuries there was a flurry of switching from the Julian calendar to the Gregorian calendar across the globe, but the more accurate calendar was adopted unevenly. For example, the reason the Russian October Revolution is commemorated in November is that at the time the revolution occurred, Russia was still using the Julian calendar, and its dates differed by more than a week from parts of the West; the revolution happened at the end of October according to the Julian calendar. If we want to be thorough, we might want all this information in our calendar as well.

We could go on.

Systems we build to reflect the world end up being complicated, because the world itself is complicated. It’s relatively straightforward to handle the vast majority of the complexity through a simple model, such as knowing that the calendar is either 365 days or 366 days, with a simple algorithm to keep track of when it’s which. But if you crave accuracy—whether in making sure you never miss an appointment, or in building a self-driving vehicle that not only won’t get lost but also won’t injure anyone—things suddenly become a good deal more complicated.

These kinds of complications are known as edge cases: the exceptions that nonetheless have to be dealt with, otherwise our technologies will fail. Edge cases range from the problem of the leap year to how to program database software to handle people’s names that have an apostrophe in them. Edge cases are far from common, but they occur often enough that they must be acknowledged and managed properly. But the process of doing so robs our technologies of simplicity and makes them much more complex. This can be seen quite clearly in the ways some of our scientific models—which are a type of technology—have changed over time. We turn to an example from the social sciences: linguistics.

Common Rarities

My middle school English teacher taught me grammar. At a young age, I memorized how to conjugate the irregular verb “to be,” as well as a list of prepositions in the English language, and learned how to diagram sentences. A sentence diagrammed—stripped to its logical skeleton—is magnificent to behold. You reduce language to its atomic features, like nouns, verbs, and adjectives, and show how they all hang together.

Though languages do not adhere to a set of equations, grammar maintains a distinct sort of beauty and order. Nonetheless, building a system to process language is not an easy task. Languages have idiomatic expressions, words whose connotations are often far more slippery than we would like, and an informality that makes grammar a rule set that is more nodded to than obeyed. These complications are all edge cases of a sort, the kind that prevents us from building a simple rule on the assumption that every sentence is some variant of Subject–Verb–Direct Object. We can better understand these edge cases of language by looking at things known as hapax legomena, what we might call common rarities.

Ever used the word “snowcrie”? I doubt it. In fact, “snowcri
e” doesn’t even have a definition. As far as we know, it was a typo of sorts. According to the Oxford English Dictionary, the word occurred in a line in a poem from 1402: “Not in Goddis gospel, but in Sathanas pistile, wher of sorowe and of snowcrie noon is to seken.” Scholars think it might have been an error, likely meant to be “sorcerie.”

Whatever its true nature, “snowcrie” is what is known as a hapax legomenon, a word that only occurs once in a given corpus—a massive, often complete, collection of texts—such as from an entire language or time period. In this case, the corpus consists of everything in English from printed sources available to the dictionary editors. But the body of text doesn’t have to be so large. Within the Shakespearean corpus—all the writings of William Shakespeare—there are numerous hapax words, such as “honorificabilitudinitatibus,” which essentially seems to mean “of honor.”

When a corpus is all (or nearly all) we have of an entire language, such as the Hebrew Bible in the case of biblical Hebrew, hapax words can sometimes be quite vexing, since we might have little idea of their meaning. But hapax legomena aren’t strange statistical flukes or curiosities. Not only are they more common as a category than we might realize, but their existence is related to certain mathematical rules of language. The frequency of words in a language is described by what is known as a power law or, more commonly, a long tail. These types of distributions, unlike the bell curves we are used to for such quantities as human height, have values that extend far out into the upper reaches of the scale, allowing both for exceedingly common words such as “the” and for much rarer words like “flother.”

Often about half of the words in a corpus turn out to have only a single occurrence, making them hapax legomena. They occupy the “long” part of the long tail. While it is rare that you will encounter a specific hapax word, it is likely you will encounter them as a category. To translate this into the world of movies, it’s rare to find someone who has seen The Adventures of Buckaroo Banzai Across the 8th Dimension, but it’s not rare to find someone who has seen at least one weird cult film.

‹ Prev Next ›