Algorithms to Live By

Home > Nonfiction > Algorithms to Live By > Page 8
Algorithms to Live By Page 8

by Brian Christian


  Before Danny Hillis founded the Thinking Machines corporation, before he invented the famous Connection Machine parallel supercomputer, he was an MIT undergraduate, living in the student dormitory, and horrified by his roommate’s socks.

  What horrified Hillis, unlike many a college undergraduate, wasn’t his roommate’s hygiene. It wasn’t that the roommate didn’t wash the socks; he did. The problem was what came next.

  The roommate pulled a sock out of the clean laundry hamper. Next he pulled another sock out at random. If it didn’t match the first one, he tossed it back in. Then he continued this process, pulling out socks one by one and tossing them back until he found a match for the first.

  With just 10 different pairs of socks, following this method will take on average 19 pulls merely to complete the first pair, and 17 more pulls to complete the second. In total, the roommate can expect to go fishing in the hamper 110 times just to pair 20 socks.

  It was enough to make any budding computer scientist request a room transfer.

  Now, just how socks should be sorted is a good way get computer scientists talking at surprising length. A question about socks posted to the programming website Stack Overflow in 2013 prompted some twelve thousand words of debate.

  “Socks confound me!” confessed legendary cryptographer and Turing Award–winning computer scientist Ron Rivest to the two of us when we brought up the topic.

  He was wearing sandals at the time.

  The Ecstasy of Sorting

  Sorting is at the very heart of what computers do. In fact, in many ways it was sorting that brought the computer into being.

  In the late nineteenth century, the American population was growing by 30% every decade, and the number of “subjects of inquiry” in the US Census had gone from just five in 1870 to more than two hundred in 1880. The tabulation of the 1880 census took eight years—just barely finishing by the time the 1890 census began. As a writer at the time put it, it was a wonder “the clerks who toiled at the irritating slips of tally paper … did not go blind and crazy.” The whole enterprise was threatening to collapse under its own weight. Something had to be done.

  Inspired by the punched railway tickets of the time, an inventor by the name of Herman Hollerith devised a system of punched manila cards to store information, and a machine, which he called the Hollerith Machine, to count and sort them. Hollerith was awarded a patent in 1889, and the government adopted the Hollerith Machine for the 1890 census. No one had ever seen anything like it. Wrote one awestruck observer, “The apparatus works as unerringly as the mills of the Gods, but beats them hollow as to speed.” Another, however, reasoned that the invention was of limited use: “As no one will ever use it but governments, the inventor will not likely get very rich.” This prediction, which Hollerith clipped and saved, would not prove entirely correct. Hollerith’s firm merged with several others in 1911 to become the Computing-Tabulating-Recording Company. A few years later it was renamed—to International Business Machines, or IBM.

  Sorting continued to spur the development of the computer through the next century. The first code ever written for a “stored program” computer was a program for efficient sorting. In fact, it was the computer’s ability to outsort IBM’s dedicated card-sorting machines that convinced the US government their enormous financial investment in a general-purpose machine was justified. By the 1960s, one study estimated that more than a quarter of the computing resources of the world were being spent on sorting. And no wonder—sorting is essential to working with almost any kind of information. Whether it’s finding the largest or the smallest, the most common or the rarest, tallying, indexing, flagging duplicates, or just plain looking for the thing you want, they all generally begin under the hood with a sort.

  But sorting is more pervasive, even, than this. After all, one of the main reasons things get sorted is to be shown in useful form to human eyes, which means that sorting is also key to the human experience of information. Sorted lists are so ubiquitous that—like the fish who asks, “What is water?”—we must consciously work to perceive them at all. And then we perceive them everywhere.

  Our email inbox typically displays the top fifty messages of potentially thousands, sorted by time of receipt. When we look for restaurants on Yelp we’re shown the top dozen or so of hundreds, sorted by proximity or by rating. A blog shows a cropped list of articles, sorted by date. The Facebook news feed, Twitter stream, and Reddit homepage all present themselves as lists, sorted by some proprietary measure. We refer to things like Google and Bing as “search engines,” but that is something of a misnomer: they’re really sort engines. What makes Google so dominant as a means of accessing the world’s information is less that it finds our text within hundreds of millions of webpages—its 1990s competitors could generally do that part well enough—but that it sorts those webpages so well, and only shows us the most relevant ten.

  The truncated top of an immense, sorted list is in many ways the universal user interface.

  Computer science gives us a way to understand what’s going on behind the scenes in all of these cases, which in turn can offer us some insight for those times when we are the one stuck making order—with our bills, our papers, our books, our socks, probably more times each day than we realize. By quantifying the vice (and the virtue) of mess, it also shows us the cases where we actually shouldn’t make order at all.

  What’s more, when we begin looking, we see that sorting isn’t just something we do with information. It’s something we do with people. Perhaps the place where the computer science of establishing rank is most unexpectedly useful is on the sporting field and in the boxing ring—which is why knowing a little about sorting might help explain how human beings are able to live together while only occasionally coming to blows. That is to say, sorting offers some surprising clues about the nature of society—that other, larger, and more important kind of order that we make.

  The Agony of Sorting

  “To lower costs per unit of output, people usually increase the size of their operations,” wrote J. C. Hosken in 1955, in the first scientific article published on sorting. This is the economy of scale familiar to any business student. But with sorting, size is a recipe for disaster: perversely, as a sort grows larger, “the unit cost of sorting, instead of falling, rises.” Sorting involves steep diseconomies of scale, violating our normal intuitions about the virtues of doing things in bulk. Cooking for two is typically no harder than cooking for one, and it’s certainly easier than cooking for one person twice. But sorting, say, a shelf of a hundred books will take you longer than sorting two bookshelves of fifty apiece: you have twice as many things to organize, and there are twice as many places each of them could go. The more you take on, the worse it gets.

  This is the first and most fundamental insight of sorting theory. Scale hurts.

  From this we might infer that minimizing our pain and suffering when it comes to sorting is all about minimizing the number of things we have to sort. It’s true: one of the best preventives against the computational difficulty of sock sorting is just doing your laundry more often. Doing laundry three times as frequently, say, could reduce your sorting overhead by a factor of nine. Indeed, if Hillis’s roommate stuck with his peculiar procedure but went thirteen days between washes instead of fourteen, that alone would save him twenty-eight pulls from the hamper. (And going just a single day longer between washes would cost him thirty pulls more.)

  Even at such a modest, fortnightly scope we can see the scale of sorting beginning to grow untenable. Computers, though, must routinely sort millions of items in a single go. For that, as the line from Jaws puts it, we’re going to need a bigger boat—and a better algorithm.

  But to answer the question of just how we ought to be sorting, and which methods come out on top, we need to figure out something else first: how we’re going to keep score.

  Big-O: A Yardstick for the Worst Case

  The Guinness Book of World Records attributes the re
cord for sorting a deck of cards to the Czech magician Zdeněk Bradáč. On May 15, 2008, Bradáč sorted a 52-card deck in just 36.16 seconds.* How did he do it? What sorting technique delivered him the title? Though the answer would shed interesting light on sorting theory, Bradáč declined to comment.

  While we have nothing but respect for Bradáč’s skill and dexterity, we are 100% certain of the following: we can personally break his record. In fact, we are 100% certain that we can attain an unbreakable record. All we need are about 80,658,175,170,943,878,571,660,636,856,403,766,975,289,505,440,883,277,824,000,000,000,000 attempts at the title. This number, a bit over 80 unvigintillion, is 52 factorial, or “52!” in mathematical notation—the number of ways that a deck of 52 cards can possibly be ordered. By taking roughly that many attempts, sooner or later we are bound to start with a shuffled deck that is in fact completely sorted by chance. At that point we can proudly enter Christian-Griffiths into The Guinness Book alongside a not-too-shabby sort time of 0m00s.

  To be fair, we’d almost certainly be trying until the heat death of the universe before we got our perfect record attempt. Nonetheless, this highlights the biggest fundamental difference between record keepers and computer scientists. The fine folks at Guinness care only about best-case performance (and beer). They’re hardly blameworthy, of course: all records in sports reflect the single best performance. Computer science, however, almost never cares about the best case. Instead, computer scientists might want to know the average sort time of someone like Bradáč: get him to sort all of the 80 unvigintillion deck orders, or a reasonably sized sample, and score him on his average speed across all attempts. (You can see why they don’t let computer scientists run these things.)

  Moreover, a computer scientist would want to know the worst sort time. Worst-case analysis lets us make hard guarantees: that a critical process will finish in time, that deadlines won’t be blown. So for the rest of this chapter—and the rest of this book, actually—we will be discussing only algorithms’ worst-case performance, unless noted otherwise.

  Computer science has developed a shorthand specifically for measuring algorithmic worst-case scenarios: it’s called “Big-O” notation. Big-O notation has a particular quirk, which is that it’s inexact by design. That is, rather than expressing an algorithm’s performance in minutes and seconds, Big-O notation provides a way to talk about the kind of relationship that holds between the size of the problem and the program’s running time. Because Big-O notation deliberately sheds fine details, what emerges is a schema for dividing problems into different broad classes.

  Imagine you’re hosting a dinner party with n guests. The time required to clean the house for their arrival doesn’t depend on the number of guests at all. This is the rosiest class of problems there is: called “Big-O of one,” written O(1), it is also known as “constant time.” Notably, Big-O notation doesn’t care a whit how long the cleaning actually takes—just that it’s always the same, totally invariant of the guest list. You’ve got the same work to do if you have one guest as if you have ten, a hundred, or any other n.

  Now, the time required to pass the roast around the table will be “Big-O of n,” written O(n), also known as “linear time”—with twice the guests, you’ll wait twice as long for the dish to come around. And again, Big-O notation couldn’t care less about the number of courses that get served, or whether they go around for second helpings. In each case, the time still depends linearly on the guest list size—if you drew a graph of the number of guests vs. the time taken, it would be a straight line. What’s more, the existence of any linear-time factors will, in Big-O notation, swamp all constant-time factors. That is to say, passing the roast once around the table, or remodeling your dining room for three months and then passing the roast once around the table, are both, to a computer scientist, effectively equivalent. If that seems crazy, remember that computers deal with values of n that could easily be in the thousands, millions, or billions. In other words, computer scientists are thinking about very, very big parties. With a guest list in the millions, passing the roast once around would indeed make the home remodel seem dwarfed to the point of insignificance.

  What if, as the guests arrived, each one hugged the others in greeting? Your first guest hugs you; your second guest has two hugs to give; your third guest, three. How many hugs will there be in total? This turns out to be “Big-O of n-squared,” written O(n2) and also known as “quadratic time.” Here again, we only care about the basic contours of the relationship between n and time. There’s no O(2n2) for two hugs apiece, or O(n2 + n) for hugs plus passing the food around, or O(n2 + 1) for hugs plus home cleaning. It’s all quadratic time, so O(n2) covers everything.

  Constant time, written O(1); linear time, written O(n); and quadratic time, written O(n2).

  It gets worse from there. There’s “exponential time,” O(2n), where each additional guest doubles your work. Even worse is “factorial time,” O(n!), a class of problems so truly hellish that computer scientists only talk about it when they’re joking—as we were in imagining shuffling a deck until it’s sorted—or when they really, really wish they were.

  The Squares: Bubble Sort and Insertion Sort

  When then senator Obama visited Google in 2007, CEO Eric Schmidt jokingly began the Q&A like a job interview, asking him, “What’s the best way to sort a million thirty-two-bit integers?” Without missing a beat, Obama cracked a wry smile and replied, “I think the Bubble Sort would be the wrong way to go.” The crowd of Google engineers erupted in cheers. “He had me at Bubble Sort,” one later recalled.

  Obama was right to eschew Bubble Sort, an algorithm which has become something of a punching bag for computer science students: it’s simple, it’s intuitive, and it’s extremely inefficient.

  Imagine you want to alphabetize your unsorted collection of books. A natural approach would be just to scan across the shelf looking for out-of-order pairs—Wallace followed by Pynchon, for instance—and flipping them around. Put Pynchon ahead of Wallace, then continue your scan, looping around to the beginning of the shelf each time you reach the end. When you make a complete pass without finding any more out-of-order pairs on the entire shelf, then you know the job is done.

  This is Bubble Sort, and it lands us in quadratic time. There are n books out of order, and each scan through the shelf can move each one at most one position. (We spot a tiny problem, make a tiny fix.) So in the worst case, where the shelf is perfectly backward, at least one book will need to be moved n positions. Thus a maximum of n passes through n books, which gives us O(n2) in the worst case.* It’s not terrible—for one thing, it’s worlds better than our O(n!) shuffle-till-it’s-sorted idea from earlier (in case you needed computer science to confirm that). But all the same, that squared term can get daunting quickly. For instance, it means that sorting five shelves of books will take not five times as long as sorting a single shelf, but twenty-five times as long.

  You might take a different tack—pulling all the books off the shelf and putting them back in place one by one. You’d put the first book in the middle of the shelf, then take the second and compare it to the first, inserting it either to the right or to the left. Picking up the third book, you’d run through the books on the shelf from left to right until you found the right spot to tuck it in. Repeating this process, gradually all of the books would end up sorted on the shelf and you’d be done.

  Computer scientists call this, appropriately enough, Insertion Sort. The good news is that it’s arguably even more intuitive than Bubble Sort and doesn’t have quite the bad reputation. The bad news is that it’s not actually that much faster. You still have to do one insertion for each book. And each insertion still involves moving past about half the books on the shelf, on average, to find the correct place. Although in practice Insertion Sort does run a bit faster than Bubble Sort, again we land squarely, if you will, in quadratic time. Sorting anything more than a single bookshelf is still an unwieldy prospect.

 
Breaking the Quadratic Barrier: Divide and Conquer

  At this point, having seen two entirely sensible approaches fall into unsustainable quadratic time, it’s natural to wonder whether faster sorting is even possible.

  The question sounds like it’s about productivity. But talk to a computer scientist and it turns out to be closer to metaphysics—akin to thinking about the speed of light, time travel, superconductors, or thermodynamic entropy. What are the universe’s fundamental rules and limits? What is possible? What is allowed? In this way computer scientists are glimpsing God’s blueprints every bit as much as the particle physicists and cosmologists. What is the minimum effort requred to make order?

  Could we find a constant-time sort, O(1), one that (like cleaning the house before the bevy of guests arrive) can sort a list of any size in the same amount of time? Well, even just confirming that a shelf of n books is sorted cannot be done in constant time, since it requires checking all n of them. So actually sorting the books in constant time seems out of the question.

  What about a linear-time sort, O(n), as efficient as passing a dish around a table, where doubling the number of items to sort merely doubles the work? Thinking about the examples above, it’s tough to imagine how that might work either. The n2 in each case comes from the fact that you need to move n books, and the work required in each move scales with n as well. How would we get from n moves of size n down to just n by itself? In Bubble Sort, our O(n2) running time came from handling each of the n books and moving them as many as n places each. In Insertion Sort, quadratic running time came from handling each of the n books and comparing them to as many as n others before inserting them. A linear-time sort means handling each book for constant time regardless of how many others it needs to find its place among. Doesn’t seem likely.

  So we know that we can do at least as well as quadratic time, but probably not as well as linear time. Perhaps our limit lies somewhere between linear time and quadratic time. Are there any algorithms between linear and quadratic, between n and n × n?

 

‹ Prev