The Speed Bump
Considering he spent much of his life thinking about how to most efficiently complete a sequence of tasks, Lawler took an intriguingly circuitous route to his own career. He studied mathematics at Florida State University before beginning graduate work at Harvard in 1954, though he left before finishing a doctorate. After time in law school, the army, and (thematically enough) working in a machine shop, he went back to Harvard in 1958, finishing his PhD and taking a position at the University of Michigan. Visiting Berkeley on sabbatical in 1969, he was arrested at a notorious Vietnam War protest. He became a member of the faculty at Berkeley the following year, and acquired a reputation there for being “the social conscience” of the computer science department. After his death in 1994, the Association for Computing Machinery established an award in Lawler’s name, honoring people who demonstrate the humanitarian potential of computer science.
Lawler’s first investigation into precedence constraints suggested that they could be handled quite easily. For instance, take the Earliest Due Date algorithm that minimizes the maximum lateness of a set of tasks. If your tasks have precedence constraints, that makes things trickier—you can’t just plow forward in order of due date if some tasks can’t be started until others are finished. But in 1968, Lawler proved that this is no trouble as long as you build the schedule back to front: look only at the tasks that no other tasks depend on, and put the one with the latest due date at the end of the schedule. Then simply repeat this process, again considering at each step only those tasks that no other (as-yet unscheduled) tasks depend upon as a prerequisite.
But as Lawler looked more deeply into precedence constraints, he found something curious. The Shortest Processing Time algorithm, as we saw, is the optimal policy if you want to cross off as many items as quickly as possible from your to-do list. But if some of your tasks have precedence constraints, there isn’t a simple or obvious tweak to Shortest Processing Time to adjust for that. Although it looked like an elementary scheduling problem, neither Lawler nor any other researcher seemed to be able to find an efficient way to solve it.
In fact, it was much worse than this. Lawler himself would soon discover that this problem belongs to a class that most computer scientists believe has no efficient solution—it’s what the field calls “intractable.”* Scheduling theory’s first speed bump turned out to be a brick wall.
As we saw with the “triple or nothing” scenario for which optimal stopping theory has no sage words, not every problem that can be formally articulated has an answer. In scheduling, it’s clear by definition that every set of tasks and constraints has some schedule that’s the best, so scheduling problems aren’t unanswerable, per se—but it may simply be the case that there’s no straightforward algorithm that can find you the optimal schedule in a reasonable amount of time.
This led researchers like Lawler and Lenstra to an irresistible question. Just what proportion of scheduling problems was intractable, anyway? Twenty years after scheduling theory was kick-started by Selmer Johnson’s bookbinding paper, the search for individual solutions was about to become something much grander and more ambitious by far: a quest to map the entire landscape of scheduling theory.
What the researchers found was that even the subtlest change to a scheduling problem often tips it over the fine and irregular line between tractable and intractable. For example, Moore’s Algorithm minimizes the number of late tasks (or rotten fruits) when they’re all of equal value—but if some are more important than others, the problem becomes intractable and no algorithm can readily provide the optimal schedule. Likewise, having to wait until a certain time to start some of your tasks makes nearly all of the scheduling problems for which we otherwise have efficient solutions into intractable problems. Not being able to put out the trash until the night before collection might be a reasonable municipal bylaw, but it will send your calendar headlong into intractability.
The drawing of the borders of scheduling theory continues to this day. A recent survey showed that the status of about 7% of all problems is still unknown, scheduling’s terra incognita. Of the 93% of the problems that we do understand, however, the news isn’t great: only 9% can be solved efficiently, and the other 84% have been proven intractable.* In other words, most scheduling problems admit no ready solution. If trying to perfectly manage your calendar feels overwhelming, maybe that’s because it actually is. Nonetheless, the algorithms we have discussed are often the starting point for tackling those hard problems—if not perfectly, then at least as well as can be expected.
Drop Everything: Preemption and Uncertainty
The best time to plant a tree is twenty years ago. The second best time is now.
—PROVERB
So far we have considered only factors that make scheduling harder. But there is one twist that can make it easier: being able to stop one task partway through and switch to another. This property, “preemption,” turns out to change the game dramatically.
Minimizing maximum lateness (for serving customers in a coffee shop) or the sum of completion times (for rapidly shortening your to-do list) both cross the line into intractability if some tasks can’t be started until a particular time. But they return to having efficient solutions once preemption is allowed. In both cases, the classic strategies—Earliest Due Date and Shortest Processing Time, respectively—remain the best, with a fairly straightforward modification. When a task’s starting time comes, compare that task to the one currently under way. If you’re working by Earliest Due Date and the new task is due even sooner than the current one, switch gears; otherwise stay the course. Likewise, if you’re working by Shortest Processing Time, and the new task can be finished faster than the current one, pause to take care of it first; otherwise, continue with what you were doing.
Now, on a good week a machine shop might know everything expected of them in the next few days, but most of us are usually flying blind, at least in part. We might not even be sure, for instance, when we’ll be able to start a particular project (when will so-and-so give me a solid answer on the such-and-such?). And at any moment our phone can ring or an email can pop up with news of a whole new task to add to our agenda.
It turns out, though, that even if you don’t know when tasks will begin, Earliest Due Date and Shortest Processing Time are still optimal strategies, able to guarantee you (on average) the best possible performance in the face of uncertainty. If assignments get tossed on your desk at unpredictable moments, the optimal strategy for minimizing maximum lateness is still the preemptive version of Earliest Due Date—switching to the job that just came up if it’s due sooner than the one you’re currently doing, and otherwise ignoring it. Similarly, the preemptive version of Shortest Processing Time—compare the time left to finish the current task to the time it would take to complete the new one—is still optimal for minimizing the sum of completion times.
In fact, the weighted version of Shortest Processing Time is a pretty good candidate for best general-purpose scheduling strategy in the face of uncertainty. It offers a simple prescription for time management: each time a new piece of work comes in, divide its importance by the amount of time it will take to complete. If that figure is higher than for the task you’re currently doing, switch to the new one; otherwise stick with the current task. This algorithm is the closest thing that scheduling theory has to a skeleton key or Swiss Army knife, the optimal strategy not just for one flavor of problem but for many. Under certain assumptions it minimizes not just the sum of weighted completion times, as we might expect, but also the sum of the weights of the late jobs and the sum of the weighted lateness of those jobs.
Intriguingly, optimizing all these other metrics is intractable if we know the start times and durations of jobs ahead of time. So considering the impact of uncertainty in scheduling reveals something counterintuitive: there are cases where clairvoyance is a burden. Even with complete foreknowledge, finding the perfect schedule might be practically impossible. In co
ntrast, thinking on your feet and reacting as jobs come in won’t give you as perfect a schedule as if you’d seen into the future—but the best you can do is much easier to compute. That’s some consolation. As business writer and coder Jason Fried says, “Feel like you can’t proceed until you have a bulletproof plan in place? Replace ‘plan’ with ‘guess’ and take it easy.” Scheduling theory bears this out.
When the future is foggy, it turns out you don’t need a calendar—just a to-do list.
Preemption Isn’t Free: The Context Switch
The hurrieder I go / The behinder I get
—NEEDLEPOINT SEEN IN BOONVILLE, CA
Programmers don’t talk because they must not be interrupted.… To synchronize with other people (or their representation in telephones, buzzers and doorbells) can only mean interrupting the thought train. Interruptions mean certain bugs. You must not get off the train.
—ELLEN ULLMAN
Scheduling theory thus tells a reasonably encouraging story after all. There are simple, optimal algorithms for solving many scheduling problems, and those problems are tantalizingly close to situations we encounter daily in human lives. But when it comes to actually carrying out single-machine scheduling in the real world, things get complicated.
First of all, people and computer operating systems alike face a curious challenge: the machine that is doing the scheduling and the machine being scheduled are one and the same. Which makes straightening out your to-do list an item on your to-do list—needing, itself, to get prioritized and scheduled.
Second, preemption isn’t free. Every time you switch tasks, you pay a price, known in computer science as a context switch. When a computer processor shifts its attention away from a given program, there’s always a certain amount of necessary overhead. It needs to effectively bookmark its place and put aside all of its information related to that program. Then it needs to figure out which program to run next. Finally it must haul out all the relevant information for that program, find its place in the code, and get in gear.
None of this switching back and forth is “real work”—that is, none of it actually advances the state of any of the various programs the computer is switching between. It’s metawork. Every context switch is wasted time.
Humans clearly have context-switching costs too. We feel them when we move papers on and off our desk, close and open documents on our computer, walk into a room without remembering what had sent us there, or simply say out loud, “Now, where was I?” or “What was I saying?” Psychologists have shown that for us, the effects of switching tasks can include both delays and errors—at the scale of minutes rather than microseconds. To put that figure in perspective, anyone you interrupt more than a few times an hour is in danger of doing no work at all.
Personally, we have found that both programming and writing require keeping in mind the state of the entire system, and thus carry inordinately large context-switching costs. A friend of ours who writes software says that the normal workweek isn’t well suited to his workflow, since for him sixteen-hour days are more than twice as productive as eight-hour days. Brian, for his part, thinks of writing as a kind of blacksmithing, where it takes a while just to heat up the metal before it’s malleable. He finds it somewhat useless to block out anything less than ninety minutes for writing, as nothing much happens in the first half hour except loading a giant block of “Now, where was I?” into his head. Scheduling expert Kirk Pruhs, of the University of Pittsburgh, has had the same experience. “If it’s less than an hour I’ll just do errands instead, because it’ll take me the first thirty-five minutes to really figure out what I want to do and then I might not have time to do it.”
Rudyard Kipling’s celebrated 1910 poem “If—” ends with an exuberant call for time management: “If you can fill the unforgiving minute / With sixty seconds’ worth of distance run…”
If only. The truth is, there’s always overhead—time lost to metawork, to the logistics of bookkeeping and task management. This is one of the fundamental tradeoffs of scheduling. And the more you take on, the more overhead there is. At its nightmarish extreme, this turns into a phenomenon called thrashing.
Thrashing
Gage: Mr. Zuckerberg, do I have your full attention?…
Zuckerberg: You have part of my attention—you have the minimum amount.
—THE SOCIAL NETWORK
Computers multitask through a process called “threading,” which you can think of as being like juggling a set of balls. Just as a juggler only hurls one ball at a time into the air but keeps three aloft, a CPU only works on one program at a time, but by swapping between them quickly enough (on the scale of ten-thousandths of a second) it appears to be playing a movie, navigating the web, and alerting you to incoming email all at once.
In the 1960s, computer scientists began thinking about how to automate the process of sharing computer resources between different tasks and users. It was an exciting time, recounts Peter Denning, now one of the top experts on computer multitasking, who was then working on his doctorate at MIT. Exciting, and uncertain: “How do you partition a main memory among a bunch of jobs that are in there when some of them want to grow and some might want to shrink and they’re going to interact with each other, trying to steal [memory] and all these kinds of things?… How do you manage that whole set of interactions? Nobody knew anything about that.”
Not surprisingly, given that the researchers didn’t really know yet what they were doing, the effort encountered difficulties. And there was one in particular that caught their attention. As Denning explains, under certain conditions a dramatic problem “shows up as you add more jobs to the multiprogramming mix. At some point you pass a critical threshold—unpredictable exactly where it is, but you’ll know it when you get there—and all of a sudden the system seems to die.”
Think again about our image of a juggler. With one ball in the air, there’s enough spare time while that ball is aloft for the juggler to toss some others upward as well. But what if the juggler takes on one more ball than he can handle? He doesn’t drop that ball; he drops everything. The whole system, quite literally, goes down. As Denning puts it, “The presence of one additional program has caused a complete collapse of service.… The sharp difference between the two cases at first defies intuition, which might lead us to expect a gradual degradation of service as new programs are introduced into crowded main memory.” Instead, catastrophe. And while we can understand a human juggler being overwhelmed, what could cause something like this to happen to a machine?
Here scheduling theory intersects caching theory. The whole idea of caches is to keep the “working set” of needed items available for quick access. One way this is done is by keeping the information the computer is currently using in fast memory rather than on the slow hard disk. But if a task requires keeping track of so many things that they won’t all fit into memory, then you might well end up spending more time swapping information in and out of memory than doing the actual work. What’s more, when you switch tasks, the newly active task might make space for its working set by evicting portions of other working sets from memory. The next task, upon reactivation, would then reacquire parts of its working set from the hard disk and muscle them back into memory, again displacing others. This problem—tasks stealing space from each other—can get even worse in systems with hierarchies of caches between the processor and the memory. As Peter Zijlstra, one of the head developers on the Linux operating system scheduler, puts it, “The caches are warm for the current workload, and when you context switch you pretty much invalidate all caches. And that hurts.” At the extreme, a program may run just long enough to swap its needed items into memory, before giving way to another program that runs just long enough to overwrite them in turn.
This is thrashing: a system running full-tilt and accomplishing nothing at all. Denning first diagnosed this phenomenon in a memory-management context, but computer scientists now use the term “thrashing” to refer to pretty much
any situation where the system grinds to a halt because it’s entirely preoccupied with metawork. A thrashing computer’s performance doesn’t bog down gradually. It falls off a cliff. “Real work” has dropped to effectively zero, which also means it’s going to be nearly impossible to get out.
Thrashing is a very recognizable human state. If you’ve ever had a moment where you wanted to stop doing everything just to have the chance to write down everything you were supposed to be doing, but couldn’t spare the time, you’ve thrashed. And the cause is much the same for people as for computers: each task is a draw on our limited cognitive resources. When merely remembering everything we need to be doing occupies our full attention—or prioritizing every task consumes all the time we had to do them—or our train of thought is continually interrupted before those thoughts can translate to action—it feels like panic, like paralysis by way of hyperactivity. It’s thrashing, and computers know it well.
If you’ve ever wrestled with a system in a state of thrashing—and if you’ve ever been in such a state—then you might be curious about the computer science of getting out. In his landmark 1960s paper on the subject, Denning noted that an ounce of prevention is worth a pound of cure. The easiest thing to do is simply to get more memory: enough RAM, for instance, to fit the working sets of all the running programs into memory at once and reduce the time taken by a context switch. But preventive advice for thrashing doesn’t help you when you find yourself in the midst of it. Besides, when it comes to human attention, we’re stuck with what we’ve got.
Another way to avert thrashing before it starts is to learn the art of saying no. Denning advocated, for instance, that a system should simply refuse to add a program to its workload if it didn’t have enough free memory to hold its working set. This prevents thrashing in machines, and is sensible advice for anyone with a full plate. But this, too, might seem like an unattainable luxury to those of us who find ourselves already overloaded—or otherwise unable to throttle the demands being placed on us.
Algorithms to Live By Page 15