In these cases there’s clearly no way to work any harder, but you can work … dumber. Along with considerations of memory, one of the biggest sources of metawork in switching contexts is the very act of choosing what to do next. This, too, can at times swamp the actual doing of the work. Faced with, say, an overflowing inbox of n messages, we know from sorting theory that repeatedly scanning it for the most important one to answer next will take O(n2) operations—n scans of n messages apiece. This means that waking up to an inbox that’s three times as full as usual could take you nine times as long to process. What’s more, scanning through those emails means swapping every message into your mind, one after another, before you respond to any of them: a surefire recipe for memory thrashing.
In a thrashing state, you’re making essentially no progress, so even doing tasks in the wrong order is better than doing nothing at all. Instead of answering the most important emails first—which requires an assessment of the whole picture that may take longer than the work itself—maybe you should sidestep that quadratic-time quicksand by just answering the emails in random order, or in whatever order they happen to appear on-screen. Thinking along the same lines, the Linux core team, several years ago, replaced their scheduler with one that was less “smart” about calculating process priorities but more than made up for it by taking less time to calculate them.
If you still want to maintain your priorities, though, there’s a different and even more interesting bargain you can strike to get your productivity back.
Interrupt Coalescing
Part of what makes real-time scheduling so complex and interesting is that it is fundamentally a negotiation between two principles that aren’t fully compatible. These two principles are called responsiveness and throughput: how quickly you can respond to things, and how much you can get done overall. Anyone who’s ever worked in an office environment can readily appreciate the tension between these two metrics. It’s part of the reason there are people whose job it is to answer the phone: they are responsive so that others may have throughput.
Again, life is harder when—like a computer—you must make the responsiveness/throughput tradeoff yourself. And the best strategy for getting things done might be, paradoxically, to slow down.
Operating system schedulers typically define a “period” in which every program is guaranteed to run at least a little bit, with the system giving a “slice” of that period to each program. The more programs are running, the smaller those slices become, and the more context switches are happening every period, maintaining responsiveness at the cost of throughput. Left unchecked, however, this policy of guaranteeing each process at least some attention every period could lead to catastrophe. With enough programs running, a task’s slice would shrink to the point that the system was spending the entire slice on context switching, before immediately context-switching again to the next task.
The culprit is the hard responsiveness guarantee. So modern operating systems in fact set a minimum length for their slices and will refuse to subdivide the period any more finely. (In Linux, for instance, this minimum useful slice turns out to be about three-quarters of a millisecond, but in humans it might realistically be at least several minutes.) If more processes are added beyond that point, the period will simply get longer. This means that processes will have to wait longer to get their turn, but the turns they get will at least be long enough to do something.
Establishing a minimum amount of time to spend on any one task helps to prevent a commitment to responsiveness from obliterating throughput entirely: if the minimum slice is longer than the time it takes to context-switch, then the system can never get into a state where context switching is the only thing it’s doing. It’s also a principle that is easy to translate into a recommendation for human lives. Methods such as “timeboxing” or “pomodoros,” where you literally set a kitchen timer and commit to doing a single task until it runs out, are one embodiment of this idea.
But what slice size should you aim for? Faced with the question of how long to wait between intervals of performing a recurring task, like checking your email, the answer from the perspective of throughput is simple: as long as possible. But that’s not the end of the story; higher throughput, after all, also means lower responsiveness.
For your computer, the annoying interruption that it has to check on regularly isn’t email—it’s you. You might not move the mouse for minutes or hours, but when you do, you expect to see the pointer on the screen move immediately, which means the machine expends a lot of effort simply checking in on you. The more frequently it checks on the mouse and keyboard, the quicker it can react when there is input, but the more context switches it has to do. So the rule that computer operating systems follow when deciding how long they can afford to dedicate themselves to some task is simple: as long as possible without seeming jittery or slow to the user.
When we humans leave the house to run a quick errand, we might say something like, “You won’t even notice I’m gone.” When our machines context-switch into a computation, they must literally return to us before we notice they’re gone. To find this balancing point, operating systems programmers have turned to psychology, mining papers in psychophysics for the exact number of milliseconds of delay it takes for a human brain to register lag or flicker. There is no point in attending to the user any more often than that.
Thanks to these efforts, when operating systems are working right you don’t even notice how hard your computer is exerting itself. You continue to be able to move your mouse around the screen fluidly even when your processor is hauling full-tilt. The fluidity is costing you some throughput, but that’s a design tradeoff that has been explicitly made by the system engineers: your system spends as much time as it possibly can away from interacting with you, then gets around to redrawing the mouse just in time.
And again, this is a principle that can be transferred to human lives. The moral is that you should try to stay on a single task as long as possible without decreasing your responsiveness below the minimum acceptable limit. Decide how responsive you need to be—and then, if you want to get things done, be no more responsive than that.
If you find yourself doing a lot of context switching because you’re tackling a heterogeneous collection of short tasks, you can also employ another idea from computer science: “interrupt coalescing.” If you have five credit card bills, for instance, don’t pay them as they arrive; take care of them all in one go when the fifth bill comes. As long as your bills are never due less than thirty-one days after they arrive, you can designate, say, the first of each month as “bill-paying day,” and sit down at that point to process every bill on your desk, no matter whether it came three weeks or three hours ago. Likewise, if none of your email correspondents require you to respond in less than twenty-four hours, you can limit yourself to checking your messages once a day. Computers themselves do something like this: they wait until some fixed interval and check everything, instead of context-switching to handle separate, uncoordinated interrupts from their various subcomponents.*
On occasion, computer scientists notice the absence of interrupt coalescing in their own lives. Says Google director of research Peter Norvig: “I had to go downtown three times today to run errands, and I said, ‘Oh, well, that’s just a one-line bug in your algorithm. You should have just waited, or added it to the to-do queue, rather than executing them sequentially as they got added one at a time.’”
At human scale, we get interrupt coalescing for free from the postal system, just as a consequence of their delivery cycle. Because mail gets delivered only once a day, something mailed only a few minutes late might take an extra twenty-four hours to reach you. Considering the costs of context switching, the silver lining to this should by now be obvious: you can only get interrupted by bills and letters at most once a day. What’s more, the twenty-four-hour postal rhythm demands minimal responsiveness from you: it doesn’t make any difference whether you mail your reply five minu
tes or five hours after receiving a letter.
In academia, holding office hours is a way of coalescing interruptions from students. And in the private sector, interrupt coalescing offers a redemptive view of one of the most maligned office rituals: the weekly meeting. Whatever their drawbacks, regularly scheduled meetings are one of our best defenses against the spontaneous interruption and the unplanned context switch.
Perhaps the patron saint of the minimal-context-switching lifestyle is the legendary programmer Donald Knuth. “I do one thing at a time,” he says. “This is what computer scientists call batch processing—the alternative is swapping in and out. I don’t swap in and out.” Knuth isn’t kidding. On January 1, 2014, he embarked on “The TeX Tuneup of 2014,” in which he fixed all of the bugs that had been reported in his TeX typesetting software over the previous six years. His report ends with the cheery sign-off “Stay tuned for The TeX Tuneup of 2021!” Likewise, Knuth has not had an email address since 1990. “Email is a wonderful thing for people whose role in life is to be on top of things. But not for me; my role is to be on the bottom of things. What I do takes long hours of studying and uninterruptible concentration.” He reviews all his postal mail every three months, and all his faxes every six.
But one does not need to take things to Knuth’s extreme to wish that more of our lives used interrupt coalescing as a design principle. The post office gives it to us almost by accident; elsewhere, we need to build it, or demand it, for ourselves. Our beeping and buzzing devices have “Do Not Disturb” modes, which we could manually toggle on and off throughout the day, but that is too blunt an instrument. Instead, we might agitate for settings that would provide an explicit option for interrupt coalescing—the same thing at a human timescale that the devices are doing internally. Alert me only once every ten minutes, say; then tell me everything.
*Ironically enough, Pathfinder software team leader Glenn Reeves would blame the bug on “deadline pressures,” and on the fact that fixing this particular issue during development had been deemed a “lower priority.” So the root cause, in a sense, mirrored the problem itself.
*We will discuss “intractable” problems in more detail in chapter 8.
*Things aren’t quite as bad as this number might make them seem, though, since it includes scheduling problems involving multiple machines—which is more like managing a group of employees than managing your calendar.
*Given that many computers tend to brashly pop up error messages and cursor-stealing dialogue boxes whenever they want something from us, their behavior is somewhat hypocritical. The user interface demands the user’s attention in a way that the CPU itself would rarely tolerate.
6 Bayes’s Rule
Predicting the Future
All human knowledge is uncertain, inexact, and partial.
—BERTRAND RUSSELL
The sun’ll come out tomorrow. You can bet your bottom dollar there’ll be sun.
—ANNIE
In 1969, before embarking on a doctorate in astrophysics at Princeton, J. Richard Gott III took a trip to Europe. There he saw the Berlin Wall, which had been built eight years earlier. Standing in the shadow of the wall, a stark symbol of the Cold War, he began to wonder how much longer it would continue to divide the East and West.
On the face of it, there’s something absurd about trying to make this kind of prediction. Even setting aside the impossibility of forecasting geopolitics, the question seems mathematically laughable: it’s trying to make a prediction from a single data point.
But as ridiculous as this might seem on its face, we make such predictions all the time, by necessity. You arrive at a bus stop in a foreign city and learn, perhaps, that the other tourist standing there has been waiting seven minutes. When is the next bus likely to arrive? Is it worthwhile to wait—and if so, how long should you do so before giving up?
Or perhaps a friend of yours has been dating somebody for a month and wants your advice: is it too soon to invite them along to an upcoming family wedding? The relationship is off to a good start—but how far ahead is it safe to make plans?
A famous presentation made by Peter Norvig, Google’s director of research, carried the title “The Unreasonable Effectiveness of Data” and enthused about “how billions of trivial data points can lead to understanding.” The media constantly tell us that we’re living in an “age of big data,” when computers can sift through those billions of data points and find patterns invisible to the naked eye. But often the problems most germane to daily human life are at the opposite extreme. Our days are full of “small data.” In fact, like Gott standing at the Berlin Wall, we often have to make an inference from the smallest amount of data we could possibly have: a single observation.
So how do we do it? And how should we?
The story begins in eighteenth-century England, in a domain of inquiry irresistible to great mathematical minds of the time, even those of the clergy: gambling.
Reasoning Backward with the Reverend Bayes
If we be, therefore, engaged by arguments to put trust in past experience, and make it the standard of our future judgement, these arguments must be probable only.
—DAVID HUME
More than 250 years ago, the question of making predictions from small data weighed heavily on the mind of the Reverend Thomas Bayes, a Presbyterian minister in the charming spa town of Tunbridge Wells, England.
If we buy ten tickets for a new and unfamiliar raffle, Bayes imagined, and five of them win prizes, then it seems relatively easy to estimate the raffle’s chances of a win: 5/10, or 50%. But what if instead we buy a single ticket and it wins a prize? Do we really imagine the probability of winning to be 1/1, or 100%? That seems too optimistic. Is it? And if so, by how much? What should we actually guess?
For somebody who has had such an impact on the history of reasoning under uncertainty, Bayes’s own history remains ironically uncertain. He was born in 1701, or perhaps 1702, in the English county of Hertfordshire, or maybe it was London. And in either 1746, ’47, ’48, or ’49 he would write one of the most influential papers in all of mathematics, abandon it unpublished, and move on to other things.
Between those two events we have a bit more certainty. The son of a minister, Bayes went to the University of Edinburgh to study theology, and was ordained like his father. He had mathematical as well as theological interests, and in 1736 he wrote an impassioned defense of Newton’s newfangled “calculus” in response to an attack by Bishop George Berkeley. This work resulted in his election in 1742 as a Fellow of the Royal Society, to whom he was recommended as “a Gentleman … well skilled in Geometry and all parts of Mathematical and Philosophical Learning.”
After Bayes died in 1761, his friend Richard Price was asked to review his mathematical papers to see if they contained any publishable material. Price came upon one essay in particular that excited him—one he said “has great merit, and deserves to be preserved.” The essay concerned exactly the kind of raffle problem under discussion:
Let us then imagine a person present at the drawing of a lottery, who knows nothing of its scheme or of the proportion of Blanks to Prizes in it. Let it further be supposed, that he is obliged to infer this from the number of blanks he hears drawn compared with the number of prizes; and that it is enquired what conclusions in these circumstances he may reasonably make.
Bayes’s critical insight was that trying to use the winning and losing tickets we see to figure out the overall ticket pool that they came from is essentially reasoning backward. And to do that, he argued, we need to first reason forward from hypotheticals. In other words, we need to first determine how probable it is that we would have drawn the tickets we did if various scenarios were true. This probability—known to modern statisticians as the “likelihood”—gives us the information we need to solve the problem.
For instance, imagine we bought three tickets and all three were winners. Now, if the raffle was of the particularly generous sort where all the tickets are winner
s, then our three-for-three experience would of course happen all of the time; it has a 100% chance in that scenario. If, instead, only half of the raffle’s tickets were winners, our three-for-three experience would happen 1⁄2 × 1⁄2 × 1⁄2 of the time, or in other words 1⁄8 of the time. And if the raffle rewarded only one ticket in a thousand, our outcome would have been incredibly unlikely: 1⁄1,000 × 1⁄1,000 × 1⁄1,000, or one in a billion.
Bayes argued that we should accordingly judge it to be more probable that all the raffle tickets are winners than that half of them are, and in turn more probable that half of them are than that only one in a thousand is. Perhaps we had already intuited as much, but Bayes’s logic offers us the ability to quantify that intuition. All things being equal, we should imagine it to be exactly eight times likelier that all the tickets are winners than that half of them are—because the tickets we drew are exactly eight times likelier (100% versus one-in-eight) in that scenario. Likewise, it’s exactly 125 million times likelier that half the raffle tickets are winners than that there’s only one winning ticket per thousand, which we know by comparing one-in-eight to one-in-a-billion.
This is the crux of Bayes’s argument. Reasoning forward from hypothetical pasts lays the foundation for us to then work backward to the most probable one.
It was an ingenious and innovative approach, but it did not manage to provide a full answer to the raffle problem. In presenting Bayes’s results to the Royal Society, Price was able to establish that if you buy a single raffle ticket and it’s a winner, then there’s a 75% chance that at least half the tickets are winners. But thinking about the probabilities of probabilities can get a bit head-spinning. What’s more, if someone pressed us, “Well, fine, but what do you think the raffle odds actually are?” we still wouldn’t know what to say.
Algorithms to Live By Page 16