by Martin Erwig
Figure 6.5 Mergesort splits a list into two sublists of equal size, sorts them, and merges the sorted results into one sorted list. The parentheses indicate the order in which lists must be merged. In line 4 the decomposition is complete when only single-element lists are obtained. In line 5 three pairs of single-element lists have been merged into three sorted two-element lists, and in line 6 those lists are merged again into one four-element and one three-element list, which are merged in the last step to produce the final result.
On the face of it, mergesort looks more complicated than quicksort, but this may be due to the fact that some steps were skipped in the description of quicksort. But still it seems that the repeated merging of longer and longer lists is inefficient. However, this intuition is deceptive. Since the decomposition is systematic and halves the size of lists in every step, the overall runtime performance is quite good: in the worst case the runtime of mergesort is linearithmic. This can be seen as follows. First, since we always split lists in half, the number of times lists need to be split is logarithmic. Second, the merging on each level takes only linear time because we have to process each element once (see figure 6.5). Finally, since merging happens on each level once, we altogether obtain linearithmic runtime.
Mergesort bears some similarity to quicksort. In particular, both algorithms have a phase for splitting lists, followed by a recursive sorting step for each of the smaller lists, and finally a phase for combining the sorted sublists into longer sorted lists. In fact, both quicksort and mergesort are examples of divide-and-conquer algorithms, which all are instances of the following general schema:
If the problem is trivial, solve it directly.
Otherwise
(1) Decompose the problem into subproblems.
(2) Solve the subproblems.
(3) Combine the solutions for the subproblems into a solution for the problem.
The case for solving a nontrivial problem shows how divide-and-conquer works. The first step is the divide step; it reduces the problem’s complexity. The second step is the recursive application of the method to the subproblems. If the subproblems thus obtained are small enough, they can be solved directly. Otherwise, they will be decomposed further until they are small enough to be solved directly. The third step is the merge step; it is assembling a solution for the problem from the solutions of the subproblems.
Quicksort does most of its work in the divide step, where all the element comparisons happen: by ensuring that the elements in the two lists are separated by the pivot, it allows the merge step to be a simple concatenation of lists. In contrast, mergesort’s divide step is very simple and does not contain any element comparisons. Most of mergesort’s work happens in the merge step, where the sorted lists are combined, similar to a zipper.
The Quest Is Over: No Better Sorting Algorithm, Ever
Mergesort is the most efficient sorting method discussed so far, but is it possible to sort even faster? The answer is yes and no. Although we cannot sort faster in general, we can do better under certain assumptions about the elements to be sorted. For example, if we know the list to be sorted contains only numbers between 1 and 100, we can create an array of 100 cells, one for each potential number in the list. This approach is similar to bucket sort, where there is one pile for each letter of the alphabet. Here each array cell corresponds to a pile that contains a particular number. The array cells are indexed from 1 to 100. We use each cell with index i to count how often i occurs in the list. First, we store 0 in each array cell, since we don’t know which numbers are in the list to be sorted. Then we traverse the list, and for each encountered element i, we increment the number stored in the array cell with index i. Finally, we traverse the array in order of increasing indices and put each index into the result list as many times as the cell’s counter indicates. For example, if the list to be sorted is 4→2→5→4→2→6, we end up with the following array after traversing the list:
By traversing the array we find that 1 does not occur in the list, since its counter is still 0. A 1 will therefore not be part of the sorted result list. In contrast, the 2 occurs twice and will therefore be put twice into the list, and so on. The resulting list will be 2→2→4→4→5→6. This method is called counting sort, since the array maintains a counter for how often an element occurs in the list to be sorted. The runtime of counting sort results from the combined cost of traversing the list and array. Since either step is linear in the size of the respective data structure (list and array), counting sort runs in linear time in the size of the list or array, whichever is larger.
The downside of counting sort is that it can waste a lot of space. In the example all cells with index 7 through 100 are never used. Moreover, it only works if the elements can be used to index an array and if the range of the elements of the list is known and is not too large. For example, we cannot sort a list of names using counting sort because names are sequences of characters and cannot be used to index arrays.5 There are other specialized sorting algorithms for lists of strings. For example, the trie data structure (see chapter 5) can be used to sort lists of strings, but those methods make other assumptions about the elements to be sorted.
As Good as It Gets
Without exploiting special properties of the data, we cannot sort faster than mergesort. While this fact may seem disappointing at first, it is also good news because it gives us the assurance that we have found the fastest algorithm possible. In other words, mergesort is the best possible solution for the problem of sorting. It is therefore an optimal algorithm. Computer science researchers can consider the problem solved and spend their time and energy on solving other problems.
The optimality of mergesort rests on two related but distinct facts. Mergesort has linearithmic runtime, and any sorting algorithm for the general case must have at least linearithmic runtime. It is this second part that justifies the conclusion about mergesort’s optimality and, for that matter, about the optimality of any other sorting algorithm that has linearithmic runtime in the worst case. The ultimate goal in designing algorithms and data structures is to find an optimal algorithm for a problem, that is, an algorithm whose worst-case runtime complexity matches the intrinsic complexity of the problem it solves. Any such algorithm could be considered the Holy Grail for that problem. And like Indiana Jones in The Last Crusade, John von Neumann found in mergesort the Holy Grail of sorting.6
It is important to distinguish between the runtime of an algorithm and the complexity of a problem. The latter says that a correct solution must take at least that many steps. In contrast, the runtime of an algorithm says that the particular algorithm takes at most that many steps. A statement about the minimal complexity of a problem is called the problem’s lower bound. A lower bound provides an estimate of how much work a problem will need at minimum and thus characterizes the problem’s inherent complexity. It is similar to the geometric distance between two points, which is a lower bound for any path connecting the points. Any such path may be longer than the distance because of obstacles, but it cannot be shorter than the distance. The possible disappointment one may feel about the lower bound of sorting and the related limitation of sorting algorithms should be offset by the deep insight that this knowledge provides about the problem of sorting. Compare this with similar results from other disciplines. For example, in physics we know that we cannot travel faster than light and that we cannot create energy out of nothing.
But how can we be so certain about the lower bound of sorting? Maybe there is an algorithm no one has thought of yet that actually does run faster than linearithmic time. Proving a negative is not easy, since it requires us to show that any algorithm, existing or yet to be invented, has to perform a certain minimum number of steps. The argument for the lower bound of sorting counts the number of possible lists of a specific length7 and shows that the number of comparisons required to identify the sorted list is linearithmic.8 Since every algorithm needs to perform this minimum number of comparisons, it follows th
at any algorithm requires at least a linearithmic number of steps, which proves the lower bound.
The reasoning about the runtime of algorithms and lower bounds makes assumptions about the capabilities of the computer that is executing algorithms. For example, a typical assumption is that the steps of an algorithm are carried out in sequence and that it takes one time unit to perform one computational step. These assumptions also underlie the analysis of the sorting algorithms discussed in this chapter. If we assume, however, that we can perform comparisons in parallel, the analysis changes, and we obtain different results for runtimes and lower bounds.
Computation Preservation
It seems that Indiana Jones is pretty organized about going on an adventure: he always packs his hat and whip into his travel bag. But instead of making a comprehensive plan of all the steps, he could also approach an adventure by determining his next step just right before it is necessary to do so. Both approaches, planning ahead and living in the moment, have their respective advantages and disadvantages. A trip to the North Pole requires different clothing and equipment than one to the Amazon. Here planning ahead seems like a good idea. On the other hand, changing circumstances can make prior plans obsolete and render the planning effort useless. In particular, during an adventure the unexpected may happen, which often requires taking different actions than anticipated.
If Indiana Jones’s strategy for finding the Lost Ark were to determine the next action whenever it is needed, he would essentially be performing a selection sort and would always search for the smallest element among the actions left to be performed, because smallest here means “has to come before all other actions.” As discussed, this is not very efficient, since selection sort is a quadratic algorithm. Indiana can do significantly better if he creates a plan in advance by employing a linearithmic sorting algorithm such as mergesort. This strategy of computing information ahead of time is called precomputation. In the case of the plan for locating the Lost Ark the precomputed information is not the set of individual steps but the arrangement of the steps into the correct order.
The ordering information is kept in a sorted list. The crucial aspect of precomputation is that computational effort is expended at one time and the computed result is used at a later time. The precomputed result is preserved in a data structure—in this case a sorted list. This sorted list acts like a computational battery that can be charged through sorting. The runtime spent by an algorithm to establish the sorted list amounts to the energy used to charge the data structure battery. This energy can be used to power computation, such as finding the next smallest element. Here, to power means to speed up: without the sorted-list battery, it takes linear time to find the next smallest element; with the battery, it only takes constant time.
Some ways are more efficient than others in charging data structure batteries, which is reflected in the different runtimes of the different sorting algorithms. For example, insertion sort is less efficient than mergesort, and the fact that mergesort is an optimal sorting algorithm means that a most efficient method exists for charging the sorted-list battery. Unlike electrical batteries, whose energy can be spent only once, data structure batteries have the nice property that, once charged, they can be discharged repeatedly without ever having to recharge them. This feature is an important point in favor of precomputing. Situations in which one can expect to make use of a data structure repeatedly provide a strong incentive to expend the precomputing effort because the cost can be amortized over several uses. On the other hand, a data structure battery must be fully charged to work properly. An almost sorted list is not sufficient because it does not guarantee that the smallest element will be at the front and can therefore produce incorrect results. Essentially, this means that a data structure battery has two states: it is either completely charged and useful, or not.
Precomputation seems like a great idea—like the squirrel that gathers nuts to save them for the winter. There are many situations, however, when it’s not clear whether the precomputation effort will pay off. We know many circumstances when acting early can be beneficial, but this isn’t guaranteed and could actually turn out to be a disadvantage. If you buy a plane ticket early or book a hotel room at a nonreimbursable hotel rate, you may get a good deal, but if you fall ill and can’t make the trip, you can lose your money and do worse than you would have by waiting with the purchase.
In cases like these, the value of acting early, or precomputating, is called into question by uncertainty about the future. Since the benefit of precomputation depends on a specific outcome of future events, it reflects a rather optimistic computation attitude with a confident outlook on the future. But what if you’re skeptical about the future? You may think that people who file their tax returns early are masochists, and you always delay yours until April 14, because the IRS could be dissolved any day now, or you could die before. However, if, like Ben Franklin, you are certain that taxes are as certain as death, then precomputation may seem a prudent thing to do.
A skeptical attitude toward the future calls for a radically different strategy for scheduling computation, namely, a strategy that tries to delay costly operations as much as possible until they cannot be avoided anymore. The hope or expectation is that something might happen that makes the costly operation obsolete and thus saves its runtime (and potentially other resources). In real life this behavior would be called procrastination; in computer science it is called lazy evaluation. Lazy evaluation can save computation effort whenever the information that would have been obtained by the saved computation is not needed anymore. In the case of Indiana Jones’s adventures, it happens quite often that an initial plan has to be changed or abandoned because of unforeseen events or complications. In such cases, all the effort that went into creating the plan is wasted and could have been saved by not making a plan in the first place.
Whereas the precomputing advocate’s motto is “A stitch in time saves nine” or “The early bird catches the worm,” the champion of lazy evaluation might respond with “Yes, but the second mouse gets the cheese.” While lazy evaluation seems attractive in its promise to not waste effort, it is problematic when an action that becomes unavoidable takes longer than it would have under precomputing, or worse, longer than there is time available. In particular, when several delayed actions become due at the same time, this might present a serious resource problem. Therefore, an overall more sensible strategy is to distribute work evenly over time. While this might waste some effort on precomputing, it avoids crises that a lazy evaluation strategy might bring on.
Getting Lunch
It’s Wednesday, the day of the week when you and some of your colleagues go out to lunch together. You decide to try out the new Italian restaurant, but when you arrive you learn that their credit card reader is not working and they are accepting only cash today. Before you order, you determine how much cash your lunch party has. When it comes to ordering, you are faced with the problem of selecting a set of items (appetizers, entrées, side dishes, and drinks) that will feed all members of the lunch party and satisfy their preferences as much as possible but that also doesn’t exceed the budget given by the amount of cash available. Of course, if you have enough money, the menu selection doesn’t pose a problem, but that is not necessarily a safe assumption these days, as few people carry cash at all, expecting to be able to pay by debit or credit card or smartphone.
How do you go about selecting the menu items to order? You could start with everybody’s preferred order. If that fits within the budget, then the problem is solved. But what if the total exceeds the available cash amount? In that case, people could offer to order cheaper alternatives or to not order appetizers or drinks until the total is within the limit. This approach depends on your ability to determine the overall value of each lunch order based on all participants’ preferences. Here value means the combined satisfaction of the lunch party with a particular lunch selection.
This is probably not an easy task, but le
t’s assume that we can determine the value of lunch orders. Now the goal is to find a lunch order of maximum satisfaction value whose total cost doesn’t exceed the cash limit. Perhaps a good strategy for this is to gradually trade value for cost. However, this strategy isn’t as simple as it looks. Even if it is better for Bob to refrain from ordering a drink than for Alice to not order an appetizer, it is not clear that going along with this selection yields the better overall value. This is because Alice’s appetizer might be more expensive than Bob’s drink and thus might save enough so that Carol could have her second entrée preference instead of her third. Now Bob’s and Carol’s preferences together might be more valuable to satisfy than Alice’s. When it gets down to the details, it is not so clear which selections to change and in which order.
We encounter this problem of selecting a number of items on a limited budget in many other situations as well, including planning a vacation with more costly or less costly events, choosing different options for traveling, or selecting among the upgrades for a new car or some electronic gadget. It might not seem like it, but this problem is surprisingly difficult in general. All known algorithms to solve these problems have runtimes that are exponential in the number of items to select from. For instance, consider writing down all possible lunch selections on small napkins of size 10 cm by 10 cm that each fit ten selections. Assuming that each person selects between one and four items out of a menu of ten items (which gives 5,860 menu options for each person), the napkins needed for writing down all possibilities for a lunch party of five would be enough to cover the surface of the earth over 13 times.