by Martin Erwig
Finally, binary search trees work only for those kind of elements that can always be ordered, that is, for any two elements where we can say which is larger and which is smaller. Such a comparison is required to decide in which direction to move in a tree to find or store an element. However, for some kinds of data, such comparisons are not possible. Suppose, for example, you are keeping notes on quilting patterns. For each pattern you want to record what fabrics and tools you need, how difficult it is to make, and how much time it takes. To store information about those patterns in a binary search tree, how could you decide whether one pattern is smaller or larger than another? Since patterns differ in the number of pieces they contain, as well as the shapes and colors of the pieces, it’s not obvious how to define an order among those patterns. This is not an impossible task—one could decompose a pattern into its constituting parts and describe it by a list of its features (for example, the number of pieces or their colors and shapes) and then compare patterns by comparing their lists of features, but this requires effort, and it may not be very practical. Thus a binary search tree might not be well suited for storing a quilting pattern dictionary. However, you could still use a list, since all you need to do for a list is to decide whether two patterns are the same, which is easier than ordering them.
To summarize: Binary search trees exploit a strategy employed naturally and effortlessly by people to decompose search problems into smaller ones. In fact, binary search trees have systematized and perfected the idea. As a consequence, binary search trees can be much more efficient than lists in representing dictionaries. However, they require more effort to guarantee the balancing, and they do not work for every kind of data. And this should match your experience with searching your desk, office, kitchen, or garage. If you regularly devote effort to keeping things in order, for example, by placing documents or tools back after their use, you have a much easier time finding objects than by having to search through a pile of unorganized stuff.
Try a Trie
Binary search trees are an efficient alternative to lists when it comes to computing a histogram of letter frequencies in words. But this is only one of the computations that help Indiana Jones solve the tile floor challenge. The other computation is to identify a sequence of tiles that spell a particular word and lead safely across the tile floor.
We have already determined that a grid consisting of six rows, each with eight letters, contains 262,144 different six-tile paths. Since each path corresponds to one word, we could represent the association of words to paths with a dictionary. A list would be an inefficient representation, since the clue word Iehova might be located near the end of the list and thus might take quite some time to locate. A balanced binary search tree would be much better because its height would be 18, ensuring the clue word can be found relatively quickly. But we don’t have such a balanced search tree available, and constructing it is a time-consuming endeavor. As with the search tree containing the letters, we have to insert each element separately, and to do so we have to repeatedly traverse paths in the tree from the root to the leaves. Without analyzing this process in detail, it should be clear that building the tree takes more than linear time.6
Still, we can identify the sequence of tiles quite efficiently (with no more than 42 steps) without any additional data structure. How is this possible? The answer is that the tile floor with the letters on the tiles is itself a data structure that supports particularly well the kind of search Indiana Jones has to perform. It is called a trie,7 a data structure that is somewhat similar to a binary search tree but also different in an important way.
Recall how Indiana Jones with each step on a tile reduces the search space by a factor of eight. This is similar to what happens when by descending into a balanced binary search tree the search space is cut in half: by taking one of two tree branches, half the nodes in the tree are excluded from the search. Similarly, by not taking a tile, the search space is reduced by one-eighth, and by selecting a tile, the search space is reduced to one-eighth of its previous size, because selecting one tile means not selecting seven others and thus reducing the search space by seven-eighth.
But something else happens here as well that is different from how a binary search tree works. In a binary search tree each element is stored in a separate node. By contrast, a trie stores only single letters in each node and represents words as paths connecting different letters. To understand the difference between a binary search tree and a trie, consider the following example. Suppose we want to represent the set of words bat, bag, beg, bet, mag, mat, meg, and met. A balanced binary search tree containing these words looks as follows:
To find the word bag in this tree, we compare it with the root bet, which tells us to continue the search in the left subtree. This comparison takes two steps for comparing the first two characters of both words. Next we compare bag with the root of the left subtree, bat, which tells us again to continue the search in the left subtree. This comparison takes three steps, since we have to compare all three characters of the two words. Finally, we compare bat with the leftmost node in the tree, which results in the successful completion of the search. This last comparison also takes three steps, and the whole search requires 8 comparisons altogether.
We can represent the same collection of words by a 2-by-3 tile floor, where each row contains tiles for the letters that can occur at the corresponding position in any of the words. For example, since each word begins with either b or m, the first row needs two tiles for these two letters. Similarly, the second row needs two tiles for a and e, and the third row needs tiles for g and t.
b
m
a
e
g
t
Finding a word on the tile floor works by systematically traversing the tile floor row-by-row. For each letter of the word the corresponding tile row is traversed from left to right until the letter is found. For example, to find the word bag on this tile floor, we start searching for the first letter, b, in the first row. We find the tile in one step. Next we search for the second letter a in the second row, which again takes one step. Finally, we can complete the search successfully by finding g in the third row, which also takes only one step. Altogether this search takes 3 comparisons.
The search on the tile floor requires fewer steps than the binary search tree because we have to compare each letter only once, whereas the search in the binary tree causes repeated comparisons of initial parts of the word. The word bag represents the best case scenario for the tile floor, since each letter can be found on the first tile. By contrast, the word met requires six steps, since each of its letters is contained on the last tile in a row. But it cannot get worse than this because we have to check each tile at most once. (For comparison, finding met in the binary search tree requires 2 + 3 + 3 + 3 = 11 steps.) The best case for the binary search tree is finding the word bet, which requires only three steps. However, with growing distance from the root, binary search leads to more and more comparisons. Since most words are located toward the leaves of the tree, at a greater distance from the root,8 we generally have to repeatedly perform comparisons of initial parts of the word, which indicates that in most cases search in a trie is faster than in a binary search tree.
The tile floor analogy suggests that a trie can be represented as a table, but this is not always the case. The example works out nicely because all words have the same length and each position in a word can contain the same letters. However, suppose we want to also represent the words bit and big. In this case, the second row needs an additional tile for the letter i, which breaks the rectangular shape. Adding the two words reveals another regularity in the example that does not exist in general. The example words are chosen so that different prefixes of the same length have the same possible suffixes, which allows the sharing of information. For example, ba and be can both be completed by adding either g or t, which means that the possible continuations of both prefixes can be described by a sing
le tile row containing the letters g and t. However, this is not true anymore when we add the words bit and big. While b can be continued with 3 possible letters, namely, a, e, and i, m can be continued only with a and e, which means that we need two different continuations.
Therefore, a trie is typically represented as a special kind of binary tree in which nodes carry single letters, left subtrees represent word continuations, and right subtrees represent letter alternatives (plus their continuations). For example, the trie for the words bat, bag, beg, and bet looks as follows:
Since the root has no right edge, all words start with b. The left edge from b to the subtree with root a leads to the possible continuations, which start with either a or e, the alternative to a pointed to by the right edge from a. The fact that ba and be can be continued by either g or t is represented by a left edge from a to a subtree with root g, which has a right child t. The fact that the left subtrees of a and e are identical means that they can be shared, which is exploited in the tile floor representation by having a single tile row. Because of this sharing, a trie typically requires less storage than a binary search tree.
Whereas in the binary search tree each key is completely contained in a separate node, the keys stored in a trie are distributed over the trie’s nodes. Any sequence of nodes from the root of a trie is a prefix of some key in the trie, and in the case of the tile floor, the selected tiles are a prefix of the final path. This is why a trie is also called a prefix tree. The trie data structure representing the tile floor facing Indiana Jones is illustrated in figure 5.2.
Like binary search trees and lists, tries have their advantages and their problems. For example, they work only for keys that can be decomposed into sequences of items. As with Indiana Jones, who in the end was not able to hold on to the Holy Grail, there is no Holy Grail of data structures. The best choice of a data structure always depends on the details of the application.
Figure 5.2 A trie data structure and how to search it. Left: A trie representation where left subtrees represent possible word continuations of the letter in the parent node (represented by the rounded rectangles) and right subtrees represent alternatives to the parent node. Middle top: The tile floor representation of the trie on the left in which the common subtrees are shared through single tile rows. Middle bottom: Three selected tiles that spell the beginning of the word Iehova. Right: A path through the trie marked by bold edges for the selected tiles. The circled nodes correspond to selected tiles.
When we watch the movie scene, we consider the real challenge to be finding the clue word or jumping precisely onto the target tiles, but hardly the identification of the tiles according to the letters of the clue word. This seems so obvious that we don’t think about it at all, which again demonstrates how natural the execution of efficient algorithms is to us. As with Hansel and Gretel and Sherlock Holmes, Indiana Jones’s adventures and his survival are based on core computing principles.
And finally, if you’re still wondering about the method to never lose your car keys again, it is very simple: When you come home, always place them at one particular location. This location is the key to the keys. But you probably knew that already.
Getting Your Ducks in a Row
If you are a teacher, a significant part of your job is to grade essays or exams for the students in your classes. The task involves more than just reading and grading individual exams. For example, before deciding on the final grades you may want to get a sense of the grade distribution and potentially curve the grades. Moreover, once the grades are fixed, you have to enter them into the class roster, and finally, you have to hand the exams back to the students. In some cases each of these tasks can be performed more efficiently by employing some form of sorting.
Let’s first consider entering grades into a roster. Even for this simple task one can identify three different algorithms with different runtime performances. First, you can go through the pile of graded exams and enter each grade into the roster. Each exam is obtained in constant time from the pile, but it takes logarithmic time to find the student’s name in the roster, assuming it is sorted by names and you use binary search to find it. For entering all grades into the roster this adds up to linearithmic runtime,1 which is much better than quadratic but not quite as good as linear.
Second, you can go through the roster name by name and find for each student her or his exam in the pile. Again, getting to the next name in the roster takes constant time, but finding the exam for a particular student in the list requires, on average, traversing half the list, resulting in quadratic runtime.2 Thus this algorithm should not be used.
Third, you could sort the exams by name and then go through the roster and the sorted pile in parallel. Since the sorted pile and the roster are aligned, entering each grade takes only constant time. Altogether this leads to linear time plus the time it takes to sort the list of exams, which can be done in linearithmic time. The total runtime for this method is dominated by the linearithimc time required for sorting. Therefore, the total runtime of this last algorithm is also linearithmic. But that’s not better than the first approach, so why go to the extra trouble of sorting the exams in the first place? Since there are situations in which we can actually sort faster than linearithmic time, the third method can in these cases improve the performance and be the most efficient approach.
The second task, curving the exam grades, requires creating a point distribution, that is, a map with point scores and the number of students who have obtained that score. This task is similar to Indiana Jones’s task of computing a histogram of letters. A grade distribution is also a histogram. Computing a histogram using a list takes quadratic runtime while using a binary search tree takes linearithmic runtime, which suggests using the latter. You could also compute a histogram by first sorting the pile of exams by points, followed by a simple scan of the sorted pile where you count how often each point score repeats. This works, since in a sorted list all the exams with the same points will follow one another. As in the case of entering the grades into the class roster, sorting does not increase the runtime but can potentially shorten it. However, the situation is different if the point range is small. In that case, updating an array of point scores is probably the fastest method.
Finally, handing back exams in a classroom can become an embarrassingly slow endeavor. Imagine standing in front of a student and trying to find her or his exam in a large pile. Even using binary search this takes too long. Repeating the search for a large number of students makes this an inconveniently slow process (even though it gets faster with each step as the pile of exam gets smaller). An alternative is to sort the exams by position of students in the class. If seats are numbered and you have a seating chart (or just happen to know where students sit), you can sort the exams by seat numbers. Handing back the exams then becomes a linear algorithm where, similarly to entering the grades into the class roster, you can traverse the seats and the pile of exams in parallel and hand back each exam in one step. In this case, even if the sorting takes linearithmic time, it is worth the effort, since it saves precious class time. This is an example of precomputing, where some data needed for an algorithm is computed before the algorithm is executed. It is what the mail carrier does before delivering mail or what you do when arranging your shopping list according to how items are placed in the aisles at the grocery store.
Sorting is a more general activity than one might think. In addition to sorting collections of items, we regularly have to sort tasks based on their dependencies. Consider the algorithm for getting dressed: you have to figure out to put on socks before shoes and underpants before pants. Assembling furniture, repairing or maintaining machines, filling out paperwork—most of these activities require us to perform some actions in the correct order. Even preschoolers are given problems to sort pictures into a coherent story.
6
Sorting out Sorting
Sorting is a major example of a computation. In addition to having many applic
ations, sorting helps to explain some fundamental concepts of computer science. First, different sorting algorithms with different runtime and space requirements illustrate the impact of efficiency in deciding how to solve a problem computationally. Second, sorting is a problem for which a minimal complexity is known. In other words, we know a lower bound on the number of steps that any sorting algorithm must take. Sorting thus illustrates that computer science as a field has identified principal limitations on the speed of computation. Knowledge about limits is empowering because it helps us direct research efforts more productively. Third, the distinction between the complexity of a problem and the complexity of its solutions helps us understand the notion of an optimal solution. Finally, several sorting algorithms are examples of divide-and-conquer algorithms. Such an algorithm splits its input into smaller parts, which are processed recursively and whose solutions are reassembled into a solution for the original problem. The elegance of the divide-and-conquer principle stems from its recursive nature and its relationship to its close cousin, mathematical induction. It is a very effective approach to problem solving and illustrates the power of problem decomposition.
As mentioned in chapter 5, one of the applications of sorting is to support and speed up search. For example, finding an element in an unsorted array or list takes linear time, whereas in a sorted array this can be achieved in logarithmic time through binary search. Thus computation (here, sorting) expended at one time can be preserved (in the form of a sorted array) to be used at a later time to speed up other computations (for example, searching). More generally, computation is a resource that can be saved and reused through data structures. This interplay between data structures and algorithms shows how closely related the two are.