by Martin Erwig
As figure 13.2 demonstrates, the actual construction of the sorted list takes many steps, and the effect of the different executions of the Insert algorithm is partly obscured by the presence of the conditional and the fact that intermediate lists are temporarily represented twice in the alternatives of the conditional. While the trace produced by substitution is precise and shows exactly what the algorithm does, it also requires a lot of attention and focus to slog through all the details and to distinguish the data from the instructions.
Another aspect of the substitution approach that can be confusing is that in many cases different substitutions are possible, and while the choice does not generally affect the result, it can affect the size of the trace and its understandability. For example, figure 13.2 shows that the first substitution yields Isort(W→H, Insert(B, )), to which two substitutions are applicable: we can either use the first equation for Insert or the second equation for Isort.
Chapter 6 showed a data-only trace to illustrate different sorting algorithms: for each element that was moved, only the unsorted and sorted lists were shown (see illustration of insertion sort in figure 6.2). If we apply the same visualization to the example here, we obtain a trace that is much shorter and more concise than the one in figure 13.2:
The data-only trace looks so much simpler because it does not contain any instructions from the algorithm. (This reveals how an interpreter typically works: it keeps the algorithm or program description and the data to be manipulated separate.) Also, the data-only trace shows only the effects of Isort on the two lists, leaving out the details of how Insert moves elements within the second list. Moreover, the program is represented only once and is not changed at all. When an algorithm is interpreted, a data-only trace presents only the data as it evolves.
Whereas the trace in the substitution approach serves the double duty of tracking the evolution of the data and the progress of the computation, an interpreter uses a stack data type for each of these two tasks. In particular, for a recursive algorithm, an interpreter must keep track for each recursive call where it left off to be able to go back and continue the computation after the recursive call is finished. Since each recursive execution of an algorithm has its own argument, the interpreter also has to be able to maintain multiple versions of parameters.
Both these needs can be fulfilled by storing program addresses and parameter values on a stack. Let’s see how this works by executing the algorithm ToDo. To facilitate jumps back from recursive calls, we have to mark positions in the algorithm, for which we use numbers. Since the parameter of the algorithm ToDo is not used in any of the definitions, we can ignore it and only store positions on the stack. The algorithm is reproduced in a slightly modified way, where numbers mark the positions between instructions:
Figure 13.3 Interpretation of ToDo(1985). If the current instruction is a recursive call, the address following it is remembered on the stack to allow the continuation of the computation after it is completed. This happens whenever a return instruction is encountered. After the jump back, the address is removed from the stack.
ToDo(1985) = ① ToDo(1955) ② go camping ③ return
ToDo(1955) = ④ destroy almanac ⑤ ToDo(1885) ⑥ return
ToDo(1885) = ⑦ save Doc ⑧ return
An interpreter executes an algorithm applied to an argument such as ToDo(1985) by performing the instructions one by one and by keeping addresses to jump back to on a stack. Whenever an instruction is a recursive execution of the algorithm, the address following that instruction is pushed onto the top of the stack before the interpreter jumps to the instruction indicated by the recursive call. In the example, the first instruction is such a jump that causes the following address ② to be pushed onto the stack and makes ④ the next instruction.
This is illustrated in the first two lines of figure 13.3, which shows how the current instruction and the stack evolve during the execution of the algorithm. The second column displays the stack with its top on the left and bottom on the right. The figure also shows part of the world as it is changed through the execution of the instructions, for example, the current year or the possession of the sports almanac. This particular fact about the world changes after Marty has destroyed the sports almanac in 1955. Note, however, that the fact that Doc Brown is in danger is not a consequence of the current algorithm execution. However, changing it is part of the next steps of the algorithm. After traveling to 1885, which causes another return address, ⑥, to be pushed onto the stack, the action to save Doc Brown changes the fact about his being in danger. The next instruction in the algorithm is to return to where the algorithm left off when the recursive jump was made. The target for the return jump is the last address pushed onto the stack and can thus be found on top of the stack. Therefore, the execution of the return instruction causes the next instruction to be the instruction at the address ⑥, which is another return that also pops the return address off the stack. This reveals the target address for the next return instruction, which is ②, the point where Marty and Jennifer can finally go camping.
The ToDo example demonstrates how nested calls of an algorithm lead to return addresses being stored on the stack, but it did not require storing parameter values on the stack. To illustrate this aspect of an interpreter, we consider again the algorithm Isort, which does require this. However, we don’t need to store return addresses because each algorithmic step is given by just one expression, not a sequence of several, as in the case of ToDo.
To sort the list of Marty’s items, an interpreter starts evaluating the call Isort(B→W→H, ) with an empty stack. Matching the arguments against the parameter patterns of Isort leads to bindings, which are associations of parameter names used in the patterns and values. These bindings are pushed onto the stack, resulting in the stack shown in figure 13.4A. It may seem strange that while the Isort algorithm takes two inputs, the application to a nonempty first input produces bindings for three parameters on the stack. This is because the first argument is matched against a pattern, x→rest, which contains two parameters that have the purpose of splitting the argument list into two parts, the first element and the tail of the list. The first equation only produces a binding for one parameter, because the first input is known to be the empty list and need not be referred to by a name.
After the pattern matching, which produces parameter bindings on the stack, the algorithm instructs to compute Isort(rest, Insert(x, list)). As in the substitution method, the interpreter now has two possible routes to proceed: either continue with the outer Isort call, or first deal with the nested Insert call. Most programming languages evaluate arguments before executing an algorithm.1 Following this strategy, the interpreter evaluates Insert(x, list) before further evaluating the Isort call. The values for x and list can be retrieved from the stack and lead to the evaluation of Insert(B, ). This evaluation can be carried out with a separate stack and produces the resulting list B, which means that the evaluation of the Isort call becomes Isort(rest, B).
The interpreter evaluates Isort(rest, B) with the stack shown in figure 13.4A. First, the value of rest is retrieved from the stack, which means that the interpreter actually evaluates the call Isort(W→H, B). Then pattern matching produces new bindings for the parameters of Isort to be pushed onto the stack, as shown in figure 13.4B.
Figure 13.4 Snapshots of stack values during the interpretation of Isort(B→W→H, ).
It is striking that the parameters x, rest, and list appear twice on the stack. This is because the recursive call of Isort operates with different arguments for its parameters and thus requires separate storage for them. Once the nested call of Isort has been processed by the interpreter, the bindings for the parameters are removed from (“popped off”) the stack, and the computation of the previous call of Isort can continue, having access to its own arguments.2
However, before it can be completed, the call of Isort leads again to an execution of the second equation for Isort, which prompts the evaluation of I
sort(rest, Insert(x, list)). The bindings are again found on the stack. But since there are multiple bindings available for each parameter name, the question is which of these should be used, and how to find them. It is here where the stack data type comes into play again. Since the parameter values for the latest Isort call have been pushed onto the stack most recently, they are found on top of the stack. The resulting call is thus Isort(H,Insert(W, B)). Since Insert(W, B) results in B→W, the next call of Isort to be evaluated is Isort(H, B→W), which triggers again the second equation for Isort and leads to another call Isort(rest, Insert(x,list)) with the stack shown in figure 13.4C.
Finding the arguments for the parameters on top of the stack, this call leads to evaluating Isort( ,Insert(H,B→W)), which in turn results in Isort( , B→H→W) after the evaluation of Insert(H, B→W). Since now Isort’s first argument is the empty list, the interpreter is given list to evaluate by the first equation for Isort. The evaluation happens in the context of the stack shown in Figure 13.4D.
At this point, the value for list is found on the stack and returned as the final result. Finally, the end of each Isort call is accompanied by removing its parameter bindings from the stack, leaving the stack empty. We can observe how a stack grows with each (recursive) call and shrinks after a call is completed.
The two-list trace for insertion sort shown earlier can be systematically reconstructed from the stacks in figure 13.4. In fact, the stack that represents the complete nesting of all Isort calls (figure 13.4D) is sufficient for that purpose. Specifically, each group of bindings in the stack produces one step of the trace. Recall that each nonempty input list to Isort is split when Isort is called to produce the bindings for the two parameters x and rest. This means that the input list for the first three Isort calls is given by the list x→rest, and the output is given by list. For the final call of Isort, when the input list is empty, the input list is the empty list, which is not represented by a parameter, and the output list is given by list. Thus the bottom entry of the stack yields the pairs of lists B→W→H and the empty list, the second entry yields W→H and B, the third yields H and B→W, and the top entry gives us the empty list (the input list, not bound to a parameter) and B→H→W.
Substitution and interpretation are two methods to understand the execution of algorithms, in particular, of recursive algorithms. Substitution is simpler, since it only works with a trace that is rewritten step-by-step, whereas interpretation employs an auxiliary stack. Substitution mixes code and data, whereas interpretation keeps them cleanly separated, which simplifies the extraction of simple traces. In the case of unbounded recursion, substitution produces something useful, whereas interpretation does not terminate.
Doppelgängers Get More Done
When Marty goes back to 1955 a second time (after returning with Doc from 2015 to the violent 1985), he exists in 1955 twice, because the past he is traveling back to is the past to which he has traveled before (by accident in the first movie). The two Martys don’t interact, and they do different things. The first Marty is trying to get his parents to fall in love, whereas the second Marty tries to take the sports almanac away from Biff. Similarly, when old Biff travels from 2015 to 1955 to give his younger self the sports almanac, he exists twice in 1955. In contrast to the two Martys, the two Biffs do interact. The old Biff gives the young Biff the sports almanac. Fortunately, the space-time paradox that could bring about the destruction of the universe doesn’t occur.3 Moreover, the time machine that Marty uses to go back from 1985 to 1955 is the same that old Biff has used to go from 2015 to 1955. Therefore, at the time when Marty observes that old Biff gives young Biff the sports almanac, two copies of the time machine must exist in 1955. In fact, three time machines must exist at the same time in 1955 because the first Marty has also come to 1955 with the time machine.
This demonstrates that an inevitable consequence of time travel into the past is the duplication of the time-traveling objects and people—at least, when multiple travels happen to that same time in the past. The situation with recursion is very similar. Since the second equation for Count contains only one recursive occurrence of Count, it will create when executed only one instance at any time in the past, because each recursive call will travel one time unit into the past from when it occurred. This form of recursion, in which the defined is referred to only once in its definition, is called linear recursion. Linear recursion in algorithms leads to isolated occurrences of algorithm calls in the past. Linear recursion can be easily transformed into loops, and it generally does not lead to opportunities for concurrent execution.
In contrast, the case when a definition refers to the defined more than once is called nonlinear recursion. Since all the calls happen at the same time, the corresponding executions are also started at the same time in the past and thus occur concurrently. This does not mean that a computer (electronic or human) actually has to execute the calls in parallel. It only means that they could be executed in parallel. And this is a stunning aspect of well-designed divide-and-conquer algorithms. Not only do they divide the problem quickly so that a problem can be solved with few steps, they also support the parallel execution by many computers.
Quicksort and mergesort are two examples of such algorithms (see chapter 6). The definition of quicksort is as follows. The first equation says that an empty list is already sorted, and the second equation says that to sort a nonempty list one should take all elements from the tail of the list (rest) that are smaller than x, sort them and put them in front of x and similarly place the result of sorting all larger or equal elements after x:
Qsort( ) =
Qsort(x→rest) = Qsort(Smaller(rest, x))→x→Qsort(Larger(rest, x))
Notice that the second equation reveals the nonlinear recursion of Qsort. Quicksort performs best when x repeatedly allows the tail to be split into two sublists of roughly equal size. This may not happen in the worst case, when the list is already (almost) sorted, but quicksort performs very well on average.
It’s actually fun to execute quicksort or mergesort with the help of a group of people. To execute quicksort, all people line up in a queue, and the first person starts sorting by applying the second equation and splitting the list of all elements into two sublists depending on whether they are smaller than the first element. She keeps the first element and hands each sublist to a new person from the queue who performs quicksort on the list handed to him and possibly also recruits new people from the queue for the sorting of sublists. A person who is handed an empty list is immediately done and can return that empty list, following the definition of the first equation.4 Once a person has finished sorting her list, she hands the sorted list back to the person who gave her the list. And everyone who receives a sorted sublist as a result creates his own sorted list by placing the list with the smaller elements before x and the list with the larger elements after x. If we stop the recursion when lists have only one element, then we need as many people to sort the list as the list contains elements, because each person hangs on to only one element. This strategy may seem a waste of resources for simply sorting a list, but with ever-decreasing computing costs and increasing computing power, it illustrates the power of divide-and-conquer and shows that many hands make light work.
Further Exploration
A recursive algorithm for solving a problem tries to first solve the same problem for smaller input. The time travel in Back to the Future illustrates how solving a problem in the past is a prerequisite for proceeding in the present. The same situation can be found in the movies Twelve Monkeys and Déjà Vu, where time travel is employed to solve a problem in the present by first solving a related problem in the past. With a similar premise but a changed perspective, in the Terminator movies robots are sent to the present to change reality now to create a different future. The future-to-present time travel also is the basis of the movie Looper.
The problem of time paradoxes is often skirted in time travel stories because reading an inconsistent stor
y is not very satisfying. Attempts to deal with paradoxes, and thus with undefined recursions, can be found in Stephen King’s 11/22/63, in which a teacher travels through a time portal back in time to prevent JFK’s assassination, which causes reality in the present to break down. A similar story line can also be found in Gregory Benford’s Timescape.
The idea of making sense of recursive definitions through fixed points is dramatically illustrated in Robert Heinlein’s All You Zombies, which tells the story of a woman who falls in love with the turned-male version of herself traveling back in time from the future. She gets pregnant, and her baby girl is abducted and brought back in time, where she grows up to be the woman the story starts out with. The interesting aspect is that all the strange events can be reconciled, since the woman is her own mother and father, a fact that amounts to the fixed point of the story’s causal constraints. This story has been made into the movie Predestination. In the movie Los Cronoscrímenes (Timecrimes), the fixed point is obtained by creating doppelgängers of the protagonist.
When recursive definitions are executed, they unfold into nested structures. This is clear for Sierpinski triangle quilts, Matryoshka dolls, or nested TV pictures. But it also happens in stories. Perhaps one of the oldest examples is the collection of stories One Thousand and One Nights, also known as Arabian Nights. A narrator tells the story of Queen Scheherazade, who tells stories in which people tell stories about people who tell stories, and so on. Douglas Hofstadter’s Gödel, Escher, Bach contains a dialogue that tells a nested story, which, among other things, illustrates the stack model of recursion. The book contains a lot of material on recursion, including several drawings by M. C. Escher.