Once Upon an Algorithm
Page 32
Terms such as wizard, detective, and scientist are, of course, types. They carry with them connotations of properties that are generally thought to be true of any one individual member of such a type. The notions of protagonist and challenge in the Story abstraction are also types, since they evoke a particular imagery that is required for the Story abstraction to convey its meaning. It seems that protagonist is a more general type than, say, wizard or detective because it provides fewer details. The fact that the latter two can be substituted for the former also supports this view. However, this is only part of the story. Take Lord Voldemort, for example. He is a wizard, but he is not a protagonist. Instead, he is a major antagonist in the Harry Potter stories. Since both Harry Potter, a protagonist, and Voldemort, an antagonist, are wizards, it now seems that the type wizard is more general because it ignores details on which the distinction between a protagonist and an antagonist relies. Therefore, neither protagonist nor wizard can be considered more abstract in general, which is not too surprising, since these types come from different domains, namely, stories and magic.
Within one domain, types are often arranged more clearly by placing them in hierarchies. For example, Harry Potter, Draco Malfoy, and Severus Snape are all members of Hogwarts, but only Harry and Draco are students at Hogwarts. If you are a student at Hogwarts, you are clearly also a member of Hogwarts, which means that the type member of Hogwarts is a more general abstraction than student at Hogwarts. Moreover, since Harry, but not Draco, is a member of the house of Gryffindor, the type student at Hogwarts is a more general abstraction than student in the house of Gryffindor. Similarly, magic is more abstract than a spell, which in turn is more abstract than a teleportation spell or a patronus charm.
Types found in programming languages are probably the most obvious form of data abstraction. The numbers 2 and 6 are different, but they have many things in common. We can divide them by 2, we can add them to other numbers, and so on. We can therefore ignore their differences and group them together with other numbers in the type Number. This type abstracts from the properties of individual numbers and exposes the commonalities between all its members. In particular, the type can be used to characterize parameters in algorithms, which can then be exploited to check the consistency of the algorithm with a type checker (see chapter 14). Thus data abstraction goes hand-in-hand with functional abstraction: an algorithm abstracts from individual values by using parameters. However, in many cases the parameters cannot be substituted by anything imaginable but require as arguments representations that can be manipulated by the algorithm. For example, a parameter that is multiplied by 2 must be a number, which is where types as data abstraction come into play. The type Number can also be regarded as more abstract than many more specialized number types, such as the type of all even numbers.
Finally, in addition to (plain) types such as Number, data abstraction applies specifically to data types (see chapter 4). A data type is defined solely by the operations it offers and their properties. The details of the representation are ignored, that is, abstracted from, which means that a data type is more abstract than a data structure that implements it. For example, a stack can be implemented by a list or an array, but the details of those structures and thus the differences between them are not visible when they are used to implement stacks.
Any well-chosen data abstraction highlights the features of the representation that support computations with them. Moreover, such an abstraction ignores and hides features that could interfere with the computation.
Time to Abstract
As explained in chapter 2, the runtime of an algorithm is not reported in the same way as a fitness tracker reports the time of your most recent six-mile run. Reporting runtimes in seconds (or minutes or hours) would not be a very helpful piece of information, since it depends on a particular computer. When the same algorithm is run on a faster or slower computer, the runtime of the same algorithm would be different. Comparing your time for running six miles with that of your friend makes sense, since it provides information about the relative efficiency of two computers, that is, runners. However, the time doesn’t say anything meaningful about the efficiency of the running algorithm itself, because both you and your friend are executing the same algorithm.
It is therefore a good idea to abstract from concrete times and to measure the complexity of algorithms in number of steps taken. Such a measure is independent of the speed of a computer and is thus not affected by technological development. Suppose your stride while running is relatively constant. Then you will always take about the same number of steps for your six-mile run regardless of the particular circumstances. Using the number of steps of a fixed stride abstracts from particular characteristics of the runner and thus provides a more stable characterization of the run. In fact, giving the number of steps is just a different way of saying that the run is six miles long. The length of a run is a better measure for its complexity than its duration, since it abstracts from the different speeds of different runners or even the same runner at different times.
However, the number of steps or operations, while more abstract than time, is still too concrete a measure for the complexity of an algorithm because this number varies with the input of the algorithm. For example, the longer a list, the longer it takes to find its minimum or sort it. Similarly, a six-mile run takes more steps to complete than a five-mile run. Remember that the goal is to characterize the complexity of the algorithm in general, not its performance for particular inputs. Thus it is not clear for which input we should report the number of steps. One could imagine creating a table showing the number of steps for several example cases, but it is not clear which examples to pick.
Therefore, time abstraction for algorithms goes further and ignores the actual number of steps taken. Instead it reports how the number of steps grows with larger input. For example, if the algorithm needs twice as many steps if the size of the input doubles, it grows at the same rate. As explained in chapter 2, such a runtime behavior is called linear. This is what happens with finding the minimum of a list or with running.5 Even if the runtime grows by a factor higher than 2, the complexity of the algorithm is still considered linear because the relationship between the size of the input and the number of steps taken is expressed by multiplication with a constant factor. This is the case for the number of steps it takes Hansel and Gretel to find their way home. The runtime grows at a rate greater than 2, since pebbles are spread out by several steps. The runtime category of linear algorithms abstracts from this factor as well, and thus Hansel and Gretel’s algorithm is still considered to be linear.
The two most important benefits of the runtime abstraction are its ability to tell us which problems are tractable and which algorithm to select for a particular problem. For example, algorithms with exponential runtime work only for very small inputs, and problems for which only algorithms with exponential runtime are known are therefore deemed intractable (see chapter 7). On the other hand, if we have several algorithms available to solve the same problem, we should pick the one with the better runtime complexity. For example, we would generally prefer the linearithmic mergesort over the quadratic insertion sort (see chapter 6). Figure 15.3 summarizes time abstraction.
Figure 15.3 Time abstraction. To abstract from the different speeds of computers, we use the number of steps an algorithm needs during execution as a measure for its runtime. To abstract from the different number of steps needed for different inputs, we measure the runtime of an algorithm in terms of how fast it grows for larger input.
The Language in the Machine
An algorithm cannot produce a computation by itself. As explained in chapter 2, an algorithm must be executed by a computer that understands the language in which the algorithm is written. Any instruction used in the algorithm must be in the repertoire of instructions the computer can process.
Tying an algorithm to a specific computer through a language is problematic for several reasons. First, independ
ently designed computers will likely understand different languages, which means that an algorithm written in a language that can be understood and executed by one computer might not be understood and executed by another one. For example, if Hansel or Gretel used German to write down the algorithm for finding the way home using pebbles, a child growing up in France or England could not execute it without having been taught German. Second, over time the languages used by computers change. This may not be a problem for people, who can still reliably understand old, outdated language forms, but it is certainly a problem for machines, which may fail to execute an algorithm altogether if even one of its instructions has been slightly changed. The brittle language tie between algorithm and computer seems to make it difficult to share algorithms. Fortunately, software does not have to be rewritten every time a new computer enters the market. Two forms of abstraction can be credited for this: language translation and abstract machines.
To illustrate the idea of an abstract machine, let’s consider the algorithm for driving a car. You probably learned how to drive one specific car, but still you are able to drive a variety of different cars. The driving skills you acquired are not tied to a specific model and make but are more abstract and can be described using concepts such as steering wheel, gas pedal, and brakes. The abstract car model is realized by various actual cars that differ in details but provide access to its functionality using the common general language of driving.
Abstraction applies to all kinds of machines. For example, we can abstract from the details of a coffee maker, a French press, or an espresso machine by saying that a machine for making coffee needs the ability to mix hot water with ground coffee for some time and then to separate the particles from the fluid. The algorithm for making coffee can be described in terms of this coffee-making abstraction, which is still concrete enough to be instantiated with different coffee-making machines. Of course, machine abstraction has its limits. We cannot use a coffee maker to execute a driving algorithm, and we cannot make coffee using a car. Yet abstract machines are an important way to decouple languages from specific computer architectures (see figure 15.4).
The most famous machine abstraction for computing is the Turing machine, named after the famous British mathematician and pioneer of computer science Alan Turing, who invented it in 1936 and used it to formalize the concepts of computation and algorithm. A Turing machine consists of a tape that is divided into cells, each containing a symbol. The tape is accessed through a read/write head that can also move the tape forward and backward. The machine is always in some particular state and is controlled by a program, given by a set of rules that say which symbol to write on the current cell of the tape, in which direction to move the tape, and which new state to go into, depending on what symbol is currently visible and what the current state is. The Turing machine has been used to prove the unsolvability of the halting problem (see chapter 11). Any program can be translated into a Turing machine program, which means that the Turing machine is an abstraction of all currently existing (electronic) computers. The importance of this insight is that whatever general property we can prove for the Turing machine also holds for any other existing computer.
The other strategy for abstracting from particular computers is to use language translation. For example, we can translate the pebble-tracing algorithm from German to French or English, which eliminates the language barrier and makes the algorithm accessible to a wider range of people. The same works, of course, for computer languages. In fact, almost all programs written today are translated in some way before they are executed by a machine, which means that almost none of the programming languages in use today are directly understood by a computer and that every algorithm has to be translated. A programming language abstracts from the details of a particular computer and provides a uniform access for programming a wide range of computers in one language. A programming language is therefore an abstraction of computers and makes the design of algorithms independent of specific computers. If a new computer is produced, all it takes to run existing algorithms on this new computer is to adapt a translator to produce changed code for that computer. The translation abstraction has made the design of programming languages to a large degree independent of the computers on which they will be executed.
Figure 15.4 An abstract machine is an abstraction from concrete computers. By providing a simpler and more general interface, an abstract machine makes algorithmic languages independent of specific computer architectures and extends the range of computers that can execute them.
There are different ways to define a Translate abstraction. As with any abstraction, the question is which details to abstract from and which to expose as parameters in the interface. The following definition abstracts from the program to be translated, the language it is given in, and the language it should be translated into:
Translate(program, source, target) = “Translate program from source into target”
Note that I use quotes around “Translate …” because translation is a sophisticated algorithm, much too long and complicated to be presented here. For example, the automatic translation of natural languages is still an open problem. In contrast, the translation of computer languages is a well understood and solved problem. Still, translators are long and complicated algorithms, which is why no details are shown here.
As an example of how to use the Translate abstraction, here is how the instruction to find a pebble can be translated from German into English:
Translate(Finde Kieselstein, German, English)
The instruction Finde Kieselstein is an element of the source language German, and the result of the translation is the instruction Find pebble, which is an element of the target language English.
In Harry Potter the language of spells and charms needs a wizard for execution. Some spells can be translated into a corresponding potion, which can then be executed by muggles as well. For example, the effect of some transforming spells can be captured in the Polyjuice Potion, which allows the drinker to change his or her appearance. While the Polyjuice Potion is apparently very difficult to produce, even for experienced wizards, some other spells have a straightforward translation. For example, the killing curse Avada Kedavra translates into any ordinary lethal poison.
Since every Translate abstraction is itself an algorithm, we can abstract from all translations through a language in which they can be expressed, just as we abstracted from all algorithms using a language. The top left part of figure 15.5 illustrates this case. Since every language corresponds to a computer or abstract machine that can execute its programs, this means that language can abstract from computers, as shown at the top right of figure 15.5.
Figure 15.5 The tower of abstractions. An algorithm is a (functional) abstraction from computations. An algorithm transforms representations, whose (data) abstractions are types. The acceptable inputs and outputs for an algorithm are also expressed as types. Each algorithm is expressed in a language, which is an abstraction from algorithms. A translation algorithm can transform algorithms from one language into another and therefore makes algorithms independent from a particular computer or abstract machine that understands the language in which it is expressed. Language is also an abstraction of computers, since translation can effectively eliminate the differences between computers. Types are also expressed as part of a language. The abstraction hierarchy illustrates that all abstractions in computer science are expressed in some language.
Just as the Turing machine is the ultimate abstraction of any computing machine, lambda calculus is the ultimate abstraction of any programming language. Lambda calculus was invented around the same time as the Turing machine by the American mathematician Alonzo Church. It consists only of three constructs for defining abstractions, referencing parameters in definitions, and creating instances of abstractions by supplying arguments for parameters, very similar to what is shown in figure 15.1. Any program in any algorithmic language can be translated into a lambda calculus
program. Now it seems that we have two different ultimate abstractions from computers: lambda calculus and the Turing machine. How can that be? It turns out that these two abstractions are equivalent, which means that any program for a Turing machine can be translated into an equivalent lambda calculus program, and vice versa. Moreover, every known formalism for expressing algorithms6 has been shown to be no more expressive than Turing machines or lambda calculus. Therefore, it seems that any algorithm can be expressed as a Turing machine or a lambda calculus program. This observation is known as the Church-Turing thesis, named after the two pioneers of computer science. The Church-Turing thesis is about the expressiveness and reach of algorithms. Since the definition of an algorithm is based on the idea of an effective instruction, which is an intuitive notion tied to human abilities, an algorithm cannot be formalized mathematically. The Church-Turing thesis is not a theorem that could be proved but rather an observation about the intuitive concept of an algorithm. The Church-Turing thesis is so important because it implies that everything that can be known about algorithms can be found out by studying Turing machines and lambda calculus. The Church-Turing thesis is accepted by most computer scientists.