Knuth’s concern was far more practical. His goal was to present to his readers what was known, circa 1968 to 1969, about algorithms for “real” computers. Toward this end, cautioning that algorithms were more than procedures or methods or recipes, he laid out the fundamental characteristics of algorithms:
1. Finiteness: An algorithm is “finite”; it always terminates (comes to a halt) after a finite number of steps.
2. Definiteness: Every step of an algorithm must be defined precisely and unambiguously.
3. Input and output: An algorithm must have one or more inputs and one or more outputs.
4. Effectiveness: Each of the operations to be performed as part of an algorithm must be basic enough that, in principle, it can be performed exactly by a human being using pencil and paper.40
In fact, strangely enough, even though from Babbage’s time people had been inventing algorithms for automatic digital computers, even in Knuth’s time there was uncertainty and ambiguity about what an algorithm was and its relation to programming. Thus, in a letter to the editor of Communications of the ACM in 1966, Knuth made the point that the word algorithm refers to an abstract method of computing whereas a program expresses an algorithm in some particular programming language. Thus, the same algorithm can be specified as different programs41—hence the distinction, made in this story, between algorithms as abstract artifacts and programs as liminal artifacts.
In The Art of Computer Programming, Knuth used several modes of describing algorithms: as sequences of steps in which the steps themselves were in a hybrid of English and mathematical notation, as flowcharts, and as programs in a “toy” assembly language of his own invention. If Wheeler’s contributions to the EDSAC project highlighted the interplay of material and liminal computational artifacts (see Chapter 8, Section VII), Knuth’s Art of Computer Programming emphasized that the algorithm as an abstract artifact is not an island of its own; it is inevitably conjoined with a corresponding liminal artifact—the program into which the algorithm must be transformed for it to be practically effective.
IX
But Knuth was not only concerned with the design and programming of algorithms. He ushered in a new aspect of algorithms that he called analysis of algorithms (or algorithmic analysis), which was concerned with the efficiency of algorithms, how well they performed. The idea was to determine the “average behavior” or “average performance” of an algorithm and, if possible, the “optimality” of the algorithm (in some precise sense).42 This would facilitate the comparison of two or more algorithms for the same problem and deciding which was the best. With this in mind, Knuth introduced the “big O” notation, invented in 1892 by German mathematician Paul Bachmann (1837–1920).
Suppose an algorithm has been designed to parse sentences of length n symbols in some language, and suppose it is calculated that the maximum number of steps the parsing algorithm will take is proportional to n2. Thus, one can say that the worst-case behavior of the algorithm is Ο(n2), read as order n2. If another algorithm for the same problem takes an amount of time proportional to ¼n2 + n, this is also an O(n2) algorithm. In other words, because the time required to execute both the algorithms will be dominated by the n2 term, they are both of the same complexity—that is, O(n2) algorithms. However, if a third algorithm is invented that requires time proportional to nlogn, this is an O(nlogn) algorithm, and because as n increases, n2 “grows” more quickly than nlogn, the O(nlogn) algorithm is more efficient than the O(n2) algorithm, because the time required to execute the former algorithm will increase at a slower rate with the increase in the length (n) of the sentence than the latter algorithm.
The term computational complexity was also used to signify these kinds of performance characteristics of algorithms.43 Studies of the computational complexity of abstract computing machines (such as Turing machines) or algorithms that could recognize certain types of formal languages (such context-free languages) were first published in 1965.44 These studies were largely concerned with the performance limits of abstract machines on certain kinds of computations. It seems fair to say that, with the publication of Knuth’s Fundamental Algorithms, computational complexity entered the domain of algorithms designed for “real” machines. Anyone who designed or invented an algorithm would be obligated to perform an analysis of the algorithm and estimate its complexity using the big O notation.
Knuth’s achievements in writing The Art of Computer Programming were many. Perhaps, at the most fundamental level, was his demonstration of the relationship between mathematics and practical programming via algorithm design and analysis. Mathematical fields—abstract algebra, probability theory, statistics, and the theory of numbers—became, in Knuth’s books, essential mathematical tools for the computer scientist interested in the design and analysis of algorithms. We are reminded of how, almost 300 years before, Sir Isaac Newton (1647–1727) published his great work in physics (or natural philosophy, as it was then called), Philosophae Naturalis Principia Mathematica (1687; known simply by posterity as Principia) in which he demonstrated how mathematical analysis becomes the tool for explaining physical phenomena. Knuth’s The Art of Computer Programming stands in the same relation to the design of algorithms (and programs) as Newton’s Principia did to physics.
X
According to the online Oxford English Dictionary, the term software entered the culture of computing in an article in the June 1960 issue of Communications of the ACM. The article referred to “such software as COBOL.” A year later an article in the British Computer Society’s Computer Bulletin, in June 1961, spoke of “The programming expertise or ‘software’ that is at the disposal of the computer user comprising expert advice on all matters of machine code programming.…”45
So although the first article alluded to a language (and, implicitly, its compiler) as software, the second referred to “expertise” that was available to the user. By the mid 1960s a more precise meaning of the term was in place. The British newspaper The Observer of December 13, 1964, referred to software as “the ‘supervisory programme,’ the complex instructions which enable the machine to handle many tasks simultaneously.”46 The New Scientist of August 25, 1966, spoke of “‘software’—the programmes for operating the computer on a wide range of problems.”47
The Oxford English Dictionary gives the modern meaning of software as
1. The programs and procedures required to enable a computer to perform a specific task, as opposed to the physical component of the system.
2. esp. The body of system programs, including computers and library routines of a particular computer and often provided by the manufacturer, as opposed to program material provided by a user for a special task.48
The significance of the word itself should not be overlooked: software as opposed to hardware. The physical computer was “hard”—materialistic, subject ultimately to physical laws. A program was “soft”—comprising symbols, indifferent to physical laws, changeable, plastic, with laws all their own. Moreover, this new dichotomy of hardware/software suggested, by the mid 1960s, a new order in computer culture. The physical computer was no longer the epicenter of computing with all else as peripheral. Rather, there were two classes of artifacts, the physical and the symbolic—the material and the liminal—and they were equal partners. Computing entailed a bilateral symbiotic relationship; the totality constituted a computer system.
Moreover, there was the increasingly transparent issue of the largeness, the complexity of system software—the “body of system programs” mentioned in the dictionary definition. Programming languages and their compilers constituted one major type of system software. And more or less at the same time as that class of software came into existence during the mid 1950s, another class made its appearance. And in just more than a decade, the size and complexity of this class of software far exceeded those of any other kind of software.
XI
Ever since David Wheeler and the EDSAC group in Cambridge conceived and implem
ented subroutine libraries and the first assembler, computer systems designers and researchers had sought to protect, as far as possible, the “user” from the gritty realities of the physical computer—thus, high-level programming languages and their compilers. The user was offered not so much an IBM, a UNIVAC, or an English Electric computer, but a FORTRAN, an Algol, a COBOL, or a LISP machine.
The system programmer’s ambition was to create a grand illusion. The tedium of programming, the input and output processes of a computation; the intricacies of organizing a program to fit into available physical memory; managing the movement of program segments from “secondary memories” such as tapes, disks, drums, or punched cards (these being the common hardware for storing large files of program data by the end of the 1950s) into the computer’s main memory; the user’s frustration at having to share access to a physical computer over time—the system programmer’s goal was to free the user from all such onerous responsibilities by creating the illusion that none of this mattered or was necessary; or, rather, it would be the system that would take these responsibilities on itself.
However, the system programmer was not driven by altruism alone. A computer was a machine, an expensive resource; a computer center was an information processing factory. Ever since the Industrial Revolution during the 18th century, those who owned, managed, and ran factories were obsessed with maximizing the efficiency of their machines, achieving the greatest possible output with the least possible cost. And this meant, among other things, maximizing the use of machines. Machines that lay idle, machines that ran below their capacity struck a chill in the bones of the factory owner.
This factory ethic would inevitably infiltrate computer culture. Users’ programs that solved systems of differential equations or calculated and produced paychecks came to be called “jobs” that a computer center had to do. And in the best factory tradition, rather than process these jobs piecemeal, one by one, as they “arrived” in the computer center, jobs were collected into batches and processed in one continuous sequence until the entire batch of jobs had been processed: “batch processing.” The “overhead” in preparing the computer installation to process jobs could thus be reduced.
A computer system as a factory machine posed other problems. The two main types of activities were input/output and actual computation. The latter was done by the “central processing unit” and, by the beginning of the 1960s, the former was performed by input/output devices under command of the program. The former was done at electronic switching speeds; the latter, involving moving devices—card readers and punches, printers, magnetic tape units, disks, and drums—required far more time. And so while a program performed its input/output transfers (such as large files of data) between the computer and input/output devices, the central processing unit could do nothing but, metaphorically, twiddle its thumbs. Thus, specialized “input/output processors” (called “channels” in IBM terminology49) were developed that could work independently of the central processor. This led to the possibility that while one program was executing its input/output operations via the specialized input/output processors, the central processor might be set to perform actual computations of another program. This would reduce the central processor’s “idle time” and increase its productivity. But, this meant that, at any one time, the main memory would have to be occupied by more than one program, and the central processor would be shared between the programs. Program P1 has control of the central processor, then requires to perform input/output, issues input/output commands to the specialized processor, releases the central processor, which is then possessed by another program P2 occupying main memory. When the input/output is complete, P1 may “interrupt” the execution of P2 and control of the central processor is given back to P1, or P2 may continue to execute until it requires to do input/output, at which time it issues the appropriate commands to the input/output processors and then relinquishes control of the central processor to P1. At any given time, several programs, sharing main memory, may take turns to take control of the central processor. This was the concept of multiprogramming.
It is not clear which individual or group actually conceived the idea of multiprogramming. It was clearly “in the air” circa 1962 to 1964. IBM implemented a form of multiprogramming on its 7094 machine (1962),50 Manchester University’s Atlas computer system apparently supported multiprogramming by January 1964, and IBM’s System/360 series supported multiprogramming, calling it multitasking.51
At about the same time, a new mode of input/output came into existence. The users were no longer actually denied direct access to a computer installation—a privilege that had been taken away from them with the advent of batch processing. Rather, they would sit at a “remote terminal”—a teletype reader/printer—possibly far removed physically from where the computer center was located, and interact with the computer on a “real-time” basis. This meant that, at any time, multiple users could have what seemed to be “simultaneous” access to the computer system. This, too, was an illusion. The user thought that he or she had exclusive access to the machine. This illusion was created by a combination of software and hardware that enabled the system’s resources to be time-shared between multiple users, with each user given a time quantum to actually access the system before it was passed to another for a time quantum, and so on. However, switching between users happened at electronic speeds, hence the user’s illusion. But all this must happen automatically; there must be minimal human intervention. The effectiveness of the grand illusion demanded that the illusion be created by the computing system itself.
Compilers were, of course, one class of such system programs. The other major class of system software that emerged from the mid 1950s went by various names: “monitor”, “supervisor”, “executive”. Eventually, the term that came to be more or less generally accepted was operating system. An example of a basic form of an operating system was implemented for the IBM 709 circa 1958. The machine came with input/output subroutines and a monitor that managed the execution of FORTRAN jobs.52 However, it was during the 1960s that the operating system properly came of age—an age at which it manifested itself in full, lush, huge, complex glory. And with its emergence was created a plethora of research problems and a gathering of new subparadigms.
How did this happen?
XII
The size of a computer’s main memory obsessed computer designers and frustrated computer users. Even though main memory grew spectacularly in capacity from the comically small (from a present-centered perspective) size of the EDSAC memory (1024 or “1K” locations) to what commercial machines had to offer during the mid 1960s—the IBM general-purpose System/360 series of machines announced in 1964 had up to 524,288 (512K) bytes (8-bit addressable locations) of main memory53—it never seemed enough, thus the need for a much larger, cheaper (but considerably slower) “secondary memory” (or “backup storage”) to be attached to the computer. A “memory hierarchy” was thus created involving the relatively faster, smaller main memory and the relatively slower, larger secondary memory. Thus was created the burden of managing this memory hierarchy so that when parts of a program and/or its data were necessary for execution, they would be transferred from secondary to main memory. This came to be called the memory management problem.
In 1962, a group at the University of Manchester led by Tom Kilburn published an article proposing a solution to this problem. They called their idea “one-level store.”54 Their idea was an instance par excellence of the grand illusion. In their case, it was to give users the illusion of (practically) unlimited memory space. This idea came to be called virtual memory.55
Kilburn and colleagues implemented this idea on the Atlas computer, designed and built at the University of Manchester in collaboration with Ferranti, their corporate partner on the Mark I (see Chapter 8, Section XIII). Manchester University had never ceased to build experimental computers since their Mark I endeavor. Atlas was the fourth of the Manchester computers56 (later, two
more would emerge). Atlas was commissioned in December 1962 and was considered to be “the most powerful computer in the world” at the time.57 It offered the user a virtual memory of one million 48-bit words, even though the physical main memory comprised only 16,384 (16K) words.58
A virtual memory, then, is a grand illusion. It consists of a “virtual address space” partitioned into fixed-length subspaces called pages, so that a single program would occupy one or (usually) more pages, or variable-length subspaces called segments, each occupied by a functionally distinct program module (such as an Algol procedure or block). A program would reside physically at all times as a collection of pages or segments in secondary memory. In the case of the Atlas, this was a drum store. However, pages or segments would be transferred automatically from secondary memory and placed in some region of the much smaller main memory as and when required for execution. The Atlas used a paging scheme with a page size of 512 words.
Allocating main memory regions to individual program segments or program pages meant that an individual program’s parts in main memory might not be stored entirely in adjacent memory words, but rather may be scattered in different regions of main memory. In a virtual memory system coupled with multiprogramming, the “virtual address space” would consist, at any time, of several user programs located either in segments or pages. Parts of an individual program (“up” for execution) would also be located in pages or segments scattered all over main memory, so that adjacent blocks of main memory would hold segments or pages of different programs.
The principle of organizing and storing programs in memory in the form of segments was first used in an experimental machine built at Rice University, Houston, Texas, in 1961.59 A similar scheme was used on the Burroughs Corporation’s B5500 machine circa 1961.60 However, neither systems involved multiprogramming. The principle of segmentation and its expansion to the design of multiprogrammed virtual memory computers was explored by Jack Dennis (1931–) and his associates at MIT during the mid 1960s as part of a project called Multics.61 The Multics operating system (discussed more later), in fact, used a combination of segmentation and paging.62 That is, the virtual address space was occupied by program segments, but only pages of a program’s segment would be transferred to occupy corresponding blocks (“page frames”) in main memory.
It Began with Babbage Page 37