It Began with Babbage
Page 29
VI
The FORTRAN (an abbreviation for FORmula TRANslator) project began in early 1954. Backus, a master’s degree holder in mathematics from Columbia University, joined IBM in 1950 as a programmer for one of their state-of-the-art electromechanical calculating machines. When the IBM 701 was built in 1953 (this the company’s first stored-program electronic computer), Backus led the development of an interpretive system for the 701 called Speedcode.27 That is, a program written in Speedcode would be read an order (or statement) at a time, then translated automatically into the 701’s machine code and executed immediately. In other words, the orders (or statements) are interpreted one at a time.
With the development of the IBM 704, IBM more or less came to monopolize the large-scale scientific computing arena.28 Although the IBM 704 was not delivered until 1956, the FORTRAN project began in 1954 and was prompted by the planned advent of the 704. The idea was to design a language that would enable engineers and scientists to write programs themselves for the 704. The language of programming was to be “algebraic” or “formulaic” in form.
This was not the only factor. In 1954, programming accounted for a very large portion—estimated at about three quarters—of the cost of operating a computer installation.29 Automatic programming could help reduce this cost.
The problem was that, in a culture in which programs were habitually written at virtually the level of a machine’s “native language” (that is, assembly language), there was “widespread skepticism” that an automatic programming system could produce machine code as efficient as “hand-coded” programs.30 Backus and his group were, of course, acutely aware of this skepticism. They recognized that, for such a system to be accepted and used widely, they would have to demonstrate the comparative efficiency of programs produced by automatic systems.31
And so even though Backus’s group wanted a language to facilitate the scientist’s and engineer’s task of programming the 704, the design of the language itself became subservient to this imperative. The real problem, as they saw it, was to design a compiler that could produce efficient programs. The actual language design itself happened on the fly, so to speak—driven by the compiler imperative.32
As Backus would remember in 1981, the FORTRAN project seemed to defy the idea that all new ideas and inventions draw on prior ideas, and that something like what Arthur Koestler called bisociation worked to meld prior, possibly unrelated, concepts into something new (see Chapter 13, Section IV). The closest automatic programming system (in broad philosophy) was the one developed for the MIT Whirlwind I by Laning and Zierler (as mentioned earlier). Although this system was, as Backus acknowledged, “the world’s first algebraic compiler”33 and was completed before the first version of FORTRAN, it had apparently no influence on the design of FORTRAN. Nor, apparently, did any of the other related projects elsewhere, many of which Backus and his group were unaware.34
VII
The original goal was to design a language that, although algebraic or formulaic in form, was still tied to a particular computer, the IBM 704—that is, it would be a liminal computational artifact (see Prologue, Section IV). That FORTRAN would evolve into a genuinely machine-independent (or, as the term would emerge, high-level) programming language, an abstract computational artifact—and arguably the first language with this characteristic—lends it a special place in the early history of computer science.
But FORTRAN represented something more. Thus far in this story of programming languages, their design and the design of their translators belonged to the realm of what we may call Little Science. The work was done by one or two persons, reflecting almost entirely the creative spirit and ideas of these individuals—thus it was with Goldstine and von Neumann, with Wheeler, with Zuse and Rutihauser, with Böhm and Glennie, with Laning and Zierler.
FORTRAN changed all of this. Much as the design of computers in the final years of the 1940s became, relatively speaking, Big Science, even in university settings. For example, at the time the Manchester Mark I was commissioned, the group led by Frederic C. Williams had at least eight members35—so also “automatic programming” entered the realm of Big Science with the FORTRAN project.36 A paper presented at a conference in Los Angeles in 1997 titled The FORTRAN Automatic Coding System had 13 coauthors.37
The size of the FORTRAN team was, of course, a symptom of the complexity of the project, and this complexity lay in that FORTRAN was a system comprised of a language, a computer program for automatic translation, a computer, a text, and a social community. Each of these components influenced, and was in turn influenced by, one or more of the other components.
The FORTRAN language was designed, on the one hand, with an anxious eye toward its translatability into efficient IBM 704 machine code; and on the other, as a means of expressing algorithms in an algebraic form that would be “natural” to scientists and engineers, the intended users of the language. Thus, the acceptability of FORTRAN as a tool by the community of its potential users—the community of expected IBM 704 users and members of an IBM computers user group called SHARE, formed in 1955—depended both on the language as a vehicle for expressing algorithms and the efficiency of the executable machine code. The designers of the translator (compiler) bore responsibility of ensuring that FORTRAN programs could be translated automatically into IBM 704 machine code that would compare favorably in efficiency with “hand-coded” programs. If the FORTRAN language was the interface between the user community and an IBM 704 installation, the FORTRAN compiler was the interface between the language and the 704 machine itself.
Strictly speaking, the FORTRAN language was one part of the user community–704 installation interface. The user had to learn the language; a text was needed that would describe the language—a programmer’s guide to FORTRAN. Like any endeavor that created practical artifacts for human use, the FORTRAN project entailed a melding of objective principles of design with subjective perceptions and beliefs—mathematics, logic, design principles fused with psychology.
VIII
The FORTRAN language design began in early 1954. Before the year’s end, a preliminary 29-page report on the language was written38 and distributed to prospective IBM 704 customers.39 Armed more with faith than empirical evidence (as Backus confessed) the authors of the report, in their introduction, assured readers that FORTRAN programs would not only execute as efficiently as “laboriously” hand-coded programs would, but also that FORTRAN would facilitate the production of much more complex programs than could be hand coded.40
Here, “FORTRAN” meant not the language but the translator; “coding” referred not to the act of writing programs in the FORTRAN language, but to the process of producing machine code automatically by the compiler; the “human coder” was not the FORTRAN programmer but the programmer in the 704’s assembly (or machine) language. Vocabulary was being refined.
Soon after the preliminary report was written, Backus and two of his language codesigners gave talks on FORTRAN to groups of IBM customers in various cities—customers who had ordered the IBM 704 (which had been announced around May 1954). By and large, they found their audience skeptical about this new prospect.41
The design and implementation of the FORTRAN compiler (still called translator, at the time)—written in the “native” language of the IBM 701 that would execute the compiler—began in earnest in early 1955.42 By summer 1956, “large parts of the system was working.”43 The complete compiler was available in early 1957.44 Also in summer 1956, David Sayre (1924–2012), a PhD in crystallography from Oxford University, began writing a document describing the language. The outcome was the Programmer’s Reference Manual (1956).45 In early 1957, Backus and his whole group presented a paper on the FORTRAN system at the Western Joint Computer Conference in Los Angeles, California.46 That same year, the Programmer’s Primer for the FORTRAN system was published by IBM.47
IX
The FORTRAN language as it was originally conceived in 1954 to 1956 (Backus called
it FORTRAN I48) would evolve in its features and capabilities over several versions: FORTRAN II, III, and IV; it spawned other “dialects.” It became, among other things, a machine-independent or high-level language. To speak later of FORTRAN was to speak of a language genus, one might say. However, the fundamental and unmistakable style of this genus was established with the original ancestor. We get a sense of this style in the little and trivial program written in the “original” FORTRAN shown in Table 13.1.
This program reads an array of 20 integers into an array data structure A. It then initializes X, Y, Z, and W to zero, then scans in sequence the integers in A and keeps count (in X) of the number of integers that are between –50 and + 50 and tallies the total of these numbers in Y. It also keeps count of the number of integers that fall outside the range in a separate counter (Z) and also adds the total of these numbers in W. When all 20 numbers in A have been accounted for, it prints out the values of the counters and the corresponding totals and stops.
The formulaic nature of most of these statements—by 1956, this word was used to refer to each of the basic meaningful units of action, although they were originally called formulas,49 hence, no doubt, the name of the language—is quite apparent. Yet, algebraic or formulaic though they seemed, they were not mathematical expressions but descriptions of actions to be performed by a computer. X = X + 1 is not an algebraic expression in FORTRAN (indeed, mathematically or logically it would not make sense); rather, it signifies an action. The = is not the mathematical equality symbol; it denotes the assignment operation, synonymous with Zuse’s => in Plankalkül (see Section III, this chapter). The statement means: Read the value of the variable X, increment by one, and write the resulting value into X.
TABLE 13.1 A FORTRAN Program
DIMENSION A(20)
READ A
X=0
Y=0
Z=0
W=0
DO 2 I=1,20
IF ABS(A(I)) > 50 GO TO 1
X=X+1
Y=Y+A(I)
GO TO 2
1
Z=Z+1
W=W+A(I)
2
CONTINUE
PRINT X,Y
PRINT Z,W
STOP
*
Other statements in Table 13.1, such as the DO or IF would seem, however, less transparent to the neophyte FORTRAN user; the former denotes an iteration, the latter a conditional branch. The purpose of the FORTRAN Programmer’s Reference Manual was to explain all aspects of the language for the user’s benefit. Such an explanation took the form of first specifying the legal form of each statement type and the rules governing this form, and an explanation of what the statement meant.
Form and meaning. In the realm of a natural language like English, linguists usually call form the syntax of the language, and the meaning its semantics. The FORTRAN Programmer’s Manual of 1956 marked the emergence of a new consciousness among language designers about the linguistic (rather than merely the notational) nature of programming languages. The terms syntax and semantics (as we will see) would come to be used very soon thereafter, but we find a presence of this consciousness in the writing of the Programmer’s Manual.
The text describing a programming language must describe all and only what the language is about. For the user of that language, the text is all. But, unlike the postmodernist’s “text,” which can be “read” in as many different ways as there are readers, and the writer’s intention is not relevant, the text describing a programming language must provide only one “reading”—there should be only one way of making meaning, one way of interpreting the language.
The FORTRAN group took this obligation seriously. In the Programmer’s Manual each type of statement is described in terms of its form and meaning, syntax and semantics.
“DO n i = m1, m2” or “DO n i = m1, m2, m3” where n is a statement number, I is a … variable, and m1, m2, m3 are each either an unsigned … constant or a nonsubscripted … variable. If m3 is not stated it is taken to be 1.50
Here, then, was the “general form” of the DO statement. Alongside were given examples: DO 30 I = 1, 10, DO 30 I = 1, M, 3. Then followed the meaning of the DO statement:
The DO statement is a command to “DO the statements which follow, to and including the statement with statement number n, repeatedly, the first time with i = m1, and with I increased by m3 for each succeeding time; after they have been done with i equal to the highest of this sequence of values which does not exceed m2 let control reach the statement following the statement with statement number n.…51
Conciseness and elegance—attributes valued by mathematicians, as many of the FORTRAN group were—is in evidence in certain sections of the language description. “Recursion,” the mathematical strategy of defining a concept in terms of itself (such as defining the factorial of a number recursively, N! = N*(N–1)!, and 0!=1) was used to specify the rules for forming algebraic expressions (such as X+Y):
By repeated use of the following rules, all permissible expressions may be derived …
4. If E is an expression the (E) is an expression
5. If E and F are expressions … then E+F, E-F, E*F, E/F are expressions.…52
X
Yet, by Backus’s own admission, language design played a secondary role in the FORTRAN endeavor.53 Rather, as (it was once said) Britain acquired India as part of its empire in a fit of absent-mindedness, so also, it almost seems, reading Backus, that the FORTRAN language was designed absent-mindedly. But Backus and his colleagues, collectively, were too much the engineer–scientists to be absent-minded, even in a metaphorical sense. What Backus really implied, as he would write in 1981, was that—no matter how satisfying the FORTRAN programming language was—as a programming tool, it would never have sufficed. The emphasis had to be on program efficiency. It was this criterion that ruled above all else.54
The burden of responsibility for producing efficient IBM 704 machine code from a FORTRAN program—later, the former came to be called object program or object code and the latter, source program or source code—fell on the compiler or, more precisely, on the compiler writer. By the time the first book-length texts and monographs on compilers began to appear during the mid to late 1960s,55 a great deal of knowledge had accumulated on the theory, design, and implementation of compilers. And the programmers who built the first FORTRAN compiler made no small contribution to this branch of what, in the 1960s, came to be called programming systems.56
A compiler is a computer program of a rather complex and peculiar sort, for it manipulates and processes other computer programs. It takes as input a program written in a machine-independent or high-level programming language (such as FORTRAN, the source program) and produces as output a functionally equivalent program (the object code) that is executable on a specific computer.57 A complier, then, is an automatic translator in the strict linguistic sense; it transforms automatically a text in a programming language L into another functionally equivalent text specified in a machine language M. Both texts are programs. Both texts are liminal. But, the abstract aspect of a program in the programming language (when it describes an algorithm meant only for human–human communication, a piece of “social text”) is more pronounced than in the case of a program in a machine language.
The great difference between a compiler and the “automatic machine translators” that Warren Weaver and others had so desired (see Chapter 11, Section IX), was that automatic machine translation referred to translating texts in one natural language (say, Russian) into another natural language (say, English). The compiler’s task is far more modest. It translates from one artificial language into another. And because artificial languages are invented, they can (at least in principle) be defined precisely and restricted in features, without the richness, vagueness, ambiguity, and variability of natural languages.
Thus, the challenges faced by the compiler writers (as, somewhat curiously, programmers who designed and built compilers would come to be c
alled58) may not have been as overwhelming as the task faced by programmers who built automatic machine translation systems. However, the challenge of compiler writing was, in many ways, more complex than any other kind of programming experienced until then.
A compiler is, in fact, a system of many components performing many different tasks. It is an amalgam of many different algorithms of varying intricacy and complexity. It must first ensure that the program syntax adheres to the rules of the language. It must “make sense” of the overall structure, and this involves recording the flow of information and the flow of control between program components. To manage the complexity of its work, the compiler must decompose the source program into subunits (called basic blocks) that can be analyzed separately. The compiler must gather and record all the information about the program that is necessary for the ultimate task of the complier: generating machine code (look ahead to Figure 15.2).
Hovering like the sword of Damocles over all this is the “efficiency imperative.” The proof of the pudding lies in the eating, and “eating”—in this context—means how efficient the object code is compared with “hand-coded” programs in assembly or machine language.
There is a “cultural” gap between the language of the source program and the language of the object program. The former language is abstract and symbolic. In fact, it is itself an abstract artifact; its syntax and semantics refer only implicitly to some (abstract) computer. The latter language is relatively more concrete and physical; it is liminal because, although abstract, it is very close to some real physical computer. The compiler has to straddle the two cultures and “map” texts from one to the other. The sword of Damocles mandates that the object code must be “optimized” as best as possible for both time to execute the object program and space required to store the object program in computer memory.