Book Read Free

It Began with Babbage

Page 34

by Dasgupta, Subrata


  What makes this situation different from solving numeric problems such as differential equations is that algorithms exist for the solution of the numeric problems. In the case of proofs of theorems and the like, a “systematic algorithm” can be developed, but this would involve a potentially exhaustive search of all chain of inferences until one leads to an expression corresponding to the theorem of interest.33 The situation is exactly the same as the human decision maker’s problem in arriving at a “fully rational” optimal decision. The decision maker is faced with bounded rationality, which for all practical purposes prevents him from optimizing his decision. So also, although in principle an algorithm may be constructed that searches exhaustively the space of all possible sequences of inferences until one is found leading to the theorem of interest, such in-principle possibility is impractical computationally because of the very large numbers of pathways of inferences that might have to be explored in arriving at a valid proof.34

  Like the human decision maker, LT must resort to heuristic reasoning and search. And, like the human decision maker operating under bounded rationality, the theorem-proving computer program must use heuristic clues, rules of thumb, and strategies of various sorts to search for a proof.

  V

  Simon would later commemorate December 15, 1955, as the day of the birth of computer-based heuristic reasoning.35 He had begun to analyze the proof of one of the theorems in the Principia and, by early December, some ideas were emerging about the nature of the heuristics Whitehead and Russell had used.36 On December 15, he simulated successfully by hand the execution of a program to generate a proof of one of the theorems in the Principia. So the basic strategies used in LT were discovered by “actual pencil-and-paper work.”37

  Principia Mathematica began with fundamental (“atomic”) concepts such as variables (denoted by p, q, … and so forth), a set of connectives (denoted by ¬ [not], + [or], —> [implies], and => [entails]) that link variables to form expressions, definitions, a set of five axioms, and three rules of inference. For our purposes here, to illustrate how LT uses heuristic reasoning, it is sufficient just to state one definition, one of the axioms, and two of the rules of inference (Table 14.1).

  Suppose LT is given the expression

  E: p → ¬p => ¬p

  and asked to obtain a proof for it.38 As it happens, a proof for E would involve the following steps:

  TABLE 14.1 A Tiny Axiomatic System

  Definition D: p → q def ¬p v q

  Axiom A: r v r => r

  Inference Rules:

  Rule of Substitution: If A(p) is any true expression containing the variable p and B is any expression, then A(B) is also a true expression.

  Rule of Replacement: Any expression may be replaced by its definition.

  *

  1. By axiom A: p v p => p

  2. Substituting ¬p for p: ¬p v ¬p => ¬p

  3. Replacement of left subexpression: p → ¬p => ¬p, which is the expression E.

  The question is: How can LT search for a proof of expression E without considering all the valid substitutions involving the five axioms?39 The heuristic Newell and Simon used was to “work backward” from the expression to be proved—from the characteristics of the expression, “cues” are extracted about the most promising path to pursue.40 Newell and Simon called this heuristic the method of substitution—later, in the literature of heuristic programming, it would viewed as an instance of what came to be called backward chaining. In the case of LT, it entailed the search for an axiom or theorem similar to the expression to be proved. If a similarity is found, then the program tries to match the expression to the similar axiom or theorem by appropriate transformations of the expression; if successful, the expression is proved and the chain of transformations is the proof. Otherwise, if no similar axiom or theorem is found, the strategy fails.41

  By “similarity” Newell and Simon meant a structural matching between the expression to be proved and one or more of the axioms or theorems already proved. Precise “measures” of similarity were defined. Informally, these measures would identify the expressions

  p → ¬p => ¬p

  p v p => p

  as similar because they have the same structure—pattern of variables and connectives—whereas

  p => q v p

  p v p => p

  would be dissimilar because their structures do not match.

  To illustrate the method of substitution heuristic, we can follow how LT produced a proof of the expression

  E: p → ¬p => ¬p

  Because of their structural similarity, LT finds a match between E and the axiom

  A: r v r => r

  However, on the left hand side of =>, there is a mismatch between the connectives → in E and v in A. To make them match, the v in A must change to →. This can be done by applying the definition D. But, before that can be done, a ¬ must be placed before r in A—that is, by substituting a variable ¬t for r in A according to the rule of substitution. This produces

  A′: ¬t v ¬t => ¬t

  Now definition D can be applied to change ¬t v ¬t to t → ¬t. The new situation is

  E: p → ¬p => ¬p

  A′′: t → ¬t => ¬t

  The only difference now between E and A″ is that there is a variable p in the former and a variable t in the latter. Applying the rule of substitution, p can be substituted for t in A″ resulting in

  A*: p → ¬p => ¬p

  There is now a complete match between E and A*; a proof of E has been obtained. In this process, the method of substitution relied on cues gleaned from the features of the theorem to be proved. Such cues reduced the amount of search but, of course, there was no prior guarantee of success. The cues may have led down a blind alley. Ultimately, the criterion of success for a heuristic is whether it actually works.

  In fact, the method of substitution did not work all the time. Of 67 theorems in the second chapter of the Principia, only 20 could be proved using this heuristic. Other heuristic principles would be needed for the other theorems.42

  VI

  LT established a bridge between human and machine cognition, so much so that Simon would write in October 1956 to Bertrand Russell of this program and what it had achieved. To which Russell replied dryly that had he and Whitehead known of this possibility, they would not have wasted the years “doing it by hand.”43

  LT was a means to an end. Newell and Simon were interested in universalizing the insights they had gained in designing this program to the realm of human problem solving in general. Moreover, the paper on LT was published in a journal on information theory, a mathematical branch of communication engineering (see Chapter 11, Section VI), and was unlikely to be read by psychologists, those who might be most interested professionally in the nature of heuristic thinking.

  Two years later, in 1958, Allen Newell, Cliff Shaw, and Herbert Simon published an article titled “Elements of a Theory of Human Problem Solving” in Psychological Review, a prominent American journal in psychology. Their theory, they stated, “explain[ed] problem-solving behavior in terms of what we shall call information processes.”44 It was an explicitly computational theory to explain the observed or observable behavior of organisms.45

  For Simon and Newell, the terms information processing and symbol processing were synonymous. Over time, they seemed to prefer the latter term. If, in their earliest joint publications, they chose to use information processing, their last joint publication (in 1975) spoke firmly of symbol systems, which are at the core of intelligent behavior.46

  Let us then refer to their theory as a symbol processing model (SPM) of cognition. SPM absorbs and generalizes Simon’s previous model of the “universal decision maker” into a “symbolic problem solver.” SPM was a theory of symbolic problem solving regardless of whether the problem solver was a human being, a computer, or some other kind of organism. SPM represented a certain class of problem-solving mechanisms of which LT was a concrete exemplar. We have seen that
Simon’s theory of the decision maker was entwined with a satisficing activity in which heuristics were used to overcome bounded rationality. So, also, SPM viewed problem solving as a heuristics-driven enterprise. LT did not go about its task using “simple algorithms” that applied “brute force” to search exhaustively for a solution to a given problem.47 This was what “conventional” computers did. The algorithmic approach was not feasible in the kind of problem solving in which LT was engaged. In finding a proof for a theorem, the choices of inferences at each stage were extremely large—an embarrassment of riches—from which the program had to select an inference that seemed most promising for the purpose of arriving at a proof. Thus, the method of proof for LT used heuristics. It also used a kind of learning in the sense that, once LT proved a theorem, this was stored away in its memory as a potential aid in proving other theorems.48

  SPM, in fact, provided significant insight into how the symbolic processing and mental simulation on which Kenneth Craik had speculated in his The Nature of Explanation (1943) could be envisioned more precisely (see Chapter 11, Section II)—namely, in the form of a heuristic computer program executed by a physical machine. Here was a model of how symbol structures might be represented in a material entity and how they might be manipulated, processed, and transformed into other symbol structures. Furthermore, SPM suggested how the computer could serve as an experimental apparatus, a kind of microscope for the inner eye, with which one could “observe” mental phenomena. By programming the computer one could simulate specific higher level cognitive tasks—language understanding, chess playing, scientific concept formation, and so on—a concrete realization of Turing’s thought experiment of 1950 (see Chapter 11, Section XI). If the simulation produced behavior that was consistent with what was predicted, then the computational (or symbol processing) model on which the program was based could become a theory of how humans might think and carry out cognitive tasks.

  VII

  Curiously, in their 1956 paper on LT, Newell and Simon did not mention the term artificial intelligence. This term, in fact, emerged the year before in a document describing a “summer research project on artificial intelligence” to be held in Dartmouth College in Hanover, New Hampshire, in summer 1956. The author of this proposal was John McCarthy (1927–2011).49

  McCarthy was an assistant professor of mathematics at Dartmouth College. Later, he would move to MIT before joining Stanford University, where he founded their artificial intelligence laboratory. McCarthy is generally regarded as the coiner of the term “artificial intelligence”, which, in its original sense, meant a kind of intelligence that contrasted to “natural intelligence.” As the Dartmouth proposal explained, artificial intelligence was a phenomenon wherein any aspect of mental activity deemed a manifestation of intelligent action could, at least in principle, be so precisely specified that a computer could be programmed to simulate it.50

  As we have seen, the possibility and the idea of machine intelligence had been in the minds of many ever since computing machines began to be built. Torres y Quevedo speculated on it in 1920 (see Chapter 3, Section IX); in 1943, Craik dwelled on the idea of human thinking in symbol processing terms (see Chapter 11, Section II). But, with the invention of the electronic digital computer, we find more interesting ideas—Claude Shannon’s thoughts on computer chess in 1949, von Neumann comparing the brain with the computer, and Alan Turing’s speculations in 1950 on what it would take for a computer to “fool” a human into thinking it was engaged with another human (see Chapter 11, Sections IV–VII).

  It seems fair to say that LT was a seminal invention that transposed artificial intelligence from the realm of speculation into the realm of the real. LT was a working liminal artifact, a computer program that simulated the way human logicians went about proving theorems. It was an actual piece of artificial intelligence. The symbol processing model Newell, Shaw, and Simon presented in 1958 was a theory of problem solving.

  It was a theory of human problem solving, in fact, which explained how Craik’s “symbol processing in the head” could happen. As such, the Newell–Shaw–Simon paper of 1958 was seminal in the development of cognitive science, the multidisciplinary study of cognition. If, as some people believe, computational processes are at the core of cognition (in humans and animals)51—although not everyone does52—then the Newell–Shaw–Simon paper was at the very vanguard of this belief. It heralded the beginning of the “information processing paradigm” in cognitive psychology.53 At the same time, LT was seminal to the development of the idea of artificial intelligence, both the phenomenon as envisioned by McCarthy and colleagues in 1955 and as a branch—a subdiscipline—of computer science.

  VIII

  If Newell and Simon were at the forefront of this new discipline, others followed very soon thereafter. Game playing and other kinds of problem solving were the main domains of inquiry. Perhaps the most influential work to follow on the tracks of Newell and Simon was the study of how machines could learn (a topic in artificial intelligence that came to be called machine learning) by Arthur Samuel (1901–1990) using the board game checkers (“draughts” in Britain).

  While a professor of electrical engineering at the University of Illinois, Urbana, in 1947, Samuel wrote his first checkers program. It would go through many modifications. During the 1950s, as a member of the IBM Poughkeepsie laboratory, Samuel programmed the IBM 701 to play checkers. The program, described in a paper published in 1959, “self-learnt” and thereby improved its own performance.54 In that same year, Herbert Gerlenter described (at the same first IFIP conference, where John Backus delivered his paper on the syntax and semantics of Algol [see Chapter 13, Section XVI]) a program that proved theorems in geometry at the level of “good” high school students.55

  Such programs used heuristic reasoning and search. By the end of the 1950s, then, the methodology of computer science was no doubt dominated by the algorithmic approach—that is, computer scientists sought primarily to automate by inventing algorithms to solve computational problems. However, this methodology was now joined by an alternative strategy: heuristic search.

  IX

  Language, thought, and reality are related intimately, as distinguished linguist Benjamin Lee Whorf (1897–1941) famously proposed.56 Whorf was alluding to natural languages, but the relationship is true in the realm of artificial languages. As we have seen, the two cultures of scientific and business computing that emerged during the 1950s gave rise to very different types of programming languages (see Chapter 13, Section XII).

  The kind of computing in which Newell, Shaw, and Simon were engaged while building LT—the kind of computing with which artificial intelligence researchers became involved—as we have seen, entailed not numeric computations, but symbol processing. Expressions in logic were manipulated as symbol structures with no numeric significance; indeed, with no semantic content at all. Computations of the kind LT performed involved extracting subexpressions, matching two (sub)expressions, substituting (sub)expressions for other (sub)expressions, and so on. For the purpose of implementing LT on the RAND Corporation’s JOHNNIAC computer, Newell and Shaw invented a programming language they called IPL (Information Processing Language).57 The version of the language that was actually used to program LT was labeled IPL V, developed circa 1957 and implemented on the IBM 650, IBM 704, and other machines of the time.58

  IPL, however, became a “dead language” within a relatively short time. Its significance is twofold: it was used to program LT and it was the first of a new genus of programming languages called list processing languages, of which the most dominant and influential species was named LISP (LISt Processing) invented by John McCarthy between 1956 and 1958.59

  The basic concept underlying LISP (and other list processing languages) is that the fundamental symbol structures (in computational jargon, “data structures”) on which computations are performed are structures called lists. Each element in a list is a symbol consisting of some piece of information along w
ith information of where to find the next associated symbol. The latter information is called a pointer. Thus, lists are not stored in sequential location in a computer’s memory. The elements of a list may be scattered all over memory, but they are linked by means of pointers.

  A list processing language would contain operations that can extract parts of a list (making them lists of their own), insert elements into or delete elements from a list, find the successor elements of a list element, create new lists, and so on.

  Such kinds of operations (insert, alter, delete) are not specific to list structures. But, LISP as a particular list processing language has properties and characteristics that are very distinct from such languages as FORTRAN or Algol. The latter were procedural languages—that is, programs are composed in the form of sequences of statement constituting procedures, which can be called from other procedures. LISP was a functional language, drawing its computational character from the mathematical notion of functions.

  In mathematics, a function is represented symbolically by the name of the function followed by the arguments of the function, usually in parentheses. Thus, the square root of a number x, represented conventionally as √x, can be given functionally as sqroot(x), and the application of this function to the value of argument x would yield the square root of that number. An arithmetic expression such as (a * b) – (w/x) would be represented functionally as minus (mpy(a,b), div(w,x)). The syntax and semantics of LISP programs—or, rather, LISP functions—follows this general pattern.

  For a flavor of what a LISP-like function looks like, let us look at a small example. Before we do, however, some preliminaries need to be established.

 

‹ Prev