by Rod Ellis
Summing up
In both symbolist and connectionist theories of language, the distinction between implicit and explicit knowledge is central. Furthermore, both paradigms view implicit knowledge as primary; it underlies our ability to use language effectively for communication. However, symbolist and connectionist theories afford different answers to a number of key questions.
Are the two types of knowledge distinct or closely related?
In Anderson’s Adaptive Control of Thought Model, declarative knowledge provides the basis for the development of procedural knowledge. Thus, the relationship between the two types of knowledge is continuous: there is no modularity. However—in both Skehan’s Cognitive Theory and connectionist accounts—the two types of knowledge are seen as disassociated. In these theories, representation involves a dual system.
Are language systems symbolic or associative in nature?
Both information-processing theories and UG view language as symbolic in nature. The categories and rules of linguists have some kind of psychological reality. Information-processing models also acknowledge an important role for exemplar-based representation and chunking, but view these as distinct from the rule-based system. In connectionism, language systems are seen as a conspiracy of associations. Rule-like behaviour becomes evident only when more abstract linguistic schemata emerge out of well-established associations.
Are the cognitive mechanisms that underlie language specialized or general?
Universal Grammar assumes that language involves a cognitive mechanism distinct from that involved in other types of learning. Principles and parameters are innately available to children. However, older L2 learners may have to rely on strategies of general learning. In contrast, in information-processing theories and connectionist theories, there is no distinct language ‘module’ in the brain. Knowledge is widely distributed in the same memory systems that store other forms of knowledge.
Is language an individual or a collective phenomenon?
UG is focused on the competence of the ideal native speaker and thus takes no account of individual variability, which it views as a feature of performance. In connectionist theories, the linguistic systems that individuals build will differ in accordance with their varied experiences of the language they are exposed to. There is no such thing as an ‘ideal native speaker’. However—given that individuals have similar experiences and a need for social identification—they will converge towards the representation of a communal language.
What is the relationship between ‘learning’ and ‘representation’?
UG is a property theory: it aims to specify the innate knowledge that learners must possess for learning a language, given the inadequacies of the input they are exposed to. In information-processing models, learning takes place when declarative knowledge transforms into procedural; the system changes as learning takes place. In connectionist accounts, there is no clear distinction between learning and representation. The network that houses implicit knowledge is constantly and dynamically evolving.
The differences between symbolic and connectionist models of representation make it unlikely that any integrated theory of L2 representation will arise in the near future. Both approaches continue to figure in SLA, but—as time passes—connectionist accounts are clearly assuming greater importance.
Representation of two languages
L2 learners come to the task of learning an L2 with their L1 already firmly established in their minds. As we saw in Chapter 6, the effects of the L1 are substantial and ubiquitous. Here I will consider how two languages are represented in the mind and why it is so difficult for learners to establish the representation of a new language.
The first issue is whether the two languages are stored separately or together. Traditionally, two types of bilingualism have been distinguished. In co-ordinate bilingualism, the two languages are kept separate. The forms and meanings of words in the two languages are stored and accessed separately. In compound bilingualism, the two languages are fused in the brain; there is a single store of word meanings linked to separate word forms. It was claimed that co-ordinate bilingualism results when the two languages are learned separately and compound bilingualism when they are learned together.
Current research on bilingual representation indicates that the learner’s L1 and L2 are not necessarily housed in separate stores. Paradis (2004) noted that both language systems are located in the same gross anatomatical areas of the brain, but in distinct microanatomical subsystems within these areas. Dehaene (1999)—after reviewing the results of neurolinguistic studies of brain activation patterns—suggested that the extent to which the L1 and L2 systems are linked depends on the L2 learners’ level of fluency. Highly fluent L2 users make use of the same micro-circuitry as native speakers, but less fluent L2 users rely on different neurological circuits. The degree of linkage, therefore, may depend on the learner’s L2 proficiency.
The second and more crucial issue concerns the effect of the L1 on the development of L2 representations. N. Ellis (2006) draws on connectionist models of representation to explain why L2 systems are rarely (and, in the opinion of some, never—see discussion of the Critical Period Hypothesis in Chapter 2)—as fully developed as the L1 system. He draws on the key concept of ‘learned selective attention’ to explain how the L1 impedes the development of the learner’s implicit system. Two general processes interfere with our ability to attend to new information. Overshadowing occurs when two cues are associated with an outcome and the more subjectively salient of the two cues overshadows the weaker. Overshadowing leads to blocking where the learner attends selectively to only the more salient of the two cues. Ellis’ point is that L1 cues overshadow L2 cues and block attention. This explains why restructuring of existing categories is especially difficult.
A good example of blocking can be found in Jiang’s (2000) account of L2 lexical representation. Jiang points out that semantic, syntactical, morphological, and phonological/orthographic information are all integrated in an L1 lexical entry and that—as a result—the activation of one aspect of an entry simultaneously activates the other aspects—for example, visual recognition of a word automatically activates its phonological representation. In contrast, an L2 lexical entry is tied to the semantic information of the equivalent L1 entry. Jiang argues that it is very unlikely that a new concept or new semantic information is created. The established L1 semantic system blocks the development of a separate L2 semantic system. He concludes that an L2 lexical entry initially contains only formal specifications—i.e. there is no semantic or syntactic information. The repeated association of the formal L2 lexical entry with the L1 lexical entry’s meaning strengthens this link to a point where the semantic and syntactic information in this entry is copied into the L2 lexical entry. However, Jiang argues that it is rare that a point is reached where a fully integrated L2 lexical entry is achieved.
Blocking prevents learners from attending to cues in the input and results in L2 representations that are firmly tied to L1 representations. However, when no blocking occurs—for example, when the input provides evidence of a word’s meaning for which there is no equivalent L1 representation—new wiring can ensue, and a separate L2 representation can develop. Evidence for this comes from the fact that phonemes that are dissimilar from L1 phonemes are easier to acquire than those that are similar (see Chapter 6). Explicit L2 knowledge can also help to overcome the blocking effect of the L1.
Attention
Any discussion of the role played by attention in L2 acquisition must necessarily draw on a model of working memory. As previously explained (see Chapter 3), working memory is the limited capacity part of the human memory system that serves two different functions: it stores information temporarily in phonological short-term memory—or, in the case of orthographic information, in the visuo-spatial sketch pad—and it also binds this information with information from long-term memory in the central executive component of working memory. I will draw on thi
s model to account for—and also try to reconcile—some of the conflicting views about attention in SLA.
I will refer extensively to the work of Schmidt on attention as this has proved seminal in SLA. Schmidt (2001) claimed that ‘the orthodox position in psychology is that there is little if any learning without attention’ (p. 16). He argued that an understanding of attention is necessary to understand just about every aspect of L2 acquisition, including interlanguage development; variability in learner language; L1 transfer; the role of individual differences; and how interaction facilitates acquisition—in other words, all the topics we have considered in previous chapters. However, not all SLA researchers are in complete agreement with Schmidt’s views about the nature and role of attention.
Key characteristics of attention
Schmidt (2001) identified six key characteristics of attention. These are summarized in Table 8.2. The broad picture is as follows. Attention takes place in working memory—wherein the learner selects which information to rehearse and supresses other information. Attention is required to process stimuli obtained from the input together with information accessed from long-term memory, and—since this process of reconciling the old with the new is the very stuff of learning—attention is a necessary condition of learning. Attention, then, involves much more than perception—i.e. the cognitive registration of a stimulus—it also involves establishing links with previously stored information.
Characteristic Description
Attention is limited Attention takes place in working memory which is limited in capacity. That is, only limited amounts of information can be processed at one time.
Attention is selective This is the corollary of the first characteristic. Because capacity is limited, it is necessary to allocate attention strategically. For example, if the learners’ attention is focused on meaning, it may be difficult for them to simultaneously focus on form. (VanPatten 1990).
Attention is subject to voluntary control Learners can decide what to focus their attention on. Voluntary attention is top-down and directed at outside events. However, there is also involuntary attention which is experience driven; learners can attend to elements of the output without having any intention to do so.
Attention controls access to consciousness The role of attention is to bring stimuli or thoughts into awareness. The process of focusing attention on specific stimuli or thoughts gives rise to the subjective feeling of awareness (i.e. consciousness).
Attention is essential for the control of action Novice behaviour requires controlled processing; expert behaviour can make use of automatic processing. Less attention is required for automatic than for controlled processing.
Attention is essential for learning Attention is the mechanism that makes input available for further processing. However, not everything attended to enters long-term memory. Thus attention is essential for learning but does not guarantee it.
Table 8.2 Six key characteristics of attention (based on Schmidt 2001)
Two SLA theories of attention
There are a number of psychological theories that account for the role of attention in L2 acquisition. Two of these theories have been especially influential.
Schmidt’s Noticing Hypothesis
This was briefly introduced in Chapter 7 as it has informed research on the role of input and interaction. Schmidt’s ideas about the importance of ‘noticing’ originated in his experiences of learning L2 Portuguese in Brazil (Schmidt and Frota 1986). Schmidt kept a diary to establish which features in the input he consciously attended to. His output was then examined to see to what extent the noticed forms turned up in his communicative speech. In nearly every case, the forms that Schmidt produced were those that he had previously noticed in the input. Schmidt also reported that he noticed the differences between his own attempt to produce Portuguese and the native-speaker input he was exposed to—a process he called noticing-the-gap.
The fullest account of the Noticing Hypothesis can be found in Schmidt (2001). He distinguished ‘perception’ and ‘noticing’. He acknowledged that while perception need not involve consciousness, noticing is necessarily conscious. That is, while learners may be able to perceive elements in the input without conscious attention, they will not be able to process this information for storage in long-term memory unless they consciously attend to it. However, attention need not be intentional—i.e. learners may not deliberately set out to attend to some specific stimuli—it can also take place incidentally: for example, when linguistic forms have been noticed while learners are primarily focused on meaning. In either case, however, attention involves consciousness.
In his earlier publications, Schmidt promoted the strong version of the Noticing Hypothesis—learners can only learn what they have consciously attended. However, in his 2001 publication, he advanced a weaker version of the hypothesis: ‘people learn about the things they attend to and do not learn much about the things they do not attend to’ (p. 30; italics added).
Tomlin and Villa’s Theory of Attention
Tomlin and Villa (1994) distinguished three distinct attentional processes: (1) alertness, which involves a general readiness to deal with incoming stimuli and is closely related to the learner’s affective/motivational state; (2) orientation, which entails the aligning of attention on some specific type or class of sensory information at the expense of others—for example, on form as opposed to meaning—and (3) detection, when the cognitive registration of a sensory stimulus takes place. It is during the last of these processes that specific stimuli are processed in working memory.
Tomlin and Villa went on to make two claims. The first is that detection can take place without alertness and orientation. In other words, learners can register an input feature even when they are not in an ideal state to attend and their attention is not focused on the feature in question. The second claim is that all three attentional processes can occur without awareness. Tomlin and Villa commented ‘awareness requires attention, but attention does not require awareness’ (p. 193).
These two theories are often presented as oppositional: Tomlin and Villa consider attention need not involve consciousness, while Schmidt argues that consciousness is a necessary condition of attention. However, a close inspection of these theories suggests that the differences are a matter of emphasis rather than absolute. ‘Alertness’ and ‘orientation’ seem to be closely related to the distinction that Schmidt makes between incidental and intentional learning. Learners may be alerted to attend to form and so will consciously orientate to specific aspects of the L2; this involves intentional learning. However, even if they are not alerted to attend to form and not oriented towards any specific aspect of the L2, incidental attention to form can occur. Finally, if ‘detection’ is equated with ‘perception’, then, the two theories agree that it can occur unconsciously, although Schmidt argues that acquisition generally requires conscious attention.
The main difference between the two theories lies in what happens when detection has taken place. Tomlin and Villa point out that detected information allows for further processing. However, they have very little to say about what learners do with the information they have detected, except to note that when particular exemplars are registered in memory, they can be ‘made accessible to whatever the key processes are for learning, such as hypothesis formation and testing’ (p. 193). Schmidt’s notions of ‘noticing’ and ‘noticing-the-gap’, however, address not just ‘detection’ but also—crucially—what learners do with what they have detected. In this respect, the Noticing Hypothesis can be considered a fuller account of the role of attention in L2 learning.
Detection
The key construct in both theories is detection. Detection is the cognitive registration of information in short-term memory. It can be seen as the first stage in a theory of attention. What is not detected cannot be subjected to further processing.
What do learners pay attention to when they detect elements in the input? Do they just detect linguisti
c exemplars, or do they also identify the symbolic category that an exemplar represents? For example—when exposed to the sentence ‘I attended a wedding ceremony’—do learners simply take note of the -ed ending on the verb, or do they also recognize that -ed denotes the past tense for referring to a past action? There are, in fact, three possibilities: (1) learners only detect a form—for example, -ed—(2) they detect a form and map it onto the meaning that it conveys—for example, -ed denotes past time reference—and (3) they detect a form and also its metalinguistic category—for example, -ed constitutes a linguistic marker of the past tense.
Schmidt suggests that what learners detect is not the raw data or input, but is still concrete—they detect words and parts of words that serve as examples of categories such as noun, adverb, past tense, etc., but the exemplars they detect do not come with such labels. In other words, detection is local and non-metalinguistic. Learners register that the word ‘gestern’ in the German sentence ‘Gestern regnete es’ (literally, ‘yesterday rained it’) can appear at the beginning of this sentence, but may not register that ‘gestern’ is an adverb or that adverbs can appear as the initial word in a sentence in German. What learners detect in the input is individual tokens and their association with other tokens.