by Martin Erwig
That Doesn’t Sound Right
The language for music notation can be defined by a fairly simple grammar. But even for such a simple language it is not clear which grammar rules to use. One problem that can plague a language is that of ambiguity, which means that one sentence can have more than one meaning. Ambiguity can creep into a sentence in two different ways. First, the basic words or signs of a language can be ambiguous, a phenomenon called lexical ambiguity (see chapter 3). Second, a particular combination of words in a sentence can be ambiguous, even though the individual words themselves are not. This is called grammatical ambiguity. Consider, for example, the sentence “Bob knows more girls than Alice.” It could mean that Bob knows more than one girl, or it could mean that he knows more girls than Alice does.
A grammatical ambiguity occurs when a grammar can generate more than one syntax tree for a given sentence. To continue with the music example, consider the following part of “Over the Rainbow.” Curiously, the score does not contain any bars, which causes this sentence to be ambiguous, since it is not clear which note, the first or the second, should be emphasized when playing it.
If you know the song and try to sing the score, you notice that the emphasis should be on the second note. It will probably not be easy to sing the song by emphasizing the first note. Usually, however, it is the first note in each bar that is emphasized. This means if Judy Garland had been given the previous score instead of the original, she would probably have assumed, incorrectly, that the emphasis should be on the first note. These two different interpretations are reflected in two different abstract syntax trees, shown in figure 9.1. The first interpretation groups the first eight notes into the first measure and the last note into the second measure. The second interpretation groups the first note into the first measure and the remaining eight notes into the second measure.
Both these syntax trees can be derived if we modify the grammar to not include bar symbols. The difference in the two abstract syntax trees results from a different decision on which rule to use to expand the first measure nonterminal. The tree on the left in the figure results when we expand it into eight note nonterminals, whereas the tree on the right results if we expand it into only one note nonterminal. Both trees represent the same sequence of notes, but their internal structures are different. The left-hand tree emphasizes “Some,” whereas the right-hand tree emphasizes “day.” This sequence of notes cannot actually be properly parsed, since the total time of the nine notes adds up to 1⅛, which is more than fits into one measure.
Figure 9.1 Ambiguity in a grammar can cause a sentence to have different abstract syntax trees. These trees present a different hierarchical structure for the same sentence. It is generally impossible to determine the meaning of such a sentence, since its structure affects its meaning.
To correctly express the sentence one has to place a bar somewhere to split the notes into two measures. The correct placement of the bar is after the first note. This will put the emphasis on the second note.
The Wizard of Oz contains another example of an ambiguity, in English, which occurs as the Wicked Witch skywrites “Surrender Dorothy” with her broom when Dorothy and her friends are in Emerald City. This message can mean either a call to the inhabitants of Emerald City to surrender Dorothy or a call to Dorothy to surrender herself. In the latter case, there should be a comma between “Surrender” and “Dorothy.” Just as bars in music notation help to clarify the correct structure and emphasis of melodies, commas and periods serve this purpose in written natural languages. And punctuation symbols and keywords serve the same purpose in programming languages.
Sometimes languages offer constructs that leave some choice to the interpreter of the algorithm as to what steps to perform. For instance, the improvisation notation here provides the musician with a choice for the pitch of the second, third, and fourth notes. An algorithm that employs such constructs is called nondeterministic. In music notation this is sometimes used to provide room for improvisation, that is, to give musicians more freedom in their interpretation and performance of a music piece. But nondeterminism must not be confused with ambiguity. From an algorithmic point of view, a music score provides instructions for a musician to produce a sequence of sounds, each of a particular pitch and duration. In contrast, an ambiguous notation would leave the musician wondering what to do. Thus, while nondeterminism is a feature of a language, ambiguity is a bug in a language’s syntax definition. It may come as a surprise, but algorithms that leave some choices open are quite common. For example, in Hansel and Gretel’s simple way-finding algorithm, the next pebble to be chosen was not uniquely specified.
Ambiguity is a pervasive phenomenon in languages. It reminds us that a language is not just a collection of its well-formed sentences but also maps these sentences to their structure in the form of abstract syntax trees. Ambiguities can be fun; they are sometimes exploited in music to create clever effects, for example, by starting a melody that is interpreted by the listener in some way and then adding a different rhythm that creates a surprise and forces the listener to change the interpretation.
In our example you can simulate this as follows. (This is best done with two people. If you have a keyboard available, this can also be done by one person, but the effect is not as strong.) One person starts by humming or playing (but not singing—this is probably too difficult) the alternating G and E eighth notes with the (wrong) emphasis on the Gs. Then, after some time, the second person starts humming quarter notes (for example, C) that start together with each E hummed by the first person. After a few iterations, you will perceive a shift in the rhythm and the song will suddenly sound like the well-known part from “Over the Rainbow.”
Ambiguities occur in other notations as well. You have probably seen the Necker cube. If you look long enough at the picture, your perception shifts between looking down on a cube from the right to looking up at a cube from the left. Again, there is one visual representation that has two different structural interpretations. This ambiguity matters. Suppose you want to touch the cube at its top right front corner; then the position you reach out to depends on your interpretation of the figure.
Related are figures like the Penrose triangle (inspired by M. C. Escher drawings). However, the issue with these figures is not ambiguity between different possible interpretations but rather the fact that no interpretation exists that is consistent with physical reality as experienced by people (see chapter 12).
Ambiguity is an intriguing concept and a great source for humor in natural languages, as in playing on the difference between “Let’s eat, grandma” and “Let’s eat grandma.” However, ambiguity presents a serious problem for algorithmic languages, since it can prevent the clear representation of a sentence’s intended meaning. There is a subtle interplay between the syntax of a language and its meaning. On the one hand, one needs to think carefully about the syntax of a language to be able to define its meaning. On the other hand, one has to understand the meaning before one can define the syntax properly.
Take on Meaning
The ambiguity example demonstrates that without knowing the structure of a sentence it is impossible to understand its proper meaning. But what really is the meaning of a sentence?
Since natural languages such as English are used to talk about everything, it is very difficult to narrow down the domain of their meaning other than saying “everything that can be talked about.” For music languages this is much easier: The meaning of a sentence, that is, a music piece, is the sound you can hear when the piece is performed. Before diving more deeply into questions about the meaning of a language and how to define it, I want to point out two different senses of the word meaning. On the one hand, we have the meaning of an individual sentence. On the other hand, we have the meaning of a language. To distinguish between the two, I mostly use the word semantics when talking about languages and the word meaning when referring to individual sentences.
An overview of how sentenc
es, languages, and meaning are related is given in figure 9.2. In a nutshell, the semantics of a language is given by the meaning of all its sentences, and the meaning of an individual sentence is given by associating it with a value of the domain the language talks about.
The meaning of one particular sentence in a music language is the music you hear when somebody performs it. However, since different musicians will interpret the song differently—some will sing it, some will play it on an instrument, others will vary harmonization or speed—there doesn’t seem to be one unique sound that can be defined to be the meaning of the song. This problem can be addressed either by postulating the performance with one particular instrument by one particular musician or by taking the performance by a MIDI synthesizer as the standard. Alternatively, we could say that the meaning of “Over the Rainbow” is given by the set of all possible sounds that result from proper performances of the song. Of course, this raises the further question of what counts as a proper performance. To avoid a full discussion of this thorny philosophical issue, I refer to words of the former U.S. Supreme Court Justice Potter Stewart, who, when asked what his criterion for obscenity was, famously replied, “I know it when I see it.” Thus, if you’ve heard the original version of “Over the Rainbow” in The Wizard of Oz, you can tell a proper performance if you hear one. After all, music is art, and we shouldn’t be surprised or concerned that it can’t be captured completely through formal definitions. The important point here is that the meaning of the “Over the Rainbow” score is a sound that somebody who knows the song well would recognize as its performance.
Figure 9.2 The semantics of a language is given by a mapping that associates each sentence’s structure, represented by its abstract syntax tree, with an element of the semantic domain. This view of meaning is called denotational semantics, since it is based on assigning denotations to sentences of a language.
If we now take all the sounds that could result from performing any music score imaginable, we obtain a semantic domain. A semantic domain for a language such as staff notation or tablature is the set of all meanings that any sentence of that language can have. A semantic domain surely is a large set, but there are many things it does not contain, such as cars, animals, thoughts, movies, traffic laws, and so on. Therefore, this set still is useful in characterizing a language. It does that by describing what a user of the language can expect a sentence of that language to denote.
The collection of all individual sentences of a language and their associated meanings constitutes the semantics of a language. If a language is nonambiguous, each of its sentences has only one meaning,1 and so picking out a sentence from the language will always lead to its meaning. In an ambiguous language, the fact that one sentence can have more than one meaning is reflected by its having multiple syntax trees, each of which can have a different meaning. The described view of meaning is referred to as denotational semantics, which is based on the idea of assigning meanings to sentences. There are several other approaches in computer science to defining the semantics of languages, but denotational semantics is probably the closest to our intuition and also easier to understand than some of the alternatives.
The major purpose of a language is the communication of meaning, but invented languages have no a priori obvious semantics. Therefore, for a language to be useful, it needs to be assigned a semantics. Since most languages consist of an infinite number of sentences, we cannot simply list all sentences and their meanings. There must be some other systematic way of defining the semantics of a language. This question ultimately boils down to finding an algorithm for defining the meaning of individual sentences. Such an algorithmic denotational semantics definition for a language consists of two parts. The first part is a mapping of terminal symbols to basic elements of the semantic domain. In this case this means mapping individual notes to sounds of a particular pitch and duration. The second part is given by rules that say for each nonterminal how to construct its meaning from the meanings of its children in the syntax tree. In this example, there are three nonterminals. The rule for note is trivial, since each note nonterminal has exactly one child, which means that its sound should be identical to that of its child. The meaning of measure is obtained by concatenating the sounds of its children in the order they appear. Finally, the meaning of melody is obtained in the same way as for measure, namely, through the concatenation of the sounds obtained as meaning for the melody’s measures.
This example illustrates how the meaning of a sentence is systematically constructed via its abstract syntax tree by combining the meanings of leaves into meanings of their parents, followed by combining their meanings, and so on, until the meaning for the root of the tree is obtained. A semantics definition that has this form is said to be compositional. A compositional definition is appealing because it mirrors the way a grammar defines the syntax of an infinite language through a finite set of rules. A language whose semantics can be defined in a compositional way is also called compositional. The principle of compositionality was identified by the mathematician and philosopher Gottlob Frege during his investigation of languages and ways to formally define language semantics. Some degree of compositionality is principally required to obtain a finite description of the meaning for an infinite number of sentences given by a grammar. If a language lacks compositionality, then it is generally problematic to define its semantics, since the meaning of an arbitrary sentence cannot be obtained by composing the meanings of its parts but must be described separately. Such descriptions are exceptions that override the general rules for determining meaning.
The simplified music language described here is compositional, but many other languages are not, or only partly so. English is one such example (as are most other natural languages). For very simple examples of noncompositionality one has to look no further than compound words. While a firefighter is somebody who fights fires, a hot dog does not refer to a high-temperature canine, and a red herring does not refer to a fish of a specific color. Other examples are idiomatic expressions such as “kicking the bucket” or “spilling the beans,” each of whose meanings is not obtained through the combination of the expression’s words.
Staff notation also contains noncompositional elements. For example, a tie is a symbol that connects two consecutive notes of the same pitch, typically the last note of a measure with the first note of the next measure. This occurs in “Over the Rainbow” at the very end of the song in the phrase “why oh why can’t I?” where “I” is sung as one note that lasts for two whole measures. According to the rule for finding the meaning of a melody, one has to determine the meanings of the individual measures and then concatenate those. This would lead to two one-measure-long sounds instead of one sound that lasts two measures. Therefore, to properly get the meaning of notes tied across two (or more) measures, the rule for finding the meaning of a melody must be overridden by a rule that treats groups of measures whose notes are tied as one.
The rules for determining the meaning of a sentence are remarkably similar to the rules that musicians follow when interpreting music notation. This is not an accident because a denotational semantics for a language is an algorithm to compute the meaning for a given sentence from that language. A computer that understands the language in which the denotational semantics is given can execute the semantics and thus compute the meaning of sentences. A computer or algorithm that can execute semantics is called an interpreter. While it may seem strange to view Hansel and Gretel as interpreters of pebble-following instructions, calling Judy Garland an interpreter of Harold Arlen’s music probably won’t raise any objections. Just as an algorithm makes computation repeatable, a language definition can be used to create computers for executing sentences of the language and thus makes language execution repeatable. In the case of the music notation, anybody who understands the semantics for the music notation can learn how to interpret music notation. In other words, the semantics can be used to teach musicians without a music teacher; it all
ows people to teach themselves.
A question related to interpreters is what happens if we create a recording of the music performance. Do recordings somehow embody the meaning of a music piece? Not really. What happens is that the recording of music produces merely a different representation of the music. The first recordings of “Over the Rainbow” were produced on analog records, which contain grooves that are interpreted by the needle of a record player and transformed into sound waves. Later, sound waves were encoded digitally on a CD as a sequence of bits (zeros and ones) that are read by a laser and then interpreted by a digital-to-analog converter to reproduce the sound waves. Today, most music is represented using software formats, such as MP3, designed to support the streaming of music over the internet. Whatever the representation is, it is still a representation of the music in some specific format, or language, that requires a performer or computer for executing the language instructions to produce the intended sound effect. Thus, playing and recording a music piece is actually a form of translation from one language, say, staff notation, into another, say, bit representation of sound waves.
Further Exploration
The song “Over the Rainbow” was used to illustrate that each sentence of a language has a structure that is important for understanding its meaning. Any other music piece could have been used as well to investigate the structure of songs and the potential ambiguity in notation, provided it could be described by some sufficiently expressive music notation. To understand the significance of notes, bars, and other structural elements of the notation, it is instructive to examine alternative notation systems for music and understand their limitations. In particular, the systematic restriction of existing notations results in new languages. For example, restricting the available pitches of notes leads to percussion notation, and ignoring pitches altogether leads to a language of rhythms. If we further reduce the available note lengths to only two (“short” and “long”), we obtain Morse code. Chord symbols ignore the melody and only show the harmonic progression of a music piece, while Chord charts combine chord symbols and rhythm notation to show the harmonic and rhythmic progression of a music piece.