Gussenhoven Intonation and Interpretation Phonetics and Phonology
Gussenhoven Intonation and Interpretation Phonetics and Phonology
Gussenhoven Intonation and Interpretation Phonetics and Phonology
% Questions
60
and hence ‘questioning’ vs ‘asserting’. In a classic experiment
50 Chinese
with a number of artificial intonation contours superimposed
40
on a phrase which could be interpreted as either Swedish (för
30 Hungari
Jane) or English (for Jane), Swedish and American English
20 an
listeners were asked to decide whether the utterance was
10
meant as a statement or as a question [20]. The contours
0
consisted of a single rising-falling peak on Jane, varying in
H1 H2 H3 H4 H5
peak height and end pitch. Essentially, the results for both
groups of listeners were that the higher peak attracted more
‘Question’ judgements than the lower peak, while there was a
clear correlation between end pitch and the ‘Question’ scores. Figure 1. Percentage “Question” judgements as a
Although the authors failed to point this out, the results also function of peak height by three groups of listeners
show the influence of the native language. Listener language with ordinal interaction between listeners’ language
appeared to interact with peak height: Swedish listeners and peak height. From [22].
differentiated more sharply between the superhigh peak and
the high peak than the American listeners, showing a greater 2.3. Grammaticalisations of the Frequency Code
influence of this variable in their scores. It is reasonable to
Grammaticalisation of the informational uses of the
explain this result as due to the fact that Swedish does not use
Frequency Code is commonplace. As said above, over 70% of
final rises as a cue for questions in the way English does,
the languages in the world are estimated to have rising
causing Swedish listeners to rely more strongly on other cues.
intonation contours, while the use of rising intonation for
Similarly, Japanese listeners are less inclined to hear
statements is exceptional [1]. In fact, many languages have
interrogativity in high-peaked contours than Russian listeners
more than one rising pattern. Dutch has four phonologically
[21]. Interestingly, Japanese uses a final rise for questions,
different contours, H*L H%, H* H%, L*H H%, and L* H%
while Russian employs a difference in peak height. In [22],
[26,27]. Malay distinguishes statements from questions by
Standard Chinese, Dutch and Hungarian listeners were asked
having an initial boundary %L in the former and %H in the
to identify the question in pairs of intonation contours
latter (Indirawati Zahid, personal communication).
superimposed on identical segmental structures. These three
Grammaticalisation of peak height is less common.
languages have different ways of expressing interrogativity
Possibly, this is due to the widespread communicative use of
prosodically. Chinese raises the pitch register [23],
pitch range in the phonetic implementation. Somewhat
presumably an effect produced in the phonetic
roundabout ways of doing this can be found, however.
implementation. Dutch uses final rises, phonologically marked
Bengali has two phonologically different contours, each with
by final H% [19], while Hungarian distinguishes peaks in
a final peak which in selected contexts can occur on the final
stressed syllables in declaratives from phrase-final (i.e.
syllable, one signalling contrastive declarative focus and the
boundary) peaks in interrogatives [24],[25]. The stimuli
other signalling the yes-no interrogative. Phonologically, the
consisted of (hypothetical) trisyllabic CVCVCV structures, as
two peaks differ in the status of the H-tone, which belongs to
pronounced by a speaker of Dutch with the stress on the
the phonological phrase in the case of the contrastive
penultimate syllable. The contours, which were similar in
declarative (Hp) but to the intonational phrase in the case of
structure to the ones used in [20], varied in peak height, peak
the interrogative contour (H%). The point is that the tone of
alignment, and end pitch. Unlike what is usual in other
the intonational phrase is pronounced at considerably higher
experiments, the listeners were told, quite untruthfully, that
pitch [28].
they were going to hear sentences from a little known
‘Unnatural’ form-function relations appear to be quite
language spoken on a South Pacific island. Regardless of
liberally available in the case of interrogative intonation, in
language background, listeners associated higher peaks and
which case they are falling, and more rarely in the case of
higher end pitch with questions, as in the 1964 experiment
declarative intonations, in which case they are rising.
(see Figure 1). Moreover, there was also an interaction
Chickasaw is a striking case: the interrogative is H* L%, the
between language group and peak height, which showed that
declarative H* H% [29]. There must be many scenarios
Hungarian speakers were more sensitive to the peak height
leading to falling intrrogative intonation and rising declarative
variable than the other two language groups, parallelling the
intonation. In [5], I sketched a probable development of
behaviour of the Swedes vis-à-vis the Americans.
falling questions from rising questions as a result of the
introduction of a lexical tone in the dialect of Roermond. The
motivation for the fall was the preservation of a lexical tone
contrast under interrogative intonation. In the declarative
context, the tone contrast was phonetically realised as a steep
fall to low (Accent 1) versus a slow fall to mid (Accent 2). In
the interrogative, a falling component was added to the rising
intonation in the case of Accent 1, which later led to a
generalised interrogative intonation contour L*-HL%. (This
contour also occurs in Bengali and Greek [28],[30].)
Arguably, the presence of a high final peak can be still said to
be a manifestation of the Frequency Code, despite the fall to usable as a substitute for pitch range, this point is made in a
low. different way.
A likely source of rising statements is truncation of An interesting exploitation of the Effort Code is the use of
delayed peaks. As argued in section 6, delayed peaks may compressed pitch range to express negativity, the withdrawal
occur as replacements of high peaks. The resulting rising- of information. This is reported for the Bantu tone language
falling pitch accents may be truncated on final syllables, and Engenni, where high tones are lowered and low tones raised
when such truncated falls are interpreted as L*H%, in negative VPs [34].
generalisation of this form to other contexts may result.
3.2. Affective interpretations of the Effort Code
3. The Effort Code Affective interpretations of the Effort Code include ‘surprise’
Increases in the effort expended on speech production will and ‘helpfulness’. As for the latter meaning, going to some
lead to greater articulatory precision, but also a wider lengths in realising pitch movements may be indicative of an
excursion of the pitch movement. Speakers exploit this fact by obliging disposition. Speech addressed to children would
using pitch range to signal meanings that can be derived from frequently appear to have this suggestion of ‘a little help’ to
this effect of the expenditure of effort. A frequent the listener. The perception of pitch range would appear to be
interpretation is that the speaker is being forceful because he tied to the distance between L-realisations and H-realisations,
believes the contents of his message are important, an not the F0-width of just any movement. This was shown for
informational meaning. Narrow range may be used to signal the perception of ‘surprise’ in Dutch in [35]. When the
negation, a withdrawal of information. In addition to the more contour’s main pitch rise was a realisation of H* H%,
obvious meanings of ‘surprise’ and ‘agitation’, affective perceived surpise went up with the raising of the targets of
meanings include ‘obligingness': the speaker is here concerned both H* and H%. However, when the rise was a realisation of
to help the listener to understand what he is saying. L*H H%, perceived surprise went up when the target of L*
was lowered, and that of H% raised (see Fig 2).
3.1. The informational interpretation of the Effort
Code
The most obvious informational interpretation of the Effort
Code is ‘emphasis’: the speaker is concerned that his message
should come across. The overall pitch range of utterances in
British English radio news bulletins correlates with
informational salience, as determined independently of the
acoustics [31].
Many perception experiments, beginning with [32], have
shown that higher pitch peaks sound more prominent,
everything else being equal. Interestingly, the effect is not
simply due to peak height. Rather, it is an estimate of how
wide the pitch excursion is, given some choice of pitch
register, and the listener's impression therefore results from an
estimate of the pitch span in relation to some choice of pitch
register. The most straightforward way in which this can be
demonstrated is by having listeners judge the prominence of
peaks in identical pitch contours superimposed on a male and
a female voice, as reported in [33]. In this experiment, the
original utterances, which had been recorded by a woman
with a fairly ‘deep' voice, were provided with artificial spectra
by multiplying the first formants with a factor or less than 1,
so as to create a set of stimuli that sounded as if they were
spoken by a man. A second set of stimuli was obtained by
multiplying the original formant values by a factor of more
than 1, so as to create a set that sounded as if they were
spoken by a woman whose voice was subjectively more
feminine than the original voice. Listeners rated pitch peaks
in the artificial male voice as more prominent than the
equivalent pitch peaks in the artificial female voice, even
though the pitch contours were identical. These results can be
explained if we assume that prominence judgements are made
relative to some hypothesised reference line, as represented
by the the contour's register. Since the hypothesised register Figure 2. Perceived surprise scores as a function of
of the ‘female’ speaker was higher than that of the ‘male’ beginning and end of nuclear contour, separately for
speaker, perceived prominence of the female stimuli was less H*H% (panel a) and L*HH% (panel b). From [35 ].
than that of the male stimuli. Thus, the Effort Code is about
inferred pitch excursion size, not height of pitch per se (see The earliest perception research into intonational meaning
Figure 2). In section 6, where pitch register is argued to be found that rising-falling and falling-rising contours
(representing a change of pitch direction and contrasting in the
experiment with stimuli having less pitch excursion) signalled of the affective interpretation of the Effort Code. This
the meanings ‘authoritative’ and ‘pleasant’. This result morpheme would consist of an initial unspecified boundary
illustrates, respectively, the informational and the affective %T, whose identity (%H or %L) is determined by the identity
interpretations of the Effort Code [36],[18]. of the following T*, as summarised in Table I.
3.3. Grammaticalisation of the Effort Code Table I. Positive speaker evaluation of negative
Grammaticalisation of the informational interpretation of the polarity of initial boundary tone in Dutch. After [35]
Effort Code is commonplace in the expression of focus. In
such cases, the intonational structure will favour a situation %H L* %L H*
whereby focused information will be characterised by
POSITIVE EVALUATION
relatively wide pitch excursions. Germanic languages and, to
a lesser extent, Romance languages use pitch accents to mark
focused parts of sentences, removing these in the sentence %L L* %H H*
constituents after the focus. Because it is mediated through a
grammar, the expression of focus through deaccentuation will NEGATIVE EVALUATION
be subject to restrictions that vary from language to language.
The constituent that allows focus contrasts to be expressed is
at least as small as the word in Dutch, which allows contrasts
I have no examples of ‘unnatural’ grammatical focus
like ZWARTE driehoek vs zwarte DRIEHOEK ‘black
expression. At best, expressions with and without focus may
triangle’ to signal the known informational status of driehoek
have equal pitch excursions, in situations in which focus is
and zwarte, respectively. By contrast, Italian does not allow
expresed in the morpho-syntax, as in Wolof [41].
NP-internal contrasts, and as a result TRIANGOLO NERO
‘black triangle’ is the neutralising translation of both Dutch
expressions [36]. In Basque, the focus constituent requires the
4. The Production Code
presence of a pitch accent, but oddly, since the presence of A very different interpretation of the process of energy
pitch accents is largely lexically determined, not all words generation relies on the fact that speakers appear to spend
are equally focusable [38]. In Japanese, compound words that more effort on the beginning of utterances than on the ends.
consist of a single accentual phrase do not allow the focus This impression originates from a correlation between
constituent to be confined to a sub-compound constituent [4]. utterances and breath groups: at the beginning of the
A different type of grammaticalisation occurs in exhalation phase, subglottal air pressure will be higher than
languages that use different pitch accents for narrow towards its end. A natural consequence of the fall-off in
(contrastive) focus and neutral focus, like Bengali [27] and energy is a gradual drop in intensity, and a weak, gradual
European Portuguese [39]. In such cases, a one-word lowering of the fundamental frequency [13], known as
utterance with contrastive focus is phonologically different ‘declination’ [42]. The communicative exploitation of this
from a neutral citation pronunciation of the same word. In effect is the Production Code, which associates high pitch with
line with the Effort Code, the contrastive pitch accent will be the utterance beginning and low pitch with its end.
realised with greater pitch excursion on the accented syllable.
In European Portuguese, the narrow focus pitch accent has a 4.1. Informational interpretations of the Production
peak in the accented syllable (H*+L), while the neutral pitch Code
accent has a fall that ends inside the accented syllable
(H+L*), causing the contrastively accented syllable to have As far as the Production Code is concerned, the significance of
the wider pitch excursion. The Bengali case is given in declination does not lie in its slope. Rather, it is variation at
section 6. A third way in which pitch excursion has been the edges that is interpreted in terms of initiation and finality.
grammaticalised is through the suspension of downstep. In Thus, high beginnings signal new topics, low beginnings
Japanese, prosodic phrasing is sensitive to focus structure, continuations of topics. A reverse relation holds for the
and the most salient consequence of this is that an otherwise utterance end: high endings signal continuation, low endings
automatic lowering of the pitch range cannot take place in a finality and end of turn. Grammaticalisations of these relations
focused constituent. is commonly found for the utterance end, when a H% signals
A grammaticalisation of the ‘obligingness’ interpretation continuation, but may also be found in the use of initial %H to
may have been found by [40]. They investigated the signal topic refreshment. The Production Code would appear
pragmatic effects of high-pitched and low-pitched realisations to have informational meanings only.
of the utterance-initial unaccented syllables before the first The interrogative and continuative meanings of final rises
pitch accent in Dutch. High onsets (%H) before a low- in languages like Dutch [43], therefore, have quite different
pitched accented syllable (L*) were more positively evaluated explanations under the present account, since the first is
than low onsets (%L) on each of four scales measuring the derived from the Frequency Code and the second from the
speaker's disposition towards the hearer, Non-aloofness, Production Code. Earlier, these meanings had been collapsed
Friendliness, Politeness and Non-aggressiveness. However, as ‘open’ in [44],[8]. Various research results suggest that
low onsets were more positively evaluated before high- where both cues exist, the continuation cue is lower than the
pitched (H*) accented syllables than high onsets. In other interrogative cue. This is true for Dutch, where L*H or H*
words, movement towards the accented syllable, regardless of followed by a level pitch until the intonational phrase
direction, was positively evaluated and absence of movement boundary, is likely to be interpreted as a continuation cue,
received negative evaluations. Arguably, choice of onset while the addition of H%, which is realised as an additional
represents an ‘obligingness’ morpheme, a grammaticalisation rise at the boundary, will cause a shift towards question
interpretation [43]. Overall slope in Danish, used
concomitanty with variation in end pitch, is similarly linked by late peaks. First, due to the Effort Code, late peaks sound
to interrogativity for the least steep slopes, with continuation more prominent than early peaks. Strictly speaking, this is a
for the medium slopes, and with statements for the steepest two-step inference on the part of the listener: (1) high peaks
slopes [45]. Arguably, this result follows from the fact that, can indicate wide pitch span, and (2) late peaks can indicate
for the purposes of the Production Code, the variation at the high peaks. Indeed, both higher and later peaks elicit more
end of the utterance falls within a lower frequency band than ‘unusual occurrence’ interpretations than ‘everyday
that at the beginning of the utterance, while the variation for occurrence’ interpretations of one-peak realisations of The
the Frequency Code is free from this downward bias. aLARM went off, as shown by [46], suggesting that listeners
Conversely, we would expect that interrogativity marking at perceive late peaks as if they were higher. Moreover, in
the beginning of the utterance, like H% in Malay, can have research on the difference between wide focus and narrow
lower pitch than that used for the signalling of a new topic. focus in the Hamburg dialect of German, it was found that
The downward slope is commonly grammaticalised, as narrow focus was realised by later peaks, suggesting again that
downstep. In a frequent type, H after L is pronounced at a speakers use it to signal high pitch [47].
categorically lower pitch than a preceding H. Such A grammaticalisation of late peak vs early peak occurs in
grammaticalisations may be purely phonological, i.e. European Portuguese, which has H*+L for narrow focus and
meaningless (except for the information provided by the fact H+L* for neutral focus [39], which latter pitch accent, again,
that the downstep context is confined to some prosodic is also lower, as noted in section 4.1. In these cases, the later
constituent, which will indirectly reveal the morpho-syntactic peak does not conflict with the primary variable, pitch span,
structure). Final Lowering, like the raising of the pitch at the since the pitch span in the accented syllable will not be
beginning of phrases, in gradient in English, but it may be smaller than in the neutral syllable. However, the use of peak
phonologised too, as it is in various African tone languages. delay for emphasis is constrained by the competition from
primary correlate of the Effort Code, the pitch span. Since the
5. Substitute variables in F0 variation nuclear syllable is a prime location for the pitch span cue,
narrow focus is often indicated by a pitch accent describing a
An important aspect of the present conception of intonational fall within the stressed syllable, while the pitch fall in the
meaning is that while the nature of the meanings is related to neutral focus case falls outside it [47]. For instance,
the way our speech organs produce pitch variation, there is no prenuclear pitch accents would appear to be L*+H in Spanish,
implication that the physical conditions that lie at the basis of and nuclear, focal ones H*+L [49].
these meanings need to be present in order to create the forms. As for the Frequency Code, there have been reports of
Speakers and listeners know what these form-function languages that use a later peak to mark question intonation
relations are, and will produce the forms in the way they see and an earlier one for statement intonation, such as southern
fit. To indicate the start of a new topic, the idea is not that the varieties of Italian [50]. The difference is interpreted as
speaker should breathe in at the beginning of his utterance, but categorical by Grice, suggesting that we are dealing with a
that he should produce sufficiently high pitch at that point to grammaticalised form of an informational interpretation of
convince his listener of his communinicative intention. It is in this secondary effect of the Frequency Code. Recently, it has
fact possible to use substitute features, phonetic forms that the been found that nuclear peaks in Dutch questions are 40 ms
listener can associate indirectly with the primary form. Two later than in declaratives [51]. Here, the effect is almost
cases are discussed. First, peak delay can signal high pitch, certainly phonetic. An affective interpretation of the
and thus all the meanings of high pitch, and second, that high Frequency Code can be found in the fact that delayed
pitch can be used to signal wide pitch span. accentual peaks in Japanese are associated with female speech
[52]. A demonstration of the universality of the connection
5.1. Peak delay as a substitute for peak height
between peak delay and interrogative intonation was provided
A higher pitch peak will take longer to reach than a lower one, in the experiment reported in [22]. In addition to end pitch
if rate of change is the same. Therefore, higher peaks will tend and peak height, their stimuli also varied in peak alignment.
to be later than lower peaks, as suggested by Figure 3. Regardless of language background, Hungarian, Chinese and
Speakers and listeners have tacit knowledge of this Dutch listeners associated not only higher peaks and higher
mechanical connection, providing them an opportunity to end pitch with questions, but also later peaks. This results
bring it under control. Peak delay can therefore be used as an showed quite ambiguously that humans know both the direct
enhancement of, or even a substitute for, pitch raising. and indirect manifestations of the Frequency Code (see Fig
4).
Hz Finally, the Production Code: [31],[53] found that first
Raised and
delayed peaks of intonational phrases containing new topics in British
English were later than other first peaks. This finding can be
related to this code, which links high beginnings to new
topics. The high beginning is expressed in the first accentual
peak, whose late timing enhances the high pitch.
time
So far, a picture has been painted whereby form-function (1) A: Do you have jello?
relations are available to all humans, which language learners a. B: We have \pie/
will grammaticalise, after which language change may b. #B: We have \pie/, which we know you won’t eat
destroy them, such that grammatical forms may have c. B: We have \pie, which we know you won’t eat.
meanings that are the opposite of what would be expected. As
a broad frame of reference, this picture has served well to The system in [57] is compositional, but to a lesser extent than
make sense of many well-known form-function relations, and advocated by [58], who essentially consider every tone a
of the fact that intonation is at the same time structural, morpheme. The compositionality of [57] comes in as
discrete, and often has arbitrary form-function relations, while ‘modifications’, to be expressed as affixes or tone deletions,
on the other hand seems overwhelmingly iconic, both to
which meaning components apply to classes of contours. An it is different after an instantiation of (2b). Non-eventive
example is L*-prefixation, which adds significance to every sentences fall into two categories, DEFINITIONAL, which
one of the three meanings (‘delay’). update the attendant circumstances, and CONTINGENCY,
which does the same, but had the additional meaning that the
6.2. Pitch accent distribution speaker claims not to know if the update is at all relevant (see
Fig. 5). The three types have different forms in English. First,
The same point can be made with respect to pitch accent
eventive sentences have no accent on the predicate if it is
distribution in English. There are precise semantic effects of
adjacent to an accented argument (subject or object).
the type illustrated in (2). In (2a), the usual rendering of the
Definitional sentences only allow unaccented focused
proverb, the presence of accent on spoil is obligatroy for the
predicates when adjacent to an accented object. Contingency
interpretation whereby the many cooks in the subject are only
sentences are distinct from definitional sentences in requiring
potential. Without the accent, as in (2b), the proposition
accent on the negator in the VP, and in requiring accent on
becomes eventive [57, ch 2], such the the speaker commits
the predicate even when adjacent to an accented object. The
himself to the belief that there actually are too many cooks
three types are distinct in a negative subject-predicate
spoiling the broth.
sentence, therefore, as shown in (3), (4) and (5).
(2) a. TOO many COOKS SPOIL the BROTH (proverb)
(3) (A: What’s that scuffle?)
b. TOO many COOKS spoil the BROTH
B: Our CUSTomers aren't admitted! (Eventive)
(implying e.g. that soups need to be taken off the menu)
(4) CUstomers aren't adMITted
(This is the way it is: Definitional)
6.3. Negotiating shared understanding (5) Our CUSTomers AREN'T adMITtEd
(In case you had forgotten: Contingency)
The grammatical meanings of intonational morphemes are
labels that tell the listener to what extent the information
represents an update of the shared understanding he is
negotiating with the speaker. The first distinction is between
status-quo information (background, old information) and
update information (focus, new information). Status-quo
information is deaccented: no pitch accents appear after the
focus constituent. (Before the focus, pitch accents may be
added for rhythmical reasons.) The meanings of the pitch
accents(-cum-boundary tones) concern the relation of the
focus to the background [60],[57],58],[61]. I illustrate the
above three meanings in Table X.