Speech Prosody 2002

ISCA Archive Aix-en-Provence, France

http://www.isca-speech.org/archive April 11-13, 2002

Intonation and Interpretation: Phonetics and Phonology

Carlos Gussenhoven

Centre for Language Studies

University of Nijmegen, The Netherlands

linguistic viewpoint was adopted [6], or an exclusively

Abstract linguistic viewpoint (e.g. [2]), or in which the two aspects are
Intonational meaning is located in two components of reconciled with each other in a gradient conception of their
language, the phonetic implementation and the intonational difference (e.g. [7, p. 128],[8]). Below, I explain the notion
grammar. The phonetic implementation is widely used for the of a biological code (section 2.0), and discuss each of the
expression of universal meanings that derive from ‘biological three codes in a separate section.
codes’, meaning dimensions based on aspects of the
production process of pitch variation. Three codes are 1.1. Three biological codes
identified, Ohala’s Frequency Code, the Effort Code and the The question arises what the expalnation is the of the nature
Production Code. In each case, ‘informational’ meanings of the universal paralignuistic meanings. This tacit knowledge
(which relate to the message) are identified, while for the first derives from three biologically determined conditions. One is
two codes also ‘affective’ meanings (relating to the state of that the organs with which we produce speech, in particular
the speaker) are discussed. Speech communities will vary in the larynx, vary in size across speakers, causing differences in
the extent to which they employ those meanings, and in the the fundamental frequency of adult speech and children's
choices they make when they conflict. What they will never speech, and within adults, of male and female speech [9]. The
do, however, is change the natural form-function relations second is that the production of speech requires energy, and
that they embody. By contrast, grammaticalised meanings that variation in this energy is detectable in the signal. The
often mimic the natural meanings, but linguistic change may third is the energy is parcelled out in chunks that coincide
create quite arbitrary form-meaning relations when forms are with exhalation phases of the breathing process. Respectively,
phonologised, and the semantics is systematised. English these codes are the Frequency Code [9], the Effort Code, and
grammaticalised intonational meaning concerns information the Production Phase Code, or Production Code, for short [5].
A. The Frequency Code. Smaller larynxes contain lighter and
1. Introduction smaller vocal cords, with which faster vibration rates are
achieved for a given amount of energy. The correlation
A discussion of intonational meaning typically raises the issue between larynx size and rate of vocal cord vibration is
of whether such meaning is universal or language-specific exploited for the expression of power relations. The many
[1,2]. The position defended here is that both the universal ramifications of this latter connection were dealt with by
and the language-specific perspectives are true, Ohala [9],[10][11]. The term for this form-function relation is
simultaneously, for any language, but that the universal part is his, and my labels for the next two relations are by analogy
exercised in the phonetic implementation, while the language- with his term. An alternative term would be ‘Size Code’.
specific meaning is located in the intonational morphology
and phonology. The universal meanings are based on B. The Effort Code The amount of energy expended on
metaphors of biological conditions that influence the speech speech production can be varied: putting in more effort will
production process, in this case F0. Three such metaphors, or not just lead to more precise articulatory movements, but also
‘biological codes, as I will call them, have been identified. to more canonical and more numerous pitch movements.
Together, they amount to a theory of paralinguistic meaning Lavishing more care on the production process means less
in intonation. In each case, we are dealing with a number of slurring together of these movements, causing them to be
different interpretations, each of which can be related to the carried out with less undershooting of targets [e.g. 12].
more general meaning of the code. C. The Production Code. The generation of energy is tied to
Unlike paralinguistic meaning, linguistic meaning is the exhalation phase of the breathing process, and hence
potentially arbitrary, although the form-function relations becomes available in phases, Lieberman's breath groups
between tones and meaning frequently mimic the [13].This code associates high pitch with the beginnings of
paralinguistic form-function relations employed in phonetic utterances and low pitch with the ends.
implementation [3]. Grammatical meanings are like
paralinguistic meanings, as when final H% indicates non- Together, the three biological codes explain what is
finality or final H% signals interrogativity. However, this is universal about the interpretation of pitch variation. In each
by no means always the case. Language change may create case, the general form-function relation acquires a number of
‘unnatural’, arbitrary forms [5]. This response to the problem more specific interpretations. Broadly, these can be classed as
of the partially paralinguistic nature of intonation contrasts 'affective', in which case they signal attributes of the speaker,
with earlier ones in which either an almost exclusively non- or 'informational', in which case they signal attributes of the
message. All of these concern meanings which are available delayed peaks as a substitute for high peaks [3]. Before this
to all humans. However, the universal meanings deriving point is made, I will deal with informational and affective
from different codes may well be mutually incompatible, and interpretations of each code in sections 2, 3 and 4.
there will be instances that speakers with different language
backgrounds make different choices, or that listeners draw on 2. The Frequency Code
a different code than the speaker intended. Moreover, the
intonational grammar of a language may bias the exploitation The Frequency Code is essentially Ohala’s extension to
of the universal codes, such that universal meanings which human speech of Morton's explanation for the widespread
happen to be encoded in the grammar are more readily similarities in patterns of avian and mammalian vocalisations
perceived by speakers of that language than by speakers of in face-to-face competitive encounters [16]. Vocalisations by
languages in which such meanings have not been encoded dominant or aggressive individuals are low-pitched, while
those by subordinate or submissive individuals are high-
1.2. Grammatical meaning pitched. The explanation of this correlation is that lower pitch
suggests the organ producing the vocalisation is larger. In
Typically, the intonational morphemes of a language will fact, the exploitation of this correlation in nature is not
mimic the universal form-function relations. But of course, in confined to meaningful variation within individuals. In many
such grammaticalisations of the universal codes, the function species, it is hard-wired through dimorphism, the different
will be morphemic, the form phonological, and since we are biological developments of the male and female members of a
now dealing with structural elements, these morphemes are species. In the front-to-back dimension, the male human
subject to the normal forces of languages change. As a result, larynx is almost twice the size of the female larynx, exactly
languages may come to possess form-meaning relation in the dimension which affects the fundamental frequency most.
their grammars which go against the universal, biological This arises at puberty, the age at which boy becomes man,
codes. In fact, this happens so commonly that Ladd [2] ready to assume the role of defender or aggressor. To
rejected the notion of universal form-meaning relations in underscore the effect, the male larynx is positioned lower in
intonation, on the grounds that if a universal is only in the throat, causing the vocal tract, the tube leading to the lips,
evidence, say, 70% of the time, there is little explanatory to be some 3.5 cm longer than the female vocal tract. The
power to be derived from it. Crucially, in the present effect is that formant frequencies are lower in men,
perspective, such 'unnatural' form-functions relations must be suggesting a larger creature. Other aspects of dimorphism in
structural, i.e. discrete. When meanings are ‘natural’, it may animals and humans point in the same direction: males may
not be easy to establish whether the phonetic difference is have extra feathers to be erected, antlers, thicker manes, or, in
discrete, i.e. due to a phonological difference, or gradient, i.e. the case of humans, peripheral facial hair, all of which serve
due to meaningful variation in the phonetic implementation to make the creature look more imposing. Ohala’s claim was
[14], [15]. that we associate pitch with this package of evolutionary
Grammaticalisation not only implies that the form is meanings, for which reason intonation contours have come to
discretely coded in phonological structure, but also that the have the distributional bias we observe.
meanings are systematised. Intonation is used to route the
semantic contents of particular morpho-syntactic constituents 2.1. Affective interpretations of the Frequency Code
to semantic categories of information status. I will briefly
work this point out at the end of the paper. Affective interpretations of the Frequency Code are rather
numerous. Submissiveness, or ‘feminine’ values, and its
1.3. Divorcing cause and effect opposite, dominance, or ‘masculine’ values, constitute one
obvious dimension. Meanings that are associated with this
Biological codes are based on the effects of physiological dimension are (for higher pitch) ‘friendliness’ and
properties of the production process on the signal, but ‘politeness’. A closely related one is ‘vulnerability’ (for
communication by means of the codes does not require that higher pitch) versus ‘confidence’, which may play out as
these physiological conditions are actually created. It is ‘protectiveness’, or as its negative counterparts, ‘aggression’
enough to create the effects. That is, the effects are not or ‘scathingness’. In the scores for ‘masculinity’ and
automatic, but have been brought under control. When we say ‘feminity’ perception in speech, Biemans [17] found a
that the meaning 'emphasis' as signalled by wide pitch positive correlation between five artificial registers
excursions is derived from the Effort Code, on the grounds superimposed on a set of spontaneous male and female
that greater effort will typically lead to wider pitch utterances and the scores on a ‘femininity’ scale, and a
excursions, there is no implication that the speaker who negative correlation with the scores on a ‘masculinity’ scale.
signals emphasis by using the Effort Code actually expends High pitch commonly leads to high scores on semantic scales
greater effort on his speech production. The only thing he for ‘polite’, ‘non-aggressive’ and ‘friendly’ in perception
needs to do is choose his pitch range such that he will be experiments with intonation. As early as 1964, Uldall found
understood to be exploiting this natural relation between that listeners associated high ending rises with both
excursion size and articulatory effort. Similarly, when using 'submissiveness' and ‘pleasantness’ [18]. In a recent
the Production Code to signal the end of a speaking turn, the experiment, it was found that the scores on four scales
speaker need not have his exhalation phase end with the end measuring affective meanings for eight Dutch intonation
of his utterance, or even produce a more steeply declining contours correlated highly with the mean fundamental
overall contour shape, but need only lower the pitch of the frequency of the contours. The strongest correlations were
last one or two syllables of his utterance. The indirectness of found between these scores and the mean fundamental
the relation between actual speaker behaviour and the natural frequency of the last quarter of the contours, suggesting that
connections between speech production and pitch are in Dutch contour endings are used more for this purpose than
underscored by the the use of ‘secondary’ features like earlier portions [19].
2.2. Informational interpretations of the Frequency
Code 90
The other class of interpretations reflect on the linguistic 80
70 Dutch
message, such as ‘uncertainty’ (for higher pitch) vs ‘certainty’,

% Questions
and hence ‘questioning’ vs ‘asserting’. In a classic experiment
50 Chinese
with a number of artificial intonation contours superimposed
on a phrase which could be interpreted as either Swedish (för
30 Hungari
Jane) or English (for Jane), Swedish and American English
20 an
listeners were asked to decide whether the utterance was
meant as a statement or as a question [20]. The contours
consisted of a single rising-falling peak on Jane, varying in
H1 H2 H3 H4 H5
peak height and end pitch. Essentially, the results for both
groups of listeners were that the higher peak attracted more
‘Question’ judgements than the lower peak, while there was a
clear correlation between end pitch and the ‘Question’ scores. Figure 1. Percentage “Question” judgements as a
Although the authors failed to point this out, the results also function of peak height by three groups of listeners
show the influence of the native language. Listener language with ordinal interaction between listeners’ language
appeared to interact with peak height: Swedish listeners and peak height. From [22].
differentiated more sharply between the superhigh peak and
the high peak than the American listeners, showing a greater 2.3. Grammaticalisations of the Frequency Code
influence of this variable in their scores. It is reasonable to
Grammaticalisation of the informational uses of the
explain this result as due to the fact that Swedish does not use
Frequency Code is commonplace. As said above, over 70% of
final rises as a cue for questions in the way English does,
the languages in the world are estimated to have rising
causing Swedish listeners to rely more strongly on other cues.
intonation contours, while the use of rising intonation for
Similarly, Japanese listeners are less inclined to hear
statements is exceptional [1]. In fact, many languages have
interrogativity in high-peaked contours than Russian listeners
more than one rising pattern. Dutch has four phonologically
[21]. Interestingly, Japanese uses a final rise for questions,
different contours, H*L H%, H* H%, L*H H%, and L* H%
while Russian employs a difference in peak height. In [22],
[26,27]. Malay distinguishes statements from questions by
Standard Chinese, Dutch and Hungarian listeners were asked
having an initial boundary %L in the former and %H in the
to identify the question in pairs of intonation contours
latter (Indirawati Zahid, personal communication).
superimposed on identical segmental structures. These three
Grammaticalisation of peak height is less common.
languages have different ways of expressing interrogativity
Possibly, this is due to the widespread communicative use of
prosodically. Chinese raises the pitch register [23],
pitch range in the phonetic implementation. Somewhat
presumably an effect produced in the phonetic
roundabout ways of doing this can be found, however.
implementation. Dutch uses final rises, phonologically marked
Bengali has two phonologically different contours, each with
by final H% [19], while Hungarian distinguishes peaks in
a final peak which in selected contexts can occur on the final
stressed syllables in declaratives from phrase-final (i.e.
syllable, one signalling contrastive declarative focus and the
boundary) peaks in interrogatives [24],[25]. The stimuli
other signalling the yes-no interrogative. Phonologically, the
consisted of (hypothetical) trisyllabic CVCVCV structures, as
two peaks differ in the status of the H-tone, which belongs to
pronounced by a speaker of Dutch with the stress on the
the phonological phrase in the case of the contrastive
penultimate syllable. The contours, which were similar in
declarative (Hp) but to the intonational phrase in the case of
structure to the ones used in [20], varied in peak height, peak
the interrogative contour (H%). The point is that the tone of
alignment, and end pitch. Unlike what is usual in other
the intonational phrase is pronounced at considerably higher
experiments, the listeners were told, quite untruthfully, that
pitch [28].
they were going to hear sentences from a little known
‘Unnatural’ form-function relations appear to be quite
language spoken on a South Pacific island. Regardless of
liberally available in the case of interrogative intonation, in
language background, listeners associated higher peaks and
which case they are falling, and more rarely in the case of
higher end pitch with questions, as in the 1964 experiment
declarative intonations, in which case they are rising.
(see Figure 1). Moreover, there was also an interaction
Chickasaw is a striking case: the interrogative is H* L%, the
between language group and peak height, which showed that
declarative H* H% [29]. There must be many scenarios
Hungarian speakers were more sensitive to the peak height
leading to falling intrrogative intonation and rising declarative
variable than the other two language groups, parallelling the
intonation. In [5], I sketched a probable development of
behaviour of the Swedes vis-à-vis the Americans.
falling questions from rising questions as a result of the
introduction of a lexical tone in the dialect of Roermond. The
motivation for the fall was the preservation of a lexical tone
contrast under interrogative intonation. In the declarative
context, the tone contrast was phonetically realised as a steep
fall to low (Accent 1) versus a slow fall to mid (Accent 2). In
the interrogative, a falling component was added to the rising
intonation in the case of Accent 1, which later led to a
generalised interrogative intonation contour L*-HL%. (This
contour also occurs in Bengali and Greek [28],[30].)
Arguably, the presence of a high final peak can be still said to
be a manifestation of the Frequency Code, despite the fall to usable as a substitute for pitch range, this point is made in a
low. different way.
A likely source of rising statements is truncation of An interesting exploitation of the Effort Code is the use of
delayed peaks. As argued in section 6, delayed peaks may compressed pitch range to express negativity, the withdrawal
occur as replacements of high peaks. The resulting rising- of information. This is reported for the Bantu tone language
falling pitch accents may be truncated on final syllables, and Engenni, where high tones are lowered and low tones raised
when such truncated falls are interpreted as L*H%, in negative VPs [34].
generalisation of this form to other contexts may result.
3.2. Affective interpretations of the Effort Code
3. The Effort Code Affective interpretations of the Effort Code include ‘surprise’
Increases in the effort expended on speech production will and ‘helpfulness’. As for the latter meaning, going to some
lead to greater articulatory precision, but also a wider lengths in realising pitch movements may be indicative of an
excursion of the pitch movement. Speakers exploit this fact by obliging disposition. Speech addressed to children would
using pitch range to signal meanings that can be derived from frequently appear to have this suggestion of ‘a little help’ to
this effect of the expenditure of effort. A frequent the listener. The perception of pitch range would appear to be
interpretation is that the speaker is being forceful because he tied to the distance between L-realisations and H-realisations,
believes the contents of his message are important, an not the F0-width of just any movement. This was shown for
informational meaning. Narrow range may be used to signal the perception of ‘surprise’ in Dutch in [35]. When the
negation, a withdrawal of information. In addition to the more contour’s main pitch rise was a realisation of H* H%,
obvious meanings of ‘surprise’ and ‘agitation’, affective perceived surpise went up with the raising of the targets of
meanings include ‘obligingness': the speaker is here concerned both H* and H%. However, when the rise was a realisation of
to help the listener to understand what he is saying. L*H H%, perceived surprise went up when the target of L*
was lowered, and that of H% raised (see Fig 2).
3.1. The informational interpretation of the Effort
The most obvious informational interpretation of the Effort
Code is ‘emphasis’: the speaker is concerned that his message
should come across. The overall pitch range of utterances in
British English radio news bulletins correlates with
informational salience, as determined independently of the
acoustics [31].
Many perception experiments, beginning with [32], have
shown that higher pitch peaks sound more prominent,
everything else being equal. Interestingly, the effect is not
simply due to peak height. Rather, it is an estimate of how
wide the pitch excursion is, given some choice of pitch
register, and the listener's impression therefore results from an
estimate of the pitch span in relation to some choice of pitch
register. The most straightforward way in which this can be
demonstrated is by having listeners judge the prominence of
peaks in identical pitch contours superimposed on a male and
a female voice, as reported in [33]. In this experiment, the
original utterances, which had been recorded by a woman
with a fairly ‘deep' voice, were provided with artificial spectra
by multiplying the first formants with a factor or less than 1,
so as to create a set of stimuli that sounded as if they were
spoken by a man. A second set of stimuli was obtained by
multiplying the original formant values by a factor of more
than 1, so as to create a set that sounded as if they were
spoken by a woman whose voice was subjectively more
feminine than the original voice. Listeners rated pitch peaks
in the artificial male voice as more prominent than the
equivalent pitch peaks in the artificial female voice, even
though the pitch contours were identical. These results can be
explained if we assume that prominence judgements are made
relative to some hypothesised reference line, as represented
by the the contour's register. Since the hypothesised register Figure 2. Perceived surprise scores as a function of
of the ‘female’ speaker was higher than that of the ‘male’ beginning and end of nuclear contour, separately for
speaker, perceived prominence of the female stimuli was less H*H% (panel a) and L*HH% (panel b). From [35 ].
than that of the male stimuli. Thus, the Effort Code is about
inferred pitch excursion size, not height of pitch per se (see The earliest perception research into intonational meaning
Figure 2). In section 6, where pitch register is argued to be found that rising-falling and falling-rising contours
(representing a change of pitch direction and contrasting in the
experiment with stimuli having less pitch excursion) signalled of the affective interpretation of the Effort Code. This
the meanings ‘authoritative’ and ‘pleasant’. This result morpheme would consist of an initial unspecified boundary
illustrates, respectively, the informational and the affective %T, whose identity (%H or %L) is determined by the identity
interpretations of the Effort Code [36],[18]. of the following T*, as summarised in Table I.

3.3. Grammaticalisation of the Effort Code Table I. Positive speaker evaluation of negative
Grammaticalisation of the informational interpretation of the polarity of initial boundary tone in Dutch. After [35]
Effort Code is commonplace in the expression of focus. In
such cases, the intonational structure will favour a situation %H L* %L H*
whereby focused information will be characterised by
relatively wide pitch excursions. Germanic languages and, to
a lesser extent, Romance languages use pitch accents to mark
focused parts of sentences, removing these in the sentence %L L* %H H*
constituents after the focus. Because it is mediated through a
grammar, the expression of focus through deaccentuation will NEGATIVE EVALUATION
be subject to restrictions that vary from language to language.
The constituent that allows focus contrasts to be expressed is
at least as small as the word in Dutch, which allows contrasts
I have no examples of ‘unnatural’ grammatical focus
like ZWARTE driehoek vs zwarte DRIEHOEK ‘black
expression. At best, expressions with and without focus may
triangle’ to signal the known informational status of driehoek
have equal pitch excursions, in situations in which focus is
and zwarte, respectively. By contrast, Italian does not allow
expresed in the morpho-syntax, as in Wolof [41].
NP-internal contrasts, and as a result TRIANGOLO NERO
‘black triangle’ is the neutralising translation of both Dutch
expressions [36]. In Basque, the focus constituent requires the
4. The Production Code
presence of a pitch accent, but oddly, since the presence of A very different interpretation of the process of energy
pitch accents is largely lexically determined, not all words generation relies on the fact that speakers appear to spend
are equally focusable [38]. In Japanese, compound words that more effort on the beginning of utterances than on the ends.
consist of a single accentual phrase do not allow the focus This impression originates from a correlation between
constituent to be confined to a sub-compound constituent [4]. utterances and breath groups: at the beginning of the
A different type of grammaticalisation occurs in exhalation phase, subglottal air pressure will be higher than
languages that use different pitch accents for narrow towards its end. A natural consequence of the fall-off in
(contrastive) focus and neutral focus, like Bengali [27] and energy is a gradual drop in intensity, and a weak, gradual
European Portuguese [39]. In such cases, a one-word lowering of the fundamental frequency [13], known as
utterance with contrastive focus is phonologically different ‘declination’ [42]. The communicative exploitation of this
from a neutral citation pronunciation of the same word. In effect is the Production Code, which associates high pitch with
line with the Effort Code, the contrastive pitch accent will be the utterance beginning and low pitch with its end.
realised with greater pitch excursion on the accented syllable.
In European Portuguese, the narrow focus pitch accent has a 4.1. Informational interpretations of the Production
peak in the accented syllable (H*+L), while the neutral pitch Code
accent has a fall that ends inside the accented syllable
(H+L*), causing the contrastively accented syllable to have As far as the Production Code is concerned, the significance of
the wider pitch excursion. The Bengali case is given in declination does not lie in its slope. Rather, it is variation at
section 6. A third way in which pitch excursion has been the edges that is interpreted in terms of initiation and finality.
grammaticalised is through the suspension of downstep. In Thus, high beginnings signal new topics, low beginnings
Japanese, prosodic phrasing is sensitive to focus structure, continuations of topics. A reverse relation holds for the
and the most salient consequence of this is that an otherwise utterance end: high endings signal continuation, low endings
automatic lowering of the pitch range cannot take place in a finality and end of turn. Grammaticalisations of these relations
focused constituent. is commonly found for the utterance end, when a H% signals
A grammaticalisation of the ‘obligingness’ interpretation continuation, but may also be found in the use of initial %H to
may have been found by [40]. They investigated the signal topic refreshment. The Production Code would appear
pragmatic effects of high-pitched and low-pitched realisations to have informational meanings only.
of the utterance-initial unaccented syllables before the first The interrogative and continuative meanings of final rises
pitch accent in Dutch. High onsets (%H) before a low- in languages like Dutch [43], therefore, have quite different
pitched accented syllable (L*) were more positively evaluated explanations under the present account, since the first is
than low onsets (%L) on each of four scales measuring the derived from the Frequency Code and the second from the
speaker's disposition towards the hearer, Non-aloofness, Production Code. Earlier, these meanings had been collapsed
Friendliness, Politeness and Non-aggressiveness. However, as ‘open’ in [44],[8]. Various research results suggest that
low onsets were more positively evaluated before high- where both cues exist, the continuation cue is lower than the
pitched (H*) accented syllables than high onsets. In other interrogative cue. This is true for Dutch, where L*H or H*
words, movement towards the accented syllable, regardless of followed by a level pitch until the intonational phrase
direction, was positively evaluated and absence of movement boundary, is likely to be interpreted as a continuation cue,
received negative evaluations. Arguably, choice of onset while the addition of H%, which is realised as an additional
represents an ‘obligingness’ morpheme, a grammaticalisation rise at the boundary, will cause a shift towards question
interpretation [43]. Overall slope in Danish, used
concomitanty with variation in end pitch, is similarly linked by late peaks. First, due to the Effort Code, late peaks sound
to interrogativity for the least steep slopes, with continuation more prominent than early peaks. Strictly speaking, this is a
for the medium slopes, and with statements for the steepest two-step inference on the part of the listener: (1) high peaks
slopes [45]. Arguably, this result follows from the fact that, can indicate wide pitch span, and (2) late peaks can indicate
for the purposes of the Production Code, the variation at the high peaks. Indeed, both higher and later peaks elicit more
end of the utterance falls within a lower frequency band than ‘unusual occurrence’ interpretations than ‘everyday
that at the beginning of the utterance, while the variation for occurrence’ interpretations of one-peak realisations of The
the Frequency Code is free from this downward bias. aLARM went off, as shown by [46], suggesting that listeners
Conversely, we would expect that interrogativity marking at perceive late peaks as if they were higher. Moreover, in
the beginning of the utterance, like H% in Malay, can have research on the difference between wide focus and narrow
lower pitch than that used for the signalling of a new topic. focus in the Hamburg dialect of German, it was found that
The downward slope is commonly grammaticalised, as narrow focus was realised by later peaks, suggesting again that
downstep. In a frequent type, H after L is pronounced at a speakers use it to signal high pitch [47].
categorically lower pitch than a preceding H. Such A grammaticalisation of late peak vs early peak occurs in
grammaticalisations may be purely phonological, i.e. European Portuguese, which has H*+L for narrow focus and
meaningless (except for the information provided by the fact H+L* for neutral focus [39], which latter pitch accent, again,
that the downstep context is confined to some prosodic is also lower, as noted in section 4.1. In these cases, the later
constituent, which will indirectly reveal the morpho-syntactic peak does not conflict with the primary variable, pitch span,
structure). Final Lowering, like the raising of the pitch at the since the pitch span in the accented syllable will not be
beginning of phrases, in gradient in English, but it may be smaller than in the neutral syllable. However, the use of peak
phonologised too, as it is in various African tone languages. delay for emphasis is constrained by the competition from
primary correlate of the Effort Code, the pitch span. Since the
5. Substitute variables in F0 variation nuclear syllable is a prime location for the pitch span cue,
narrow focus is often indicated by a pitch accent describing a
An important aspect of the present conception of intonational fall within the stressed syllable, while the pitch fall in the
meaning is that while the nature of the meanings is related to neutral focus case falls outside it [47]. For instance,
the way our speech organs produce pitch variation, there is no prenuclear pitch accents would appear to be L*+H in Spanish,
implication that the physical conditions that lie at the basis of and nuclear, focal ones H*+L [49].
these meanings need to be present in order to create the forms. As for the Frequency Code, there have been reports of
Speakers and listeners know what these form-function languages that use a later peak to mark question intonation
relations are, and will produce the forms in the way they see and an earlier one for statement intonation, such as southern
fit. To indicate the start of a new topic, the idea is not that the varieties of Italian [50]. The difference is interpreted as
speaker should breathe in at the beginning of his utterance, but categorical by Grice, suggesting that we are dealing with a
that he should produce sufficiently high pitch at that point to grammaticalised form of an informational interpretation of
convince his listener of his communinicative intention. It is in this secondary effect of the Frequency Code. Recently, it has
fact possible to use substitute features, phonetic forms that the been found that nuclear peaks in Dutch questions are 40 ms
listener can associate indirectly with the primary form. Two later than in declaratives [51]. Here, the effect is almost
cases are discussed. First, peak delay can signal high pitch, certainly phonetic. An affective interpretation of the
and thus all the meanings of high pitch, and second, that high Frequency Code can be found in the fact that delayed
pitch can be used to signal wide pitch span. accentual peaks in Japanese are associated with female speech
[52]. A demonstration of the universality of the connection
5.1. Peak delay as a substitute for peak height
between peak delay and interrogative intonation was provided
A higher pitch peak will take longer to reach than a lower one, in the experiment reported in [22]. In addition to end pitch
if rate of change is the same. Therefore, higher peaks will tend and peak height, their stimuli also varied in peak alignment.
to be later than lower peaks, as suggested by Figure 3. Regardless of language background, Hungarian, Chinese and
Speakers and listeners have tacit knowledge of this Dutch listeners associated not only higher peaks and higher
mechanical connection, providing them an opportunity to end pitch with questions, but also later peaks. This results
bring it under control. Peak delay can therefore be used as an showed quite ambiguously that humans know both the direct
enhancement of, or even a substitute for, pitch raising. and indirect manifestations of the Frequency Code (see Fig
Hz Finally, the Production Code: [31],[53] found that first
Raised and
delayed peaks of intonational phrases containing new topics in British
English were later than other first peaks. This finding can be
related to this code, which links high beginnings to new
topics. The high beginning is expressed in the first accentual
peak, whose late timing enhances the high pitch.

Figure 3. Hypothesized relation between high peaks

and late peaks. From [3].

As a result, the meanings derived from the three biological

codes that are associated with high pitch may also be signalled
monolingual speakers and to researchers interested in
intonational typology. There are many details to be explained,
such as the impression - if it is correct - that the Frequency
80 Code more easily gives up its iconicity than the Effort Code or
the Production Code. Also, the question of how much liberty
% Questions

60 speech communities have in exploiting these codes – to what

Chinese extent the expoitation of the phonetic space by their grammars
40 limits them in their use, and to what extent speech
Hungarian communities can decide to use one meaning rather than
another where meanings are conflict, as in the case Dutch
0 listeners’ interpretation of high register as ‘emphatic’ rather
than’friendly’. The experience of the former British Prime

Minister Margaret Thatcher is illustrative. She was

apparently following the advice of speech consultants when
lowering her pitch with an aim to sound authoritative
(Frequency Code), but was frequently interrupted by
Figure 4. Percentage “Question” judgements as a interviewers as a result, because the way she moved to low
function of peak condition by three groups of listeners, pitch resembled the way she produced end of turn signals
with ordinal interaction between Language and Peak (Production Code) [56]. Conventionalisations must of course
Condition. From [22]. remain within the semantic/pragmatic framework operative in
the phonetic implementation: they cannot reverse the universal
5.2. High register as a substitute of pitch span form-function relations.
High register may be used as a substitute of wide pitch span, Another point is that grammaticalisation not only refers to
as demonstrated by the results of [54], to be reported at this the phonology, but also to the semantics. This is particularly
conference. They show that, unlike British English listeners, clear in the case of the meanings of pitch accents and of pitch
Dutch listeners are prepared to interpret high register as accent distributions in English, which form a system whose
signalling emphasis. An interesting corrolary of this appears to complexity goes beyond what seems possible in the phonetic
be that for Dutch listeners, high register is ‘occupied’ by the implementation. I give a brief example of each.
Effort Code. In [55], Dutch and English listeners were asked
to rate stimuli which varied in overall register for 6.1. Contours
‘friendliness’, in Dutch and English stimuli, respectively. Within autosegmental approaches to intonation, there have
Dutch listeners were considerably less inclined than British been two proposals for the semantics of intonation contours,
English listeners to perceive the variation in register in terms [57],[58]. It is hard to evaluate the compatibility of these
of ‘friendliness’ variation, as shown in Figure 4. proposals, but for the sake of the argument, I summarize three
elements of [57, ch 6].

a. H*L-type contours label the linguistic constituents for

55 addition to the discourse model: the speaker commits himself

50 Dutch to the inclusion of the information in the model;

45 b. H*L H%-type contours label the linguistic constituents for
40 English
selection from the discourse model. The speaker acts as if the
35 discourse model already contained it;
1 2 3 4 5 c. L*H-type contours label the information as potentially
Pitch Register belonging to the discourse model, the hearer being invited to
resolve this. This was labelled testing.
1:lowest; 5:highest;
Meanings b. and c. in particular seem too specific for them to
be directly derivable from the biological codes. Ward and
Figure 4 Interaction of Language and Register for Hirschberg [58] give (1a) to show that the speaker cannot
perceived friendliness scores in British English and appeal to the listener to consider pies to be part of the set of
Dutch. From [55]. likeable things to which jello belongs, as he knows this to be
untrue. In (1b), the implication goes through. With a fall, the
6. Grammatical meaning implication, and with it the contradiction, disappears.

So far, a picture has been painted whereby form-function (1) A: Do you have jello?
relations are available to all humans, which language learners a. B: We have \pie/
will grammaticalise, after which language change may b. #B: We have \pie/, which we know you won’t eat
destroy them, such that grammatical forms may have c. B: We have \pie, which we know you won’t eat.
meanings that are the opposite of what would be expected. As
a broad frame of reference, this picture has served well to The system in [57] is compositional, but to a lesser extent than
make sense of many well-known form-function relations, and advocated by [58], who essentially consider every tone a
of the fact that intonation is at the same time structural, morpheme. The compositionality of [57] comes in as
discrete, and often has arbitrary form-function relations, while ‘modifications’, to be expressed as affixes or tone deletions,
on the other hand seems overwhelmingly iconic, both to
which meaning components apply to classes of contours. An it is different after an instantiation of (2b). Non-eventive
example is L*-prefixation, which adds significance to every sentences fall into two categories, DEFINITIONAL, which
one of the three meanings (‘delay’). update the attendant circumstances, and CONTINGENCY,
which does the same, but had the additional meaning that the
6.2. Pitch accent distribution speaker claims not to know if the update is at all relevant (see
Fig. 5). The three types have different forms in English. First,
The same point can be made with respect to pitch accent
eventive sentences have no accent on the predicate if it is
distribution in English. There are precise semantic effects of
adjacent to an accented argument (subject or object).
the type illustrated in (2). In (2a), the usual rendering of the
Definitional sentences only allow unaccented focused
proverb, the presence of accent on spoil is obligatroy for the
predicates when adjacent to an accented object. Contingency
interpretation whereby the many cooks in the subject are only
sentences are distinct from definitional sentences in requiring
potential. Without the accent, as in (2b), the proposition
accent on the negator in the VP, and in requiring accent on
becomes eventive [57, ch 2], such the the speaker commits
the predicate even when adjacent to an accented object. The
himself to the belief that there actually are too many cooks
three types are distinct in a negative subject-predicate
spoiling the broth.
sentence, therefore, as shown in (3), (4) and (5).
(2) a. TOO many COOKS SPOIL the BROTH (proverb)
(3) (A: What’s that scuffle?)
b. TOO many COOKS spoil the BROTH
B: Our CUSTomers aren't admitted! (Eventive)
(implying e.g. that soups need to be taken off the menu)
(4) CUstomers aren't adMITted
(This is the way it is: Definitional)
6.3. Negotiating shared understanding (5) Our CUSTomers AREN'T adMITtEd
(In case you had forgotten: Contingency)
The grammatical meanings of intonational morphemes are
labels that tell the listener to what extent the information
represents an update of the shared understanding he is
negotiating with the speaker. The first distinction is between
status-quo information (background, old information) and
update information (focus, new information). Status-quo
information is deaccented: no pitch accents appear after the
focus constituent. (Before the focus, pitch accents may be
added for rhythmical reasons.) The meanings of the pitch
accents(-cum-boundary tones) concern the relation of the
focus to the background [60],[57],58],[61]. I illustrate the
above three meanings in Table X.

Table II. Three meanings acquiring different

interpretations depending on whether the speaker’s or
the hearer’s conception of the shared understanding is
being modified.

Expression Speaker-serving Hearer-serving

Figure 5. Graphical representation of three meanings
A falling contour can be an inference, when the addition is to of intonational contours, and three meanings of pitch
the speaker’s own conception of the shared knowledge, but accent distribution. The shaded area represents the
supplies information to the hearer when the latter’s conception focus constituent, the larger area the shared
of the model gets updated. A falling-rising contour can be a understanding.
puzzled realisation when made for the speaker’s own benefit,
but a reminder when made for the hearer’s benefit. Finally, a 7. Summary and Conclusion
rising contour for the speaker’s benefit represents a question, a
request for information, but is a challenge when performed for Universal meaning in intonation derives from three biological
the hearer’s benefit (‘Are you really sure this is part of our codes, the Frequency Code, the Effort Code and the
background?’). Production Code. The codes are biological in the sense that
Accent distribution is used for distinguishing between they represent aspects of the speech production mechanism
updates of the historical record, in which case the hearer will that affect rate of vocal cord vibration. Speakers have brought
know the world is a different place from what he believed it these effects of the ‘hardware’ under control. The fact that
to be before processing the speaker’s utterance, from speakers take charge of these aspects of speech production fits
attendant circumstances, where the update concerns his into a larger picture of speaker control [62]. Speakers control
knowledge of things that already were that way before he the phonetic implementation of linguistic expression for a
processed the utterance. The former type was labelled wide variety of reasons, among which are social positioning,
EVENTIVE is [56]. The proverb (2a) is non-eventive: the maximisation of the discriminability of phonological contrasts,
world is the same before and after an instantiation of (2a), but and the recruitment of iconic uses of the voice to aid the
expression of the meaning of their linguistic expression. The Grammaticalisation will also affect the semantics of tonal
exploitation of the biological codes in intonation is similarly forms. There would appear to be a systematisation of meaning
controlled during phonetic implementation. for the expression of information structure which goes beyond
It was stressed that in order to express these meanings, what would be expected of a direct form-function relation of
speakers need not create the physiological conditions which the type found in animal communication and paralinguistic
are associated with them through any of the three codes. In at meaning. Meanings like SELECTION and CONTINGENCY
least one case, this would be physically impossible: we cannot were given as examples.
reduce or enlarge the size of our larynx to manipulate pitch The account of the position of intonation in language
for the purposes of the Frequency Code. Similarly, they do presented here presupposes a principled distinction between
not have to take in more air to produce higher utterance phonetics and phonology, and to the extent that it is
beginnings signalling new topics (Production Code), or speak convincing, amounts to a further argument for making it:
slovenly so as to have low pitch excursions signalling a lack without it, we lose the basis on which we distinguish the
of interest (Effort Code) (even though in these latter cases universal, non-linguistic (in the sense of non-structural)
they might). system of communication employed in phonetic
A number of interpretations of the Effort Code were implementation, and the linguistic system, which is embedded
identified. An informational interpretation is emphasis, which in the grammar, and for that reason potentially invested with
is due to the interpretation of effort as the speakers's intention arbitrary (i.e., non-iconic) meanings. questions or femininity
to underscore the importance of the message. Affective (Frequency Code) and are more likely to signal new topics
interpretations include surprise and obligingness. The latter (Production Code). Similarly, wide pitch span may be
meaning is due to the interpretation of effort as the speaker's signalled by high pitch register.
intention to appear clear and unambiguous. The Production The exploitation of these universal meanings will to some
Code is due to the effect of energy dissipation in the course of extent be conventionalised within speech communities. For
the utterance. Its interpretations are informational only: high instance, mean F0 of German speakers was found to correlate
beginnings signal newness of topic, low beginnings the positively with ratings for such personality traits as lack of
opposite, and high endings signal continuation, low endings autonomy, dependability and likeability, while in the case of
its opposite. The Frequency Code is widely used for the American males, mean F0 correlated positively with
expression of affective meanings. These include masculinity, dominance, authority and competence [63]. Evidently, the
authoritativeness/ assertiveness, and protectiveness (low German speakers were understood to be signalling the
pitch) and femininity, submissiveness/friendliness, and feminine meanings of the Frequency Code, while the
vulnerability (high pitch). The informational interpretation is American speakers were understood to be signalling the
‘certainness’, leading to distinctions in ‘sentence mode’, the significance meanings of the Effort Code. This difference in
difference between statements and questions. interpretation may just be culturally determined, in which
Grammaticalisations of the paralinguistic meanings are case the phonetic parameters might well have been the same,
common in the case of the informational interpretations. In or else the German speakers showed less pitch excursions
fact, the only case of an ‘affective’ morpheme was presented than did the American speakers (information which is lost
for Dutch, which arguably has a polar %T signalling when data are represented by the mean F0 over utterances, as
‘obligingness’. Informational grammatilisations concen the in [63]). Also, due to the way different phonological systems
significance of (parts of) the message (Effort Code), to use the available phonetic space differently, languages will
continuation vs end of turn (Production Code), and question vary in the scope they allow fro the expression of universil
vs. statement (Frequency Code). meanings. This may be the explanation of the fact that the
Pitch height in peaks can in part be enhanced or taken wide-span L*HH% contour sounds more aggressive in
over by peak delay, due to the mechanical connection answers to questions in Dutch than in British English. In
between high peaks and late peaks, which explains why later order to signal the TESTING meaning (‘challenge’) the
peaks sound more prominent (Effort Code), are more likely to speaker must go beyond the usual pitch span, and since Dutch
signal when data are represented by the mean F0 over has a narrower pitch span than British English, this effect is
utterances, as in [63]). Also, due to the way their phonologies more obtained more readily in Dutch [64].
use the available phonetic space differently, languages will When the form-function relations become
vary in the scope they allow for the expression of universal grammaticalised, there is no longer a guarantee that they are
meanings. This may be the explanation of the fact that the maintained, since they are subject to the forces of
wide-span L*HH% contour sounds more aggressive on phonological change. Loss of iconicity seems common in the
answers to questions in Dutch than in British English: in order case of the informational interpretation of the Frequency
to signal the TESTING meaning assumed to be responsible Code, i.e., in the case of question and statement intonation. Of
for the negative effect (‘challenge’) the speaker must go these, statement intonation is less commonly non-falling than
beyond the usual kind of pitch span that signals friendliness. question intonation is non-rising [1]. This may have to do
Since Dutch uses a narrower pitch span than British English, with the fact that high pitch for questions need not be located
a difference in interpretation could result [64]. at the end of the utterance.
When the universal form-function relations become Grammaticalisation will also affect the semantics of tonal
grammaticalised, and thus are encoded in the discrete morphemes. Theer would appear to be a systematisationof
prosodic structures of the language, there is no longer a meaning for the expression of information strcuture which
guarantee that they are maintained. Loss of iconicity seems goes beyond what would be expected of a direct form
common in the case of the informational interpretation of the function relation of the type found in paralinguistic meaning.
Freuqency Code, i.e., in the case of question and statement The account of the position of intonation in language
intonation. presented here presupposes a principled distinction between
phonetics and phonology, and to the extent that it is
convincing, amounts to a further argument for making it. grammar, with its potentially arbitrary form-meaning
Without it, we lose the basis on which we distinguish the relations.
universal, non-linguistic (in the sense of non-structural)
system of communication employed in phonetic Acknowledgement. I thank Aoju Chen for useful comments
implementation from the linguistic system embedded in the on an earlier version of this text.

