9th International Conference on Speech Prosody 2018
13-16 June 2018, Poznań, Poland
The role of pragmatics and politeness in explaining prosodic variability
Stella Gryllia1, Mary Baltazani2, Amalia Arvaniti3
1
University of Leiden, Netherlands
2
University of Oxford, UK
3
University of Kent, UK
s.gryllia@hum.leidenuniv.nl, mary.baltazani@phon.ox.ac.uk, a.arvaniti@kent.ac.uk
mentioned, questions with this tune are used by speakers and
interpreted by listeners as straightforward questions, since they
are formally questions; however, the interpretation of implicit
statement is robust, considering that the only cue for this
interpretation is the tune: the two interpretations can apply even
when questions are string identical [3].
This brief description of the phonology and pragmatics of
the two tunes is based on large samples of perceptual data [3],
but the production studies on the realization of the tunes are
based on small numbers of participants; e.g. [2] and [3] were
both based on four speakers each. Here we use a large sample
of questions elicited from 20 Greek speakers to examine in
more detail the phonetics of the two tunes in relation to the
possible contexts in which they appear; we also consider the
role of politeness in the realization of the tunes. Specifically we
examine if the differentiation between the two tunes applies in
this large sample in order to determine whether (a) the two tunes
are indeed distinct and used consistently in response to different
pragmatic contexts; and (b) whether politeness affects the
choice of tune, as the results of [3] indicate, or leads instead to
gradient changes in realization, as discussed in the literature of
politeness and prosody (cf. [4], [5]).
Abstract
Twenty speakers (10F, 10M) took part in a discourse
completion task (DCT) to examine effects of politeness and
context on tunes used with wh-questions in Greek: they heard
and saw on screen short scenarios ending in a wh-question.
DCTs were controlled for power, solidarity, and context (with
scenarios leading to the wh-questions being used either to
request information or to make a statement). The results
confirmed the role of context: the two context types led to the
elicitation of distinct tunes, L*+H L-!H% for informationseeking questions, and L+H* L-L% for implicit statements,
with lower scaling and later alignment of the accentual H in the
former, and differences in final F0 consistent with a !H% and
L% boundary tone respectively. In addition, questions after
information contexts were shorter, but with a significantly
longer final vowel. Politeness also affected duration, with
conditions requiring a greater degree of politeness (the
addressee being non-solidary and of different social status than
the speaker) leading to lower speaking rate. The results indicate
that tunes are associated with different durational profiles,
which are also influenced by politeness. These results support
recent studies showing that the study of intonation must include
parameters beyond F0.
2. Methods
Index Terms: intonation, politeness, Greek, wh-questions
2.1. Participants
1. Introduction
Twenty native speakers of Standard Athenian Greek (10
females, and 10 males) were recorded. They were between 18
and 24 years old (mean 21, s.d. 2). At the time of the recording
they were all students at the University of Athens, the
University of Patras or the TEI of Western Greece (also located
in Patras). They had all been brought up in Athens and had lived
there most of their lives.
Greek wh-questions are typically produced with one of two
tunes, which in autosegmental terms are represented as L*+H
L-!H% and L+H* L-L% [1], [2], [3]. Previous research has
shown that straightforward questions used to request
information are typically, though not exclusively, produced with
the former tune. This tune is also rated more polite by native
speakers of Greek, though only when produced by female
talkers (while the use of either tune is considered equally polite
for male talkers), [3].
The L+H* L-L% tune can also be used for straightforward
questions, but it is also used by speakers to make a statement or
assert an opinion, typically a negative one; we refer to questions
used in this way as implicit statements. As an illustration, in the
example shown in (1), the speaker uses a question in order to
elicit information from the addressee; in (2), however, when the
speaker finds out there is no milk, she uses the question What
will I make him cappuccino with now? to assert that making
cappuccino is impossible and thus to express displeasure at the
addressee’s lack of foresight which led them to not buying a
sufficient quantity of milk. We note that the fact that this
assertion takes the form of a question does not make it less
confrontational than an overt statement, i.e. using a question
here with the L+H* L-L% tune is not a politeness device. As
2.2. Materials
The materials were a set of scenarios that ended with a whquestion. The scenarios were created so as to take into account
two elements, pragmatic context and politeness, with the latter
being operationalized as power and solidarity (ideally, degree
of imposition should also be included in the exploration of
politeness, but the use of wh-questions precluded use of this
politeness-related factor). Two types of contexts were used,
following [3]: context A provided participants with a situation
which naturally ended with asking a question in order to receive
information (see 1 below); context B provided participants with
a situation in which the wh-question was unlikely to serve its
primary purpose of seeking information and was used instead
as an implicit statement (see 2 below). In addition, the contexts
were created so as to manipulate power and solidarity:
situations were either between solidary or between non-solidary
158
10.21437/SpeechProsody.2018-32
3.
Scaling of the final boundary tone (FB), i.e. the F0
value in ERB at the end of the contour;
4. Speaking rate over the entire question, defined as the
duration of the question divided by the number of
syllables (as defined by each question’s phonological
representation);
5. Pitch range in ERB, defined as the difference between
the maximum and minimum F0 of each question;
6. The duration of the accentual vowel and the last vowel
in the question.
As we were interested in systematic differences between
tunes, rather than individual differences among speakers, we
analyzed the resulting data using linear mixed effects models
with CONTEXT, POWER, and SOLIDARITY as fixed factors, and
speakers and items as random intercepts [9].
Based on previous results [3], we hypothesized that there
would be systematic differences between questions uttered after
context A and context B. Specifically, we expected that context
A would favor the use of the L*+H L-!H% tune and context B
would favor the use of the L+H* L-L% tune. Consequently we
expected that after context A questions would show later AH
alignment, and higher FB scaling; observation suggested that
the accentual and final vowel would also be longer than after
context B. We further expected that politeness would affect
production in a gradient manner: specifically we expected that
when the addressee was non-solidary or non-equal in power to
the speaker, questions would be produced with lower speaking
rate and extended pitch range.
speakers; within each set, speakers were equal, or unequal with
the speaker being either superior or inferior to the addressee.
There were 10 scenarios per combination of context, power and
solidarity for a total of 120 wh-questions (2 contexts * 2 power
* 3 solidarity * 10 scenarios).
(1) Context A
You return home from school hungry. You want to eat
but only if dinner will be ready within half an hour, as
you are going to the pool for swim practice and you
don’t want to feel full. You see your mother boiling
pasta and you ask:
[ˈpote θa ˈine etimi i makaɾoˈnaða]
When will the pasta be ready?
(2) Context B
When your dad does the grocery shopping, he gets
stingy and does not buy enough of anything. One day
you have many friends at home and you run out of milk
although you had warned him about it. When a friend
asks you for a double cappuccino, you tell your dad:
[meˈti na tu ˈftçakso ˈtoɾa toŋ gapuˈtsino]
What will I make him cappuccino with now?
The questions varied with each scenario but had similar
structure. They varied in number of syllables from 9 to 16
(mean = 11.5, mode = 11, s.d. 1.53). The wh-words or
expressions varied so that half began with a stressed syllable
(e.g. [ˈti] “what”, [ˈpote] “when), while the other half begun
with an unstressed syllable (e.g. [me ˈti] “with what”). In
addition, in all questions (with a few exceptions), the stressed
syllable of the wh-word and the next stressed syllable were
separated by two unstressed syllables, which is optimal in
Greek [6]. All questions ended in words with penultimate stress
(the default in Greek).
In total, 2400 utterances were recorded. The data of one
male participant (M3) were discarded as it turned out that he
had failed to respond to many of the stimuli. This yielded a total
of 2280 usable tokens from 19 speakers (10 females and 9
males).
2.3. Procedures
The recordings took place at the University of Patras. The
participants were recorded in a quiet room, using a laptop and a
Yeti microphone set to cardioid. The task was an amended
version of the Discourse Completion Task [7], in that the
participants were given both the description of a situation (the
scenario) and the text with which to respond to it (the whquestion). Each scenario and question combination was
presented on a Powerpoint slide, one combination at a time. The
participants saw the scenarios on screen and heard a
prerecorded version. They saw the questions only in writing and
were asked to utter them in a way appropriate for the situation
described in the scenario. The slides were presented in random
order, with each speaker being presented with a different
randomization.
Figure 1. Illustration of measurements. Top panel:
[ˈpote na peˈɾaso na to ˈpaɾo] “when should I come
pick it up?” in response to a context of type A; bottom
panel: [ʝaˈti na ti ˈblino ti bʝaˈtela] “why should I wash
the platter?” in response to a context of type B.
2.4. Measurements and statistical analysis
The questions were annotated using the facilities of Praat [8].
Here we report on the following measurements, which are
illustrated in Figure 1:
1. Accentual H (AH) scaling; the F0 value in ERB at the
highest point in the contour; this was always located
in the vicinity of the stressed vowel of the wh-word;
2. Accentual H alignment, i.e. the distance of AH from
the onset of the stressed vowel of the wh-word in ms;
3. Results
3.1. Accentual High
We found a significant effect of CONTEXT on the scaling of the
accentual H, AH [est. = −0.138, SE = 0.030, t = −4.558]: in
context B (question as implicit statement), AH was significantly
159
higher (𝑥𝑥� = 6.83 ERB) than in context A (𝑥𝑥� = 6.70 ERB); see
Figure 2. There was no effect of SOLIDARITY [est. = −0.009, SE
= 0.030, t = −0.288], or POWER [for equal compared to inferior,
est. = 0.001, SE 0.037, t = 0.035; for superior compared to
inferior, est. = −0.052, SE = 0.037, t = −1.414].
POWER for the comparison between inferior and superior
[est. = 0.031, SE = 0.031, t = 1.005]; see Figure 4.
3.3. Speaking rate
CONTEXT had no effect on speaking rate [est. = −0.869, SE
= 0.711, t = −1.22]. SOLIDARITY, on the other hand, affected
speaking rate [est. = −5.919, SE = 0.711, t = −8.32], such
that speaking rate was significantly slower when the
speakers were non-solidary (𝑥𝑥� = 147.06 ms/syllable) than
when they were solidary (𝑥𝑥� = 141.15 ms/syllable). We also
found an effect of POWER; in scenarios where the speaker
was inferior to the addressee, the speaking rate was
significantly slower (𝑥𝑥� = 146.27 ms/syllable) than when the
two were equal (𝑥𝑥� = 141.44 ms/syllable) [est. = −4.907, SE
= 0.869, t = −5.65]. There was no effect of POWER in the
comparison of inferior with superior [est. = −1.701, SE =
0.871, t = −1.95]; see Figure 5.
Figure 2. Effect of CONTEXT on the scaling of AH.
There was also a significant effect of CONTEXT on the alignment
of AH [est. = 0.057, SE = 0.004, t = 14.234], with AH aligning
significantly earlier in context B (𝑥𝑥� = 38 ms) than context A
(𝑥𝑥� = 44 ms). SOLIDARITY also had a significant effect on the
alignment of AH [est. = −0.009, SE = 0.004, t = −2.232]: AH
was aligned significantly earlier when the speakers were
solidary (𝑥𝑥� = 41 ms) than non-solidary (𝑥𝑥� = 42 ms), though the
difference was minimal. There was a similarly small effect of
POWER on AH alignment for the contrast inferior to superior
[est. = 0.011, SE = 0.005, t = 2.158], such that AH was aligned
earlier (𝑥𝑥� = 41 ms) when the speaker was inferior to the
addressee than when the speaker was superior (𝑥𝑥� = 42 ms). On
the other hand, there was no effect of power for the comparison
between inferior and equal [est. = −0.005, SE = 0.005, t =
−1.000]; see Figure 3.
Figure 4. Effects of CONTEXT, SOLIDARITY and POWER
on the scaling of FB.
Figure 3. Effects of CONTEXT, SOLIDARITY and POWER
on the alignment of AH.
3.2. Final Boundary Tone
CONTEXT had a significant effect on the scaling of the final
boundary tone, FB [est. = 0.379, SE = 0.026, t = 14.834]; in
context B, FB was significantly lower (𝑥𝑥�= 4.54 ERB) than
in context A (𝑥𝑥� = 4.92 ERB). SOLIDARITY also had a
significant effect on the scaling of FB [est. = −0.070, SE =
0.026, t = −2.745]; FB was higher when the speakers were
non-solidary (𝑥𝑥� = 4.77 ERB) than when they were solidary
(𝑥𝑥� = 4.69 ERB). POWER partially affected the scaling of
FB; when the speaker was equal to the addressee, FB was
significantly lower (𝑥𝑥� = 4.63 ERB) than when the speaker
was inferior to the addressee (𝑥𝑥� = 4.76 ERB), [est. =
−0.139, SE = 0.031, t = −4.439]. There was no effect of
Figure 5. Effects of CONTEXT, SOLIDARITY and POWER on
speaking rate.
3.4. Pitch range
had a significant effect on pitch range [est. = −0.106,
SE = 0.0419, t = −2.530], which was larger in context B (𝑥𝑥� =
2.45 ERB) than in context A (𝑥𝑥� = 2.35 ERB); see Figure 6. We
found no effect of SOLIDARITY [est. = −0.038, SE = 0.042, t =
−0.914] or POWER [for equal compared to inferior, est. = 0.036,
SE = 0.051, t = 0.702; for superior compared to inferior, est. =
−0.030, SE = 0.051, t = −0.586].
CONTEXT
160
was primarily used in response to context B. In addition, the
difference in the scaling of the final boundary tone (FB), is also
consistent with earlier descriptions that prototypical questions
in Greek are more likely to end in a !H% boundary tone, while
those used as implicit statements are more likely to end in L%.
In addition, the data show effects of the tune on segmentals,
particularly the longer duration of the final vowel in context A.
Although this difference could be dismissed as simply
associated with the final rise (prevalent in context A), our
results indicate that this explanation is insufficient, in that final
rises were of small excursion and not always present.
In addition, the results indicate that intonation data show
substantial variation when a large number of participants and
utterances are examined, as in the present study. This is
reflected in the fact that in many instances the differences found
between the two tunes are not as large as reported in previous
studies (e.g. [3]), an effect due to this variability. This is
illustrated in Figure 1, which shows that the accentual peak is
after the accented vowel of the wh-word in both questions,
although the expectation from previous studies, such as [3], was
that late peak alignment would be present only in the question
ending in !H% (top panel). Examining the role and limits of
such variability is critical for understanding how to connect
phonological representations of intonation with their realization
and is part of our planned research.
With respect to politeness, we find small but relatively
consistent effects. First, our results do not show an overall
increase in pitch range as an indication of politeness, as often
expected (e.g. [4]). This is, however, consistent with other
empirical studies, such as [5] on politeness in Catalan which
also showed no changes in pitch range related to politeness. On
the other hand, we did find that the FB was scaled higher when
the speakers were non-solidarity and when there was a power
difference between them (whether the speaker was inferior or
superior to the addressee). Solidarity also affected speaking rate
so that questions were produced with a slower rate when
speakers were non-solidary. These results indicate that although
some effects of politeness are global, such as the effect of
solidarity on speaking rate, others are local; e.g. we do not see
an overall increase of pitch range, but a local effect on the FB
only. Moreover, the results on politeness are in line with
previous research on politeness in Greek and on the use of
indirect devices [11].
Overall the results uncover an intricate interplay between
categorical and gradient effects both on F0 and on duration,
relating to both politeness and the context-dependent choice of
tune, which clearly affects segmental timing in addition to the
scaling and alignment of tonal targets. This indicates, on the one
hand, the need to pay close to attention to pragmatics in the
study intonation (and prosody more generally), and on the
other, the impossibility of incorporating this variability into
representations of intonation in some systematic way. In turn,
these results argue in favor of streamlined phonological
representations of intonation coupled with rich phonetics.
Figure 6. Effect of CONTEXT on pitch range.
3.5. Duration of the accented vowel and the last vowel
CONTEXT had a significant effect on the duration of the accented
vowel [est. = −2.713, SE = 0.719, t = −3.775], which was
longer in context B (𝑥𝑥� = 54.16 ms) than in context A (𝑥𝑥� = 51.43
ms), though the difference was small; see Figure 7. We found
no effect of SOLIDARITY [est. = 0.465, SE = 0.719, t = 0.647] or
POWER [for equal compared to inferior, est. = −1.591, SE =
0.878, t = −1.812; for superior compared to inferior, est. =
0.653, SE = 0.881, t = 0.741].
Figure 7. Effect of CONTEXT on the duration of the
accented vowel.
There was also an effect of CONTEXT on the duration of the
last vowel [est. = 12.905, SE = 1.102, t = 11.709], which was
significantly longer in context A (𝑥𝑥� = 107.4 ms) than in context
B (𝑥𝑥� = 94.52 ms); see Figure 8. There was no effect of
SOLIDARITY [est. = 0.850, SE = 1.102, t = 0.771], or POWER on
the duration of the last vowel [for equal compared to inferior,
est. = −2.434, SE 1.347, t = −1.808; for superior compared to
inferior, est. = 0.357, SE = 1.351, t = 0.264].
Figure 8. Effect of CONTEXT on the duration of the last
vowel.
4. Discussion and Conclusions
The results confirmed previous descriptions of the two tunes
used with wh-questions in Greek, reported in [1], [2], [3]: they
showed that speakers respond differently to the two types of
contexts used here, context A, in which questions were used in
their prototypical function, namely to request information, and
context B, in which questions were used as implicit statements.
In response to context A, speakers produced accents with lower
scaling and later alignment than in context B; this difference is
consistent with what is known about the pitch accents of Greek
represented as L*+H and L+H* [1], [10], and it indicates that
L*+H was primarily used in response to context A, while L+H*
5. Acknowledgements
We thank the study participants, Maria Giakoumelou for the
collection and annotation of the data, and the Laboratory of
Modern Greek Dialects, University of Patras for facilitating
data collection. The financial support of the British Academy
through grant SG160538 is hereby gratefully acknowledged.
161
6. References
[1]
A. Arvaniti, and M. Baltazani, “Intonation analysis and prosodic
annotation of Greek spoken corpora,” in S. Jun (Ed.), Prosodic
Typology: The Phonology of Intonation and Phrasing, pp. 84–
117, Oxford: Oxford University Press, 2005.
[2] A. Arvaniti, and D. R. Ladd, “Greek wh-questions and the
phonology of intonation,” Phonology, vol. 26, pp. 43-74, 2009.
[3] A. Arvaniti, M. Baltazani, and S. Gryllia, “The pragmatic
interpretation of intonation in Greek wh-questions,” Proceedings
of Speech Prosody 7, 20-23 May 2014, Dublin, retrieved
04/01/2018 from http://www.speechprosody2014.org/, 2014.
[4] C. Gussenhoven, The Phonology of Tone and Intonation,
Cambridge: Cambridge University Press, 2004.
[5] M. Nadeu, and P. Prieto, “Pitch range, gestural information, and
perceived politeness in Catalan,” Journal of Pragmatics, vol. 43,
issue 3, pp. 841–854, 2011.
[6] A. Arvaniti, D. R. Ladd, and I. Mennen, “Stability of tonal
alignment: the case of Greek prenuclear accents,” Journal of
Phonetics, vol. 26, pp. 3–25, 1998.
[7] S. Blum-Kulka, J. House, and G. Kasper. “Investigating crosscultural pragmatics: An introductory overview,” in S. BlumKulka, J. House, and G. Kasper (Eds.), Cross-cultural
Pragmatics: Requests and Apologies, pp. 1–34, Norwood, NJ:
Ablex, 1989.
[8] P. Boersma and D. Weenink, Praat: doing phonetics by computer
[Computer program]. Version 6.0.36, retrieved 11 November
2017 from http://www.praat.org/
[9] R. H. Baayen, D. J. Davidson and D. M. Bates, “Mixed-effects
modelling with crossed random effects for subjects and items,”
Journal of Memory and Language, vol. 59, pp. 390–412, 2008.
[10] A. Arvaniti, D. R. Ladd, and I. Mennen, “Tonal association and
tonal alignment: Evidence from Greek polar questions and
emphatic statements,” Language and Speech, vol. 49, pp. 421–
450, 2006.
[11] M. Sifianou. “Politeness and off-record indirectness,”
International Journal of the Sociology of Language 126, pp. 63179, 1997.
162