Distinction For Stops: Role of Formant Transitions in The Voiced-Voiceless
Distinction For Stops: Role of Formant Transitions in The Voiced-Voiceless
Distinction For Stops: Role of Formant Transitions in The Voiced-Voiceless
distinctionfor stops
Kenneth N. Stevens and Dennis H. Klatt
Subject Classification:70.30.
653 d. Acoust. Soc. Am., Vol. 55, No. 3, March 1974 Copyright ¸ 1974 by.the AcousticalSociety of America 653
Redistribution subject to ASA license or copyright; see http://acousticalsociety.org/content/terms. Download to IP: 131.156.157.31 On: Tue, 25 Nov 2014 19:06:37
654 Stevensand Klatt: Voiced-voicelessdistinction for stops 654
A. Experiment 1
Redistribution subject to ASA license or copyright; see http://acousticalsociety.org/content/terms. Download to IP: 131.156.157.31 On: Tue, 25 Nov 2014 19:06:37
655 Stevens
andKlatt:Voiced-voiceless
distinction
for stops 655
i i i i
mant level during the following steady vowel.
u• I00
z The amplitude control for the aspiration source was
w
set to a constant level during the aspiration interval and
w
was off otherwise. The amplitude was set so that the
excitation in the region of F3 and F4 remained at the
same level in going from aspiration to voicing (Stevens,
1971), i.e., there was no discontinuity in level in this
frequency region in passing from noise burst to buzz ex-
citation.
boundary within the range of voice-onset times for each >- 1500
trajectory. VOT values for the stimuli ranged from 15 SECOND FORMANT
to 55 msec.
(ii)
The frication burst spectrum was produced by exciting
FIG. 4. A synthetic utterance consists of a frication burst and
a single resonator set to the formant frequency and band-
a variable interval of aspiration, followed by onset of voicing,
width values of the fifth formant, as is to be expected
as shown in part (i). The lowest two formant frequencies fol-
from analysis of natural speech (Stevens, 1968). Frica- low one of four possible trajectories that are labeled (a) through
tion source amplitude was set to produce a fifth-formant (d) in part (ii) of the figure. Zero on the time scale corresponds
level in the burst about 6 dB greater than the fifth-for- to the time when the formant transitions are completed.
Redistribution subject to ASA license or copyright; see http://acousticalsociety.org/content/terms. Download to IP: 131.156.157.31 On: Tue, 25 Nov 2014 19:06:37
656 Stevensand Klatt: Voiced-voiceless
distinctionfor stops 656
! I
agreement between these two threshold values of tran-
lOOms
sition duration is not unreasonable, if account is taken
SYNTHETIC /do/ SYNTHETIC/to/
of the fact that the burst and aspiration in the plosive-
FIG. 5. Spectrograms of two of the synthetic utterances. The vowel stimuli may inhibit detection of the transition at
stimulus at the left had a VOT of 15 msec and was unanimously the onset of voicing.
identified as/do/. The stimulus at the right had a VOT of 55
msec and was unanimously identified as II. DISCUSSION
Redistribution subject to ASA license or copyright; see http://acousticalsociety.org/content/terms. Download to IP: 131.156.157.31 On: Tue, 25 Nov 2014 19:06:37
657 Stevensand Klatt: Voiced-voiceless
distinction for stops 657
a bcd For a VOT that exceeds 20-odd msec, the initial on-
I I I I set and the onset of voicing are perceived as successive
a b c d
events. If the rapid change in spectrum resulting from
FORMANT TRANSITION RATE
the formant transitions is essentially completed before
FIG. 7. The identification data of five subjects are summarized the onset of voicing, the consonant is identified as voice-
by plotting the average VOT at the phoneme boundary (see text) less. If, however, there are substantial formant transi-
for each rate of formant motion. The VOT in this case is the tions following the onset of voicing, then there are two
time from consonantal release to onset of voicing. The letters conflicting cues. The long VOT tends to indicate a
on the abscissa are identified with different rates of transition,
voiceless consonant, whereas the rapid spectrum change
as indicated in the inset. A horizontalline could be fitted to the
at the onset of voicing is characteristic of a voiced con-
data if the phoneme boundaries were based only on the delay of
voicing with respect to plosive release. A sloping line could sonant. There are, in fact, two successive onsets that
be fitted to the data if the phoneme boundaries were based on contain rapid spectrum changes. Some listeners (e.g.,
the presence or absence of a detectable formant transition (or VZ) assign more weight to the absolute VOT, whereas
of a fixed duration of residual transition) during the initial por- others (e.g., AWFH) will not accept a stimulus with an
tion of the voicing. The sloping line in this figure corresponds appreciable transition at the onset of voicing as a voice-
to a time of voicing onset indicated by the arrow in the inset. less consonantregardless of the VOT. The most accept-
able voiceless consonant is, of course, one for which
both criteria are satisfied: the VOT exceeds 20 msec
25 msec are somehow integrated and processed as a
and the formant transitions are essentially completed
unit. If two components with different acoustic charac-
before voicing onset occurs. These were in fact judged
teristics have onset times more than 20-25 msec apart,
by listeners to be the most "natural" voiceless conso-
they are perceived as being successive events.
nants.
The findings of Experiment 2 are consistent with the
Indirect evidence supporting the importance of the
data of Liberman, Delattre, and Cooper (1958) who ex-
presence or absence of rapid spectrum change at voicing
amined the responses of listeners to a series of con-
onset is found in data on segmental durations in pre-
sonant-vowel stimuli which differed only in the amount
of cutback of the first formant relative to the onset time. stressed plosive-sonorant constant clusters (Klatt,
1973b). In a word such as "breed," VOT is typically
In other words, the stimuli were produced by starting
with a soundlike that of/da / in Fig. 1, and by "blank-
about 10 msec and the [r] segment is rather short. For
a word beginning with [pr], the VOT is increased to
ing out" the initial part of the F1 transition in varying
about 60msec. Measurements indicate that in this clus-
amounts. Their data show that the stimuli with larger
F1 cutbackwere identified as beginningwith/t/. The
ter the consonant[r] is increased in duration by about
30 msec. If the sonorant had not been lengthened, voic-
boundarybetween/d/ and/t/ responses occurred when
ing onset would have occurred during the rapid formant
the cutback was at a point where most of the F1 transi-
motions of the sonorant-vowel transition, leading to a
tion was completed. In a recently reported experiment,
false cue for voicing. The increased sonorant duration
Summerfield and Haggard (1972) have also noted that
results in an onset of voicing that is well before the
the presence of an F1 transition after voicing onset is a
sonorant-vowel transition, thus preserving the cue for
cue for voicing.
voicelessness.
Redistribution subject to ASA license or copyright; see http://acousticalsociety.org/content/terms. Download to IP: 131.156.157.31 On: Tue, 25 Nov 2014 19:06:37
658 Stevensand Klatt: Voiced-voiceless
distinctionfor stops 658
that are qualitatively different. The particular VOT's voice-onset time for voiceless stops is usually shorter
for the stimulus pair that was discriminated by the than in the prestressed position, and in this phonetic en-
children in the experiment of Eimas et al., were 20 and vironment VOT measurements alone do not reliably
40 msec. The stimulus with a VOT of 20 msec has a separate voiced from voiceless stops (Lisker and Abram-
rapid spectrum change after onset of voicing whereas son, 1967(. All of thesecommentsindicatethat thevoiced-
the stimulus with a VOT of 40 msec has negligible for- voiceless distinction for a stop is not triggered by the same
mant transitions after voicing onset. Stimuli with voice- acoustic property (or properties) independentof the pho-
onset times up to 20 msec have the common attribute that netic environment in which it appears. The properties
there is a rapid spectrum change at the onset of voicing, considered in this study apply only to consonants in pre-
and these stimuli are judged to be similar by infants. stressed position. It is, of course, a common observa-
On the other hand, stimuli with voice-onset times great- tion that the acoustic manifestation of a particular pho-
er than 40 msec have the common property that the for- netic feature depends on the environment of other features
mant transitions are completed before onset of voicing, in which the segment occurs. ,
and thus are not discriminated from one another by in-
rants. ACKNOWLEDGMENT
If there is a requirement that the formant transitions This research was supported in part by the National In-
be essentially completed before onset of voicing, then a stitutes of Health (Grant No. NS-04332).
simple explanationexists for the longer VOT's that are
observed for velars than for dentals, and for dentals
than for labials in prestressed position (Lisker and 1Theterm voiceless as appliedto stop consonantsin this paper
Abramson, 1964). It is known that the duration of the refers to voiceless aspirated stops of the type that occurs in
movement of the articulator that forms the closure is English. Voiceless unaspirated stop consonants also occur in
many languages.
greatest for the tongue body, less for the tongue tip, 2Thesenumbersapply to labials and apicals; they are some-
and least for the lips. Measurements of rates of for- what greater for velars.
mant transitions, and speech synthesis experiments with 3Apossibleexceptionis the findingthat the temporalorder of
various rates of formant transitions also show the slow- the onsets of two different acoustic stimuli with appropriate
est rates for velars. When the rate of transition is slow- characteristics can b.e j•dged only if the time between onsets
er, the onset of voicing must be increased if the transi- is 20 msec or more (Hirsh, 1959; Hirsh and Sherrick, 1961).
tion is to be essentially completed before voicing begins. The relevance of these results to the present problem are
discussed later.
•Note that neither the /h/ nor the glottal stop/•/ havesucha
This change in VOT for voiceless aspirated stops from
rapid spectrum change at the onset of voicing, where there is
one place of articulation to another raises a question as abrupt intensity increase (at low frequencies in the case of
to what strategy is used by a speaker to actualize these /h/). Linguists characterize these segments as nonconson-
different VOT's. The mechanism suggested by Klatt antal.
(1973a), in which glottal closure is initiated reflexly by 5The experiments of Eimas et al. were performed with stimuli
the rapid pressure drop that occurs in the mouth cavity on a continuum between/po/ and/bo/ rather than/ta/ and
(and in the region of the glottis) could account for the /da/, but the acoustic cues for the voiced-voiceless distinc-
tion are substantially the same in the two cases.
differences not only for stressed consonant-vowel syl-
lables but also for the increased VOT in initial voiceless Denes, P. (1955). "Effect of Duration on the Perception of
Voicing," J. Acoust. Soc. Am. 27, 761-764.
stop consonants in consonant clusters. Klatt observes Eimas, P. D., Siqueland, E. R., Jusczyk, P., and Vigorito,
that the time from completion of the initial burst of J. (1971). "Speech Perception in Infants," Science 171, 303-
frication noise (when the mouth pressure presumably 306.
drops rapidly) to voicing onset is roughly independent Fant, C. G. M. (1956). "On the Predicability-of Formant Lev-
of consonant place of articulation or of the identity of the els and Spectrum Envelope from Formant Frequencies," in
following segment. The frication noise burst is longer For Roman Jateobson, edited by M. Halle et al. (Mouton, The
for the velars, for which the rate of opening of the con- Hague), pp. 109-120.
striction is slower than for dentals and labials.
Hirsh, I. J. (1959). "Auditory Perception of Temporal Order,"
J. Acoust. Soc. Am. 31, 759-767.
The strategy proposed here for discriminating voice- Hirsh, I. J. and Sherrick, C. E o Jr. (1961). "Perceived Order
in Different Sense Modalities," J. Exp. Psychol. 62, 423-
less from voiced stop consonants applies specifically to
432.
single consonants preceding stressed vowels and possi-
House, A. S. (1961). "On Vowel Duration in English," J.
bly for initial consonants in clusters. When the conso- Acoust. Soc. Am. 33, 1174-1178.
nants occur in other environments, different strategies Jakobson, R., Fant, C. G. M., and Halle, M. (1963). Pre-
may have to be invoked in order to distinguish voiced liminaries to SpeechAnalysis (MIT Press, Cambridge, Mass. ).
from voiceless stops. Thus, for example, if the conso- Klatt, D. H. (1972). "Acoustic Theory of Terminal Analog
nant follows a stressed vowel and is either in final posi- Speech Synthesis," Proc. 1972 Int. Conf. on Speech Commu-
tion or preceding an unstressed vowel, the length of the nication and Processing, No. 72 CHO 596-7 AE (IEEE, New
York), pp. 131-135.
vowel is often enough to signal the voicing feature--the
Klatt, D. H. (1973a). "Voice Onset Time, Frication and As-
vowel is shortened preceding a voiceless consonant
piration in Word-Initial Consonant Clusters," Res. Lab. Elec-
(Denes, 1955; House, 1961). In the intervocalic environ- tron. Quart. Progr. Rept. No. 109, M. I. T., pp. 124-136.
ment, the length of the stop gap may also provide a cue, Klatt, D. H. (1973b). "Durational Characteristics of Pre-
the gap being shorter for the voiced consonant (Lisker, stressed Word-Initial Consonant Clusters in English," Res.
1957). In intervocalic pre-unstressed position, the Lab. Electron. Quart. Progr. Rept. No. 108, M. I. T., pp.
Redistribution subject to ASA license or copyright; see http://acousticalsociety.org/content/terms. Download to IP: 131.156.157.31 On: Tue, 25 Nov 2014 19:06:37
659 Stevensand Klatt: Voiced-voicelessdistinction for stops 659
Redistribution subject to ASA license or copyright; see http://acousticalsociety.org/content/terms. Download to IP: 131.156.157.31 On: Tue, 25 Nov 2014 19:06:37