Distinction For Stops: Role of Formant Transitions in The Voiced-Voiceless

Role of formant transitions in the voiced-voiceless
distinctionfor stops
Kenneth N. Stevens and Dennis H. Klatt
ResearchLaboratoryof Electronicsand Department of Electrical Engineering,Massachusetts

Institute of
Technology,Cambridge,Massachusetts
(Received 2 August 1971; revised 1 November 1973)
Previousresearchon acousticcuesresponsiblefor the voiced-voiceless distinctionin prestressed

Englishplosiveshas emphasizedthe importanceof voicingonsettime with respectto plosiverelease
(VOT). Voiced plosivesin Englishnormally have a short VOT (lessthan 20-30 msec)and a
significantformant transitionis presentfollowingvoiceonset.Voicelessplosivesin prestressed
position,on the other hand, have relativelylong VOT's (greaterthan about 50 msec)and the
formant transitionsare essentiallycompletedprior to voice onset.Our experimentswith synthetic
speechcomparethe role of VOT and the presenceor absenceof a significantformant transition
following voicing onsetas cuesfor the voiced-voiceless distinction.The data indicate that there is a
significanttradingrelationshipbetweenthesetwo cues.The presenceor absenceof a rapid spectral
changefollowingvoiceonsetproducesup to 15-msecchangein the locationof the perceived
phonemeboundaryas measuredin terms of absoluteVOT. One can speculatethat the auditory
systemmay be predisposed to detectthe presenceor absenceof a rapid spectrumchangeas a general
property of acousticinputs. If this is the case,then the acquisitionof the voiced-voicelessdistinction
in infants may be conditionedinitially by the presenceor absenceof this property at the onsetof
voicing rather than by absoluteVOT.
Subject Classification:70.30.
INTRODUCTION sify as one phoneme, but can readily discriminate chang-

es of similar magnitude that span a phoneme boundary.
The acoustic cues for the distinction between voiced
andvoiceless
• stopconsonants
in initial prestressed A conclusion that one might make from all of these
position in English have been examined in detail through findings is that the auditory system is, as it were,
measurements on spoken consonant-vowel syllables "wired" to place a boundary at some fixed absolute noise
(Lisker and Abramson, 1964), and through experiments duration. Study of a variety of phonetic features indi-
in which the acoustic parameters of synthetic stimuli cates, however, that the acoustic differences between
are manipulated (Liberman, Delattre, and Cooper, 1958; minimal pairs of phonemes in a given phonetic environ-
Lisker and Abramson, 1970). The data from the studies ment appear to be characterized by distinctively differ-
with synthetic speech have suggested that the acoustic ent properties rather than by differences in the magni-
characteristic providing the simplest and most direct tude of a given acoustic parameter that takes on a range
indication of whether a stop consonant is voiced orvoice- of values (Jakobson, Fant, and Halle, 1963; Stevens,
less is the time from the release of the closure to the 1972). As far as we know, there is no evidence to sug-
onset of vocal-cord vibration, called the voice-onset gest that a 20- to 40-msec noise duration preceding a
time (VOT). If this time is greater than about 25-40 vowel-like sound represents a natural perceptual bound-
msec (dependingon consonantalplace of articulation), ary. 3 Further experimentsare neededto resolve this
the consonant is voiceless; for smaller voice-onset question.
times, the consonant is identified as voiced. The data
from natural utterances show that the voice-onset times Examples of spectrograms of the syllables /de•/ and
for the two classes of consonants in prestressed posi- /tc•/are shownin Fig. 1. The difference in VOT is
tion are clustered around 0-20 msec and about 50 msec clearly seen in the spectrograms. Another difference
or more, •' and intermediate voice-onset times are rare- between the sounds, however, is that the voiced stop
ly found. Furthermore, identification and discrimination has a well-defined transition in the first formant (as
judgments for synthetic stimuli indicate that there is a well as the second formant) after the onset of voicing,
sharp perceptual boundary between voiced and voiceless whereas the F• transition for the voice•less stop is es-
responses. Discrimination of pairs of stimuli within a sentially nonexistent after the onset of voicing. The lack
given class is poor, while discrimination for stimuli of formant transitions after voicing onset for the as-
near the phonemeboundary is good (Liberman, Harris, pirated consonants indicates that the rapid movements
Kinney, and Lane, 1961). of the supraglottal articulators (the tongue tip in the case
Recent experiments on the discrimination of synthetic of Fig. 1) are essentially complete before the vocal
cords are in a configuration appropriate for the onset
syllables by infants (Eimas et al., 1971) have drawn
of voicing. Based on synthesis experiments (Liberman,
further attention to the importance of the voice-onset
Delattre, Gerstman, and Cooper, 1956), it is known, in
time as a property which the auditory mechanism uses
fact, that the duration of these transitions is of the or-
to categorize speech sounds into classes. The Eimas
der of 40 msec or less.
experiments suggest that infants cannot discriminate
changes in VOT within a region that adults would clas- These examples suggest that one of the possible cues
653 d. Acoust. Soc. Am., Vol. 55, No. 3, March 1974 Copyright ¸ 1974 by.the AcousticalSociety of America 653
Redistribution subject to ASA license or copyright; see http://acousticalsociety.org/content/terms. Download to IP: 131.156.157.31 On: Tue, 25 Nov 2014 19:06:37
654 Stevensand Klatt: Voiced-voicelessdistinction for stops 654
this was followed in turn by the onset of a synthetic

vowel with fixed formants; The formants corresponded
roughlyto values appropriate for the vowel [c]. The
fundamental frequency of the vowel had a falling inflec-
tion from 135 to 100 Hz over its 265-msec duration.
The level of the noise burst was adjusted to be a few
decibels above the level of the vowel in the frequency
region above 2000 Hz.
There were ten different experimental stimuli, cor-

responding to silent intervals T c in 5-msec steps from
0 to 40 msec. None of the stimuli could be readily in-
lOOms
terpreted as speech events.
NATURAL /do/ NATURAL/to/
Several replications of the stimuli were arranged in
FIG. 1. Spectrogramsof the syllables /do/and /to/as spoken random order and wet e presented to a group of list eners.
in isolation by an English talker. The stimuli were described to listeners as consisting
of a brief burst of noise and a buzz-like sound. The
task of the listeners was to judge whether or not there
for the voiced-voiceless distinction for stops in pre- was a silent interval between the burst of noise and the
stressed position is the presence or absence of a sig- onset of the buzz.
nificant and rapid spectrum change at the onset of voic-
ing. Accordingto this hypothesis, absenceof such a Average data from four subjects are shown in Fig. 3.
spectrum change would be a requirement for perception The abscissa represents the time from onset of noise
of a well-formed voiceless stop. It is known, of course, burst to onset of buzz. (Since the duration of the noise
that a rapid spectrum change always occurs at the re- burst is 5 msec, this time is 5 msec longer than the
lease of a stop consonant. We are postulating here that, silent interval. ) We shall call this the voice-onset time,
in the case of a voiceless stop consonant, another rapid even though the stimuli were not interpreted as speech
spectrum change at the onset of voicing is a negative by the listeners.
cue for voicelessness.
The data indicate that up to a VOT of about 15 msec,

!. DESCRIPTION OF STIMULI AND EXPERIMENTS no silent interval could be heard. The listeners appar-
ently regarded the onset of the noise burst and the onset
In order to examine these questions, we have per- of the buzz as being essentially simultaneous (although
formed two experiments with synthetic consonant-vowel this question was not specifically asked of the listeners).
stimuli in which voice-onset times and transition dura- For a VOT greater than about 25 msec, listeners usually
tions were manipulated, and we have obtained listener reported the presence of a silent interval. The VOT at
responses to these stimuli. The stimuli were produced the 50%point is about 20 msec.
on a digital terminal-analog speech synthesizer (Klatt,
1972). The synthesizer contains sources for frication,
aspiration, and voicing. The spectra of frication and
.aspiration are fiat to 5 kHz and the spectral envelope of
the periodic voicing source falls off at -12 dB/octave
above 300 Hz. The synthesizer configuration includes 4
five digital resonators connected in cascade to simulate
the vocal-tract transfer function for vowels, another set
of digital resonators for the fricative transfer function,
and a radiation characteristic. The synthesizer pro-
duces 5-kHz bandwidth speech or speechlike sounds.
A. Experiment 1
This experiment was designed to determine whether

the boundary between unaspirated and aspirated stop con-
sonants at a VOT in the range 20-40 msec represents a I I i •,• i i I
characteristic of the auditory processing of acoustic 40 80 260 300
stimuli independent of whether the stimuli are speech or TIME (ms)
nonspeech.
FIG. 2. Schematized representation of nonspeech stimuli used
in Experiment I. The initial vertical bar is a noise burst, and
Stimuli for this experiment are schematized in Fig. 2. the horizontal bars represent formants in a vowellike sound.
Each stimulus consisted Qf a brief 5-msec burst of The silent interval Tc is varied in 5 msec steps from 0 to 40
broad-band noise followed by an interval of silence, and msec.
J. Acoust. Soc. Am., Vol. 55, No. 3, March 1974
655 Stevens
andKlatt:Voiced-voiceless
distinction
for stops 655
i i i i
mant level during the following steady vowel.
u• I00
z The amplitude control for the aspiration source was
w
set to a constant level during the aspiration interval and
w
was off otherwise. The amplitude was set so that the
excitation in the region of F3 and F4 remained at the
same level in going from aspiration to voicing (Stevens,
1971), i.e., there was no discontinuity in level in this
frequency region in passing from noise burst to buzz ex-
citation.
z Tape recordings of a number of replications of these

16 stimuli in random order were prepared, and were
w
I I
presented over monaural headphones to five listeners,
n 0
o IO 2o 30 4O who were asked to identify each stimulus as Ida/or
/la/. Spectrograms of stimuli unanimously judged to
ONSET OF BURST TO ONSET OF BUZZ (ms)
be Ida/and /la/ are shownin Fig. 5.
FIG. 3. Average percentage of times a silent interval was
heard as a function of the time from burst onset to buzz onset
Each listener made 12 responses to each of the stimu-
for the stimulus ensemble shown in Fig. 2. li. Results for two of the listeners are shown in Fig. 6.
These data represent, in a sense, two extremes of per-
formance of the listeners. For one listener (AWFH),
the boundary between Ida/ and/la/ responses occurred
B. Experiment 2 at a fixed time relative to completion of the formant
A series of synthetic consonant-vowel stimuli were transitions, while for other listeners (e.g., VZ) the
generated in which onset times and transition durations boundary tended to be at a nearly fixed time relative
were manipulated independently, and listener responses to plosive release. However, the latter listener showed
to these stimuli were obtained. Sixteen stimuli similar at least some tendency for the absolute VOT at the pho-
to either /ta/or/da/were producedby selecting a neme boundary to be longer for stimuli with longer tran-
number of combinations of voice-onset time (VOT) and sitions. When expressed in terms of voice-onset time,
rate of formant motion. Each stimulus consisted of a
brief 5-msec burst of frication, a variable interval of
aspiration, and then voicing for the remainder of the
utterance, as shown at the top of Fig. 4. Formant fre-
quencies follow a time course shown at the bottom of
Fig. 4. Formant bandwidths were set at constant values
of 50, 80, 120, 180, and 300 Hz for the first through
rr CO 0
fifth formants, respectively.
K\•'///A
Four formant trajectories are defined in Fig. 4, cor-
responding to moderate (a) through rapid (d) consonant-
(i)
vowel transition times. Four values of voice-onset time
were selected for each trajectory. All the transition
times are within the range that yields a reasonable /da/
stimulus for the shortest VOT's. Preliminary tests
were performed in order to place the/d-t/phoneme N
2000
•bcd FIFTH
FOURTH
FORMANT
FORM ANT=5500Hz
THIRD FORMANT =2500Hz
=4500Hz
boundary within the range of voice-onset times for each >- 1500
trajectory. VOT values for the stimuli ranged from 15 SECOND FORMANT
to 55 msec.
Control parameters were updated at discrete 5-msec rr I000

FIRST FORMANT
intervals. The fundamental frequency was held fixed at
130 Hz from voicing onset to 80 msec and fell linearly
• 500
to 100 Hz from 80 msec to the end of the stimulus at 350
o
msec. The first voicing pulse in an utterance occurred
precisely at the desired onset of voicing. The voicing
amplitude was set to a constant value during the voicing
0 bcid,1 I -40 0 40 200 260 500
interval and was off otherwise. TIME(ms)
(ii)
The frication burst spectrum was produced by exciting
FIG. 4. A synthetic utterance consists of a frication burst and
a single resonator set to the formant frequency and band-
a variable interval of aspiration, followed by onset of voicing,
width values of the fifth formant, as is to be expected
as shown in part (i). The lowest two formant frequencies fol-
from analysis of natural speech (Stevens, 1968). Frica- low one of four possible trajectories that are labeled (a) through
tion source amplitude was set to produce a fifth-formant (d) in part (ii) of the figure. Zero on the time scale corresponds
level in the burst about 6 dB greater than the fifth-for- to the time when the formant transitions are completed.
656 Stevensand Klatt: Voiced-voiceless
distinctionfor stops 656
Three subjects participated in ABX tests comparing

the stimulus without a formant transition with each of
the four stimuli containing various amounts of formant
transition. The results were plotted on a graph of per-
cent correct versus transition duration, and a smooth
curve was fitted to the data points. This curve crossed
the 75% correct detection threshold at a transition dura-
tion of about 13 msec.
This just-detectable transition duration at the stimulus

'I onset is about 10 msec
duration that triggers a/t/response
less than the maximum
in Fig. 7. The
transition
! I
agreement between these two threshold values of tran-
lOOms
sition duration is not unreasonable, if account is taken
SYNTHETIC /do/ SYNTHETIC/to/
of the fact that the burst and aspiration in the plosive-
FIG. 5. Spectrograms of two of the synthetic utterances. The vowel stimuli may inhibit detection of the transition at
stimulus at the left had a VOT of 15 msec and was unanimously the onset of voicing.
identified as/do/. The stimulus at the right had a VOT of 55
msec and was unanimously identified as II. DISCUSSION
The results of Experiment 1 are in reasonable agree-

the phoneme boundary moved about 25 msec for a 30- ment with the experimental data of Hirsh (1959) and of
msec change in the duration of the formant transition in Hirsh and Sherrick (1961), who carried out a series of
the data of listener AWFH, but the phoneme boundary experiments to determine the difference in onset time
moved only about 5 msec for VZ. that is required for a listener to judge which of two
stimuli with diverse characteristics came first. These
Average data for the five listeners are shown in Fig.
investigators found a time difference of about 20 msec
7. Each point represents the average VOT at the 50•c
for a variety of stimulus conditions, including cross-
response point for the five listeners. For this figure,
modal presentation of the stimuli. Various components
the abscissa is the voice-onset time relative to the con-
of a complex stimulus that are packaged in the first 20-
sonantal release. The average VOT at the phoneme
boundary moved from 26 to 39 msec for a 30-msec
change in formant transition duration. The hori-
zontal line in Fig. 7 indicates a constant voice-onset
time; the data points would lie along this line if the ab-
I -•.\\ x c /
solute VOT provided the only cue for the voiced-voice-
less distinction. The line sloping up to the left repre- 8' ''k• nd d4
sents the locus of constant duration of the transition
from voicing onset to termination of the transition. 4
Data points would lie along a line with this slope if the
cue for the voiced-voiceless distinction were the pres-
ence or absence of substantial formant transitions
after voicing onset. The line in Fig. 7 is in fact drawn • (o)
for a formant transition duration during voicing of 23
msec, as indicated by the arrow in the inset. The amount c)
of spectrum change occurring as a result of the rising n- 0 n,.
transition of the first formant during this interval is
relatively small compared with the spectrum change re-
suiting from the entire transition from 250-750 Hz.
Theaveragerise in intensityof thehigherformantsdue
to the F1 transition is about 5 dB in the former case and
about 20 dB in the latter (Fant, 1956; Stevens and House,
•96•). 4• SUBJECT:VZ
o I I '•. I 12
An additional experiment was performed to determine -$o -20 o
VOICE ONSET TIME RELATIVE TO END OF
the minimal first-formant transition that can be detect-
FORMANT TRANSITION
ed at the onset of the syntheticvowel/o/. Five stimuli
were prepared with formant transition durations of (a)
0, (b) 5, (c) 10, (d) 15, and (e) 20 msec. The rate of FIG. 6. Identification da• for two subjects. The number
the formant transition was held constant in all cases at /d•/responses is plowed as a function of time of onset of voic-
ing, where zero on the scale corresponds to the completion of
500 Hz/60 msec. This rate is typical of a first-formant
the formant transition shown in •ig. 4. A family of four curves
transition in a stop release and it is the same as the is drawnto indicatehowthe responseschangeas a functionof'
slowest rate of transition used in the previous experi- rate of formant motion; the labels a, b, c, d refer to the transi-
ments. tion types shown in Fig.
distinction for stops 657
5O I I of this transient spectrum shift provide some of the

cues for place of articulation for the consonant.
4O This hypothesis can be applied to the stimuli used in

Experiment 2, which have an abrupt initial intensity in-
crease, coupled with a rapid change in the spectrum in
the next few tens of msec. When the onset of voicing is
delayed relative to the beginning of the stimulus, there
is a second discontinuity due to an abrupt increase in
energy in the first formant range. If the VOT is less
than 20 msec, the two onsets are perceived as simul-
I I taneous (from Experiment i and from Hirsh's results),
and the stimulus is identified as a voiced consonant.
a bcd For a VOT that exceeds 20-odd msec, the initial on-
I I I I set and the onset of voicing are perceived as successive
a b c d
events. If the rapid change in spectrum resulting from
FORMANT TRANSITION RATE
the formant transitions is essentially completed before
FIG. 7. The identification data of five subjects are summarized the onset of voicing, the consonant is identified as voice-
by plotting the average VOT at the phoneme boundary (see text) less. If, however, there are substantial formant transi-
for each rate of formant motion. The VOT in this case is the tions following the onset of voicing, then there are two
time from consonantal release to onset of voicing. The letters conflicting cues. The long VOT tends to indicate a
on the abscissa are identified with different rates of transition,
voiceless consonant, whereas the rapid spectrum change
as indicated in the inset. A horizontalline could be fitted to the
at the onset of voicing is characteristic of a voiced con-
data if the phoneme boundaries were based only on the delay of
voicing with respect to plosive release. A sloping line could sonant. There are, in fact, two successive onsets that
be fitted to the data if the phoneme boundaries were based on contain rapid spectrum changes. Some listeners (e.g.,
the presence or absence of a detectable formant transition (or VZ) assign more weight to the absolute VOT, whereas
of a fixed duration of residual transition) during the initial por- others (e.g., AWFH) will not accept a stimulus with an
tion of the voicing. The sloping line in this figure corresponds appreciable transition at the onset of voicing as a voice-
to a time of voicing onset indicated by the arrow in the inset. less consonantregardless of the VOT. The most accept-
able voiceless consonant is, of course, one for which
both criteria are satisfied: the VOT exceeds 20 msec
25 msec are somehow integrated and processed as a
and the formant transitions are essentially completed
unit. If two components with different acoustic charac-
before voicing onset occurs. These were in fact judged
teristics have onset times more than 20-25 msec apart,
by listeners to be the most "natural" voiceless conso-
they are perceived as being successive events.
nants.
The findings of Experiment 2 are consistent with the
Indirect evidence supporting the importance of the
data of Liberman, Delattre, and Cooper (1958) who ex-
presence or absence of rapid spectrum change at voicing
amined the responses of listeners to a series of con-
onset is found in data on segmental durations in pre-
sonant-vowel stimuli which differed only in the amount
of cutback of the first formant relative to the onset time. stressed plosive-sonorant constant clusters (Klatt,
1973b). In a word such as "breed," VOT is typically
In other words, the stimuli were produced by starting
with a soundlike that of/da / in Fig. 1, and by "blank-
about 10 msec and the [r] segment is rather short. For
a word beginning with [pr], the VOT is increased to
ing out" the initial part of the F1 transition in varying
about 60msec. Measurements indicate that in this clus-
amounts. Their data show that the stimuli with larger
F1 cutbackwere identified as beginningwith/t/. The
ter the consonant[r] is increased in duration by about
30 msec. If the sonorant had not been lengthened, voic-
boundarybetween/d/ and/t/ responses occurred when
ing onset would have occurred during the rapid formant
the cutback was at a point where most of the F1 transi-
motions of the sonorant-vowel transition, leading to a
tion was completed. In a recently reported experiment,
false cue for voicing. The increased sonorant duration
Summerfield and Haggard (1972) have also noted that
results in an onset of voicing that is well before the
the presence of an F1 transition after voicing onset is a
sonorant-vowel transition, thus preserving the cue for
cue for voicing.
voicelessness.
Based on the results of Experiments i and 2, we can

suggest a possible strategy that a listener uses to dis- The proposed mechanism for discriminating pre-
criminate between voiced and voiceless stop consonants stressed voiced from voiceless consonants suggests a way
in prestressed position. We postulate first that the cue of explaining the results ofEimas etal. (1971) on the re-
for the presence of a consonantal segment is a rapid sponse of infants to the voiced-voiceless distinction for
change in the acoustic spectrum occurring at a point stopconsonants.
5 It is postulatedthat the auditory sys-
where there is an abrupt or discontinuous increase in tem provides one kind of response when there is a sig-
intensityin somefrequencyrange(Stevens,1971).4 nificant transition in F1 after onset of voicing, creating
This rapid change in the acoustic spectrum is in the fre- a rapid spectrum change, and another kind of response
quency range above about 1000 Hz, and occurs over a where there is no such rapid spectrum change. Stimuli
brief time interval of 20-30 msec. The characteristics which differ in this property yield auditory responses
distinctionfor stops 658
that are qualitatively different. The particular VOT's voice-onset time for voiceless stops is usually shorter
for the stimulus pair that was discriminated by the than in the prestressed position, and in this phonetic en-
children in the experiment of Eimas et al., were 20 and vironment VOT measurements alone do not reliably
40 msec. The stimulus with a VOT of 20 msec has a separate voiced from voiceless stops (Lisker and Abram-
rapid spectrum change after onset of voicing whereas son, 1967(. All of thesecommentsindicatethat thevoiced-
the stimulus with a VOT of 40 msec has negligible for- voiceless distinction for a stop is not triggered by the same
mant transitions after voicing onset. Stimuli with voice- acoustic property (or properties) independentof the pho-
onset times up to 20 msec have the common attribute that netic environment in which it appears. The properties
there is a rapid spectrum change at the onset of voicing, considered in this study apply only to consonants in pre-
and these stimuli are judged to be similar by infants. stressed position. It is, of course, a common observa-
On the other hand, stimuli with voice-onset times great- tion that the acoustic manifestation of a particular pho-
er than 40 msec have the common property that the for- netic feature depends on the environment of other features
mant transitions are completed before onset of voicing, in which the segment occurs. ,
and thus are not discriminated from one another by in-
rants. ACKNOWLEDGMENT
If there is a requirement that the formant transitions This research was supported in part by the National In-
be essentially completed before onset of voicing, then a stitutes of Health (Grant No. NS-04332).
simple explanationexists for the longer VOT's that are
observed for velars than for dentals, and for dentals
than for labials in prestressed position (Lisker and 1Theterm voiceless as appliedto stop consonantsin this paper
Abramson, 1964). It is known that the duration of the refers to voiceless aspirated stops of the type that occurs in
movement of the articulator that forms the closure is English. Voiceless unaspirated stop consonants also occur in
many languages.
greatest for the tongue body, less for the tongue tip, 2Thesenumbersapply to labials and apicals; they are some-
and least for the lips. Measurements of rates of for- what greater for velars.
mant transitions, and speech synthesis experiments with 3Apossibleexceptionis the findingthat the temporalorder of
various rates of formant transitions also show the slow- the onsets of two different acoustic stimuli with appropriate
est rates for velars. When the rate of transition is slow- characteristics can b.e j•dged only if the time between onsets
er, the onset of voicing must be increased if the transi- is 20 msec or more (Hirsh, 1959; Hirsh and Sherrick, 1961).
tion is to be essentially completed before voicing begins. The relevance of these results to the present problem are
discussed later.
•Note that neither the /h/ nor the glottal stop/•/ havesucha
This change in VOT for voiceless aspirated stops from
rapid spectrum change at the onset of voicing, where there is
one place of articulation to another raises a question as abrupt intensity increase (at low frequencies in the case of
to what strategy is used by a speaker to actualize these /h/). Linguists characterize these segments as nonconson-
different VOT's. The mechanism suggested by Klatt antal.
(1973a), in which glottal closure is initiated reflexly by 5The experiments of Eimas et al. were performed with stimuli
the rapid pressure drop that occurs in the mouth cavity on a continuum between/po/ and/bo/ rather than/ta/ and
(and in the region of the glottis) could account for the /da/, but the acoustic cues for the voiced-voiceless distinc-
tion are substantially the same in the two cases.
differences not only for stressed consonant-vowel syl-
lables but also for the increased VOT in initial voiceless Denes, P. (1955). "Effect of Duration on the Perception of
Voicing," J. Acoust. Soc. Am. 27, 761-764.
stop consonants in consonant clusters. Klatt observes Eimas, P. D., Siqueland, E. R., Jusczyk, P., and Vigorito,
that the time from completion of the initial burst of J. (1971). "Speech Perception in Infants," Science 171, 303-
frication noise (when the mouth pressure presumably 306.
drops rapidly) to voicing onset is roughly independent Fant, C. G. M. (1956). "On the Predicability-of Formant Lev-
of consonant place of articulation or of the identity of the els and Spectrum Envelope from Formant Frequencies," in
following segment. The frication noise burst is longer For Roman Jateobson, edited by M. Halle et al. (Mouton, The
for the velars, for which the rate of opening of the con- Hague), pp. 109-120.
striction is slower than for dentals and labials.
Hirsh, I. J. (1959). "Auditory Perception of Temporal Order,"
J. Acoust. Soc. Am. 31, 759-767.
The strategy proposed here for discriminating voice- Hirsh, I. J. and Sherrick, C. E o Jr. (1961). "Perceived Order
in Different Sense Modalities," J. Exp. Psychol. 62, 423-
less from voiced stop consonants applies specifically to
432.
single consonants preceding stressed vowels and possi-
House, A. S. (1961). "On Vowel Duration in English," J.
bly for initial consonants in clusters. When the conso- Acoust. Soc. Am. 33, 1174-1178.
nants occur in other environments, different strategies Jakobson, R., Fant, C. G. M., and Halle, M. (1963). Pre-
may have to be invoked in order to distinguish voiced liminaries to SpeechAnalysis (MIT Press, Cambridge, Mass. ).
from voiceless stops. Thus, for example, if the conso- Klatt, D. H. (1972). "Acoustic Theory of Terminal Analog
nant follows a stressed vowel and is either in final posi- Speech Synthesis," Proc. 1972 Int. Conf. on Speech Commu-
tion or preceding an unstressed vowel, the length of the nication and Processing, No. 72 CHO 596-7 AE (IEEE, New
York), pp. 131-135.
vowel is often enough to signal the voicing feature--the
Klatt, D. H. (1973a). "Voice Onset Time, Frication and As-
vowel is shortened preceding a voiceless consonant
piration in Word-Initial Consonant Clusters," Res. Lab. Elec-
(Denes, 1955; House, 1961). In the intervocalic environ- tron. Quart. Progr. Rept. No. 109, M. I. T., pp. 124-136.
ment, the length of the stop gap may also provide a cue, Klatt, D. H. (1973b). "Durational Characteristics of Pre-
the gap being shorter for the voiced consonant (Lisker, stressed Word-Initial Consonant Clusters in English," Res.
1957). In intervocalic pre-unstressed position, the Lab. Electron. Quart. Progr. Rept. No. 108, M. I. T., pp.
659 Stevensand Klatt: Voiced-voicelessdistinction for stops 659
253-260. sion: Some Experiments in Comparative Phonetics," in Proc.

Liberman, A.M., Delattre, P. C., and Cooper, F. S. (1958). 6th Int. Cong. Phonetic Sciences, Prague 1967 (Academia Publ.
"Some Cues for the Distinction Between Voiced and Voiceless House of Czechoslovak Acad. of Sci., Prague), pp. 563-567.
Stops in Initial Position," Language and Speech 1, 153-167. Stevens, K. N. (1968). "Acoustic Correlates of Place of Artic-
Liberman, A.M., Delattre, P. C., Gerstman, L., and Cooper, ulation for Stop and Fricative Consonants," Res. Lab. Elec-
F. S. (1956). "Tempo of Frequency Change as a Cue for Dis- tron. Quart. Progr. Rept. No. 89, M. I. T., pp. 199-205.
tinguishing Classes of Speech Sounds," J. Exp. Psychol. 52, Stevens, K. N. (1971). "The Role of Rapid Spectrum Changes
127-137. in the Production and Perception of Speech," in Form and Sub- '
Liberman, A.M., Harris, K. S., Kinney, J. A., and Lane, stance (Festsehrift for Eli Fiseher-J•rgensen). (Akademisk
H. (1961). "The Discrimination of Relative Onset-Time of Forlag, Copenhagen), pp. 95-101.
the Components of Certain Speech and Non-Speech Patterns," Stevens, K. N. (1972). "The Quantal Nature of Speech: Evi-
J. Exp. Psychol. 61, 379-388. deneeffrom Artieularoty-Aeoustie Data, "in Human Communica-
Lisker, L. (1957). "Closure Duration and the Intervocalic tion: A Unified View, edited by E. E. David Jr. and P. B.
Voiced-Voiceless Distinction in English," Language 33, 42-49. Denes (McGraw-Hill, New York), pp. 51-66.
Lisker, L. and Abramson, A. S. (1964). "A Cross-Language Stevens, K. N. and House, A. S. (1961). "An Acoustical
Study of Voicing in Initial Stops: Acoustical Measurements," Theory of Vowel Production and Some of its Implications,"
Word 20, 384-422. J. Speech and Hearing Res. 4, 303-320.
Lisker, L. and Abramson, A. S. (1967). "Some Effects of Summerfield, A. and Haggard, M. (1972). "Perception of
Context on Voice Onset Time in English Stops," Language and Stop-Voieing--A Rate-Specific Transition Detector?" Speech
Speech 10, 1-28. Perception: Report of Research in Progress Series 2 (No. 1),
Lisker, L. and Abramson, A. S. (1970). "The Voicing Dimen- Dept. Psyehol, Queen's Univ. of Belfast, pp. 1-14.

Distinction For Stops: Role of Formant Transitions in The Voiced-Voiceless

Uploaded by

Copyright:

Available Formats

Distinction For Stops: Role of Formant Transitions in The Voiced-Voiceless

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Distinction For Stops: Role of Formant Transitions in The Voiced-Voiceless

Uploaded by

Copyright:

Available Formats

Role of formant transitions in the voiced-voiceless

ResearchLaboratoryof Electronicsand Department of Electrical Engineering,Massachusetts

Previousresearchon acousticcuesresponsiblefor the voiced-voiceless distinctionin prestressed

INTRODUCTION sify as one phoneme, but can readily discriminate chang-

this was followed in turn by the onset of a synthetic

There were ten different experimental stimuli, cor-

The data indicate that up to a VOT of about 15 msec,

This experiment was designed to determine whether

J. Acoust. Soc. Am., Vol. 55, No. 3, March 1974

z Tape recordings of a number of replications of these

Control parameters were updated at discrete 5-msec rr I000

J. Acoust. Soc. Am., Vol. 55, No. 3, March 1974

Three subjects participated in ABX tests comparing

This just-detectable transition duration at the stimulus

The results of Experiment 1 are in reasonable agree-

J. Acoust. Soc. Am., Vol. 55, No. 3, March 1974

5O I I of this transient spectrum shift provide some of the

4O This hypothesis can be applied to the stimuli used in

Based on the results of Experiments i and 2, we can

J. Acoust. Soc. Am., Vol. 55, No. 3, March 1974

J. Acoust. Soc. Am., Vol. 55, No. 3, March 1974

253-260. sion: Some Experiments in Comparative Phonetics," in Proc.

J. Acoust. Soc. Am., Vol. 55, No. 3, March 1974

You might also like