Journal of the Acoustical Society of America, Oct 1, 1991
The influence of higher level linguistic information on production of duration and pitch patterns... more The influence of higher level linguistic information on production of duration and pitch patterns at syntactic boundaries
Journal of the Acoustical Society of America, Aug 1, 2019
To understand how cochlear implant processing affects emotional prosody recognition in tonal lang... more To understand how cochlear implant processing affects emotional prosody recognition in tonal languages, how normal-hearing (NH) and cochlear-implanted (CI) adults identify four emotions ("angry," "happy," "sad," and "neutral") in short, semantically neutral, Mandarin sentences are compared. Depending on hearing status (CI, NH), adults heard natural speech and/or noise-vocoded speech conditions (4-, 8-, and 16-spectral channels). Results suggest that Mandarin-speaking adults with CIs recognize emotions with similar accuracy as NH listeners attending to spectrally degraded (4-channel) vocoded speech. The accuracy noted for Mandarin appears to be lower than that described in previous studies of English.
ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Modeling cross-lingual speech emotion recognition (SER) has become more prevalent because of its ... more Modeling cross-lingual speech emotion recognition (SER) has become more prevalent because of its diverse applications. Existing studies have mostly focused on technical approaches that adapt the feature, domain, or label across languages, without considering in detail the similarities between the languages. This study focuses on domain adaptation in cross-lingual scenarios using phonetic constraints. This work is framed in a twofold manner. First, we analyze emotion-specific phonetic commonality across languages by identifying common vowels that are useful for SER modeling. Second, we leverage these common vowels as an anchoring mechanism to facilitate cross-lingual SER. We consider American English and Taiwanese Mandarin as a case study to demonstrate the potential of our approach. This work uses two in-the-wild natural emotional speech corpora: MSP-Podcast (American English), and BIIC-Podcast (Taiwanese Mandarin). The proposed unsupervised cross-lingual SER model using these phonetical anchors outperforms the baselines with a 58.64% of unweighted average recall (UAR).
Electromagnetic Articulography (EMA) was used to provide augmented visual feedback in the learnin... more Electromagnetic Articulography (EMA) was used to provide augmented visual feedback in the learning of non-native speech sounds. Eight adult native speakers of English were randomly assigned to one of the two training conditions: (1) conventional L2 speech production training or (2) conventional L2 speech production training with EMA-based kinematic feedback. The participants' speech was perceptually judged by six native speakers of Japanese. The results indicate that kinematic feedback with EMA facilitates the acquisition and maintenance of the Japanese flap consonant, providing superior acquisition and maintenance. The findings suggest augmented visual feedback may play an important role in adults' L2 learning.
To better understand audiovisual speech processing, we investigated the effects of viewing time-s... more To better understand audiovisual speech processing, we investigated the effects of viewing time-synchronized videos of a 3D tongue avatar on vowel production by healthy individuals. A group of 15 American English-speaking subjects heard pink noise over headphones and produced the word head under four viewing conditions: First, while viewing repetitions of the same vowel, /ɛ/ (baseline phase), then during a series of "morphed" videos shifting gradually from /ɛ/ to /ae/ (ramp phase), followed by repetitions of /ae/ (maximum hold phase), and finally repetitions of /ɛ/ (after effects phase). Results of a formant frequency (F1) analysis indicated that the visual mismatch phases (ramp and maximum hold) caused all subjects to align their productions to the visually-presented vowel, /ae/. No subjects reported being aware that their vowel quality had changed. We conclude that the visual moving tongue stimuli produced entrainment to the viewed vowel category, rather than adaptation in the opposite direction of the perturbation. Further experimentation is needed to determine whether these effects are due to inherent imitation behaviors or subjects' lack of agency with the tongue avatar.
A previous report suggested that visual augmented feedback provided by electromagnetic articulogr... more A previous report suggested that visual augmented feedback provided by electromagnetic articulography (EMA) may help patients recover speech motor control in apraxia of speech (AOS) following stroke (Katz, Bharadwaj, & Carstens, 1999). The study used frequent (100%) feedback, a condition thought to increase the rate of skill acquisition but diminish long-term maintenance and generalization. The present study used a multiple-baseline design in the short-term treatment of consonants produced by an individual with aphasia and AOS. Frequent (100%) and infrequent feedback (50%) conditions were included to determine whether properties of feedback scheduling reported in the limb motor literature also apply to the treatment of speech motor control. Methods Participant The participant (AOS2) was a 51-year-old, male monolingual speaker of American English who sustained a left-hemisphere CVA three years before treatment. He was diagnosed with Broca's aphasia based on clinical examination and results of the Short Form BDAE-3 (Goodglass, Kaplan, & Barresi, 2000). Behavioral testing indicated moderate-to-severe AOS and moderate oral apraxia. Participant behaviors were consistent with the unique speech characteristics of AOS as described by McNeil, Robin, & Schmidt (1997) and additionally showed decreased initiation, highly variable speech errors, distortions, groping behaviors, and occasional word perseverations.
An experiment investigated the role of phonological activation in Japanese adults' reading of ide... more An experiment investigated the role of phonological activation in Japanese adults' reading of ideograms (Kanji) and syllabic characters (Kana), using the Stroop effect. A group of 21 native speakers of Japanese completed color-naming (Stroop) and word-reading (reverse-Stroop) tasks with Kanji and Kana scripts. A series of analyses contrasted the reaction time required for different script types; including Kanji color words, Kanji homophones, and Kana. On the hypothesis that a word's pronunciation plays an important role in its semantic activation process, it was predicted that color-naming/word-reading interference and facilitation would be demonstrated for both the Kanji color words and Kanji homophones, with Kanji homophones showing somewhat reduced effects. The results showed robust color-naming (Stroop) patterns for the Kanji color words, significant effects for Kana, and no significant Stroop effects for the Kanji homophones. A word-reading (reverse-Stroop) task revealed uniform effects of interference with incongruent stimuli across the three script types. Taken together, the data suggest different processing routes may be accessed in colornaming and word-reading tasks.
Folia phoniatrica et logopaedica : official organ of the International Association of Logopedics and Phoniatrics (IALP), 2010
Apraxia of speech (AOS) is a motor speech disorder characterized by disturbed spatial and tempora... more Apraxia of speech (AOS) is a motor speech disorder characterized by disturbed spatial and temporal parameters of movement. Research on motor learning suggests that augmented feedback may provide a beneficial effect for training movement. This study examined the effects of the presence and frequency of online augmented visual kinematic feedback (AVKF) and clinician-provided perceptual feedback on speech accuracy in 2 adults with acquired AOS. Within a single-subject multiple-baseline design, AVKF was provided using electromagnetic midsagittal articulography (EMA) in 2 feedback conditions (50 or 100%). Articulator placement was specified for speech motor targets (SMTs). Treated and baselined SMTs were in the initial or final position of single-syllable words, in varying consonant-vowel or vowel-consonant contexts. SMTs were selected based on each participant's pre-assessed erred productions. Productions were digitally recorded and online perceptual judgments of accuracy (including...
Perspectives on Neurophysiology and Neurogenic Speech and Language Disorders, 2010
Electromagnetic articulography (EMA) is a method originally designed for the laboratory measureme... more Electromagnetic articulography (EMA) is a method originally designed for the laboratory measurement of speech articulatory motion (Schönle et al., 1987). We describe a novel use of this technology applied to the remediation of apraxia of speech (AOS). In this experimental technique, individuals with AOS are provided with real-time, visual information concerning the movement of the tongue during speech. From information sent via EMA sensors mounted on the tongue, patients are guided into hitting “targets” displayed on a computer monitor, designed to guide correct articulatory placement. The results of several studies suggest that augmented feedback-based treatment is efficacious and that this treatment follows principles of motor learning described in the limb motor literature. Potential challenges facing this type of approach, as well as some new directions, are discussed.
Foreign accent syndrome (FAS) is a rare disorder characterized by the emergence of a perceived fo... more Foreign accent syndrome (FAS) is a rare disorder characterized by the emergence of a perceived foreign accent following brain damage. Despite decades of study, little is known about the neural substrates involved in this disorder. In this case study, MRI images of the brain were obtained during a speech task for an American English-speaking monolingual female who presented with FAS of unknown etiology and was thought to sound "Swedish" or "Eastern European." On the basis of MR structural imaging, the patient was noted to have frontal lobe atrophy. An fMRI picture-naming task designed to broadly engage the speech motor network revealed predominantly left-hemisphere involvement, including activation of (1) left superior temporal and medial frontal structures, (2) bilateral subcortical structures and thalamus, and (3) left cerebellum. The results suggest an instance of substantial brain reorganization for speech motor control.
The Journal of the Acoustical Society of America, 1985
The reader will readily note that much of Repp’s criticism [J. Acoust. Soc. Am. 78, 1114–1116 (19... more The reader will readily note that much of Repp’s criticism [J. Acoust. Soc. Am. 78, 1114–1116 (1985)] reflects his contention that there has been a procedural transgression on the part of the authors and the Society that establishes a ‘‘dangerous precedent.’’ Since Repp provides a partial account we feel that it is first necessary to address this issue.
The Journal of the Acoustical Society of America, 1996
Using synthetic speech, word duration and fundamental frequency (F0) contours were parametrically... more Using synthetic speech, word duration and fundamental frequency (F0) contours were parametrically manipulated to examine processes of phrasal interpretation by adult and child (5 and 7 years old) listeners. From an adult male voice, versions of the phrase ‘‘pink and green and white’’ were resynthesized to produce stimuli suggesting two possible interpretations: [(pink and green) and white] and [pink and (green and white)]. For each stimulus, listeners pointed to a picture to indicate which interpretation was intended. All subjects used duration and (to a lesser extent) intonation as perceptually salient cues for phrasal interpretation. The manner in which subjects processed this information was evaluated by comparing subjects’ performance with the predictions of three different information processing models: a nonindependent cue-evaluation model, and two independent cue-evaluation models (an additive model, and the multiplicative, fuzzy logical model). Performance was best described...
The Journal of the Acoustical Society of America, 1994
There has been much recent interest in the manner in which children develop mature speech product... more There has been much recent interest in the manner in which children develop mature speech production capabilities. Investigations of both normal speech acquisition and developmental speech production impairments have contributed to our current knowledge base. A consistent finding is that young children’s speech is highly variable, both in its temporal and spectral attributes, and that this variability diminishes gradually with age. However, the exact manner in which articulatory precision emerges (or fails to emerge in speech disorders) continues to be the subject of debate. For example, some research on anticipatory coarticulation suggests a developmental progression from syllable- to segment-based timing strategies while other research suggests the opposite. Similarly, some studies of compensatory articulation show comparable degrees of motor equivalence in young children and adults, while others show that this aspect of motor control emerges gradually with maturation. This presen...
The Journal of the Acoustical Society of America, 2000
Recent studies have shown that time-varying changes in formant pattern contribute to the phonetic... more Recent studies have shown that time-varying changes in formant pattern contribute to the phonetic specification of vowels. This variation could be especially important in children’s vowels, because children have higher fundamental frequencies (f0’s) than adults, and formant-frequency estimation is generally less reliable when f0 is high. To investigate the contribution of time-varying changes in formant pattern to the identification of children’s vowels, three experiments were carried out with natural and synthesized versions of 12 American English vowels spoken by children (ages 7, 5, and 3 years) as well as adult males and females. Experiment 1 showed that (i) vowels generated with a cascade formant synthesizer (with hand-tracked formants) were less accurately identified than natural versions; and (ii) vowels synthesized with steady-state formant frequencies were harder to identify than those which preserved the natural variation in formant pattern over time. The decline in intell...
The Journal of the Acoustical Society of America, 1985
The visual abstraction procedure used in previous studies of declination was tested using 12 subj... more The visual abstraction procedure used in previous studies of declination was tested using 12 subjects who each fit the F0 contours of 19 spoken short simple sentences with baselines. These baselines were found to be poorly replicated by the fitters. An objective all-points least-squares best-fit procedure was tested on this corpus and on a set of sentences that had been produced in both spontaneous and read speech by six speakers. The all-points linear regression line was a better descriptor of the F0 contours than either baselines or toplines. Declination did not always occur in these simple declarative sentences; there was more variation present in the F0 contours of sentences that had been uttered during spontaneous speech; 35% of the spontaneous sentences did not show declination; 45% of these sentences better fit the breath-group model. Their F0 contours could be described by a level all-points linear regression line followed by a falling terminal segment.
The Journal of the Acoustical Society of America, 1995
To examine developmental patterns in the production and perception of American English vowels, re... more To examine developmental patterns in the production and perception of American English vowels, recordings were made of 12 /hVd/ words from 10 men, 10 women, and 30 children (ages 3, 5, 7). Fundamental frequency (F0) and formant center frequencies (F1–F4) were estimated and a subset of the measurements served as input to a cascade formant synthesizer. Natural and synthesized vowels were presented to adult listeners for identification. Overall, natural tokens were identified more accurately than synthesized versions. Performance was significantly lower when time-varying changes in either F1 or F2 were replaced by constant values drawn from the vowel nucleus. A further drop in accuracy resulted when all formants (F1−F4) and F0 were ‘‘flattened,’’ consistent with findings of Hillenbrand [J. Acoust. Soc. Am. 97, 3245(A) (1995)]. These findings highlight the perceptual importance of time-varying changes in vowel spectra. It has been suggested that time-varying changes in the formants can ...
The Journal of the Acoustical Society of America, 1991
This study investigates how children use syllable duration and fundamental frequency cues to iden... more This study investigates how children use syllable duration and fundamental frequency cues to identify phrasal units, and whether children’s cue trading relationships resemble those of adults. Adults and 7-year-old children were asked to complete a picture-pointing, discrimination task based upon auditorally presented stimuli. Auditory stimuli were computer-edited tokens of the ambiguous phrase ‘‘pink and green and white.’’ From a recording by a male speaker, vocoded stimuli were created with five steps of syllable duration and three steps of fundamental frequency. Both continua ranged between patterns suggesting the interpretation [(pink and green) and white] and [(pink) and green and white]. Results indicate that, for adults, both intonation and duration influenced identification of phrasal units, and cue trading was evident. The children were also sensitive to the prosodic manipulations. However, duration had a strong influence on interpretation, pitch had only a slight influence,...
The Journal of the Acoustical Society of America, 1988
A study was undertaken to explore the effects of fixing the mandible with a bite block on the for... more A study was undertaken to explore the effects of fixing the mandible with a bite block on the formant frequencies of the vowels [i a u] produced by two groups of children aged 4 and 5 and 7 and 8 years. Vowels produced in both normal and bite-block conditions were submitted to LPC analysis with windows placed over the first glottal pulse and at the vowel midpoint. For both groups of children, no differences were found in the frequencies of either the first or second formant between the normal and bite-block conditions. Results are discussed in relation to theories of the acquisition of speech motor control.
and-conditions-of-access.pdf This article may be used for research, teaching and private study pu... more and-conditions-of-access.pdf This article may be used for research, teaching and private study purposes. Any substantial or systematic reproduction, redistribution , reselling , loan or sub-licensing, systematic supply or distribution in any form to anyone is expressly forbidden. The publisher does not give any warranty express or implied or make any representation that the contents will be complete or accurate or up to date. The accuracy of any instructions, formulae and drug doses should be independently verified with primary sources. The publisher shall not be liable for any loss, actions, claims, proceedings, demand or costs or damages whatsoever or howsoever caused arising directly or indirectly in connection with or arising out of the use of this material.
Journal of the Acoustical Society of America, Oct 1, 1991
The influence of higher level linguistic information on production of duration and pitch patterns... more The influence of higher level linguistic information on production of duration and pitch patterns at syntactic boundaries
Journal of the Acoustical Society of America, Aug 1, 2019
To understand how cochlear implant processing affects emotional prosody recognition in tonal lang... more To understand how cochlear implant processing affects emotional prosody recognition in tonal languages, how normal-hearing (NH) and cochlear-implanted (CI) adults identify four emotions ("angry," "happy," "sad," and "neutral") in short, semantically neutral, Mandarin sentences are compared. Depending on hearing status (CI, NH), adults heard natural speech and/or noise-vocoded speech conditions (4-, 8-, and 16-spectral channels). Results suggest that Mandarin-speaking adults with CIs recognize emotions with similar accuracy as NH listeners attending to spectrally degraded (4-channel) vocoded speech. The accuracy noted for Mandarin appears to be lower than that described in previous studies of English.
ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Modeling cross-lingual speech emotion recognition (SER) has become more prevalent because of its ... more Modeling cross-lingual speech emotion recognition (SER) has become more prevalent because of its diverse applications. Existing studies have mostly focused on technical approaches that adapt the feature, domain, or label across languages, without considering in detail the similarities between the languages. This study focuses on domain adaptation in cross-lingual scenarios using phonetic constraints. This work is framed in a twofold manner. First, we analyze emotion-specific phonetic commonality across languages by identifying common vowels that are useful for SER modeling. Second, we leverage these common vowels as an anchoring mechanism to facilitate cross-lingual SER. We consider American English and Taiwanese Mandarin as a case study to demonstrate the potential of our approach. This work uses two in-the-wild natural emotional speech corpora: MSP-Podcast (American English), and BIIC-Podcast (Taiwanese Mandarin). The proposed unsupervised cross-lingual SER model using these phonetical anchors outperforms the baselines with a 58.64% of unweighted average recall (UAR).
Electromagnetic Articulography (EMA) was used to provide augmented visual feedback in the learnin... more Electromagnetic Articulography (EMA) was used to provide augmented visual feedback in the learning of non-native speech sounds. Eight adult native speakers of English were randomly assigned to one of the two training conditions: (1) conventional L2 speech production training or (2) conventional L2 speech production training with EMA-based kinematic feedback. The participants' speech was perceptually judged by six native speakers of Japanese. The results indicate that kinematic feedback with EMA facilitates the acquisition and maintenance of the Japanese flap consonant, providing superior acquisition and maintenance. The findings suggest augmented visual feedback may play an important role in adults' L2 learning.
To better understand audiovisual speech processing, we investigated the effects of viewing time-s... more To better understand audiovisual speech processing, we investigated the effects of viewing time-synchronized videos of a 3D tongue avatar on vowel production by healthy individuals. A group of 15 American English-speaking subjects heard pink noise over headphones and produced the word head under four viewing conditions: First, while viewing repetitions of the same vowel, /ɛ/ (baseline phase), then during a series of "morphed" videos shifting gradually from /ɛ/ to /ae/ (ramp phase), followed by repetitions of /ae/ (maximum hold phase), and finally repetitions of /ɛ/ (after effects phase). Results of a formant frequency (F1) analysis indicated that the visual mismatch phases (ramp and maximum hold) caused all subjects to align their productions to the visually-presented vowel, /ae/. No subjects reported being aware that their vowel quality had changed. We conclude that the visual moving tongue stimuli produced entrainment to the viewed vowel category, rather than adaptation in the opposite direction of the perturbation. Further experimentation is needed to determine whether these effects are due to inherent imitation behaviors or subjects' lack of agency with the tongue avatar.
A previous report suggested that visual augmented feedback provided by electromagnetic articulogr... more A previous report suggested that visual augmented feedback provided by electromagnetic articulography (EMA) may help patients recover speech motor control in apraxia of speech (AOS) following stroke (Katz, Bharadwaj, & Carstens, 1999). The study used frequent (100%) feedback, a condition thought to increase the rate of skill acquisition but diminish long-term maintenance and generalization. The present study used a multiple-baseline design in the short-term treatment of consonants produced by an individual with aphasia and AOS. Frequent (100%) and infrequent feedback (50%) conditions were included to determine whether properties of feedback scheduling reported in the limb motor literature also apply to the treatment of speech motor control. Methods Participant The participant (AOS2) was a 51-year-old, male monolingual speaker of American English who sustained a left-hemisphere CVA three years before treatment. He was diagnosed with Broca's aphasia based on clinical examination and results of the Short Form BDAE-3 (Goodglass, Kaplan, & Barresi, 2000). Behavioral testing indicated moderate-to-severe AOS and moderate oral apraxia. Participant behaviors were consistent with the unique speech characteristics of AOS as described by McNeil, Robin, & Schmidt (1997) and additionally showed decreased initiation, highly variable speech errors, distortions, groping behaviors, and occasional word perseverations.
An experiment investigated the role of phonological activation in Japanese adults' reading of ide... more An experiment investigated the role of phonological activation in Japanese adults' reading of ideograms (Kanji) and syllabic characters (Kana), using the Stroop effect. A group of 21 native speakers of Japanese completed color-naming (Stroop) and word-reading (reverse-Stroop) tasks with Kanji and Kana scripts. A series of analyses contrasted the reaction time required for different script types; including Kanji color words, Kanji homophones, and Kana. On the hypothesis that a word's pronunciation plays an important role in its semantic activation process, it was predicted that color-naming/word-reading interference and facilitation would be demonstrated for both the Kanji color words and Kanji homophones, with Kanji homophones showing somewhat reduced effects. The results showed robust color-naming (Stroop) patterns for the Kanji color words, significant effects for Kana, and no significant Stroop effects for the Kanji homophones. A word-reading (reverse-Stroop) task revealed uniform effects of interference with incongruent stimuli across the three script types. Taken together, the data suggest different processing routes may be accessed in colornaming and word-reading tasks.
Folia phoniatrica et logopaedica : official organ of the International Association of Logopedics and Phoniatrics (IALP), 2010
Apraxia of speech (AOS) is a motor speech disorder characterized by disturbed spatial and tempora... more Apraxia of speech (AOS) is a motor speech disorder characterized by disturbed spatial and temporal parameters of movement. Research on motor learning suggests that augmented feedback may provide a beneficial effect for training movement. This study examined the effects of the presence and frequency of online augmented visual kinematic feedback (AVKF) and clinician-provided perceptual feedback on speech accuracy in 2 adults with acquired AOS. Within a single-subject multiple-baseline design, AVKF was provided using electromagnetic midsagittal articulography (EMA) in 2 feedback conditions (50 or 100%). Articulator placement was specified for speech motor targets (SMTs). Treated and baselined SMTs were in the initial or final position of single-syllable words, in varying consonant-vowel or vowel-consonant contexts. SMTs were selected based on each participant's pre-assessed erred productions. Productions were digitally recorded and online perceptual judgments of accuracy (including...
Perspectives on Neurophysiology and Neurogenic Speech and Language Disorders, 2010
Electromagnetic articulography (EMA) is a method originally designed for the laboratory measureme... more Electromagnetic articulography (EMA) is a method originally designed for the laboratory measurement of speech articulatory motion (Schönle et al., 1987). We describe a novel use of this technology applied to the remediation of apraxia of speech (AOS). In this experimental technique, individuals with AOS are provided with real-time, visual information concerning the movement of the tongue during speech. From information sent via EMA sensors mounted on the tongue, patients are guided into hitting “targets” displayed on a computer monitor, designed to guide correct articulatory placement. The results of several studies suggest that augmented feedback-based treatment is efficacious and that this treatment follows principles of motor learning described in the limb motor literature. Potential challenges facing this type of approach, as well as some new directions, are discussed.
Foreign accent syndrome (FAS) is a rare disorder characterized by the emergence of a perceived fo... more Foreign accent syndrome (FAS) is a rare disorder characterized by the emergence of a perceived foreign accent following brain damage. Despite decades of study, little is known about the neural substrates involved in this disorder. In this case study, MRI images of the brain were obtained during a speech task for an American English-speaking monolingual female who presented with FAS of unknown etiology and was thought to sound "Swedish" or "Eastern European." On the basis of MR structural imaging, the patient was noted to have frontal lobe atrophy. An fMRI picture-naming task designed to broadly engage the speech motor network revealed predominantly left-hemisphere involvement, including activation of (1) left superior temporal and medial frontal structures, (2) bilateral subcortical structures and thalamus, and (3) left cerebellum. The results suggest an instance of substantial brain reorganization for speech motor control.
The Journal of the Acoustical Society of America, 1985
The reader will readily note that much of Repp’s criticism [J. Acoust. Soc. Am. 78, 1114–1116 (19... more The reader will readily note that much of Repp’s criticism [J. Acoust. Soc. Am. 78, 1114–1116 (1985)] reflects his contention that there has been a procedural transgression on the part of the authors and the Society that establishes a ‘‘dangerous precedent.’’ Since Repp provides a partial account we feel that it is first necessary to address this issue.
The Journal of the Acoustical Society of America, 1996
Using synthetic speech, word duration and fundamental frequency (F0) contours were parametrically... more Using synthetic speech, word duration and fundamental frequency (F0) contours were parametrically manipulated to examine processes of phrasal interpretation by adult and child (5 and 7 years old) listeners. From an adult male voice, versions of the phrase ‘‘pink and green and white’’ were resynthesized to produce stimuli suggesting two possible interpretations: [(pink and green) and white] and [pink and (green and white)]. For each stimulus, listeners pointed to a picture to indicate which interpretation was intended. All subjects used duration and (to a lesser extent) intonation as perceptually salient cues for phrasal interpretation. The manner in which subjects processed this information was evaluated by comparing subjects’ performance with the predictions of three different information processing models: a nonindependent cue-evaluation model, and two independent cue-evaluation models (an additive model, and the multiplicative, fuzzy logical model). Performance was best described...
The Journal of the Acoustical Society of America, 1994
There has been much recent interest in the manner in which children develop mature speech product... more There has been much recent interest in the manner in which children develop mature speech production capabilities. Investigations of both normal speech acquisition and developmental speech production impairments have contributed to our current knowledge base. A consistent finding is that young children’s speech is highly variable, both in its temporal and spectral attributes, and that this variability diminishes gradually with age. However, the exact manner in which articulatory precision emerges (or fails to emerge in speech disorders) continues to be the subject of debate. For example, some research on anticipatory coarticulation suggests a developmental progression from syllable- to segment-based timing strategies while other research suggests the opposite. Similarly, some studies of compensatory articulation show comparable degrees of motor equivalence in young children and adults, while others show that this aspect of motor control emerges gradually with maturation. This presen...
The Journal of the Acoustical Society of America, 2000
Recent studies have shown that time-varying changes in formant pattern contribute to the phonetic... more Recent studies have shown that time-varying changes in formant pattern contribute to the phonetic specification of vowels. This variation could be especially important in children’s vowels, because children have higher fundamental frequencies (f0’s) than adults, and formant-frequency estimation is generally less reliable when f0 is high. To investigate the contribution of time-varying changes in formant pattern to the identification of children’s vowels, three experiments were carried out with natural and synthesized versions of 12 American English vowels spoken by children (ages 7, 5, and 3 years) as well as adult males and females. Experiment 1 showed that (i) vowels generated with a cascade formant synthesizer (with hand-tracked formants) were less accurately identified than natural versions; and (ii) vowels synthesized with steady-state formant frequencies were harder to identify than those which preserved the natural variation in formant pattern over time. The decline in intell...
The Journal of the Acoustical Society of America, 1985
The visual abstraction procedure used in previous studies of declination was tested using 12 subj... more The visual abstraction procedure used in previous studies of declination was tested using 12 subjects who each fit the F0 contours of 19 spoken short simple sentences with baselines. These baselines were found to be poorly replicated by the fitters. An objective all-points least-squares best-fit procedure was tested on this corpus and on a set of sentences that had been produced in both spontaneous and read speech by six speakers. The all-points linear regression line was a better descriptor of the F0 contours than either baselines or toplines. Declination did not always occur in these simple declarative sentences; there was more variation present in the F0 contours of sentences that had been uttered during spontaneous speech; 35% of the spontaneous sentences did not show declination; 45% of these sentences better fit the breath-group model. Their F0 contours could be described by a level all-points linear regression line followed by a falling terminal segment.
The Journal of the Acoustical Society of America, 1995
To examine developmental patterns in the production and perception of American English vowels, re... more To examine developmental patterns in the production and perception of American English vowels, recordings were made of 12 /hVd/ words from 10 men, 10 women, and 30 children (ages 3, 5, 7). Fundamental frequency (F0) and formant center frequencies (F1–F4) were estimated and a subset of the measurements served as input to a cascade formant synthesizer. Natural and synthesized vowels were presented to adult listeners for identification. Overall, natural tokens were identified more accurately than synthesized versions. Performance was significantly lower when time-varying changes in either F1 or F2 were replaced by constant values drawn from the vowel nucleus. A further drop in accuracy resulted when all formants (F1−F4) and F0 were ‘‘flattened,’’ consistent with findings of Hillenbrand [J. Acoust. Soc. Am. 97, 3245(A) (1995)]. These findings highlight the perceptual importance of time-varying changes in vowel spectra. It has been suggested that time-varying changes in the formants can ...
The Journal of the Acoustical Society of America, 1991
This study investigates how children use syllable duration and fundamental frequency cues to iden... more This study investigates how children use syllable duration and fundamental frequency cues to identify phrasal units, and whether children’s cue trading relationships resemble those of adults. Adults and 7-year-old children were asked to complete a picture-pointing, discrimination task based upon auditorally presented stimuli. Auditory stimuli were computer-edited tokens of the ambiguous phrase ‘‘pink and green and white.’’ From a recording by a male speaker, vocoded stimuli were created with five steps of syllable duration and three steps of fundamental frequency. Both continua ranged between patterns suggesting the interpretation [(pink and green) and white] and [(pink) and green and white]. Results indicate that, for adults, both intonation and duration influenced identification of phrasal units, and cue trading was evident. The children were also sensitive to the prosodic manipulations. However, duration had a strong influence on interpretation, pitch had only a slight influence,...
The Journal of the Acoustical Society of America, 1988
A study was undertaken to explore the effects of fixing the mandible with a bite block on the for... more A study was undertaken to explore the effects of fixing the mandible with a bite block on the formant frequencies of the vowels [i a u] produced by two groups of children aged 4 and 5 and 7 and 8 years. Vowels produced in both normal and bite-block conditions were submitted to LPC analysis with windows placed over the first glottal pulse and at the vowel midpoint. For both groups of children, no differences were found in the frequencies of either the first or second formant between the normal and bite-block conditions. Results are discussed in relation to theories of the acquisition of speech motor control.
and-conditions-of-access.pdf This article may be used for research, teaching and private study pu... more and-conditions-of-access.pdf This article may be used for research, teaching and private study purposes. Any substantial or systematic reproduction, redistribution , reselling , loan or sub-licensing, systematic supply or distribution in any form to anyone is expressly forbidden. The publisher does not give any warranty express or implied or make any representation that the contents will be complete or accurate or up to date. The accuracy of any instructions, formulae and drug doses should be independently verified with primary sources. The publisher shall not be liable for any loss, actions, claims, proceedings, demand or costs or damages whatsoever or howsoever caused arising directly or indirectly in connection with or arising out of the use of this material.
Uploads
Papers by William Katz