University Professor in English Linguistics with specialisations in Phonetics/Phonology/Morphology at the Department of English and American Studies, English Language and Linguistics, Heinrich Heine University Düsseldorf.
Research interests: computational linguistics, laboratory phonology, psycholinguistics, human health (linguistic biomarkers, precision health & speech pathology).
The Journal of the Acoustical Society of America, 2024
Predictions of gradient degree of lenition of voiceless and voiced stops in a corpus of Argentine... more Predictions of gradient degree of lenition of voiceless and voiced stops in a corpus of Argentine Spanish are evaluated using three acoustic measures (minimum and maximum intensity velocity and duration) and two recurrent neural network (Phonet) measures (posterior probabilities of sonorant and continuant phonological features). While mixed and inconsistent predictions were obtained across the acoustic metrics, sonorant and continuant probability values were consistently in the direction predicted by known factors of a stop's lenition with respect to its voicing, place of articulation, and surrounding contexts. The results suggest the effectiveness of Phonet as an additional or alternative method of lenition measurement. Furthermore, this study has enhanced the accessibility of Phonet by releasing the trained Spanish Phonet model used in this study and a pipeline with step-by-step instructions for training and inferencing new models. V
We review and elaborate an account of consonantal strength that is founded on the model of speech... more We review and elaborate an account of consonantal strength that is founded on the model of speech as a modulated carrier signal. The stronger the consonant, the greater the modulation. Unlike approaches based on sonority or articulatory aperture, the account offers a uniform definition of the phonetic effect lenition has on consonants: All types of lenition (such as debuccalisation, spirantisation and vocalisation) reduce the extent to which a consonant modulates the carrier. To demonstrate the quantifiability of this account, we present an analysis of Ibibio, in which we investigate the effects of lenition on the amplitude, periodicity and temporal properties of consonants. We propose a method for integrating these different acoustic dimensions within an overall measure of modulation size. Not only does the modulated carrier account cover all the classically recognised lenition types, but it also encompasses loss of plosive release in final stops – which, although not traditionally classed as lenition, is clearly related to processes that are.
Objective: This study investigated the degrees of lenition, or consonantal weakening, in the prod... more Objective: This study investigated the degrees of lenition, or consonantal weakening, in the production of Spanish stop consonants by native English speakers during a study abroad (SA) program. Lenition is a key phonological process in Spanish, where voiced stops (/b/, /d/, /ɡ/) typically weaken to fricatives or approximants in specific phonetic environments. For L2 learners, mastering this subtle process is essential for achieving native-like pronunciation. Methods: To assess the learners’ progress in acquiring lenition, we employed Phonet, a deep learning model. Unlike traditional quantitative acoustic methods that focus on measuring the physical properties of speech sounds, Phonet utilizes recurrent neural networks to predict the posterior probabilities of phonological features, particularly sonorant and continuant characteristics, which are central to the lenition process. Results: The results indicated that while learners showed progress in producing the fricative-like variants of lenition during the SA program and understood how to produce lenition in appropriate contexts, the retention of these phonological gains was not sustained after their return. Additionally, unlike native speakers, the learners never fully achieved the approximant-like realization of lenition. Conclusions: These findings underscore the need for sustained exposure and practice beyond the SA experience to ensure the long-term retention of L2 phonological patterns. While SA programs offer valuable opportunities for enhancing L2 pronunciation, they should be supplemented with ongoing support to consolidate and extend the gains achieved during the immersive experience.
Linguistic alignment, the tendency of speakers to share common linguistic features during convers... more Linguistic alignment, the tendency of speakers to share common linguistic features during conversations, has emerged as a key area of research in computer-supported collaborative learning. While previous studies have shown that linguistic alignment can have a significant impact on collaborative outcomes, there is limited research exploring its role in K-12 learning contexts. This study investigates syntactic and lexical linguistic alignments in a collaborative computer sciencelearning corpus from 24 pairs (48 individuals) of middle school students (aged 11-13). The results show stronger effects of self-alignment than partner alignment on both syntactic and lexical levels, with students often diverging from their partners on task-relevant words. Furthermore, student self-alignment on the syntactic level is negatively correlated with partner satisfaction ratings, while self-alignment on lexical level is positively correlated with their partner's satisfaction.
Introduction: Automatic recognition of stutters (ARS) from speech recordings can facilitate objec... more Introduction: Automatic recognition of stutters (ARS) from speech recordings can facilitate objective assessment and intervention for people who stutter. However, the performance of ARS systems may depend on how the speech data are segmented and labelled for training and testing. This study compared two segmentation methods: event-based, which delimits speech segments by their fluency status, and interval-based, which uses fixed-length segments regardless of fluency.
Methods: Machine learning models were trained and evaluated on interval-based and event-based stuttered speech corpora. The models used acoustic and linguistic features extracted from the speech signal and the transcriptions generated by a state-of-the-art automatic speech recognition system.
Results: The results showed that event-based segmentation led to better ARS performance than interval-based segmentation, as measured by the area under the curve (AUC) of the receiver operating characteristic. The results suggest differences in the quality and quantity of the data because of segmentation method. The inclusion of linguistic features improved the detection of whole-word repetitions, but not other types of stutters.
Discussion: The findings suggest that event-based segmentation is more suitable for ARS than interval-based segmentation, as it preserves the exact boundaries and types of stutters. The linguistic features provide useful information for separating supra-lexical disfluencies from fluent speech but may not capture the acoustic characteristics of stutters. Future work should explore more robust and diverse features, as well as larger and more representative datasets, for developing effective ARS systems.
Degeneration of nerve cells that control cognitive, speech, and language processes leading to lin... more Degeneration of nerve cells that control cognitive, speech, and language processes leading to linguistic impairments at various levels, from verbal utterances to individual speech sounds, could indicate signs of neurological, cognitive and psychiatric disorders such as Alzheimer’s disease (AD), Parkinson’s disease (PD), amyotrophic lateral sclerosis (ALS), dementias, depression, autism spectrum disorder, schizophrenia, etc. Currently, these disorders are diagnosed using specific clinical diagnostic criteria and neuropsychological examinations. However, speech-based biomarkers could potentially offer many advantages over current clinical standards. In addition to being objective and naturalistic, they can also be collected remotely with minimal instruction and time requirements. Furthermore, Machine Learning algorithms developed to build automated diagnostic models using linguistic features extracted from speech could aid diagnosis of patients with probable diseases from a group of normal population.
To ensure that speech-based biomarkers are providing accurate measurement and can serve as effective clinical tools for detecting and monitoring disease, speech features extracted and analyzed must be systematically and rigorously evaluated. Different machine learning architectures trained to classify different types of disordered speech must also be rigorously tested and systematically compared.
For speech measures, three categories of evaluation have been proposed: verification, analytical validation, and clinical validation. Verification includes assessing and comparing the quality of speech recordings across hardware and recording conditions. Analytical validation entails checking the accuracy and reliability of data processing and computed measures to ensure that they are accurately measuring the intended phenomena. Clinical validity involves verifying the correspondence of a measure to clinical diagnosis, disease severity/progression, and/or response to treatment outcomes.
For machine learning algorithms, analytical and clinical validation apply. For example, the accuracy of different algorithms can be compared in different clinical groups for different outcome measures.
This Research Topic aims at bringing together research on the effectiveness of speech-based as biomarkers for the clinical diagnosis or the evaluation of disease severity and prognosis from related disciplines including cognitive neurosciences, computer sciences, engineering, linguistics, speech, communication sciences, etc. We welcome original research or systematic reviews on any of the three categories of evaluation of the speech measures: verification, analytical validation, clinical validation as well as NLP tools used to model clinical detection, classification and evaluation of disease severity/progression and/or response to treatment outcomes.
Topics may include, but are not limited to: • Automatic analysis of dysarthric speech (e.g. typical and atypical Parkinsonism, Huntington's disease, Multiple Sclerosis, Amyotrophic Lateral Sclerosis); • Early detection and classification of Neurodevelopmental Disorders (e.g. Autism Spectrum Disorder (ASD), Speech Language Impairment (SLI), Attention Deficit and Hyperactivity Disorder (ADHD)); • Cognitive assessment and clinical phenotypization of (e.g., Alzheimer's disease, cognitive Impairment including substance-induced cognitive impairment, dementia); • Mental illness screening and diagnosis (e.g., Post-traumatic Stress Disorder (PTSD), Depressive Disorder, anxiety disorder, Bipolar Disorder, Schizophrenia); • Novel methods and tools used to collect speech samples for the assessment of neurological, cognitive and psychiatric disorders.
Belonging to a university shapes wellbeing and academic outcomes for first-year students, yet thi... more Belonging to a university shapes wellbeing and academic outcomes for first-year students, yet this belongingness is harder to achieve for those from lower socioeconomic backgrounds. This study delved into the flexible construct of status-the individual's perceived position within the university's social hierarchy and the strategy they adopt to achieve that position-and its impact on their belongingness. The objective was to identify key psychological contributors that could impact first-year Psychology students expected social status and thereby their belongingness. A cross-sectional study tested first year Psychology students entering university in 2021 and 2022.The first-year students completed a battery of questionnaires to ascertain their status, belongingness to the university, mental state, and personalities. Structural equation modelling (SEM) was employed to evaluate a social ecological model focusing on belongingness. This analysis investigated the mediating role of peer status (popularity among peers) in the relationship between mental state and belongingness, and the moderating influence of personality traits on the connections between mental state and peer status. Both the mediation and moderation effects were statistically significant after adjusting for gender and ethnicity. The findings offer insights into how university administrations can effectively support students, particularly those from lower socioeconomic backgrounds, in enhancing their social status among peers and fostering a stronger belongingness, thereby promoting their overall mental wellbeing and success in their academic pursuits.
African American Language (AAL) is a marginalized variety of American English that has been under... more African American Language (AAL) is a marginalized variety of American English that has been understudied due to a lack of accessible data. This lack of data has made it difficult to research language in African American communities and has been shown to cause emerging technologies such as Automatic Speech Recognition (ASR) to perform worse for African American speakers. To address this gap, the Joel Buchanan Archive of African American Oral History (JBA) at the University of Florida is being compiled into a time-aligned and linguistically annotated corpus. Through Natural Language Processing (NLP) techniques, this project will automatically time-align spoken data with transcripts and automatically tag AAL features. Transcription and time-alignment challenges have arisen as we ensure accuracy in depicting AAL morphosyntactic and phonetic structure. Two linguistic studies illustrate how the African American Corpus from Oral Histories betters our understanding of this lesser-studied variety.
The relative weighting of f0 and vowel reduction in English spoken word recognition at the senten... more The relative weighting of f0 and vowel reduction in English spoken word recognition at the sentence level were investigated in one two-alternative forced-choice word identification experiment. In the experiment, an H* pitch-accented or a deaccented word fragment (e.g., AR- in the word archive) was presented at the end of a carrier sentence for identification. The results of the experiment revealed differences in the cue weighting of English lexical stress perception between native and non-native listeners. For native English listeners, vowel quality was a more prominent cue than f0, while native Mandarin Chinese listeners employed both vowel quality and f0 in a comparable fashion. These results suggested that (a) vowel reduction is superior to f0 in signaling initial stress in the words and (b) f0 facilitates the recognition of word initial stress, which is modulated by first language.
International Journal of Language \& Communication Disorders, 2023
Background
Non-word repetition (NWR) tests are an important way speech and language therapists ... more Background
Non-word repetition (NWR) tests are an important way speech and language therapists (SaLTs) assess language development. NWR tests are often scored whilst participants make their responses (i.e., in real time) in clinical and research reports (documented here via a secondary analysis of a published systematic review). Aims
The main aim was to determine the extent to which real-time coding of NWR stimuli at the whole-item level (as correct/incorrect) was predicted by models that had varying levels of detail provided from phonemic transcriptions using several linear mixed method (LMM) models. Methods & Procedures
Live scores and recordings of responses on the universal non-word repetition (UNWR) test were available for 146 children aged between 3 and 6 years where the sample included all children starting in five UK schools in one year or two consecutive years. Transcriptions were made of responses to two-syllable NWR stimuli for all children and these were checked for reliability within and between transcribers. Signal detection analysis showed that consonants were missed when judgments were made live. Statistical comparisons of the discrepancies between target stimuli and transcriptions of children's responses were then made and these were regressed against live score accuracy. Six LMM models (three normalized: 1a, 2a, 3a; and three non-normalized: 1b, 2b, 3b) were examined to identify which model(s) best captured the data variance. Errors on consonants for live scores were determined by comparison with the transcriptions in the following ways (the dependent variables for each pair of models): (1) consonants alone; (2) substitutions, deletions and insertions of consonants identified after automatic alignment of live and transcribed materials; and (3) as with (2) but where substitutions were coded further as place, manner and voicing errors. Outcomes & Results
The normalized model that coded consonants in non-words as ‘incorrect’ at the level of substitutions, deletions and insertions (2b) provided the best fit to the real-time coding responses in terms of marginal R2, Akaike's information criterion (AIC) and Bayesian information criterion (BIC) statistics. Conclusions & Implications
Errors that occur on consonants when non-word stimuli are scored in real time are characterized solely by the substitution, deletion and insertion measure. It is important to know that such errors arise when real-time judgments are made because NWR tasks are used to assess and diagnose several cognitive–linguistic impairments. One broader implication of the results is that future work could automate the analysis procedures to provide the required information objectively and quickly without having to transcribe data.
WHAT THIS PAPER ADDS What is already known on this subject
Children and patients with a wide range of cognitive and language difficulties are less accurate relative to controls when they attempt to repeat non-words. Responses to non-words are often scored as correct or incorrect at the time the test is conducted. Limited assessments of this scoring procedure have been conducted to date.
What this study adds to the existing knowledge
Live NWR scores made by 146 children were available and the accuracy of these judgements was assessed here against ones based on phonemic transcriptions. Signal detection analyses showed that live scoring missed consonant errors in children's responses. Further analyses, using linear mixed effect models, showed that live judgments led to consonant substitution, deletion and insertion errors.
What are the practical and clinical implications of this work?
Improved and practicable NWR scoring procedures are required to provide SaLTs with better indications about children's language development (typical and atypical) and for clinical assessments of older people. The procedures currently used miss substitutions, deletions and insertions. Hence, procedures are required that provide the information currently only available when materials are transcribed manually. The possibility of training automatic speech recognizers to provide this level of detail is raised.
Intoxication and pitch control in tonal and non-tonal language speakers, 2022
Alcohol intoxication is known to affect pitch variability in non-tonal languages. In this study, ... more Alcohol intoxication is known to affect pitch variability in non-tonal languages. In this study, intoxication’s effects on pitch were examined in tonal and non-tonal language speakers, in both their native language (L1; German, Korean, Mandarin) and nonnative language (L2; English). Intoxication significantly increased pitch variability in the German group (in L1 and L2), but not in the Korean or Mandarin groups (in L1 or L2), although there were individual differences. These results support the view that pitch control is related to the functional load of pitch and is an aspect of speech production that can be advantageously transferred across languages, overriding the expected effects of alcohol.
Laboratory Phonology: Journal of the Association for Laboratory Phonology, 2023
Artificial language learning research has become a popular tool to investigate universal mechanis... more Artificial language learning research has become a popular tool to investigate universal mechanisms in language learning. However, often it is unclear whether the found effects are due to learning, or due to artefacts of the native language or the artificial language, and whether findings in only one language will generalise to speakers of other languages. The present study offers a new approach to model the influence of both the L1 and the target artificial language on language learning. The idea is to control for linguistic factors of the artificial and the native language by incorporating measures of wordlikeness into the statistical analysis as covariates. To demonstrate the approach, we extend Linzen and Gallagher (2017)’s study on consonant identity pattern to evaluate whether speakers of German and Mandarin rapidly learn the pattern when influences of L1 and the artificial language are accounted for by incorporating measures assessed by analogical and discriminative learning models over the L1 and artificial lexicon. Results show that nonwords are more likely to be accepted as grammatical if they are more similar to the trained artificial lexicon and more different from the L1 and, crucially, the identity effect is still present. The proposed approach is helpful for designing cross-linguistic studies.
Spanish voiced stops /b, d, ɡ/ surfaced as fricatives [β, ð, ɣ] in intervocalic position due to a... more Spanish voiced stops /b, d, ɡ/ surfaced as fricatives [β, ð, ɣ] in intervocalic position due to a phonological process known as spirantization or, more broadly, lenition. However, conditioned by various factors such as stress, place of articulation, flanking vowel quality, and speaking rate, phonetic studies reveal a great deal of variation and gradience of these surface forms, ranging from fricative-like to approximant-like [βT, ðT, ɣT]. Several acoustic measurements have been used to quantify the degree of lenition, but none is standard. In this study, the posterior probabilities of sonorant and continuant phonological features in a corpus of Argentinian Spanish estimated by a deep learning Phonet model as measures of lenition were compared to traditional acoustic measurements of intensity, duration, and periodicity. When evaluated against known lenition factors: stress, place of articulation, surrounding vowel quality, word status, and speaking rate, the results show that sonorant and continuant posterior probabilities predict lenition patterns that are similar to those predicted by relative acoustic intensity measures and are in the direction expected by the effort-based view of lenition and previous findings. These results suggest that Phonet is a reliable alternative or additional approach to investigate the degree of lenition.
We review and elaborate an account of consonantal strength that is founded on the model of speech... more We review and elaborate an account of consonantal strength that is founded on the model of speech as a modulated carrier signal. The stronger the consonant, the greater the modulation. Unlike approaches based on sonority or articulatory aperture, the account offers a uniform definition of the phonetic effect lenition has on consonants: all types of lenition (such as debuccalisation, spirantisation, and vocalisation) reduce the extent to which a consonant modulates the carrier. To demonstrate the quantifiability of this account, we present an analysis of Ibibio, in which we investigate the effects of lenition on the amplitude, periodicity, and temporal properties of consonants. We propose a method for integrating these different acoustic dimensions within an overall measure of modulation size. Not only does the modulated-carrier account cover all the classically recognised lenition types, but it also encompasses loss of plosive release in final stops – which, although not traditionally classed as lenition, is clearly related to processes that are.
The Journal of the Acoustical Society of America, 2023
A deep learning Phonet model was evaluated as a method to measure lenition. Unlike quantitative a... more A deep learning Phonet model was evaluated as a method to measure lenition. Unlike quantitative acoustic methods, recurrent networks were trained to recognize the posterior probabilities of sonorant and continuant phonological features in a corpus of Argentinian Spanish. When applied to intervocalic and post-nasal voiced and voiceless stops, the approach yielded lenition patterns similar to those previously reported. Further, additional patterns also emerged. The results suggest the validity of the approach as an alternative or addition to quantitative acoustic measures of lenition.
The Journal of the Acoustical Society of America, 2018
The probability is one of the many factors which influence phonetic variation. Contextual probabi... more The probability is one of the many factors which influence phonetic variation. Contextual probability, which describes how predictable a linguistic unit is in some local environment, has been consistently shown to modulate the phonetic salience of words and other linguistic units in speech production (the probabilistic reduction effect). In this paper the question of whether the probabilistic reduction effect, as previously observed for majority languages like English, is also found in a language (Kaqchikel Mayan) which has relatively rich morphology is explored. Specifically, whether the contextual predictability of words and morphemes influences their phonetic duration in Kaqchikel is examined. It is found that the contextual predictability of a word has a significant effect on its duration. The effect is manifested differently for lexical words and function words. It is also found that the contextual predictability of certain prefixes in Kaqchikel affects their duration, showing ...
Procedures were designed to test for the effects of working-memory training on children at risk o... more Procedures were designed to test for the effects of working-memory training on children at risk of fluency difficulty that apply to English and to many of the languages spoken by children with English as an Additional Language (EAL) in UK schools. Working-memory training should: (1) improve speech fluency in high-risk children; (2) enhance non-word repetition (NWR) (phonological) skills for all children; (3) not affect word-finding abilities. Children starting general education (N = 232) were screened to identify those at risk of fluency difficulty. Children were selected who were at high-risk (12), or low-risk (27) of fluency difficulty. For the low-risk children 10 received, and 17 did not receive, the working-memory training. All children in the treatment groups received working-memory training over a 2-week period. For the high-risk group, fluency improved and lasted for at least a week after the end of the study. Phonological skills improved in this group and in the low-risk group who received the training and the improvements continued for at least a week. The low-risk group who did not receive working-memory training showed no improvements, and no group improved word-finding ability.
The average predictability (aka informativity) of a word in context has been shown to condition w... more The average predictability (aka informativity) of a word in context has been shown to condition word duration (Seyfarth, 2014). All else being equal, words that tend to occur in more predictable environments are shorter than words that tend to occur in less predictable environments. One account of the informativity effect on duration is that the acoustic details of probabilistic reduction are stored as part of a word's mental representation. Other research has argued that predictability effects are tied to prosodic structure in integral ways. With the aim of assessing a potential prosodic basis for informativity effects in speech production, this study extends past work in two directions; it investigated informativity effects in another large language, Mandarin Chinese, and broadened the study beyond word duration to additional acoustic dimensions, pitch and intensity, known to index prosodic prominence. The acoustic information of content words was extracted from a large telephone conversation speech corpus with over 400,000 tokens and 6000 word types spoken by 1655 individuals and analyzed for the effect of informativity using frequency statistics estimated from a 431 million word subtitle corpus. Results indicated that words with low informativity have shorter durations, replicating the effect found in English. In addition, informativity had significant effects on maximum pitch and intensity, two phonetic dimensions related to prosodic prominence. Extending this interpretation, these results suggest that predictability is closely linked to prosodic prominence, and that the lexical representation of a word includes phonetic details associated with its average prosodic prominence in discourse. In other words, the lexicon absorbs prosodic influences on speech production.
Autonomous technology has the potential to greatly benefit personal transportation, last-mile de... more Autonomous technology has the potential to greatly benefit personal transportation, last-mile delivery, logistics, and many other mobility applications. In many of these applications, the mobility infrastructure is a shared resource in which all the players must cooperate. In fact, the driving task has been described as a “tango” where we—as humans—cooperate to enable a robust transportation system. Can autonomous systems participate in this tango? Does that even make sense? This report will examine the current interaction points between humans and autonomous systems, the shortcomings of the current state of these systems with a particular focus on advanced driver assistance systems, the requirements for human machine interfaces as imposed by human perception, and finally, the progress being made to close the gap.
Proceedings of the Royal Society B: Biological Sciences, 2020
Classic linguistic theory ascribes language change and diversity to population migrations, conque... more Classic linguistic theory ascribes language change and diversity to population migrations, conquests, and geographical isolation, with the assumption that human populations have equivalent language processing abilities. We hypothesize that spectral and temporal characteristics make some consonant manners vulnerable to differences in temporal precision associated with specific population allele frequencies. To test this hypothesis, we modelled association between RU1-1 alleles of DCDC2 and manner of articulation in 51 populations spanning five continents, and adjusting for geographical proximity, and genetic and linguistic relatedness. RU1-1 alleles, acting through increased expression of DCDC2, appear to increase auditory processing precision that enhances stop-consonant discrimination, favouring retention in some populations and loss by others. These findings enhance classical linguistic theories by adding a genetic dimension, which until recently, has not been considered to be a significant catalyst for language change.
The Journal of the Acoustical Society of America, 2024
Predictions of gradient degree of lenition of voiceless and voiced stops in a corpus of Argentine... more Predictions of gradient degree of lenition of voiceless and voiced stops in a corpus of Argentine Spanish are evaluated using three acoustic measures (minimum and maximum intensity velocity and duration) and two recurrent neural network (Phonet) measures (posterior probabilities of sonorant and continuant phonological features). While mixed and inconsistent predictions were obtained across the acoustic metrics, sonorant and continuant probability values were consistently in the direction predicted by known factors of a stop's lenition with respect to its voicing, place of articulation, and surrounding contexts. The results suggest the effectiveness of Phonet as an additional or alternative method of lenition measurement. Furthermore, this study has enhanced the accessibility of Phonet by releasing the trained Spanish Phonet model used in this study and a pipeline with step-by-step instructions for training and inferencing new models. V
We review and elaborate an account of consonantal strength that is founded on the model of speech... more We review and elaborate an account of consonantal strength that is founded on the model of speech as a modulated carrier signal. The stronger the consonant, the greater the modulation. Unlike approaches based on sonority or articulatory aperture, the account offers a uniform definition of the phonetic effect lenition has on consonants: All types of lenition (such as debuccalisation, spirantisation and vocalisation) reduce the extent to which a consonant modulates the carrier. To demonstrate the quantifiability of this account, we present an analysis of Ibibio, in which we investigate the effects of lenition on the amplitude, periodicity and temporal properties of consonants. We propose a method for integrating these different acoustic dimensions within an overall measure of modulation size. Not only does the modulated carrier account cover all the classically recognised lenition types, but it also encompasses loss of plosive release in final stops – which, although not traditionally classed as lenition, is clearly related to processes that are.
Objective: This study investigated the degrees of lenition, or consonantal weakening, in the prod... more Objective: This study investigated the degrees of lenition, or consonantal weakening, in the production of Spanish stop consonants by native English speakers during a study abroad (SA) program. Lenition is a key phonological process in Spanish, where voiced stops (/b/, /d/, /ɡ/) typically weaken to fricatives or approximants in specific phonetic environments. For L2 learners, mastering this subtle process is essential for achieving native-like pronunciation. Methods: To assess the learners’ progress in acquiring lenition, we employed Phonet, a deep learning model. Unlike traditional quantitative acoustic methods that focus on measuring the physical properties of speech sounds, Phonet utilizes recurrent neural networks to predict the posterior probabilities of phonological features, particularly sonorant and continuant characteristics, which are central to the lenition process. Results: The results indicated that while learners showed progress in producing the fricative-like variants of lenition during the SA program and understood how to produce lenition in appropriate contexts, the retention of these phonological gains was not sustained after their return. Additionally, unlike native speakers, the learners never fully achieved the approximant-like realization of lenition. Conclusions: These findings underscore the need for sustained exposure and practice beyond the SA experience to ensure the long-term retention of L2 phonological patterns. While SA programs offer valuable opportunities for enhancing L2 pronunciation, they should be supplemented with ongoing support to consolidate and extend the gains achieved during the immersive experience.
Linguistic alignment, the tendency of speakers to share common linguistic features during convers... more Linguistic alignment, the tendency of speakers to share common linguistic features during conversations, has emerged as a key area of research in computer-supported collaborative learning. While previous studies have shown that linguistic alignment can have a significant impact on collaborative outcomes, there is limited research exploring its role in K-12 learning contexts. This study investigates syntactic and lexical linguistic alignments in a collaborative computer sciencelearning corpus from 24 pairs (48 individuals) of middle school students (aged 11-13). The results show stronger effects of self-alignment than partner alignment on both syntactic and lexical levels, with students often diverging from their partners on task-relevant words. Furthermore, student self-alignment on the syntactic level is negatively correlated with partner satisfaction ratings, while self-alignment on lexical level is positively correlated with their partner's satisfaction.
Introduction: Automatic recognition of stutters (ARS) from speech recordings can facilitate objec... more Introduction: Automatic recognition of stutters (ARS) from speech recordings can facilitate objective assessment and intervention for people who stutter. However, the performance of ARS systems may depend on how the speech data are segmented and labelled for training and testing. This study compared two segmentation methods: event-based, which delimits speech segments by their fluency status, and interval-based, which uses fixed-length segments regardless of fluency.
Methods: Machine learning models were trained and evaluated on interval-based and event-based stuttered speech corpora. The models used acoustic and linguistic features extracted from the speech signal and the transcriptions generated by a state-of-the-art automatic speech recognition system.
Results: The results showed that event-based segmentation led to better ARS performance than interval-based segmentation, as measured by the area under the curve (AUC) of the receiver operating characteristic. The results suggest differences in the quality and quantity of the data because of segmentation method. The inclusion of linguistic features improved the detection of whole-word repetitions, but not other types of stutters.
Discussion: The findings suggest that event-based segmentation is more suitable for ARS than interval-based segmentation, as it preserves the exact boundaries and types of stutters. The linguistic features provide useful information for separating supra-lexical disfluencies from fluent speech but may not capture the acoustic characteristics of stutters. Future work should explore more robust and diverse features, as well as larger and more representative datasets, for developing effective ARS systems.
Degeneration of nerve cells that control cognitive, speech, and language processes leading to lin... more Degeneration of nerve cells that control cognitive, speech, and language processes leading to linguistic impairments at various levels, from verbal utterances to individual speech sounds, could indicate signs of neurological, cognitive and psychiatric disorders such as Alzheimer’s disease (AD), Parkinson’s disease (PD), amyotrophic lateral sclerosis (ALS), dementias, depression, autism spectrum disorder, schizophrenia, etc. Currently, these disorders are diagnosed using specific clinical diagnostic criteria and neuropsychological examinations. However, speech-based biomarkers could potentially offer many advantages over current clinical standards. In addition to being objective and naturalistic, they can also be collected remotely with minimal instruction and time requirements. Furthermore, Machine Learning algorithms developed to build automated diagnostic models using linguistic features extracted from speech could aid diagnosis of patients with probable diseases from a group of normal population.
To ensure that speech-based biomarkers are providing accurate measurement and can serve as effective clinical tools for detecting and monitoring disease, speech features extracted and analyzed must be systematically and rigorously evaluated. Different machine learning architectures trained to classify different types of disordered speech must also be rigorously tested and systematically compared.
For speech measures, three categories of evaluation have been proposed: verification, analytical validation, and clinical validation. Verification includes assessing and comparing the quality of speech recordings across hardware and recording conditions. Analytical validation entails checking the accuracy and reliability of data processing and computed measures to ensure that they are accurately measuring the intended phenomena. Clinical validity involves verifying the correspondence of a measure to clinical diagnosis, disease severity/progression, and/or response to treatment outcomes.
For machine learning algorithms, analytical and clinical validation apply. For example, the accuracy of different algorithms can be compared in different clinical groups for different outcome measures.
This Research Topic aims at bringing together research on the effectiveness of speech-based as biomarkers for the clinical diagnosis or the evaluation of disease severity and prognosis from related disciplines including cognitive neurosciences, computer sciences, engineering, linguistics, speech, communication sciences, etc. We welcome original research or systematic reviews on any of the three categories of evaluation of the speech measures: verification, analytical validation, clinical validation as well as NLP tools used to model clinical detection, classification and evaluation of disease severity/progression and/or response to treatment outcomes.
Topics may include, but are not limited to: • Automatic analysis of dysarthric speech (e.g. typical and atypical Parkinsonism, Huntington's disease, Multiple Sclerosis, Amyotrophic Lateral Sclerosis); • Early detection and classification of Neurodevelopmental Disorders (e.g. Autism Spectrum Disorder (ASD), Speech Language Impairment (SLI), Attention Deficit and Hyperactivity Disorder (ADHD)); • Cognitive assessment and clinical phenotypization of (e.g., Alzheimer's disease, cognitive Impairment including substance-induced cognitive impairment, dementia); • Mental illness screening and diagnosis (e.g., Post-traumatic Stress Disorder (PTSD), Depressive Disorder, anxiety disorder, Bipolar Disorder, Schizophrenia); • Novel methods and tools used to collect speech samples for the assessment of neurological, cognitive and psychiatric disorders.
Belonging to a university shapes wellbeing and academic outcomes for first-year students, yet thi... more Belonging to a university shapes wellbeing and academic outcomes for first-year students, yet this belongingness is harder to achieve for those from lower socioeconomic backgrounds. This study delved into the flexible construct of status-the individual's perceived position within the university's social hierarchy and the strategy they adopt to achieve that position-and its impact on their belongingness. The objective was to identify key psychological contributors that could impact first-year Psychology students expected social status and thereby their belongingness. A cross-sectional study tested first year Psychology students entering university in 2021 and 2022.The first-year students completed a battery of questionnaires to ascertain their status, belongingness to the university, mental state, and personalities. Structural equation modelling (SEM) was employed to evaluate a social ecological model focusing on belongingness. This analysis investigated the mediating role of peer status (popularity among peers) in the relationship between mental state and belongingness, and the moderating influence of personality traits on the connections between mental state and peer status. Both the mediation and moderation effects were statistically significant after adjusting for gender and ethnicity. The findings offer insights into how university administrations can effectively support students, particularly those from lower socioeconomic backgrounds, in enhancing their social status among peers and fostering a stronger belongingness, thereby promoting their overall mental wellbeing and success in their academic pursuits.
African American Language (AAL) is a marginalized variety of American English that has been under... more African American Language (AAL) is a marginalized variety of American English that has been understudied due to a lack of accessible data. This lack of data has made it difficult to research language in African American communities and has been shown to cause emerging technologies such as Automatic Speech Recognition (ASR) to perform worse for African American speakers. To address this gap, the Joel Buchanan Archive of African American Oral History (JBA) at the University of Florida is being compiled into a time-aligned and linguistically annotated corpus. Through Natural Language Processing (NLP) techniques, this project will automatically time-align spoken data with transcripts and automatically tag AAL features. Transcription and time-alignment challenges have arisen as we ensure accuracy in depicting AAL morphosyntactic and phonetic structure. Two linguistic studies illustrate how the African American Corpus from Oral Histories betters our understanding of this lesser-studied variety.
The relative weighting of f0 and vowel reduction in English spoken word recognition at the senten... more The relative weighting of f0 and vowel reduction in English spoken word recognition at the sentence level were investigated in one two-alternative forced-choice word identification experiment. In the experiment, an H* pitch-accented or a deaccented word fragment (e.g., AR- in the word archive) was presented at the end of a carrier sentence for identification. The results of the experiment revealed differences in the cue weighting of English lexical stress perception between native and non-native listeners. For native English listeners, vowel quality was a more prominent cue than f0, while native Mandarin Chinese listeners employed both vowel quality and f0 in a comparable fashion. These results suggested that (a) vowel reduction is superior to f0 in signaling initial stress in the words and (b) f0 facilitates the recognition of word initial stress, which is modulated by first language.
International Journal of Language \& Communication Disorders, 2023
Background
Non-word repetition (NWR) tests are an important way speech and language therapists ... more Background
Non-word repetition (NWR) tests are an important way speech and language therapists (SaLTs) assess language development. NWR tests are often scored whilst participants make their responses (i.e., in real time) in clinical and research reports (documented here via a secondary analysis of a published systematic review). Aims
The main aim was to determine the extent to which real-time coding of NWR stimuli at the whole-item level (as correct/incorrect) was predicted by models that had varying levels of detail provided from phonemic transcriptions using several linear mixed method (LMM) models. Methods & Procedures
Live scores and recordings of responses on the universal non-word repetition (UNWR) test were available for 146 children aged between 3 and 6 years where the sample included all children starting in five UK schools in one year or two consecutive years. Transcriptions were made of responses to two-syllable NWR stimuli for all children and these were checked for reliability within and between transcribers. Signal detection analysis showed that consonants were missed when judgments were made live. Statistical comparisons of the discrepancies between target stimuli and transcriptions of children's responses were then made and these were regressed against live score accuracy. Six LMM models (three normalized: 1a, 2a, 3a; and three non-normalized: 1b, 2b, 3b) were examined to identify which model(s) best captured the data variance. Errors on consonants for live scores were determined by comparison with the transcriptions in the following ways (the dependent variables for each pair of models): (1) consonants alone; (2) substitutions, deletions and insertions of consonants identified after automatic alignment of live and transcribed materials; and (3) as with (2) but where substitutions were coded further as place, manner and voicing errors. Outcomes & Results
The normalized model that coded consonants in non-words as ‘incorrect’ at the level of substitutions, deletions and insertions (2b) provided the best fit to the real-time coding responses in terms of marginal R2, Akaike's information criterion (AIC) and Bayesian information criterion (BIC) statistics. Conclusions & Implications
Errors that occur on consonants when non-word stimuli are scored in real time are characterized solely by the substitution, deletion and insertion measure. It is important to know that such errors arise when real-time judgments are made because NWR tasks are used to assess and diagnose several cognitive–linguistic impairments. One broader implication of the results is that future work could automate the analysis procedures to provide the required information objectively and quickly without having to transcribe data.
WHAT THIS PAPER ADDS What is already known on this subject
Children and patients with a wide range of cognitive and language difficulties are less accurate relative to controls when they attempt to repeat non-words. Responses to non-words are often scored as correct or incorrect at the time the test is conducted. Limited assessments of this scoring procedure have been conducted to date.
What this study adds to the existing knowledge
Live NWR scores made by 146 children were available and the accuracy of these judgements was assessed here against ones based on phonemic transcriptions. Signal detection analyses showed that live scoring missed consonant errors in children's responses. Further analyses, using linear mixed effect models, showed that live judgments led to consonant substitution, deletion and insertion errors.
What are the practical and clinical implications of this work?
Improved and practicable NWR scoring procedures are required to provide SaLTs with better indications about children's language development (typical and atypical) and for clinical assessments of older people. The procedures currently used miss substitutions, deletions and insertions. Hence, procedures are required that provide the information currently only available when materials are transcribed manually. The possibility of training automatic speech recognizers to provide this level of detail is raised.
Intoxication and pitch control in tonal and non-tonal language speakers, 2022
Alcohol intoxication is known to affect pitch variability in non-tonal languages. In this study, ... more Alcohol intoxication is known to affect pitch variability in non-tonal languages. In this study, intoxication’s effects on pitch were examined in tonal and non-tonal language speakers, in both their native language (L1; German, Korean, Mandarin) and nonnative language (L2; English). Intoxication significantly increased pitch variability in the German group (in L1 and L2), but not in the Korean or Mandarin groups (in L1 or L2), although there were individual differences. These results support the view that pitch control is related to the functional load of pitch and is an aspect of speech production that can be advantageously transferred across languages, overriding the expected effects of alcohol.
Laboratory Phonology: Journal of the Association for Laboratory Phonology, 2023
Artificial language learning research has become a popular tool to investigate universal mechanis... more Artificial language learning research has become a popular tool to investigate universal mechanisms in language learning. However, often it is unclear whether the found effects are due to learning, or due to artefacts of the native language or the artificial language, and whether findings in only one language will generalise to speakers of other languages. The present study offers a new approach to model the influence of both the L1 and the target artificial language on language learning. The idea is to control for linguistic factors of the artificial and the native language by incorporating measures of wordlikeness into the statistical analysis as covariates. To demonstrate the approach, we extend Linzen and Gallagher (2017)’s study on consonant identity pattern to evaluate whether speakers of German and Mandarin rapidly learn the pattern when influences of L1 and the artificial language are accounted for by incorporating measures assessed by analogical and discriminative learning models over the L1 and artificial lexicon. Results show that nonwords are more likely to be accepted as grammatical if they are more similar to the trained artificial lexicon and more different from the L1 and, crucially, the identity effect is still present. The proposed approach is helpful for designing cross-linguistic studies.
Spanish voiced stops /b, d, ɡ/ surfaced as fricatives [β, ð, ɣ] in intervocalic position due to a... more Spanish voiced stops /b, d, ɡ/ surfaced as fricatives [β, ð, ɣ] in intervocalic position due to a phonological process known as spirantization or, more broadly, lenition. However, conditioned by various factors such as stress, place of articulation, flanking vowel quality, and speaking rate, phonetic studies reveal a great deal of variation and gradience of these surface forms, ranging from fricative-like to approximant-like [βT, ðT, ɣT]. Several acoustic measurements have been used to quantify the degree of lenition, but none is standard. In this study, the posterior probabilities of sonorant and continuant phonological features in a corpus of Argentinian Spanish estimated by a deep learning Phonet model as measures of lenition were compared to traditional acoustic measurements of intensity, duration, and periodicity. When evaluated against known lenition factors: stress, place of articulation, surrounding vowel quality, word status, and speaking rate, the results show that sonorant and continuant posterior probabilities predict lenition patterns that are similar to those predicted by relative acoustic intensity measures and are in the direction expected by the effort-based view of lenition and previous findings. These results suggest that Phonet is a reliable alternative or additional approach to investigate the degree of lenition.
We review and elaborate an account of consonantal strength that is founded on the model of speech... more We review and elaborate an account of consonantal strength that is founded on the model of speech as a modulated carrier signal. The stronger the consonant, the greater the modulation. Unlike approaches based on sonority or articulatory aperture, the account offers a uniform definition of the phonetic effect lenition has on consonants: all types of lenition (such as debuccalisation, spirantisation, and vocalisation) reduce the extent to which a consonant modulates the carrier. To demonstrate the quantifiability of this account, we present an analysis of Ibibio, in which we investigate the effects of lenition on the amplitude, periodicity, and temporal properties of consonants. We propose a method for integrating these different acoustic dimensions within an overall measure of modulation size. Not only does the modulated-carrier account cover all the classically recognised lenition types, but it also encompasses loss of plosive release in final stops – which, although not traditionally classed as lenition, is clearly related to processes that are.
The Journal of the Acoustical Society of America, 2023
A deep learning Phonet model was evaluated as a method to measure lenition. Unlike quantitative a... more A deep learning Phonet model was evaluated as a method to measure lenition. Unlike quantitative acoustic methods, recurrent networks were trained to recognize the posterior probabilities of sonorant and continuant phonological features in a corpus of Argentinian Spanish. When applied to intervocalic and post-nasal voiced and voiceless stops, the approach yielded lenition patterns similar to those previously reported. Further, additional patterns also emerged. The results suggest the validity of the approach as an alternative or addition to quantitative acoustic measures of lenition.
The Journal of the Acoustical Society of America, 2018
The probability is one of the many factors which influence phonetic variation. Contextual probabi... more The probability is one of the many factors which influence phonetic variation. Contextual probability, which describes how predictable a linguistic unit is in some local environment, has been consistently shown to modulate the phonetic salience of words and other linguistic units in speech production (the probabilistic reduction effect). In this paper the question of whether the probabilistic reduction effect, as previously observed for majority languages like English, is also found in a language (Kaqchikel Mayan) which has relatively rich morphology is explored. Specifically, whether the contextual predictability of words and morphemes influences their phonetic duration in Kaqchikel is examined. It is found that the contextual predictability of a word has a significant effect on its duration. The effect is manifested differently for lexical words and function words. It is also found that the contextual predictability of certain prefixes in Kaqchikel affects their duration, showing ...
Procedures were designed to test for the effects of working-memory training on children at risk o... more Procedures were designed to test for the effects of working-memory training on children at risk of fluency difficulty that apply to English and to many of the languages spoken by children with English as an Additional Language (EAL) in UK schools. Working-memory training should: (1) improve speech fluency in high-risk children; (2) enhance non-word repetition (NWR) (phonological) skills for all children; (3) not affect word-finding abilities. Children starting general education (N = 232) were screened to identify those at risk of fluency difficulty. Children were selected who were at high-risk (12), or low-risk (27) of fluency difficulty. For the low-risk children 10 received, and 17 did not receive, the working-memory training. All children in the treatment groups received working-memory training over a 2-week period. For the high-risk group, fluency improved and lasted for at least a week after the end of the study. Phonological skills improved in this group and in the low-risk group who received the training and the improvements continued for at least a week. The low-risk group who did not receive working-memory training showed no improvements, and no group improved word-finding ability.
The average predictability (aka informativity) of a word in context has been shown to condition w... more The average predictability (aka informativity) of a word in context has been shown to condition word duration (Seyfarth, 2014). All else being equal, words that tend to occur in more predictable environments are shorter than words that tend to occur in less predictable environments. One account of the informativity effect on duration is that the acoustic details of probabilistic reduction are stored as part of a word's mental representation. Other research has argued that predictability effects are tied to prosodic structure in integral ways. With the aim of assessing a potential prosodic basis for informativity effects in speech production, this study extends past work in two directions; it investigated informativity effects in another large language, Mandarin Chinese, and broadened the study beyond word duration to additional acoustic dimensions, pitch and intensity, known to index prosodic prominence. The acoustic information of content words was extracted from a large telephone conversation speech corpus with over 400,000 tokens and 6000 word types spoken by 1655 individuals and analyzed for the effect of informativity using frequency statistics estimated from a 431 million word subtitle corpus. Results indicated that words with low informativity have shorter durations, replicating the effect found in English. In addition, informativity had significant effects on maximum pitch and intensity, two phonetic dimensions related to prosodic prominence. Extending this interpretation, these results suggest that predictability is closely linked to prosodic prominence, and that the lexical representation of a word includes phonetic details associated with its average prosodic prominence in discourse. In other words, the lexicon absorbs prosodic influences on speech production.
Autonomous technology has the potential to greatly benefit personal transportation, last-mile de... more Autonomous technology has the potential to greatly benefit personal transportation, last-mile delivery, logistics, and many other mobility applications. In many of these applications, the mobility infrastructure is a shared resource in which all the players must cooperate. In fact, the driving task has been described as a “tango” where we—as humans—cooperate to enable a robust transportation system. Can autonomous systems participate in this tango? Does that even make sense? This report will examine the current interaction points between humans and autonomous systems, the shortcomings of the current state of these systems with a particular focus on advanced driver assistance systems, the requirements for human machine interfaces as imposed by human perception, and finally, the progress being made to close the gap.
Proceedings of the Royal Society B: Biological Sciences, 2020
Classic linguistic theory ascribes language change and diversity to population migrations, conque... more Classic linguistic theory ascribes language change and diversity to population migrations, conquests, and geographical isolation, with the assumption that human populations have equivalent language processing abilities. We hypothesize that spectral and temporal characteristics make some consonant manners vulnerable to differences in temporal precision associated with specific population allele frequencies. To test this hypothesis, we modelled association between RU1-1 alleles of DCDC2 and manner of articulation in 51 populations spanning five continents, and adjusting for geographical proximity, and genetic and linguistic relatedness. RU1-1 alleles, acting through increased expression of DCDC2, appear to increase auditory processing precision that enhances stop-consonant discrimination, favouring retention in some populations and loss by others. These findings enhance classical linguistic theories by adding a genetic dimension, which until recently, has not been considered to be a significant catalyst for language change.
Proceedings of the 20th International Congress of Phonetic Sciences, Prague 2023, 2023
Alcohol intoxication facilitates inhibition of one's first language (L1) ego, which may lead to r... more Alcohol intoxication facilitates inhibition of one's first language (L1) ego, which may lead to reduced individual differences among second language (L2) speakers under intoxication. This study examined whether, compared to speaking while sober, speaking while intoxicated would reduce individual differences in the acoustic compactness of vowel categories in sequential bilinguals exemplifying diverse L1-L2 pairs (German-English, Korean-English). Vowel compactness in F 1 × F 2 space varied by language (German, Korean, English) and by vowel, and was generally lower in intoxicated compared to sober speech, both across languages and throughout a bilingual's language repertoire. Crucially, however, there was still a wide range in compactness under intoxication; furthermore, individuals with more compact vowels while sober also produced more compact vowels while intoxicated, in both L1 and L2. Taken together, these findings show patterned variability of vowel compactness, suggesting that articulatory precision is an individual-difference dimension that persists across speaking conditions and throughout the repertoire.
Proceedings of the 20th International Congress of Phonetic Sciences, Prague 2023, 2023
Alcohol is known to impair fine articulatory control and movements. In drunken speech, incomplete... more Alcohol is known to impair fine articulatory control and movements. In drunken speech, incomplete closure of the vocal tract can result in deaffrication of the English affricate sounds /tʃ/ and /ʤ/, spirantization (fricative-like production) of the stop consonants and palatalization (retraction of place of articulation) of the alveolar fricative /s/ (produced as /ʃ/). Such categorical segmental errors have been well-reported. This study employs a phonologically-informed neural network approach to estimate degrees of deaffrication of /tʃ/ and /ʤ/, spirantization of /t/ and /d/ and place retraction for /s/ in a corpus of intoxicated English speech. Recurrent neural networks were trained to recognize relevant phonological features [anterior], [continuant] and [strident] in a control speech corpus. Their posterior probabilities were computed over the segments produced under intoxication. The results obtained revealed both categorical and gradient errors and, thus, suggested that this new approach could reliably quantify fine-grained errors in intoxicated speech.
Proceedings of the 20th International Congress of Phonetic Sciences, Prague 2023, 2023
Naturally-occurring misperception can help establish the ecological validity of laboratory findin... more Naturally-occurring misperception can help establish the ecological validity of laboratory findings of speech perception and generate new hypotheses. In this study, we report on a corpus of misheard German sung speech which contains instances of misperception reported by individuals. We validated the corpus by examining segmental confusions, and word mis-segmentation. Approximately 1,000 segment confusions were found. Our naturalistic segment confusions were significantly correlated with acoustic distances (r = 0.559) and with speech-in-noise-induced confusions in an experimental study (vowel: r = 0.364; consonant: r = 0.210). Our mis-segmentation patterns only partially confirmed the rhythmic segmentation hypothesis and findings from previous studies. While boundaries inserted before strong syllables created content words following the preferred rhythmic properties of German, we find an unexpected amount of boundary deletion before strong syllables, resulting in nonce percepts which might reflect the expectation of listeners with neologisms in lyrics.
Alcohol, a progressive central nervous system depressant, has been found to negatively affect not... more Alcohol, a progressive central nervous system depressant, has been found to negatively affect not only cognitive functions but also the production of speech—a complex motor activity requiring a high degree of coordination. In this study, we estimate the degrees of deaffrication, spirantization, and retracted place of articulation for /t/, /d/, /s/, /ʃ /, /tʃ /, and /ʤ/ in a corpus of speech affected by alcohol. These estimations are based on posterior probabilities calculated by recurrent neural networks known as Phonet, which are trained to recognize anterior, continuant, and strident phonological features. The results obtained revealed both categorical and gradient errors in intoxicated speech, indicating the reliability of Phonet in quantifying fine-grained errors.
Proceedings of the Second Workshop on Language Technology for Equality, Diversity and Inclusion, 2022
Recent research has highlighted that natural language processing (NLP) systems exhibit a bias aga... more Recent research has highlighted that natural language processing (NLP) systems exhibit a bias against African American speakers. The bias errors are often caused by poor representation of linguistic features unique to African American English (AAE), due to the relatively low probability of occurrence of many such features in training data. We present a workflow to overcome such bias in the case of habitual "be". Habitual "be" is isomorphic, and therefore ambiguous, with other forms of "be" found in both AAE and other varieties of English. This creates a clear challenge for bias in NLP technologies. To overcome the scarcity, we employ a combination of rule-based filters and data augmentation that generate a corpus balanced between habitual and non-habitual instances. With this balanced corpus, we train unbiased machine learning classifiers, as demonstrated on a corpus of AAE transcribed texts, achieving .65 F 1 score disambiguating habitual "be".
A phonologically informed neural network approach, Phonet, was compared to acoustic measurements ... more A phonologically informed neural network approach, Phonet, was compared to acoustic measurements of intensity, duration and harmonicity in estimating lenition degree of voiced and voiceless stops in a corpus of Argentine Spanish. Recurrent neural networks were trained to recognize phonological features [sonorant] and [continuant]. Their posterior probabilities were computed over the target segments. Relative to most acoustic metrics, posterior probabilities of the two features are more consistent, and in the direction predicted by known factors of lenition: stress, voicing, place of articulation, surrounding vowel height, and speaking rate. The results suggest that Phonet could more reliably quantify lenition gradient than some acoustic metrics.
Proceedings of the Annual Meetings on Phonology, 2023
Recent work has shown that lexical items come to take on the phonetic characteristics of the pros... more Recent work has shown that lexical items come to take on the phonetic characteristics of the prosodic environments in which they are typically produced, a phenomenon referred to as “leaky prosody”. Focusing on pitch patterns in Mandarin, we show that leaky prosody can be derived from a flat (i.e., non-transformational, non-optimizing) model of speech production. Formalized using Dynamic Field Theory, in our model, lexical, phonological, and prosodic inputs each exert forces on a Dynamic Neural Field representing pitch. Notably, the forces exerted by these inputs reflect surface distributions in a large corpus of spontaneous speech. Our simulations showed that the flat model derives the short timescale effect of prosodic prominence on pitch production as well as the longer timescale effect of leaky prosody. By updating lexical items based on surface phonetic form, words that are consistently produced in high/low prosodic prominence positions take on the phonetic characteristics of those environments.
Proceedings of the 19th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology, 2022
This paper presents the submission by the HeiMorph team to the SIGMORPHON 2022 task 2 of Morpholo... more This paper presents the submission by the HeiMorph team to the SIGMORPHON 2022 task 2 of Morphological Acquisition Trajectories. Across all experimental conditions, we have found no evidence for the so-called Ushaped development trajectory. Our submitted systems achieve an average test accuracies of 55.5{\%} on Arabic, 67{\%} on German and 73.38{\%} on English. We found that, bigram hallucination provides better inferences only for English and Arabic and only when the number of hallucinations remains low.
Proceedings of the 20th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology, 2023
This paper presents our submission to the SIGMORPHON 2023 task 2 of Cognitively Plausible Morphop... more This paper presents our submission to the SIGMORPHON 2023 task 2 of Cognitively Plausible Morphophonological Generalization in Korean. We implemented both Linear Discriminative Learning and Transformer models and found that the Linear Discriminative Learning model trained on a combination of corpus and experimental data showed the best performance with the overall accuracy of around 83\%. We found that the best model must be trained on both corpus data and the experimental data of one particular participant. Our examination of speaker-variability and speaker-specific information did not explain why a particular participant combined well with the corpus data. We recommend Linear Discriminative Learning models as a future non-neural baseline system, owning to its training speed, accuracy, model interpretability and cognitive plausibility. In order to improve the model performance, we suggest using bigger data and/or performing data augmentation and incorporating speaker- and item-specifics considerably.
Recent research has highlighted that state-of-the-art automatic speech recognition (ASR) systems ... more Recent research has highlighted that state-of-the-art automatic speech recognition (ASR) systems exhibit a bias against African American speakers. In this research, we investigate the underlying causes of this racially based disparity in performance , focusing on a unique morpho-syntactic feature of African American English (AAE), namely habitual "be", an invariant form of "be" that encodes the habitual aspect. By looking at over 100 hours of spoken AAE, we evaluated two ASR systems-DeepSpeech and Google Cloud Speech-to examine how well habitual "be" and its surrounding contexts are inferred. While controlling for local language and acoustic factors such as the amount of context, noise, and speech rate, we found that habitual "be" and its surrounding words were more error prone than non-habitual "be" and its surrounding words. These findings hold both when the utterance containing "be" is processed in isolation and in conjunction with surrounding utterances within speaker turn. Our research highlights the need for equitable ASR systems to take into account dialectal differences beyond acoustic modeling.
The First Workshop on Speech Technologies for Code-Switching in Multilingual Communities, 2020
Forced alignment methods have recently seen great progress in the fields of acoustic-phonetics st... more Forced alignment methods have recently seen great progress in the fields of acoustic-phonetics studies of low-resource languages. Code-mixed speech however, presents complex challenges to forced-alignment techniques, because of the longer phonemic inventory of bilingual speakers, the nature of ac-cented speech, and the confounding interaction of two languages at a frame level. In this paper, we use the Montreal Forced Aligner to annotate the Phonetically Balanced Code-Mixed read-speech corpus (7.4 hours; 113 speakers) in 3 different training environments (code-mixed, Hindi and English). Additionally, we present an analysis of alignment errors using phonological and data-driven features using Random Forest and Linear mixed effects models. We find that contextual influence of neighbouring phonemes influences the error in alignment most significantly, when compared against any other features. Many of the alignment errors by phonological features can be explained by their acoustic distinctiveness. Additionally, the amount of training data by phone type also contributed to lowering their respective error rates.
Forced alignment, a technique for aligning segment-level annotations with audio recordings, is a ... more Forced alignment, a technique for aligning segment-level annotations with audio recordings, is a valuable tool for phonetic analysis. While forced alignment has great promise for phonetic fieldwork and language documentation, training a functional, custom forced alignment model requires at least several hours of accurately transcribed audio in the target language-something which is not always available in language documentation contexts. We explore a technique for model training which sidesteps this limitation by pooling smaller quantities of data from genetically-related languages to train a forced aligner. Using data from two Mayan languages, we show that this technique produces an effective forced alignment system even with relatively small amounts of data. We also discuss factors which affect the accuracy of training on mixed data sets of this type, and provide some recommendations about how to balance data from pooled languages.
Proceedings of Disfluencies in Spontaneous Speech, 8th workshop, 2017
Children who have word-finding difficulty can be identified by the pattern of disfluencies in the... more Children who have word-finding difficulty can be identified by the pattern of disfluencies in their spontaneous speech; in particular whole-word repetition of prior words often occurs when they cannot retrieve the subsequent word. Work is reviewed that shows whole-word repetitions can be used to identify children from diverse language backgrounds who have word-finding difficulty. The symptom-based identification procedure was validated using a non-word repetition task. Children who were identified as having word-finding difficulty were given phonological training that taught them features of English that they lacked (this depended on their language background).
This work documents the motivation and development of a subtitle-based corpus for Brazilian Portu... more This work documents the motivation and development of a subtitle-based corpus for Brazilian Portuguese, SUBTLEX-PT-BR, available at \url{http://crr.ugent.be/subtlex-pt-br/}. While the target language was Brazilian Portuguese, the methodology can be extended to any other languages with subtitles. A preliminary corpus comparison with a large conversational and written corpus was conducted to evaluate the validity of the corpus, and suggested that the subtitle corpus is more similar to the conversational than the written language. Future work on the methodology and the corpus itself is outlined. Its diverse use as a resource for linguistic research is discussed.
University College London Working Papers in Linguistics (UCLWPL), Dec 2013
In this paper, we address the unproductivity of irregular verbal "L"-patterns in Portuguese, Ital... more In this paper, we address the unproductivity of irregular verbal "L"-patterns in Portuguese, Italian and Spanish diachronically in a corpus linguistic study. Using openly available corpora, we answer two questions systematically: firstly whether the size of an active lexicon of a speaker/community remains constant, and secondly, whether the productivity of the regular verbal forms in the first conjugation -ar(e) increases over time and is a function of verb vocabulary size.
By running random sampling simulations on both large and small corpora from different sources for each language, we found a consistent increase, especially after 1750, in both verb vocabulary size and productivity of the regular verbal form -ar(e). The results suggested that productivity of the regular verbal form is likely to be caused by the increase in verb vocabulary size, and as more new verbs come into a language, they will most likely fall into the first conjugation. This increase in the ratio of new verbs being assigned to the first conjugation caused the irregular forms in the second and third conjugations -er(e) and -ir(e) to become less productive over time. Finally, we speculate that the 1750 shift across all corpora is possibly caused by the Industrial Revolution which started around 1760.
Forced alignment, a technique for aligning segmentlevel annotations with audio recordings, is a v... more Forced alignment, a technique for aligning segmentlevel annotations with audio recordings, is a valuable tool for phonetic analysis. While forced alignment has great promise for phonetic fieldwork and language documentation, training a functional, custom forced alignment model requires at least several hours of accurately transcribed audio in the target language—something which is not always available in language documentation contexts. We explore a technique for model training which sidesteps this limitation by pooling smaller quantities of data from genetically-related languages to train a forced aligner. Using data from two Mayan languages, we show that this technique produces an effective forced alignment system even with relatively small amounts of data. We also discuss factors which affect the accuracy of training on mixed data sets of this type, and provide some recommendations about how to balance data from pooled languages.
Rhythms of Speech and Language: Culture, Cognition, and the Brain, 2024
Book chapter to be published: Ratree Wayland, Kevin Tang & Rahul Sengupta. In press. Acquisition ... more Book chapter to be published: Ratree Wayland, Kevin Tang & Rahul Sengupta. In press. Acquisition of similar versus different speech rhythmic class. In Lars Meyer & Antje Strauss (eds.), Rhythms of Speech and Language: Culture, Cognition, and the Brain, Chapter 39. Cambridge University Press, Cambridge"
Does shared rhythmic class in L1 (English and German, vs French) facilitate L2 speech learning? The rhythmic patterns of native and German-accented English and French, and native, English- and French-accented German utterances from the Bonne Tempo corpus were analyzed using three rhythm metrics based on duration variability, amplitude envelope modulation frequency, and intensity variability. Results of stepwise discriminant function analyses revealed that the intensity-based approach yielded the highest classification accuracy followed by the frequency-based and the duration-based approach, respectively. More importantly, German- accented English utterances were more frequently classified as native English utterances than German-accented French as native French utterances were. In addition, a higher percentage of English-accented German utterances were classified as native German utterances than French-accented German by the duration- and the frequency-based metrics, but not by the intensity-based metrics. Overall, the results suggested facilitative effects of shared rhythmic class in L2 speech learning.
Uploads
Journal articles by Kevin Tang
Methods: Machine learning models were trained and evaluated on interval-based and event-based stuttered speech corpora. The models used acoustic and linguistic features extracted from the speech signal and the transcriptions generated by a state-of-the-art automatic speech recognition system.
Results: The results showed that event-based segmentation led to better ARS performance than interval-based segmentation, as measured by the area under the curve (AUC) of the receiver operating characteristic. The results suggest differences in the quality and quantity of the data because of segmentation method. The inclusion of linguistic features improved the detection of whole-word repetitions, but not other types of stutters.
Discussion: The findings suggest that event-based segmentation is more suitable for ARS than interval-based segmentation, as it preserves the exact boundaries and types of stutters. The linguistic features provide useful information for separating supra-lexical disfluencies from fluent speech but may not capture the acoustic characteristics of stutters. Future work should explore more robust and diverse features, as well as larger and more representative datasets, for developing effective ARS systems.
To ensure that speech-based biomarkers are providing accurate measurement and can serve as effective clinical tools for detecting and monitoring disease, speech features extracted and analyzed must be systematically and rigorously evaluated. Different machine learning architectures trained to classify different types of disordered speech must also be rigorously tested and systematically compared.
For speech measures, three categories of evaluation have been proposed: verification, analytical validation, and clinical validation. Verification includes assessing and comparing the quality of speech recordings across hardware and recording conditions. Analytical validation entails checking the accuracy and reliability of data processing and computed measures to ensure that they are accurately measuring the intended phenomena. Clinical validity involves verifying the correspondence of a measure to clinical diagnosis, disease severity/progression, and/or response to treatment outcomes.
For machine learning algorithms, analytical and clinical validation apply. For example, the accuracy of different algorithms can be compared in different clinical groups for different outcome measures.
This Research Topic aims at bringing together research on the effectiveness of speech-based as biomarkers for the clinical diagnosis or the evaluation of disease severity and prognosis from related disciplines including cognitive neurosciences, computer sciences, engineering, linguistics, speech, communication sciences, etc. We welcome original research or systematic reviews on any of the three categories of evaluation of the speech measures: verification, analytical validation, clinical validation as well as NLP tools used to model clinical detection, classification and evaluation of disease severity/progression and/or response to treatment outcomes.
Topics may include, but are not limited to:
• Automatic analysis of dysarthric speech (e.g. typical and atypical Parkinsonism, Huntington's disease, Multiple Sclerosis, Amyotrophic Lateral Sclerosis);
• Early detection and classification of Neurodevelopmental Disorders (e.g. Autism Spectrum Disorder (ASD), Speech Language Impairment (SLI), Attention Deficit and Hyperactivity Disorder (ADHD));
• Cognitive assessment and clinical phenotypization of (e.g., Alzheimer's disease, cognitive Impairment including substance-induced cognitive impairment, dementia);
• Mental illness screening and diagnosis (e.g., Post-traumatic Stress Disorder (PTSD), Depressive Disorder, anxiety disorder, Bipolar Disorder, Schizophrenia);
• Novel methods and tools used to collect speech samples for the assessment of neurological, cognitive and psychiatric disorders.
speakers. To address this gap, the Joel Buchanan Archive of African American Oral History (JBA) at the University of Florida is being compiled into a time-aligned and linguistically annotated corpus. Through Natural Language Processing (NLP) techniques, this project will automatically time-align spoken data with transcripts and automatically tag AAL features. Transcription and time-alignment challenges have arisen as we ensure accuracy in depicting AAL morphosyntactic and phonetic structure. Two linguistic studies illustrate how the African American Corpus from Oral Histories betters our understanding of this lesser-studied variety.
Non-word repetition (NWR) tests are an important way speech and language therapists (SaLTs) assess language development. NWR tests are often scored whilst participants make their responses (i.e., in real time) in clinical and research reports (documented here via a secondary analysis of a published systematic review).
Aims
The main aim was to determine the extent to which real-time coding of NWR stimuli at the whole-item level (as correct/incorrect) was predicted by models that had varying levels of detail provided from phonemic transcriptions using several linear mixed method (LMM) models.
Methods & Procedures
Live scores and recordings of responses on the universal non-word repetition (UNWR) test were available for 146 children aged between 3 and 6 years where the sample included all children starting in five UK schools in one year or two consecutive years. Transcriptions were made of responses to two-syllable NWR stimuli for all children and these were checked for reliability within and between transcribers. Signal detection analysis showed that consonants were missed when judgments were made live. Statistical comparisons of the discrepancies between target stimuli and transcriptions of children's responses were then made and these were regressed against live score accuracy. Six LMM models (three normalized: 1a, 2a, 3a; and three non-normalized: 1b, 2b, 3b) were examined to identify which model(s) best captured the data variance. Errors on consonants for live scores were determined by comparison with the transcriptions in the following ways (the dependent variables for each pair of models): (1) consonants alone; (2) substitutions, deletions and insertions of consonants identified after automatic alignment of live and transcribed materials; and (3) as with (2) but where substitutions were coded further as place, manner and voicing errors.
Outcomes & Results
The normalized model that coded consonants in non-words as ‘incorrect’ at the level of substitutions, deletions and insertions (2b) provided the best fit to the real-time coding responses in terms of marginal R2, Akaike's information criterion (AIC) and Bayesian information criterion (BIC) statistics.
Conclusions & Implications
Errors that occur on consonants when non-word stimuli are scored in real time are characterized solely by the substitution, deletion and insertion measure. It is important to know that such errors arise when real-time judgments are made because NWR tasks are used to assess and diagnose several cognitive–linguistic impairments. One broader implication of the results is that future work could automate the analysis procedures to provide the required information objectively and quickly without having to transcribe data.
WHAT THIS PAPER ADDS
What is already known on this subject
Children and patients with a wide range of cognitive and language difficulties are less accurate relative to controls when they attempt to repeat non-words. Responses to non-words are often scored as correct or incorrect at the time the test is conducted. Limited assessments of this scoring procedure have been conducted to date.
What this study adds to the existing knowledge
Live NWR scores made by 146 children were available and the accuracy of these judgements was assessed here against ones based on phonemic transcriptions. Signal detection analyses showed that live scoring missed consonant errors in children's responses. Further analyses, using linear mixed effect models, showed that live judgments led to consonant substitution, deletion and insertion errors.
What are the practical and clinical implications of this work?
Improved and practicable NWR scoring procedures are required to provide SaLTs with better indications about children's language development (typical and atypical) and for clinical assessments of older people. The procedures currently used miss substitutions, deletions and insertions. Hence, procedures are required that provide the information currently only available when materials are transcribed manually. The possibility of training automatic speech recognizers to provide this level of detail is raised.
Methods: Machine learning models were trained and evaluated on interval-based and event-based stuttered speech corpora. The models used acoustic and linguistic features extracted from the speech signal and the transcriptions generated by a state-of-the-art automatic speech recognition system.
Results: The results showed that event-based segmentation led to better ARS performance than interval-based segmentation, as measured by the area under the curve (AUC) of the receiver operating characteristic. The results suggest differences in the quality and quantity of the data because of segmentation method. The inclusion of linguistic features improved the detection of whole-word repetitions, but not other types of stutters.
Discussion: The findings suggest that event-based segmentation is more suitable for ARS than interval-based segmentation, as it preserves the exact boundaries and types of stutters. The linguistic features provide useful information for separating supra-lexical disfluencies from fluent speech but may not capture the acoustic characteristics of stutters. Future work should explore more robust and diverse features, as well as larger and more representative datasets, for developing effective ARS systems.
To ensure that speech-based biomarkers are providing accurate measurement and can serve as effective clinical tools for detecting and monitoring disease, speech features extracted and analyzed must be systematically and rigorously evaluated. Different machine learning architectures trained to classify different types of disordered speech must also be rigorously tested and systematically compared.
For speech measures, three categories of evaluation have been proposed: verification, analytical validation, and clinical validation. Verification includes assessing and comparing the quality of speech recordings across hardware and recording conditions. Analytical validation entails checking the accuracy and reliability of data processing and computed measures to ensure that they are accurately measuring the intended phenomena. Clinical validity involves verifying the correspondence of a measure to clinical diagnosis, disease severity/progression, and/or response to treatment outcomes.
For machine learning algorithms, analytical and clinical validation apply. For example, the accuracy of different algorithms can be compared in different clinical groups for different outcome measures.
This Research Topic aims at bringing together research on the effectiveness of speech-based as biomarkers for the clinical diagnosis or the evaluation of disease severity and prognosis from related disciplines including cognitive neurosciences, computer sciences, engineering, linguistics, speech, communication sciences, etc. We welcome original research or systematic reviews on any of the three categories of evaluation of the speech measures: verification, analytical validation, clinical validation as well as NLP tools used to model clinical detection, classification and evaluation of disease severity/progression and/or response to treatment outcomes.
Topics may include, but are not limited to:
• Automatic analysis of dysarthric speech (e.g. typical and atypical Parkinsonism, Huntington's disease, Multiple Sclerosis, Amyotrophic Lateral Sclerosis);
• Early detection and classification of Neurodevelopmental Disorders (e.g. Autism Spectrum Disorder (ASD), Speech Language Impairment (SLI), Attention Deficit and Hyperactivity Disorder (ADHD));
• Cognitive assessment and clinical phenotypization of (e.g., Alzheimer's disease, cognitive Impairment including substance-induced cognitive impairment, dementia);
• Mental illness screening and diagnosis (e.g., Post-traumatic Stress Disorder (PTSD), Depressive Disorder, anxiety disorder, Bipolar Disorder, Schizophrenia);
• Novel methods and tools used to collect speech samples for the assessment of neurological, cognitive and psychiatric disorders.
speakers. To address this gap, the Joel Buchanan Archive of African American Oral History (JBA) at the University of Florida is being compiled into a time-aligned and linguistically annotated corpus. Through Natural Language Processing (NLP) techniques, this project will automatically time-align spoken data with transcripts and automatically tag AAL features. Transcription and time-alignment challenges have arisen as we ensure accuracy in depicting AAL morphosyntactic and phonetic structure. Two linguistic studies illustrate how the African American Corpus from Oral Histories betters our understanding of this lesser-studied variety.
Non-word repetition (NWR) tests are an important way speech and language therapists (SaLTs) assess language development. NWR tests are often scored whilst participants make their responses (i.e., in real time) in clinical and research reports (documented here via a secondary analysis of a published systematic review).
Aims
The main aim was to determine the extent to which real-time coding of NWR stimuli at the whole-item level (as correct/incorrect) was predicted by models that had varying levels of detail provided from phonemic transcriptions using several linear mixed method (LMM) models.
Methods & Procedures
Live scores and recordings of responses on the universal non-word repetition (UNWR) test were available for 146 children aged between 3 and 6 years where the sample included all children starting in five UK schools in one year or two consecutive years. Transcriptions were made of responses to two-syllable NWR stimuli for all children and these were checked for reliability within and between transcribers. Signal detection analysis showed that consonants were missed when judgments were made live. Statistical comparisons of the discrepancies between target stimuli and transcriptions of children's responses were then made and these were regressed against live score accuracy. Six LMM models (three normalized: 1a, 2a, 3a; and three non-normalized: 1b, 2b, 3b) were examined to identify which model(s) best captured the data variance. Errors on consonants for live scores were determined by comparison with the transcriptions in the following ways (the dependent variables for each pair of models): (1) consonants alone; (2) substitutions, deletions and insertions of consonants identified after automatic alignment of live and transcribed materials; and (3) as with (2) but where substitutions were coded further as place, manner and voicing errors.
Outcomes & Results
The normalized model that coded consonants in non-words as ‘incorrect’ at the level of substitutions, deletions and insertions (2b) provided the best fit to the real-time coding responses in terms of marginal R2, Akaike's information criterion (AIC) and Bayesian information criterion (BIC) statistics.
Conclusions & Implications
Errors that occur on consonants when non-word stimuli are scored in real time are characterized solely by the substitution, deletion and insertion measure. It is important to know that such errors arise when real-time judgments are made because NWR tasks are used to assess and diagnose several cognitive–linguistic impairments. One broader implication of the results is that future work could automate the analysis procedures to provide the required information objectively and quickly without having to transcribe data.
WHAT THIS PAPER ADDS
What is already known on this subject
Children and patients with a wide range of cognitive and language difficulties are less accurate relative to controls when they attempt to repeat non-words. Responses to non-words are often scored as correct or incorrect at the time the test is conducted. Limited assessments of this scoring procedure have been conducted to date.
What this study adds to the existing knowledge
Live NWR scores made by 146 children were available and the accuracy of these judgements was assessed here against ones based on phonemic transcriptions. Signal detection analyses showed that live scoring missed consonant errors in children's responses. Further analyses, using linear mixed effect models, showed that live judgments led to consonant substitution, deletion and insertion errors.
What are the practical and clinical implications of this work?
Improved and practicable NWR scoring procedures are required to provide SaLTs with better indications about children's language development (typical and atypical) and for clinical assessments of older people. The procedures currently used miss substitutions, deletions and insertions. Hence, procedures are required that provide the information currently only available when materials are transcribed manually. The possibility of training automatic speech recognizers to provide this level of detail is raised.
By running random sampling simulations on both large and small corpora from different sources for each language, we found a consistent increase, especially after 1750, in both verb vocabulary size and productivity of the regular verbal form -ar(e). The results suggested that productivity of the regular verbal form is likely to be caused by the increase in verb vocabulary size, and as more new verbs come into a language, they will most likely fall into the first conjugation. This increase in the ratio of new verbs being assigned to the first conjugation caused the irregular forms in the second and third conjugations -er(e) and -ir(e) to become less productive over time. Finally, we speculate that the 1750 shift across all corpora is possibly caused by the Industrial Revolution which started around 1760.
Does shared rhythmic class in L1 (English and German, vs French) facilitate L2 speech learning? The rhythmic patterns of native and German-accented English and French, and native, English- and French-accented German utterances from the Bonne Tempo corpus were analyzed using three rhythm metrics based on duration variability, amplitude envelope modulation frequency, and intensity variability. Results of stepwise discriminant function analyses revealed that the intensity-based approach yielded the highest classification accuracy followed by the frequency-based and the duration-based approach, respectively. More importantly, German- accented English utterances were more frequently classified as native English utterances than German-accented French as native French utterances were. In addition, a higher percentage of English-accented German utterances were classified as native German utterances than French-accented German by the duration- and the frequency-based metrics, but not by the intensity-based metrics. Overall, the results suggested facilitative effects of shared rhythmic class in L2 speech learning.