Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content
Highlights • NUVA automatically assesses online word naming attempts in aphasia therapy.• Significantly more accurate and faster than leading commercial speech recognition.• Accuracies between 83.6% and 93.6% validate use in clinical... more
Highlights • NUVA automatically assesses online word naming attempts in aphasia therapy.• Significantly more accurate and faster than leading commercial speech recognition.• Accuracies between 83.6% and 93.6% validate use in clinical research.
A signal processing approach combining beamforming with mask-informed speech enhancement was assessed by measuring sentence recognition in listeners with mild-to-moderate hearing impairment in adverse listening conditions that simulated... more
A signal processing approach combining beamforming with mask-informed speech enhancement was assessed by measuring sentence recognition in listeners with mild-to-moderate hearing impairment in adverse listening conditions that simulated the output of behind-the-ear hearing aids in a noisy classroom. Two types of beamforming were compared: binaural, with the two microphones of each aid treated as a single array, and bilateral, where independent left and right beamformers were derived. Binaural beamforming produces a narrower beam, maximising improvement in signal-to-noise ratio (SNR), but eliminates the spatial diversity that is preserved in bilateral beamforming. Each beamformer type was optimised for the true target position and implemented with and without additional speech enhancement in which spectral features extracted from the beamformer output were passed to a deep neural network trained to identify time-frequency regions dominated by target speech. Additional conditions comprising binaural beamforming combined with speech enhancement implemented using Wiener filtering or modulation-domain Kalman filtering were tested in normally-hearing (NH) listeners. Both beamformer types gave substantial improvements relative to no processing, with significantly greater benefit for binaural beamforming. Performance with additional mask-informed enhancement was poorer than with beamforming alone, for both beamformer types and both listener groups. In NH listeners the addition of mask-informed enhancement produced significantly poorer performance than both other forms of enhancement, neither of which differed from the beamformer alone. In summary, the additional improvement in SNR provided by binaural beamforming appeared to outweigh loss of spatial information, while speech understanding was not further improved by the mask-informed enhancement method implemented here.
Page 1. Investigating the Syntactic Characteristics of English Tone Units Alex Chengyu Fang, Jill House, and Mark Huckvale alex,jill,mark@phon.ucl.ac.uk Department of Phonetics and Linguistics, University College London Gower Street WC1E... more
Page 1. Investigating the Syntactic Characteristics of English Tone Units Alex Chengyu Fang, Jill House, and Mark Huckvale alex,jill,mark@phon.ucl.ac.uk Department of Phonetics and Linguistics, University College London Gower Street WC1E 6BT, London, England ...
This paper outlines ProSynth, an approach to speech synthesis which takes a rich linguistic structure as central to the generation of natural-sounding speech. We start from the assumption that the speech signal is informationally rich,... more
This paper outlines ProSynth, an approach to speech synthesis which takes a rich linguistic structure as central to the generation of natural-sounding speech. We start from the assumption that the speech signal is informationally rich, and that this acoustic richness reflects linguistic structural richness and underlies the percept of naturalness. Naturalness achieved by structural richness produces a perceptually robust signal
From the Publisher: Current work in speech synthesis is in an interesting double position. At the same time as increasingly natural-sounding speech synthesis, systems are being implemented for many of the world's languages on the... more
From the Publisher: Current work in speech synthesis is in an interesting double position. At the same time as increasingly natural-sounding speech synthesis, systems are being implemented for many of the world's languages on the basis of existing, increasingly well understood concatentative technology. This technology entails some inherent limitations, and research on further improvements is still proceeding rapidly. This volume is an accumulation of studies emanating from COST 258, a European Action concerned with the issues in the improvement of speech synthesis.*Addresses the issues involved in improving the output quality when producing speech synthesis for world languages*Written by leading European researchers throughout academia and industry*Presents research results of COST 258*Areas covered include Improving Signal Generation, Improving Prosody and Tonal Quality and Improving and Supporting the Modelling ProcessResearchers and Engineers in speech synthesis, telecommunications and computer science will find the thorough coverage of this leading edge topic to be a valuable reference resource, and this text would also be of interest to academics and postgraduate students in the presentation of the most up-to-date research in this field.
This article describes investigations into the use of phonologically-constrained morphological analysis (PCMA) in language modelling for continuous speech recognition. PCMA provides a means for modelling text as a sequence of morphemes in... more
This article describes investigations into the use of phonologically-constrained morphological analysis (PCMA) in language modelling for continuous speech recognition. PCMA provides a means for modelling text as a sequence of morphemes in a way that retains compatibility with the linear concatenative model of pronunciation used in conventional decoders. Experiments were performed in English exploiting the 100-million-word British National Corpus as
To improve the quality of noisy speech recordings, sound engineers have at their disposal a variety of signal processing techniques. These techniques often have a wide range of parameters which need to be adjusted to obtain optimal... more
To improve the quality of noisy speech recordings, sound engineers have at their disposal a variety of signal processing techniques. These techniques often have a wide range of parameters which need to be adjusted to obtain optimal processing results. This paper investigates the difficulty of finding the best parameter settings for a commercial noisereduction system. In a first experiment, operators adjusted the settings of a particular system while attempting to maximise the intelligibility of speech corrupted with babble noise at different signal-to-noise ratios. Their preferences were then evaluated in a listening experiment showing that their chosen settings actually reduced intelligibility compared to the original signal. In another experiment a range of parameter settings for the same system were evaluated using both listeners and an intelligibility model based on a speech envelope distortion measure. Although the measure is imperfect, it is still able to predict optimal parameter settings better than the human operators.
ProSynth uses a hierarchical prosodic structure (implemented in XML) as its core linguistic representation. To model intonation we map template representations of F0 contours onto this structure. The template for a particular pitch... more
ProSynth uses a hierarchical prosodic structure (implemented in XML) as its core linguistic representation. To model intonation we map template representations of F0 contours onto this structure. The template for a particular pitch pattern is derived from analysis of a labelled speech database. For a falling nuclear pitch accent this template has three turning points: two define the F0 peak and one marks the end of the F0 fall. Statistical analysis confirmed that the alignment and shape of the template are sensitive to the properties of the structure and also provided quantitative values for F0 synthesis. Our results suggest that phonetic interpretation of the nuclear pitch accent is best related to the accented Foot rather than to the accented syllable. In determining parameter values for synthesis, we conclude that F0 information should be integrated with temporal and segmental information.
Background AVATAR therapy is a novel intervention targeting distressing auditory verbal hallucinations (henceforth ‘voices’). A digital simulation (avatar) of the voice is created and used in a three-way dialogue between participant,... more
Background AVATAR therapy is a novel intervention targeting distressing auditory verbal hallucinations (henceforth ‘voices’). A digital simulation (avatar) of the voice is created and used in a three-way dialogue between participant, avatar and therapist. To date, therapy has been delivered over 6 sessions, comprising an initial phase, focusing on standing up to a hostile avatar, and a second phase in which the avatar concedes and focus shifts to individualised treatment targets, including beliefs about voices. The first fully powered randomised trial found AVATAR therapy resulted in a rapid and substantial fall in voice frequency and associated distress that was superior to supportive counselling at 12 weeks. The main objective of this AVATAR2 trial is to test the efficacy of two forms of AVATAR therapy in reducing voice-related distress: AVATAR-brief (standardised focus on exposure, assertiveness and self-esteem) and AVATAR-extended (phase 1 mirroring AVATAR-brief augmented by a f...
In moderate levels of noise, listeners report that noise reduction (NR) processing can improve the perceived quality of a speech signal as measured on a typical MOS rating scale. Most quantitative experiments of intelligibility, however,... more
In moderate levels of noise, listeners report that noise reduction (NR) processing can improve the perceived quality of a speech signal as measured on a typical MOS rating scale. Most quantitative experiments of intelligibility, however, show that NR reduces the intelligibility of noisy speech signals, and so should be expected to increase the cognitive effort required to process utterances. To study cognitive effort we look at how NR affects reaction times to speech in noise, using material that is still highly intelligible. We show that adding noise increases reaction times and that NR does not restore reaction times back to the quiet condition. The implication is that NR does not make speech "easier" to process, at least as far as this task is concerned.
Background AVATAR therapy is an innovative therapy designed to support people with distressing voices. Voice hearers co-create a digital representation of their voice and engage in dialogue with it. Although it has been successfully... more
Background AVATAR therapy is an innovative therapy designed to support people with distressing voices. Voice hearers co-create a digital representation of their voice and engage in dialogue with it. Although it has been successfully tested in a powered randomised controlled trial (ISRCTN65314790), the participants’ experience of this therapy has not been yet evaluated. We aimed to explore enablers and barriers to engagement with the therapy and potential for real-world impact on distressing voices. Methods Thirty per cent of those who completed AVATAR therapy (15 people in total) and 5 who dropped out from therapy within the main AVATAR RCT were invited to participate in a semi-structured interview, which was audio-recorded and subsequently transcribed. Results Fourteen therapy completers (28% of the full sample) and one person who dropped out of therapy after 1 active session, were interviewed. Thematic analysis was used to explore the interviews. A total of 1276 references were co...
This paper presents the development of a compact vocabulary for describing the audible characteristics of degraded speech. An experiment was conducted with 51 English-speaking subjects who were tasked with assigning one of a list of given... more
This paper presents the development of a compact vocabulary for describing the audible characteristics of degraded speech. An experiment was conducted with 51 English-speaking subjects who were tasked with assigning one of a list of given text descriptors to 220 degradation conditions. Exploratory data analysis using hierarchical clustering resulted in a compact vocabulary of 10 classes, which was further validated by a bootstrap cluster analysis.
Speech recordings obtained in the context of law enforce-ment are often degraded in terms of quality and intelligib-ility. Several techniques for assessing the impact of speech enhancement algorithms on quality are available, both... more
Speech recordings obtained in the context of law enforce-ment are often degraded in terms of quality and intelligib-ility. Several techniques for assessing the impact of speech enhancement algorithms on quality are available, both in-trusive and nonintrusive, but the assessment ...
We propose a data driven, non-intrusive method for speech intelligibility estimation. We begin with a large set of speech signal specific features and use a dimensionality reduction approach based on correlation and principal component... more
We propose a data driven, non-intrusive method for speech intelligibility estimation. We begin with a large set of speech signal specific features and use a dimensionality reduction approach based on correlation and principal component analysis to find the most relevant features for intelligibility prediction. These are then used to train a Gaussian mixture model from which the intelligibility of unseen data is inferred. Experimental results show that our method gives a correlation with subjective intelligibility of 0.92 and a correlation of 0.96 with the ANSI standard Speech Intelligibility Index.
Purpose In this study, the authors investigated how well experts can adjust the settings of a commercial noise-reduction system to optimize the intelligibility for naive normal-hearing listeners. Method In Experiment 1, 5 experts adjusted... more
Purpose In this study, the authors investigated how well experts can adjust the settings of a commercial noise-reduction system to optimize the intelligibility for naive normal-hearing listeners. Method In Experiment 1, 5 experts adjusted parameters for a noise-reduction system while aiming to optimize intelligibility. The stimuli consisted of speech presented in car-cabin noise or babble at 5 different signal-to-noise ratios (SNRs). In Experiment 2, the effects of processing with these settings were measured with 10 listeners undertaking an intelligibility test. In Experiment 3, the intelligibility of a broad range of settings was investigated with another 10 listeners to determine whether the experts' chosen settings could have been improved. Results Low Cronbach's alphas indicated that parameter settings varied considerably within and across experts. For very low SNRs, mean proposed settings differed from those for higher SNRs. The different settings had no significant ef...

And 244 more