Thomas Hain

Vocal Tract Length Normalisation (VTLN) is a commonly used technique to normalise for inter-speaker variability. It is based on the speaker-specific warping of the frequency axis, param- eterised by a scalar warp factor. This factor is... more

Vocal Tract Length Normalisation (VTLN) is a commonly used technique to normalise for inter-speaker variability. It is based on the speaker-specific warping of the frequency axis, param- eterised by a scalar warp factor. This factor is typically esti- mated using maximum likelihood. We discuss how VTLN may be applied to multiparty conversations, reporting a substantial decrease in word error rate in experiments using the ICSI meet- ings corpus. We investigate the behaviour of the VTLN warping factor and show that a stable estimate is not obtained. Instead it appears to be influenced by the context of the meeting, in par- ticular the current conversational partner. These results are con- sistent with predictions made by the psycholinguistic interactive alignment account of dialogue, when applied at the acoustic and phonological levels.

Research Interests:
Maximum Likelihood and Vocal Tract Length

Download (.pdf)

ABSTRACT This paper presents a new approach for rapid adaptation in the presence of highly diverse scenarios that takes advantage of information describing the input signals. We introduce a new method for joint factorisation of the... more

ABSTRACT This paper presents a new approach for rapid adaptation in the presence of highly diverse scenarios that takes advantage of information describing the input signals. We introduce a new method for joint factorisation of the background and the speaker in an eigenspace MLLR framework: Joint Factor Eigenspace MLLR (JFEMLLR). We further propose to use contextual information describing the speaker and background, such as tags or more complex metadata, to provide an immediate estimation of the best MLLR transformation for the utterance. This provides instant adaptation, since it does not require any transcription from a previous decoding stage. Evaluation in a highly diverse Automatic Speech Recognition (ASR) task, a modified version of WSJCAM0, yields an improvement of 26.9% over the baseline, which is an extra 1.2% reduction over two-pass MLLR adaptation.

Publication Date: 2014

Publication Name: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

ABSTRACT Large corpora of transcribed speech are rare and expensive to acquire, but valuable for ASR systems. Of current research interest are corpora of natural speech, i.e. far-field recordings of multiple speakers in noisy... more

ABSTRACT Large corpora of transcribed speech are rare and expensive to acquire, but valuable for ASR systems. Of current research interest are corpora of natural speech, i.e. far-field recordings of multiple speakers in noisy environments. In the big data era there are many speech transcriptions collected for purposes other than ASR, which omit features required by typical ASR systems such as timing information. If we could recover training data from such &#39;found&#39; corpora this would open up large new resources for ASR research. We present a case study for this type of data recovery - becoming known as &#39;lightly supervised learning&#39; - for a highly damaged corpus called Family Life. We use a novel comparison of a parallel decode and forced audio alignment to iteratively select and grow good data. Family Life also has unusual data mislabelling problems which can be addressed by an integrated tfidf approach. These methods reduce WER on the corpus from 83.0 to 57.2. We also discuss a probabilistic loose string alignment approach which removes untranscribed &#39;icebreaker&#39; speech.

Publication Date: 2013

Publication Name: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing

Publication Date: 2012

Publication Name: 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Download (.pdf)

ABSTRACT This paper was presented at the First Workshop on Speech, Language and Audio in Multimedia, August 22-23, 2013; Marseille. It was published in CEUR Workshop Proceedings at http://ceur-ws.org/Vol-1012/.

The Minimum Bayes Risk (MBR) framework has been a successful strategy for the training of hidden Markov models for large vocabulary speech recognition. Practical implementations of MBR must select an appropriate hypothesis space and loss... more

The Minimum Bayes Risk (MBR) framework has been a successful strategy for the training of hidden Markov models for large vocabulary speech recognition. Practical implementations of MBR must select an appropriate hypothesis space and loss function. The set of word sequences and a word-based Levenshtein distance may be assumed to be the optimal choice but use of phoneme-based criteria appears

Publication Date: 2006

Research Interests:
Speech Recognition, Practical Reasoning, hidden Markov model, Levenshtein distance, Sampling Technique, and Tagucghi Loss Function

Download (.pdf)

Publication Date: 2014

Publication Name: Computer Speech & Language

Research Interests:
Cognitive Science

Download (.pdf)

Publication Date: 2006

Publication Date: 2005

Publication Name: Speech Communication

Research Interests:
Cognitive Science, Linguistics, and Speech Communication

Download (.pdf)

Publication Date: 2000

Publication Name: IEEE Transactions on Audio, Speech, and Language Processing

Research Interests:
Engineering and Parameter estimation

Download (.pdf)

Publication Date: 2000

Publication Name: IEEE Transactions on Audio, Speech, and Language Processing

Research Interests:
Engineering

Download (.pdf)

Publication Date: 2011

Publication Date: 2009

Publication Date: 2007

Download (.pdf)

Publication Date: 2011

Download (.pdf)

Publication Date: 2006

Research Interests:
Speech Recognition, Practical Reasoning, hidden Markov model, Levenshtein distance, Sampling Technique, and Tagucghi Loss Function

Download (.pdf)

Publication Date: 2015

Publication Name: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Publication Date: 2000

Publication Name: IEEE Transactions on Audio, Speech and Language Processing

Research Interests: Engineering<div>()</div>

Publication Date: 2015

Publication Name: Lecture Notes in Computer Science

Research Interests: Automatic Speech Recognition<div>()</div>

Publication Date: 2015

Publication Name: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Publication Date: 1994

Publication Name: Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing

Research Interests: Maximum Likelihood and Vocal Tract Length<div>()</div>

Publication Date: 2014

Publication Name: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Publication Date: 2013

Publication Name: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing

Publication Date: 2012

Publication Name: 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Publication Date: 2006

Research Interests: Speech Recognition, Practical Reasoning, hidden Markov model, Levenshtein distance, Sampling Technique, and Tagucghi Loss Function<div>()</div>

Publication Date: 2014

Publication Name: Computer Speech & Language

Research Interests: Cognitive Science<div>()</div>

Publication Date: 2006

Publication Date: 2005

Publication Name: Speech Communication

Research Interests: Cognitive Science, Linguistics, and Speech Communication<div>()</div>

Publication Date: 2000

Publication Name: IEEE Transactions on Audio, Speech, and Language Processing

Research Interests: Engineering and Parameter estimation<div>()</div>

Publication Date: 2000

Publication Name: IEEE Transactions on Audio, Speech, and Language Processing

Research Interests: Engineering<div>()</div>

Publication Date: 2011

Research Interests: Conference<div>()</div>

Publication Date: 2012

Publication Date: 2012

Publication Date: 2011

Publication Date: 2009

Publication Date: 2007

Publication Date: 2011

Publication Date: 2006

Research Interests: Speech Recognition, Practical Reasoning, hidden Markov model, Levenshtein distance, Sampling Technique, and Tagucghi Loss Function<div>()</div>

Log In

Research Interests:
Engineering

Research Interests:
Automatic Speech Recognition

Research Interests:
Maximum Likelihood and Vocal Tract Length

Research Interests:
Speech Recognition, Practical Reasoning, hidden Markov model, Levenshtein distance, Sampling Technique, and Tagucghi Loss Function

Research Interests:
Cognitive Science

Research Interests:
Cognitive Science, Linguistics, and Speech Communication

Research Interests:
Engineering and Parameter estimation

Research Interests:
Engineering

Research Interests:
Conference

Research Interests:
Speech Recognition, Practical Reasoning, hidden Markov model, Levenshtein distance, Sampling Technique, and Tagucghi Loss Function