Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content
Two important components of a speech archiving system are the compression scheme and the search facility. We investigate two ways of providing these components. The first is to run the recogniser directly from the compressed speech - we... more
Two important components of a speech archiving system are the compression scheme and the search facility. We investigate two ways of providing these components. The first is to run the recogniser directly from the compressed speech - we show how even with a 2.4kbit/sec codec it is possible to produce good recognition results; but the search is slow. The second is to preprocess the speech and store the extra data in a compressed form along with the speech. In the case of an RNN-HMM hybrid system, the posterior probabilties provide a suitable intermediate data format. Vector quantizing these at just 625 bits/sec enables the search to run many times real-time and still maintain good recognition accuracy.
This paper investigates the scaling properties of Recurrent Neural Network Language Models (RNNLMs). We discuss how to train very large RNNs on GPUs and address the questions of how RNNLMs scale with respect to model size, training-set... more
This paper investigates the scaling properties of Recurrent Neural Network Language Models (RNNLMs). We discuss how to train very large RNNs on GPUs and address the questions of how RNNLMs scale with respect to model size, training-set size, computational costs and memory. Our analysis shows that despite being more costly to train, RNNLMs obtain much lower perplexities on standard benchmarks than n-gram models. We train the largest known RNNs and present relative word error rates gains of 18% on an ASR task. We also present the new lowest perplexities on the recently released billion word language modelling benchmark, 1 BLEU point gain on machine translation and a 17% relative hit rate gain in word prediction.
This chapter contains sections titled: Objectives of the operatorHydraulic ground-crop sprayersRotary atomizersGranule applicatorsBand sprayersKnapsack sprayersSprayer faultsErrors in applying herbicidesHerbicide... more
This chapter contains sections titled: Objectives of the operatorHydraulic ground-crop sprayersRotary atomizersGranule applicatorsBand sprayersKnapsack sprayersSprayer faultsErrors in applying herbicidesHerbicide driftDecontaminationsprayers and disposal of waste materialStorage of herbicidesReferences and further readinObjectives of the operatorHydraulic ground-crop sprayersRotary atomizersGranule applicatorsBand sprayersKnapsack sprayersSprayer faultsErrors in applying herbicidesHerbicide driftDecontaminationsprayers and disposal of waste materialStorage of herbicidesReferences and further readin
... DOLE [SIL] WHO ANNOUNCED HE IS RESIGNING FROM THE SENATE TO DEVOTE FULL TIME TO HIS QUEST FOR THE WHITE HOUSE [SIL ... Gethin Williams and Steve Renals at the Univer-sity of Sheffield report using acoustic-based confidence measures... more
... DOLE [SIL] WHO ANNOUNCED HE IS RESIGNING FROM THE SENATE TO DEVOTE FULL TIME TO HIS QUEST FOR THE WHITE HOUSE [SIL ... Gethin Williams and Steve Renals at the Univer-sity of Sheffield report using acoustic-based confidence measures derived from the ...
Research Interests:
Research Interests:
ABSTRACT ABBOT is the hybrid connectionist-hidden Markov model largevocabulary speech recognition system developed at Cambridge University. In this system, a recurrent network maps each acoustic vector to an estimate of the posterior... more
ABSTRACT ABBOT is the hybrid connectionist-hidden Markov model largevocabulary speech recognition system developed at Cambridge University. In this system, a recurrent network maps each acoustic vector to an estimate of the posterior probabilities of the phone classes. The maximum likelihood word string is then extracted using Markov models. As in traditional hidden Markov models, the Markov process is used to model the lexical and language model constraints. This paper describes the system which participated in the ...
ABSTRACT
Research Interests:
Research Interests:
Research Interests:
Research Interests:
Research Interests:
Abstract ABBOT is the hybrid connectionist-hidden Markov model (HMM) large-vocabulary continuous speech recognition (CSR) system developed at Cambridge University. This system uses a recurrent network to estimate the acoustic observation... more
Abstract ABBOT is the hybrid connectionist-hidden Markov model (HMM) large-vocabulary continuous speech recognition (CSR) system developed at Cambridge University. This system uses a recurrent network to estimate the acoustic observation probabilities within an HMM framework. A major advantage of this approach is that good performance is achieved using context-independent acoustic models and requiring many fewer parameters than comparable HMM systems. This paper presents substantial performance improvements ...
ABSTRACT
Research Interests:
Research Interests:
Research Interests:
Research Interests:
ABSTRACT This paper describes the THISL system that participated in the TREC-7 evaluation, Spoken Document Retrieval (SDR) Track, and presents the results obtained, together with some analysis. The THISL system is based on the ABBOT... more
ABSTRACT This paper describes the THISL system that participated in the TREC-7 evaluation, Spoken Document Retrieval (SDR) Track, and presents the results obtained, together with some analysis. The THISL system is based on the ABBOT speech recognition system and the thislIR text retrieval system. In this evaluation we were concerned with investigating the suitability for SDR of a recognizer running at less than ten times realtime, the use of multiple transcriptions and word graphs, the effect of simple query expansion ...
Research Interests:
Automatic summarisation of spoken audio is a fairly new research pursuit, in large part due to the relative novelty of technology for accurately decoding audio into text. Techniques that account for the peculiarities and potential... more
Automatic summarisation of spoken audio is a fairly new research pursuit, in large part due to the relative novelty of technology for accurately decoding audio into text. Techniques that account for the peculiarities and potential ambiguities of decoded audio (high error rates, lack of syntactic boundaries) appear promising for culling summary information from audio for content-based browsing and skimming. This paper combines acoustic confidence measures with simple information retrieval and extraction techniques in order to obtain accurate, readable summaries of broadcast news programs. It also demonstrates how extracted summaries, full-text speech recogniser output and audio files can be usefully linked together through an audio-visual interface. The results suggest that information extraction based on statistical information can produce viable summaries of decoded audio. 1. APPLICATION CONTEXT Managing this contemporary explosion of audio and video materials calls for intelligent...
The speech waveform can be modelled as a piecewise-stationary linear stochastic state space system, and its parameters can be est imated using an expectation-maximisation (EM) algorithm. One problem is the ini- tialisation of the EM... more
The speech waveform can be modelled as a piecewise-stationary linear stochastic state space system, and its parameters can be est imated using an expectation-maximisation (EM) algorithm. One problem is the ini- tialisation of the EM algorithm. Standard initialisation s chemes can lead to poor formant trajectories. But these trajectories howev er are impor- tant for vowel intelligibility. The aim of
ABSTRACT
Research Interests:

And 70 more

This thesis extends the error propagation network to deal with time varying or dynamic patterns. Examples are given of supervised, reinforcement driven and unsupervised learning. Chapter 1 presents an overview of connectionist models.... more
This thesis extends the error propagation network to deal with time varying or dynamic patterns.  Examples are given of supervised, reinforcement driven and unsupervised learning.

Chapter 1 presents an overview of connectionist models.

Chapter 2 introduces the error propagation algorithm for general node types.

Chapter 3 discusses the issue of data representation in connectionist models.

Chapter 4 describes the use of several types of networks applied to the problem of the recognition of steady state vowels from multiple speakers.

Chapter 5 extends the error propagation algorithm to deal with time varying input.  Three possible architectures are explored which deal with learning sequences of known length and sequences of unknown and possibly indefinite length.  Several simple examples are given.

Chapter 6 describes the use of two dynamic nets to form a speech coder.  The popular method of Differential Pulse Code Modulation for speech coding employs two linear filters to encode and decode speech.  By generalising these to non-linear filters, implemented as dynamic nets, a reduction in the noise imposed by a limited bandwidth channel is achieved.

Chapter 7 describes the application of a dynamic net to the recognition of a large subset of the phonemes of English from continuous speech.  The dynamic net is found to give a higher recognition rate both in comparison with a fixed window net and with the established k nearest neighbour technique.

Chapter 8 describes a further development of dynamic nets which allows them to be trained by a reinforcement signal which expresses the correctness of the output of the net.  Two possible architectures are given and an example of learning to play the game of noughts and crosses is presented.
Research Interests: