Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content
F. Beaufays

    F. Beaufays

    Google, Search, Department Member
    We show that signal flow graph theory provides a simple way to relate two popular algorithms used for adapting dynamic neural networks, real-time backpropagation and backpropagation-through-time. Starting with the flow graph for real-time... more
    We show that signal flow graph theory provides a simple way to relate two popular algorithms used for adapting dynamic neural networks, real-time backpropagation and backpropagation-through-time. Starting with the flow graph for real-time backpropagation, we use a simple transposition to produce a second graph. The new graph is shown to be interreciprocal with the original and to correspond to the backpropagation-through-time algorithm. Interreciprocity provides a theoretical argument to verify that both flow graphs implement the same overall weight update.
    We present an approach to cluster the training datafor automatic speech recognition (ASR). A relativeentropybased distance metric between training dataclusters is defined. This metric is used to hierarchicallycluster the training data.... more
    We present an approach to cluster the training datafor automatic speech recognition (ASR). A relativeentropybased distance metric between training dataclusters is defined. This metric is used to hierarchicallycluster the training data. The metric can alsobe used to select the closest training data clustersgiven a small amount of data from the test speaker.The selected clusters are then used to estimate a
    Deriving backpropagation algorithms for time-dependent neural network structures typically requires numerous chain rule expansions, diligent bookkeeping, and careful manipulation of terms. In this paper, we present a unified approach to... more
    Deriving backpropagation algorithms for time-dependent neural network structures typically requires numerous chain rule expansions, diligent bookkeeping, and careful manipulation of terms. In this paper, we present a unified approach to derive such algorithms via a set of simple block diagram manipulation rules.
    We propose a technique to port channel characteristics from one language to another. This allows us to build acoustic models in a target language that are robust to an environment for which we have no data in that language. The approach... more
    We propose a technique to port channel characteristics from one language to another. This allows us to build acoustic models in a target language that are robust to an environment for which we have no data in that language. The approach consists in training broad phonetic class maximum likelihood linear regression (MLLR) transformations from a source language, and applying them in the target language. These transforms encapsulate the acoustic specificities of the environment without capturing language-specific characteristics that are difficult to port across languages. As a case study, we consider the problem of building in-the-car GSM models for UK English, assuming that we have no GSM, and no car data in UK English, but that we have such data in German. We show that this technique can greatly reduce the error rate of the recognition system on English GSM car data.
    Basic studies in denotational mathematics and mathematical engineering have led to the theory of abstract intelligence (aI), which is a set of mathematical models of natural and computational intelligence in cognitive informatics (CI) and... more
    Basic studies in denotational mathematics and mathematical engineering have led to the theory of abstract intelligence (aI), which is a set of mathematical models of natural and computational intelligence in cognitive informatics (CI) and cognitive computing (CC). intelligence triggers the recent breakthroughs in cognitive systems such as cognitive computers, cognitive robots, cognitive neural networks, and cognitive learning. This paper reports a set of position statements presented in the plenary panel (Part II) of IEEE ICCI*CC'16 on Cognitive Informatics and Cognitive Computing at Stanford University. The summary is contributed by invited panelists who are part of the world's renowned scholars in the transdisciplinary field of CI and CC.
    This paper describes a new approach to acoustic modeling for large vocabulary continuous speech recognition (LVCSR) systems. Each phone is modeled with a large Gaussian mixture model (GMM) whose context-dependent mixture weights are... more
    This paper describes a new approach to acoustic modeling for large vocabulary continuous speech recognition (LVCSR) systems. Each phone is modeled with a large Gaussian mixture model (GMM) whose context-dependent mixture weights are estimated with a sentence-level discriminative training criterion. The estimation problem is cast in a neural network framework, which enables the incorporation of the appropriate constraints on the mixture weight vectors, and allows a straight-forward training procedure, based on steepest descent. Experiments conducted on the Callhome-English and Switchboard databases show a significant improvement of the acoustic model performance, and a somewhat lesser improvement with the combined acoustic and language models.
    In the context of automatic speaker recognition, we propose a model transformation technique that renders speaker models more robust to acoustic mismatches and to data scarcity by appropriately increasing their variances. We use a stereo... more
    In the context of automatic speaker recognition, we propose a model transformation technique that renders speaker models more robust to acoustic mismatches and to data scarcity by appropriately increasing their variances. We use a stereo database containing speech recorded simultaneously under different acoustic conditions to derive a synthetic variance distribution. This distribution is then used to modify the variances of other speaker models from other telephone databases. The technique is illustrated with experiments conducted on a locally collected database and on the NIST'95 and '96 subsets of the Switchboard corpus.
    ABSTRACT
    We describe our early experience building and optimizing GOOG-411, a fully automated, voice-enabled, business finder. We show how taking an iterative approach to system development allows us to optimize the various components of the... more
    We describe our early experience building and optimizing GOOG-411, a fully automated, voice-enabled, business finder. We show how taking an iterative approach to system development allows us to optimize the various components of the system, thereby progressively improving user-facing metrics. We show the contributions of different data sources to recognition accuracy. For business listing language models, we see a nearly

    And 4 more