Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
Volume E89-D, Issue 3March 2006
Reflects downloads up to 17 Feb 2025Bibliometrics
article
What HMMs Can Do

Since their inception almost fifty years ago, hidden Markov models (HMMs) have have become the predominant methodology for automatic speech recognition (ASR) systems---today, most state-of-the-art speech systems are HMM-based. There have been a number ...

article
Training Augmented Models Using SVMs

There has been significant interest in developing new forms of acoustic model, in particular models which allow additional dependencies to be represented than those contained within a standard hidden Markov model (HMM). This paper discusses one such ...

article
Minimum Bayes Risk Estimation and Decoding in Large Vocabulary Continuous Speech Recognition

Minimum Bayes risk estimation and decoding strategies based on lattice segmentation techniques can be used to refine large vocabulary continuous speech recognition systems through the estimation of the parameters of the underlying hidden Markov models ...

article
Acoustic Model Adaptation Using First-Order Linear Prediction for Reverberant Speech

This paper describes a hands-free speech recognition technique based on acoustic model adaptation to reverberant speech. In hands-free speech recognition, the recognition accuracy is degraded by reverberation, since each segment of speech is affected by ...

article
Robust Speech Recognition by Using Compensated Acoustic Scores

This paper proposes a new compensation method of acoustic scores in the Viterbi search for robust speech recognition. This method introduces noise models to represent a wide variety of noises and realizes robust decoding together with conventional ...

article
A Non-stationary Noise Suppression Method Based on Particle Filtering and Polyak Averaging

This paper addresses a speech recognition problem in non-stationary noise environments: the estimation of noise sequences. To solve this problem, we present a particle filter-based sequential noise estimation method for front-end processing of speech ...

article
Verification of Speech Recognition Results Incorporating In-domain Confidence and Discourse Coherence Measures

Conventional confidence measures for assessing the reliability of ASR (automatic speech recognition) output are typically derived from "low-level" information which is obtained during speech recognition decoding. In contrast to these approaches, we ...

article
Genetic Algorithm Based Optimization of Partly-Hidden Markov Model Structure Using Discriminative Criterion

A discriminative modeling is applied to optimize the structure of a Partly-Hidden Markov Model (PHMM). PHMM was proposed in our previous work to deal with the complicated temporal changes of acoustic features. It can represent observation dependent ...

article
Improving Acoustic Model Precision by Incorporating a Wide Phonetic Context Based on a Bayesian Framework

Over the last decade, the Bayesian approach has increased in popularity in many application areas. It uses a probabilistic framework which encodes our beliefs or actions in situations of uncertainty. Information from several models can also be combined ...

article
A Hybrid HMM/BN Acoustic Model Utilizing Pentaphone-Context Dependency

The most widely used acoustic unit in current automatic speech recognition systems is the triphone, which includes the immediate preceding and following phonetic contexts. Although triphones have proved to be an efficient choice, it is believed that ...

article
Utterance-Based Selective Training for the Automatic Creation of Task-Dependent Acoustic Models

To obtain a robust acoustic model for a certain speech recognition task, a large amount of speech data is necessary. However, the preparation of speech data including recording and transcription is very costly and time-consuming. Although there are ...

article
Speech Recognition Based on Student's t-Distribution Derived from Total Bayesian Framework

We introduce a robust classification method based on the Bayesian predictive distribution (Bayesian Predictive Classification, referred to as BPC) for speech recognition. We and others have recently proposed a total Bayesian framework named Variational ...

article
Using Hybrid HMM/BN Acoustic Models: Design and Implementation Issues

In recent years, the number of studies investigating new directions in speech modeling that goes beyond the conventional HMM has increased considerably. One promising approach is to use Bayesian Networks (BN) as speech models. Full recognition systems ...

article
ATR Parallel Decoding Based Speech Recognition System Robust to Noise and Speaking Styles

In this paper, we describe a parallel decoding-based ASR system developed of ATR that is robust to noise type, SNR and speaking style. It is difficult to recognize speech affected by various factors, especially when an ASR system contains only a single ...

article
Improving Rapid Unsupervised Speaker Adaptation Based on HMM-Sufficient Statistics in Noisy Environments Using Multi-Template Models

This paper describes the method of using multi-template unsupervised speaker adaptation based on HMM-Sufficient Statistics to push up the adaptation performance while keeping adaptation time within few seconds with just one arbitrary utterance. This ...

article
Production-Oriented Models for Speech Recognition

Acoustic modeling in speech recognition uses very little knowledge of the speech production process. At many levels our models continue to model speech as a surface phenomenon. Typically, hidden Markov model (HMM) parameters operate primarily in the ...

article
PS-ZCPA Based Feature Extraction with Auditory Masking, Modulation Enhancement and Noise Reduction for Robust ASR

A pitch-synchronous (PS) auditory feature extraction method based on ZCPA (Zero-Crossings Peak-Amplitudes) was proposed previously and showed more robustness over a conventional ZCPA and MFCC based features. In this paper, firstly, a non-linear adaptive ...

article
Trigger-Based Language Model Adaptation for Automatic Transcription of Panel Discussions

We present a novel trigger-based language model adaptation method oriented to the transcription of meetings. In meetings, the topic is focused and consistent throughout the whole session, therefore keywords can be correlated over long distances. The ...

article
Single-Channel Multiple Regression for In-Car Speech Enhancement

We address issues for improving hands-free speech enhancement and speech recognition performance in different car environments using a single distant microphone. This paper describes a new single-channel in-car speech enhancement method that estimates ...

article
Gamma Modeling of Speech Power and Its On-Line Estimation for Statistical Speech Enhancement

This study shows the effectiveness of using gamma distribution in the speech power domain as a more general prior distribution for the model-based speech enhancement approaches. This model is a super-set of the conventional Gaussian model of the complex ...

article
Robust Talker Direction Estimation Based on Weighted CSP Analysis and Maximum Likelihood Estimation

This paper describes a new talker direction estimation method for front-end processing to capture distant-talking speech by using a microphone array. The proposed method consists of two algorithms: One is a TDOA (Time Delay Of Arrival) estimation ...

article
Text-Independent/Text-Prompted Speaker Recognition by Combining Speaker-Specific GMM with Speaker Adapted Syllable-Based HMM

We presented a new text-independent/text-prompted speaker recognition method by combining speaker-specific Gaussian Mixture Model (GMM) with syllable-based HMM adapted by MLLR or MAP. The robustness of this speaker recognition method for speaking style'...

article
Comparative Study of Speaker Identification Methods: dPLRM, SVM and GMM

A comparison of performances is made of three text-independent speaker identification methods based on dual Penalized Logistic Regression Machine (dPLRM), Support Vector Machine (SVM) and Gaussian Mixture Model (GMM) with experiments by 10 male ...

article
Nonparametric Speaker Recognition Method Using Earth Mover's Distance

In this paper, we propose a distributed speaker recognition method using a nonparametric speaker model and Earth Mover's Distance (EMD). In distributed speaker recognition, the quantized feature vectors are sent to a server. The Gaussian mixture model (...

article
Context-Dependent Boundary Model for Refining Boundaries Segmentation of TTS Units

For producing high quality synthesis, a concatenation-based Text-to-Speech (TTS) system usually requires a large number of segmental units to cover various acoustic-phonetic contexts. However, careful manual labeling and segmentation by human experts, ...

article
A Style Adaptation Technique for Speech Synthesis Using HSMM and Suprasegmental Features

This paper proposes a technique for synthesizing speech with a desired speaking style and/or emotional expression, based on model adaptation in an HMM-based speech synthesis framework. Speaking styles and emotional expressions are characterized by many ...

article
Generating F0 Contours by Statistical Manipulation of Natural F0 Shapes

This paper describes a method of generating F0 contours from natural F0 segmental shapes for speech synthesis. The extracted shapes of the F0 units are basically held invariant by eliminating any averaging operations in the analysis phase and by ...

article
Speech Analysis Based on Modeling the Effective Voice Source

A new system identification based method has been proposed for accurate estimation of vocal tract parameters. An often encountered problem in using the conventional linear prediction analysis is due to the harmonic structure of the excitation source of ...

article
Implementation and Evaluation of an HMM-Based Korean Speech Synthesis System

Development of a hidden Markov model (HMM)-based Korean speech synthesis system and its evaluation is described. Statistical HMM models for Korean speech units are trained with the hand-labeled speech database including the contextual information about ...

Comments