TROIS: Vol E89-D, No 3

Volume E89-D, Issue 3March 2006

Volume E89-D, Issue 3

March 2006

Publisher:

Oxford University Press, Inc.
198 Madison Ave. New York, NY
United States

ISSN:0916-8532

EISSN:1745-1361

Tags:

Bibliometrics

Select All

Export Citations Save to Binder

article

Special Section on Statistical Modeling for Speech Processing

Nakamura Satoshi

Pages 867–868https://doi.org/10.1093/ietisy/e89-d.3.867

article

What HMMs Can Do

Jeff A. Bilmes

Pages 869–891https://doi.org/10.1093/ietisy/e89-d.3.869

Since their inception almost fifty years ago, hidden Markov models (HMMs) have have become the predominant methodology for automatic speech recognition (ASR) systems---today, most state-of-the-art speech systems are HMM-based. There have been a number ...

article

Training Augmented Models Using SVMs

Pages 892–899https://doi.org/10.1093/ietisy/e89-d.3.892

There has been significant interest in developing new forms of acoustic model, in particular models which allow additional dependencies to be represented than those contained within a standard hidden Markov model (HMM). This paper discusses one such ...

article

Minimum Bayes Risk Estimation and Decoding in Large Vocabulary Continuous Speech Recognition

William Byrne

Pages 900–907https://doi.org/10.1093/ietisy/e89-d.3.900

Minimum Bayes risk estimation and decoding strategies based on lattice segmentation techniques can be used to refine large vocabulary continuous speech recognition systems through the estimation of the parameters of the underlying hidden Markov models ...

article

Acoustic Model Adaptation Using First-Order Linear Prediction for Reverberant Speech

Pages 908–914https://doi.org/10.1093/ietisy/e89-d.3.908

This paper describes a hands-free speech recognition technique based on acoustic model adaptation to reverberant speech. In hands-free speech recognition, the recognition accuracy is degraded by reverberation, since each segment of speech is affected by ...

article

Robust Speech Recognition by Using Compensated Acoustic Scores

Pages 915–921https://doi.org/10.1093/ietisy/e89-d.3.915

This paper proposes a new compensation method of acoustic scores in the Viterbi search for robust speech recognition. This method introduces noise models to represent a wide variety of noises and realizes robust decoding together with conventional ...

article

A Non-stationary Noise Suppression Method Based on Particle Filtering and Polyak Averaging

Pages 922–930https://doi.org/10.1093/ietisy/e89-d.3.922

This paper addresses a speech recognition problem in non-stationary noise environments: the estimation of noise sequences. To solve this problem, we present a particle filter-based sequential noise estimation method for front-end processing of speech ...

article

Verification of Speech Recognition Results Incorporating In-domain Confidence and Discourse Coherence Measures

Pages 931–938https://doi.org/10.1093/ietisy/e89-d.3.931

Conventional confidence measures for assessing the reliability of ASR (automatic speech recognition) output are typically derived from "low-level" information which is obtained during speech recognition decoding. In contrast to these approaches, we ...

article

Genetic Algorithm Based Optimization of Partly-Hidden Markov Model Structure Using Discriminative Criterion

Pages 939–945https://doi.org/10.1093/ietisy/e89-d.3.939

A discriminative modeling is applied to optimize the structure of a Partly-Hidden Markov Model (PHMM). PHMM was proposed in our previous work to deal with the complicated temporal changes of acoustic features. It can represent observation dependent ...

article

Improving Acoustic Model Precision by Incorporating a Wide Phonetic Context Based on a Bayesian Framework

Pages 946–953https://doi.org/10.1093/ietisy/e89-d.3.946

Over the last decade, the Bayesian approach has increased in popularity in many application areas. It uses a probabilistic framework which encodes our beliefs or actions in situations of uncertainty. Information from several models can also be combined ...

article

A Hybrid HMM/BN Acoustic Model Utilizing Pentaphone-Context Dependency

Pages 954–961https://doi.org/10.1093/ietisy/e89-d.3.954

The most widely used acoustic unit in current automatic speech recognition systems is the triphone, which includes the immediate preceding and following phonetic contexts. Although triphones have proved to be an efficient choice, it is believed that ...

article

Utterance-Based Selective Training for the Automatic Creation of Task-Dependent Acoustic Models

Pages 962–969https://doi.org/10.1093/ietisy/e89-d.3.962

To obtain a robust acoustic model for a certain speech recognition task, a large amount of speech data is necessary. However, the preparation of speech data including recording and transcription is very costly and time-consuming. Although there are ...

article

Speech Recognition Based on Student's t-Distribution Derived from Total Bayesian Framework

Pages 970–980https://doi.org/10.1093/ietisy/e89-d.3.970

We introduce a robust classification method based on the Bayesian predictive distribution (Bayesian Predictive Classification, referred to as BPC) for speech recognition. We and others have recently proposed a total Bayesian framework named Variational ...

article

Using Hybrid HMM/BN Acoustic Models: Design and Implementation Issues

Pages 981–988https://doi.org/10.1093/ietisy/e89-d.3.981

In recent years, the number of studies investigating new directions in speech modeling that goes beyond the conventional HMM has increased considerably. One promising approach is to use Bayesian Networks (BN) as speech models. Full recognition systems ...

article

ATR Parallel Decoding Based Speech Recognition System Robust to Noise and Speaking Styles

Pages 989–997https://doi.org/10.1093/ietisy/e89-d.3.989

In this paper, we describe a parallel decoding-based ASR system developed of ATR that is robust to noise type, SNR and speaking style. It is difficult to recognize speech affected by various factors, especially when an ASR system contains only a single ...

article

Improving Rapid Unsupervised Speaker Adaptation Based on HMM-Sufficient Statistics in Noisy Environments Using Multi-Template Models

Pages 998–1005https://doi.org/10.1093/ietisy/e89-d.3.998

This paper describes the method of using multi-template unsupervised speaker adaptation based on HMM-Sufficient Statistics to push up the adaptation performance while keeping adaptation time within few seconds with just one arbitrary utterance. This ...

article

Production-Oriented Models for Speech Recognition

Pages 1006–1014https://doi.org/10.1093/ietisy/e89-d.3.1006

Acoustic modeling in speech recognition uses very little knowledge of the speech production process. At many levels our models continue to model speech as a surface phenomenon. Typically, hidden Markov model (HMM) parameters operate primarily in the ...

article

PS-ZCPA Based Feature Extraction with Auditory Masking, Modulation Enhancement and Noise Reduction for Robust ASR

Pages 1015–1023https://doi.org/10.1093/ietisy/e89-d.3.1015

A pitch-synchronous (PS) auditory feature extraction method based on ZCPA (Zero-Crossings Peak-Amplitudes) was proposed previously and showed more robustness over a conventional ZCPA and MFCC based features. In this paper, firstly, a non-linear adaptive ...

article

Trigger-Based Language Model Adaptation for Automatic Transcription of Panel Discussions

Pages 1024–1031https://doi.org/10.1093/ietisy/e89-d.3.1024

We present a novel trigger-based language model adaptation method oriented to the transcription of meetings. In meetings, the topic is focused and consistent throughout the whole session, therefore keywords can be correlated over long distances. The ...

article

Single-Channel Multiple Regression for In-Car Speech Enhancement

Pages 1032–1039https://doi.org/10.1093/ietisy/e89-d.3.1032

We address issues for improving hands-free speech enhancement and speech recognition performance in different car environments using a single distant microphone. This paper describes a new single-channel in-car speech enhancement method that estimates ...

article

Gamma Modeling of Speech Power and Its On-Line Estimation for Statistical Speech Enhancement

Pages 1040–1049https://doi.org/10.1093/ietisy/e89-d.3.1040

This study shows the effectiveness of using gamma distribution in the speech power domain as a more general prior distribution for the model-based speech enhancement approaches. This model is a super-set of the conventional Gaussian model of the complex ...

article

Robust Talker Direction Estimation Based on Weighted CSP Analysis and Maximum Likelihood Estimation

Pages 1050–1057https://doi.org/10.1093/ietisy/e89-d.3.1050

This paper describes a new talker direction estimation method for front-end processing to capture distant-talking speech by using a microphone array. The proposed method consists of two algorithms: One is a TDOA (Time Delay Of Arrival) estimation ...

article

Text-Independent/Text-Prompted Speaker Recognition by Combining Speaker-Specific GMM with Speaker Adapted Syllable-Based HMM

Pages 1058–1065https://doi.org/10.1093/ietisy/e89-d.3.1058

We presented a new text-independent/text-prompted speaker recognition method by combining speaker-specific Gaussian Mixture Model (GMM) with syllable-based HMM adapted by MLLR or MAP. The robustness of this speaker recognition method for speaking style'...

article

Comparative Study of Speaker Identification Methods: dPLRM, SVM and GMM

Pages 1066–1073https://doi.org/10.1093/ietisy/e89-d.3.1066

A comparison of performances is made of three text-independent speaker identification methods based on dual Penalized Logistic Regression Machine (dPLRM), Support Vector Machine (SVM) and Gaussian Mixture Model (GMM) with experiments by 10 male ...

article

Nonparametric Speaker Recognition Method Using Earth Mover's Distance

Pages 1074–1081https://doi.org/10.1093/ietisy/e89-d.3.1074

In this paper, we propose a distributed speaker recognition method using a nonparametric speaker model and Earth Mover's Distance (EMD). In distributed speaker recognition, the quantized feature vectors are sent to a server. The Gaussian mixture model (...

article

Context-Dependent Boundary Model for Refining Boundaries Segmentation of TTS Units

Pages 1082–1091https://doi.org/10.1093/ietisy/e89-d.3.1082

For producing high quality synthesis, a concatenation-based Text-to-Speech (TTS) system usually requires a large number of segmental units to cover various acoustic-phonetic contexts. However, careful manual labeling and segmentation by human experts, ...

article

A Style Adaptation Technique for Speech Synthesis Using HSMM and Suprasegmental Features

Pages 1092–1099https://doi.org/10.1093/ietisy/e89-d.3.1092

This paper proposes a technique for synthesizing speech with a desired speaking style and/or emotional expression, based on model adaptation in an HMM-based speech synthesis framework. Speaking styles and emotional expressions are characterized by many ...

article

Generating F0 Contours by Statistical Manipulation of Natural F0 Shapes

Takashi Saito

Pages 1100–1106https://doi.org/10.1093/ietisy/e89-d.3.1100

This paper describes a method of generating F0 contours from natural F0 segmental shapes for speech synthesis. The extracted shapes of the F0 units are basically held invariant by eliminating any averaging operations in the analysis phase and by ...

article

Speech Analysis Based on Modeling the Effective Voice Source

Pages 1107–1115https://doi.org/10.1093/ietisy/e89-d.3.1107

A new system identification based method has been proposed for accurate estimation of vocal tract parameters. An often encountered problem in using the conventional linear prediction analysis is due to the harmonic structure of the excitation source of ...

article

Implementation and Evaluation of an HMM-Based Korean Speech Synthesis System

Pages 1116–1119https://doi.org/10.1093/ietisy/e89-d.3.1116

Development of a hidden Markov model (HMM)-based Korean speech synthesis system and its evaluation is described. Statistical HMM models for Korean speech units are trained with the hand-labeled speech database including the contextual information about ...

IEICE - Transactions on Information and Systems

Sections

Special Section on Statistical Modeling for Speech Processing

What HMMs Can Do

Training Augmented Models Using SVMs

Minimum Bayes Risk Estimation and Decoding in Large Vocabulary Continuous Speech Recognition

Acoustic Model Adaptation Using First-Order Linear Prediction for Reverberant Speech

Robust Speech Recognition by Using Compensated Acoustic Scores

A Non-stationary Noise Suppression Method Based on Particle Filtering and Polyak Averaging

Verification of Speech Recognition Results Incorporating In-domain Confidence and Discourse Coherence Measures

Genetic Algorithm Based Optimization of Partly-Hidden Markov Model Structure Using Discriminative Criterion

Improving Acoustic Model Precision by Incorporating a Wide Phonetic Context Based on a Bayesian Framework

A Hybrid HMM/BN Acoustic Model Utilizing Pentaphone-Context Dependency

Utterance-Based Selective Training for the Automatic Creation of Task-Dependent Acoustic Models

Speech Recognition Based on Student's t-Distribution Derived from Total Bayesian Framework

Using Hybrid HMM/BN Acoustic Models: Design and Implementation Issues

ATR Parallel Decoding Based Speech Recognition System Robust to Noise and Speaking Styles

Improving Rapid Unsupervised Speaker Adaptation Based on HMM-Sufficient Statistics in Noisy Environments Using Multi-Template Models

Production-Oriented Models for Speech Recognition

PS-ZCPA Based Feature Extraction with Auditory Masking, Modulation Enhancement and Noise Reduction for Robust ASR

Trigger-Based Language Model Adaptation for Automatic Transcription of Panel Discussions

Single-Channel Multiple Regression for In-Car Speech Enhancement

Gamma Modeling of Speech Power and Its On-Line Estimation for Statistical Speech Enhancement

Robust Talker Direction Estimation Based on Weighted CSP Analysis and Maximum Likelihood Estimation

Text-Independent/Text-Prompted Speaker Recognition by Combining Speaker-Specific GMM with Speaker Adapted Syllable-Based HMM

Comparative Study of Speaker Identification Methods: dPLRM, SVM and GMM

Nonparametric Speaker Recognition Method Using Earth Mover's Distance

Context-Dependent Boundary Model for Refining Boundaries Segmentation of TTS Units

A Style Adaptation Technique for Speech Synthesis Using HSMM and Suprasegmental Features

Generating F0 Contours by Statistical Manipulation of Natural F0 Shapes

Speech Analysis Based on Modeling the Effective Voice Source

Implementation and Evaluation of an HMM-Based Korean Speech Synthesis System

Sections

Save to Binder

Comments