Environment Compensation Based on Maximum a Posteriori Estimation for Improved Speech Recognition

Shen, Haifeng; Guo, Jun; Liu, Gang; Huang, Pingmu; Li, Qunxia

doi:10.1007/11579427_87

Haifeng Shen²¹,
Jun Guo²¹,
Gang Liu²¹,
Pingmu Huang²¹ &
…
Qunxia Li²²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3789))

Included in the following conference series:

Mexican International Conference on Artificial Intelligence

1130 Accesses
3 Citations

Abstract

In this paper, we describe environment compensation approach based on MAP (maximum a posteriori) estimation assuming that the noise can be modeled as a single Gaussian distribution. It employs the prior information of the noise to deal with environmental variabilities. The acoustic-distorted environment model in the cepstral domain is approximated by the truncated first-order vector Taylor series(VTS) expansion and the clean speech is trained by using Self-Organizing Map (SOM) neural network with the assumption that the speech can be well represented as the multivariate diagonal Gaussian mixtures model (GMM). With the reasonable environment model approximation and effective clustering for the clean model, the noise is well refined using batch-EM algorithm under MAP criterion. Experiment with large vocabulary speaker-independent continuous speech recognition shows that this approach achieves considerable improvement on recognition performance.

This research was sponsored by NSFC (National Natural Science Foundation of China) under Grant No.60475007, the Foundation of China Education Ministry for Century Spanning Talent and BUPT Education Foundation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Computationally efficient generic adaptive filter (CEGAF)

Article 24 July 2017

Investigation and Development of Methods for Improving Robustness of Automatic Speech Recognition Algorithms in Complex Acoustic Environments

Robust Speaker Recognition Using MAP Estimation of Additive Noise in i-vectors Space

References

Boll, S.F.: Suppression of Acoustic Noise in Speech Using Spectral Subtraction. IEEE Trans. Acoustics, Speech and Signal Processing, 113–120 (1979)
Google Scholar
Moreno, P.J., Raj, B., Stern, R.M.: A Vector Taylor Series Approach for Environment-Independent Speech Recognition. The Proceedings of IEEE, 733–736 (1995)
Google Scholar
Kim, N.S., Kim, D.Y., Kong, B.G.: Application of VTS to Environment Compensation with Noise Statistics. In: ESCA workshop on Robust Speech Recognition, Pont-a-Mousson, France, pp. 99–102 (1997)
Google Scholar
Kim, N.S.: Statistical Linear Approximation for Environment Compensation. IEEE Signal Processing Letters 1, 8–10 (1998)
Google Scholar
Shen, H., Liu, G., Guo, J., Li, Q.: Two-Domain Feature Compensation for Robust Speech Recognition. In: Wang, J., Liao, X.-F., Yi, Z. (eds.) ISNN 2005. LNCS, vol. 3497, pp. 351–356. Springer, Heidelberg (2005)
Chapter Google Scholar
Shen, H., Guo, J., Liu, G., Li, Q.: Non-Stationary Environment Compensation Using Sequential EM Algorithm for Robust Speech Recognition. In: Jorge, A.M., Torgo, L., Brazdil, P.B., Camacho, R., Gama, J. (eds.) PKDD 2005. LNCS (LNAI), vol. 3721, pp. 264–273. Springer, Heidelberg (2005)
Chapter Google Scholar
Gauvain, J.L., Lee, C.H.: Maximum A Posteriori Estimation for Multivariate Gaussian Mixture Observation of Markov Chains. IEEE Transactions on Speech and Audio Processing 2, 291–298 (1994)
Article Google Scholar
Huo, Q., Lee, C.H.: On-Line Adaptive Learning of the Continuous Density Hidden Markov Model Based on Approximate Recursive Bayes Estimate. IEEE Transactions on Speech and Audio Processing 2, 161–172 (1997)
Google Scholar
Huo, Q., Chan, C., Lee, C.H.: Bayesian Adaptive Learning of the Parameters of Hidden Markov Model for Speech Recognition. IEEE Transactions on Speech and Audio Processing 5, 334–345 (1995)
Google Scholar
Kohonen, T.: The self-Organizing Map. The Proceedings of the IEEE 78, 1464–1480 (1990)
Article Google Scholar
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society B, 1–38 (1977)
Google Scholar
Zu, Y.Q.: Issues in the Scientific Design of the Continuous Speech Database. Available: http://www.cass.net.cn/chinese/s18_yys/yuyin/report/report_1998.htm
Varga, A., Steenneken, H.J.M., Tomilson, M., Jones, D.: The NOISEX–92 Study on the Effect of Additive Noise on Automatic Speech Recognition. Tech. Rep. DRA Speech Research Unit (1992)
Google Scholar

Download references

Author information

Authors and Affiliations

Beijing University of Posts and Telecommunications, 100876, Beijing, China
Haifeng Shen, Jun Guo, Gang Liu & Pingmu Huang
University of Science and Technology Beijing, 100083, Beijing, China
Qunxia Li

Authors

Haifeng Shen
View author publications
You can also search for this author in PubMed Google Scholar
Jun Guo
View author publications
You can also search for this author in PubMed Google Scholar
Gang Liu
View author publications
You can also search for this author in PubMed Google Scholar
Pingmu Huang
View author publications
You can also search for this author in PubMed Google Scholar
Qunxia Li
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

National Polytechnic Institute, Center for Computing Research, 07738, Mexico City, México
Alexander Gelbukh
Technológico de Monterrey (ITESM), Campus Ciudad de México (CCM), Calle del Puente 222, Col. Ejudos de Huipulco, 14360 DF, Tlalpan, Mexico
Álvaro de Albornoz
Center for Intelligent Systems, Tecnológico de Monterrey, Campus Monterrey, 64849, Monterrey, N.L., Mexico
Hugo Terashima-Marín

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Shen, H., Guo, J., Liu, G., Huang, P., Li, Q. (2005). Environment Compensation Based on Maximum a Posteriori Estimation for Improved Speech Recognition. In: Gelbukh, A., de Albornoz, Á., Terashima-Marín, H. (eds) MICAI 2005: Advances in Artificial Intelligence. MICAI 2005. Lecture Notes in Computer Science(), vol 3789. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11579427_87

Download citation

DOI: https://doi.org/10.1007/11579427_87
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29896-0
Online ISBN: 978-3-540-31653-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Environment Compensation Based on Maximum a Posteriori Estimation for Improved Speech Recognition

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Computationally efficient generic adaptive filter (CEGAF)

Investigation and Development of Methods for Improving Robustness of Automatic Speech Recognition Algorithms in Complex Acoustic Environments

Robust Speaker Recognition Using MAP Estimation of Additive Noise in i-vectors Space

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Environment Compensation Based on Maximum a Posteriori Estimation for Improved Speech Recognition

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Computationally efficient generic adaptive filter (CEGAF)

Investigation and Development of Methods for Improving Robustness of Automatic Speech Recognition Algorithms in Complex Acoustic Environments

Robust Speaker Recognition Using MAP Estimation of Additive Noise in i-vectors Space

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation