Discriminative Boosting Algorithm for Diversified Front-End Phonotactic Language Recognition

Liu, Wei-Wei; Cai, Meng; Zhang, Wei-Qiang; Liu, Jia; Johnson, Michael T.

doi:10.1007/s11265-015-1017-1

Discriminative Boosting Algorithm for Diversified Front-End Phonotactic Language Recognition

Published: 28 May 2015

Volume 82, pages 229–239, (2016)
Cite this article

Journal of Signal Processing Systems Aims and scope Submit manuscript

Wei-Wei Liu^1,2,
Meng Cai¹,
Wei-Qiang Zhang¹,
Jia Liu¹ &
…
Michael T. Johnson³

232 Accesses
Explore all metrics

Abstract

Currently, phonotactic spoken language recognition (SLR) and acoustic SLR systems are widely used language recognition systems. Parallel phone recognition followed by vector space modeling (PPRVSM) is one typical phonotactic system for spoken language recognition. To achieve better performance, researchers assumed to extract more complementary information of the training data using phone recognizers trained for multiple language-specific phone recognizers, different acoustic models and acoustic features. These methods achieve good performance but usually compute at high computational cost and only using complementary information of the training data. In this paper, we explore a novel approach to discriminative vector space model (VSM) training by using a boosting framework to use the discriminative information of test data effectively, in which an ensemble of VSMs is trained sequentially. The effectiveness of our boosting variation comes from the emphasis on working with the high confidence test data to achieve discriminatively trained models. Our variant of boosting also includes utilizing original training data in VSM training. The discriminative boosting algorithm (DBA) is applied to the National Institute of Standards and Technology (NIST) language recognition evaluation (LRE) 2009 task and show performance improvements. The experimental results demonstrate that the proposed DBA shows 1.8 %, 11.72 % and 15.35 % relative reduction for 30s, 10s and 3s test utterances in equal error rate (EER) than baseline system.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Kannada Dialect Identification from Case-Based Word Utterances Using Gradient Boosting Algorithm

Sentence-Based Dialect Identification System Using Extreme Gradient Boosting Algorithm

Development of novel automated language classification model using pyramid pattern technique with speech signals

Article 25 July 2022

References

Li, H., Ma, B., & Lee, K.A. (2013). Spoken language recognition: from fundamentals to practice. Proceedings of the IEEE, 101(5), 1136–1159.
Article Google Scholar
Zissman, M.A. (1996). Comparison of four approaches to automatic language identification of telephone speech. IEEE Transactions on Speech and Audio Processing, 4(1), 31–34.
Article Google Scholar
Torres-Carrasquillo, P.A., Singer, E., Kohler, M.A., Greene, R.J., Reynolds, D.A., & Deller Jr, J.R. (2002). Approaches to language identification using Gaussian mixture models and shifted delta cepstral features. In INTERSPEECH (pp. 33–36).
Schwarz, P. (2009). Phoneme recognition based on long temporal context. PhD thesis. Brno University of Technology.
Sim, K.C., & Li, H. (2008). On acoustic diversification front-end for spoken language identification. IEEE Transactions on Audio, Speech, and Language Processing, 16(5), 1029– 1037.
Article Google Scholar
Wells, W.M., Viola, P., Atsumi, H., Nakajima, S., & Kikinis, R. (1996). Multi-modal volume registration by maximization of mutual information. Medical Image Analysis, 1(1), 35–51.
Article Google Scholar
Bahl, L., Brown, P., de Souza, P.V., & Mercer, R. (1986). Maximum mutual information estimation of hidden Markov model parameters for speech recognition. In IEEE International Conference on Acoustics, Speech, and Signal Processing (pp. 49–52).
Povey, D., & Woodland, P. C. (2002). Minimum phone error and I-smoothing for improved discriminative training. In IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (pp. 101–105).
Juang, B.H., & Katagiri, S. (1992). Discriminative learning for minimum error classification [pattern recognition]. IEEE Transactions on Signal Processing, 40(12), 3043–3054.
Article MATH Google Scholar
Zhang, W.Q., He, L., Deng, Y., Liu, J., & Johnson, M.T. (2011). Time-Frequency Cepstral Features and Heteroscedastic Linear Discriminant Analysis for Language Recognition. IEEE Transactions on Audio, Speech, and Language Processing, 19(2), 266–276.
Article Google Scholar
Singer, E., Torres-Carrasquillo, P.A., Gleason, T.P., Campbell, W.M., & Reynolds, D.A. (2003). Acoustic, phonetic, and discriminative approaches to automatic language identification. In INTERSPEECH (pp. 1944–1948).
Martin, A. F., & Greenberg, C. S. (2010). The 2009 NIST Language Recognition Evaluation. In Odyssey (p. 30).
Li, H., Ma, B., & Lee, C.H. (2007). A vector space modeling approach to spoken language identification. IEEE Transactions on Audio, Speech, and Language Processing, 15(1), 271–284.
Article Google Scholar
Gauvain, J. L., Messaoudi, A., & Schwenk, H. (2004). Language recognition using phone latices. In INTERSPEECH (pp. 1283–1286).
Dahl, G.E., Yu, D., Deng, L., & Acero, A. (2012). Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition. IEEE Transactions on Audio, Speech, and Language Processing, 20(1), 30–42.
Article Google Scholar
Campbell, W. M., Campbell, J. P., Reynolds, D. A., Jones, D. A., & Leek, T. R. (2003). Phonetic speaker recognition with support vector machines. In Advances in Neural Information Processing Systems (pp. 1377–1384).
Matejka, P., Burget, L., Glembek, O., Schwarz, P., Hubeika, V., Fapso, M., & Plchot, O. (2007). BUT system description for NIST LRE 2007. In 2007 NIST Language Recognition Evaluation Workshop (pp. 1–5).
Povey, D. (2005). Discriminative training for large vocabulary speech recognition (Doctoral dissertation, University of Cambridge).
Jancik, Z., Plchot, O., Brmmer, N., Burget, L., Glembek, O., Hubeika, V., & Cernocky, J. (2010). Data selection and calibration issues in automatic language recognition-investigation with BUT-AGNITIO NIST LRE 2009 system. In Odyssey (pp. 215–221).
Torres-Carrasquillo, P.A., Singer, E., Gleason, T., McCree, A., Reynolds, D.A., Richardson, F., & Sturim, D. (2010). The MITLL NIST LRE 2009 language recognition system. In IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP) (pp. 4994–4997).
Liu, W.-W., Cai, M., Yuan, H., Xu, J., Liu, J., & Zhang, W.-Q. (2014). DNN-HMM acoustic model for phonotactic language recognition. International Symposium on Chinese Spoken Language Processing(ISCSLP), 148–152.
Cai, M., Shi, Y., & Liu, J. (2013). Deep maxout neural networks for speech recognition. In IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU) (pp. 291–296).
Godfrey, J. J., Holliman, E. C., & McDaniel, J. (1992). SWITCHBOARD: Telephone speech corpus for research and development. In EEE International Conference on Acoustics Speech, and Signal Processing (pp. 517–520).
Seide, F., Li, G., Chen, X., & Yu, D. (2011). Feature engineering in context-dependent deep neural networks for conversational speech transcription. In IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU) (pp. 24–29).
Mnih, V. (2009). Cudamat: a CUDA-based matrix class for python. Department of Computer Science, University of Toronto, Tech. Rep. UTML TR.
Deng, Y., Zhang, W.-Q., Qian, Y.M., & Liu, J. (2011). Language recognition based on acoustic diversified phone recognizers and phonotactic feature fusion. IEICE Transactions on Information and Systems, 94(3), 679–689.
Article Google Scholar
Young, S., Evermann, G., Gales, M., Hain, T., Kershaw, D., Liu, X.A., & Woodland, P. (2006). The HTK book (for HTK version 3.4).
Stolcke, A. (2002). SRILM-an extensible language modeling toolkit. In INTERSPEECH.
Mikolov, T., Kombrink, S., Deoras, A., Burget, L., & Cernocky, J. (2011). RNNLM-Recurrent neural network language modeling toolkit. In Proceeding of the 2011 ASRU Workshop (pp. 196–201).
Fan, R.E., Chang, K.W., Hsieh, C.J., Wang, X.R., & Lin, C.J. (2008). LIBLINEAR: A library for large linear classification. The Journal of Machine Learning Research, 1871–1874.
Zhang, W.Q., Hou, T., & Liu, J. (2010). Discriminative score fusion for language identification. Chinese Journal of Electronics, 19(1), 124–128.
Google Scholar

Download references

Author information

Authors and Affiliations

Tsinghua National Laboratory for Information Science and Technology, Department of Electronic Engineering, Tsinghua University, Beijing, 100084, China
Wei-Wei Liu, Meng Cai, Wei-Qiang Zhang & Jia Liu
General Communication Station, Chinese General Logistics Department, Beijing, 100842, China
Wei-Wei Liu
Department of Electrical and Computer Engineering, Marquette University, Milwaukee, Wisconsin, USA
Michael T. Johnson

Authors

Wei-Wei Liu
View author publications
You can also search for this author in PubMed Google Scholar
Meng Cai
View author publications
You can also search for this author in PubMed Google Scholar
Wei-Qiang Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Jia Liu
View author publications
You can also search for this author in PubMed Google Scholar
Michael T. Johnson
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wei-Qiang Zhang.

Additional information

This project is supported by National Natural Science Foundation of China (No.61273268, No. 61370034 and No. 61403224).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Liu, WW., Cai, M., Zhang, WQ. et al. Discriminative Boosting Algorithm for Diversified Front-End Phonotactic Language Recognition. J Sign Process Syst 82, 229–239 (2016). https://doi.org/10.1007/s11265-015-1017-1

Download citation

Received: 14 November 2014
Revised: 23 April 2015
Accepted: 11 May 2015
Published: 28 May 2015
Issue Date: February 2016
DOI: https://doi.org/10.1007/s11265-015-1017-1

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Discriminative Boosting Algorithm for Diversified Front-End Phonotactic Language Recognition

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Kannada Dialect Identification from Case-Based Word Utterances Using Gradient Boosting Algorithm

Sentence-Based Dialect Identification System Using Extreme Gradient Boosting Algorithm

Development of novel automated language classification model using pyramid pattern technique with speech signals

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Discriminative Boosting Algorithm for Diversified Front-End Phonotactic Language Recognition

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Kannada Dialect Identification from Case-Based Word Utterances Using Gradient Boosting Algorithm

Sentence-Based Dialect Identification System Using Extreme Gradient Boosting Algorithm

Development of novel automated language classification model using pyramid pattern technique with speech signals

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation