Continuous Tamil Speech Recognition technique under non stationary noisy environments

Kalamani, M.; Krishnamoorthi, M.; Valarmathi, R. S.

doi:10.1007/s10772-018-09580-8

Continuous Tamil Speech Recognition technique under non stationary noisy environments

Published: 30 November 2018

Volume 22, pages 47–58, (2019)
Cite this article

International Journal of Speech Technology Aims and scope Submit manuscript

M. Kalamani¹,
M. Krishnamoorthi² &
R. S. Valarmathi¹

242 Accesses
6 Citations
Explore all metrics

Abstract

In the last few years, the need for Continuous Speech Recognition system in Tamil language has been increased widely. In this research work, efficient Continuous Tamil Speech Recognition (CTSR) technique is proposed under non stationary noisy environments. This research work consists of two stages such as speech enhancement and modelling phase. In this, the modified Modulation Magnitude Estimation based Spectral Subtraction with Chi-Square Distribution based Noise Estimation (SS–NE) algorithm is proposed to enhance the noisy Tamil speech signal under various non-stationary noise environments. In order to extract the speech segments from the continuous speech, further the enhanced speech signal is segmented through the combination of short-time signal energy and spectral centroid features of the signal. In this work, 26 mel frequency cepstral coefficients per frame are found as optimal values and they are considered as acoustic feature vectors for each frame. In this research work, the Fuzzy C-Means (FCM) clustering is used in order to cluster the extracted feature vectors into discrete symbols. From the evaluation results, it is found that the optimal number of clusters ‘C’ as 5. Finally, Tamil speech from various speakers is recognized using Expectation Maximization Gaussian Mixture Model (EM-GMM) with 16 component densities under continuous measurements of labelled features from FCM clustering techniques in order to reduce the word error rate. From the simulated results, it is observed that the proposed FCM with EM-GMM model for CTSR improves the recognition accuracy from 1.2 to 4.4% when compared to the existing algorithms under different noisy environments by reducing the WER from 1.6 to 5.47%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Hindi speech recognition in noisy environment using hybrid technique

Article 01 January 2021

Hybrid Feature Extraction Technique for Tamil Automatic Speech Recognition System in Noisy Environment

Speech Recognition System Based on OLLO French Corpus by Using MFCCs

References

Al-Alaoui, M. A., Al-Kanj, L., Azar, J., & Yaacoub, E. (2008). Speech recognition using artificial neural networks and hidden Markov models. IEEE Multidisciplinary Engineering Education Magazine, 3(3), 77–86.
Google Scholar
Atlas, L., Li, Q., & Thompson, J. (2004). Homomorphic modulation spectra. In Proceedings of the IEEE international conference on acoustics, speech, and signal processing, Vol. 2, pp. ii761–ii764.
Benesty, J., & Huang, Y. (2003). Adaptive signal processing: Applications to real-world problems. Berlin: Springer.
Book MATH Google Scholar
Boll, S. (1979). Suppression of acoustic noise in speech using spectral subtraction. IEEE Transactions on Acoustics, Speech and Signal Processing, 27(2), 113–120.
Article Google Scholar
Cappé, O. (1994). Elimination of the musical noise phenomenon with the Ephraim and Malah noise suppressor. IEEE Transactions on Speech and Audio Processing, 2(2), 345–349.
Article Google Scholar
Chattopadhyay, S., Pratihar, D. K., & Sarkar, S. C. D. (2011). A comparative study of fuzzy C-means algorithm and entropy-based fuzzy clustering algorithms. Computing and Informatics, 30(4), 701–720.
MATH Google Scholar
Chi, H. F., Gao, S. X., Soli, S. D., & Alwan, A. (2003). Band-limited feedback cancellation with a modified filtered-X LMS algorithm for hearing aids. Speech Communication, 39(1), 147–161.
Article MATH Google Scholar
Cohen, I. (2003). Noise spectrum estimation in adverse environments: Improved minima controlled recursive averaging. IEEE Transactions on Speech Audio Processing, 11(5), 466–475.
Article Google Scholar
Cohen, I. (2004). Speech enhancement using a non-causal a priori SNR estimator. IEEE Signal Processing Letters, 11(9), 725–728.
Article Google Scholar
Cohen, I. (2005). Speech enhancement using super Gaussian speech models and non causal a priori SNR estimation. Speech Communication, 47(3), 336–350.
Article MathSciNet Google Scholar
Cohen, I., & Berdugo, B. (2002). Noise Estimation by minima controlled recursive averaging for robust speech enhancement. IEEE Signal Processing Letters, 9(1), 12–15.
Article Google Scholar
Cornelis, B., Moonen, M., & Wouters, J. (2011). Performance analysis of multichannel Wiener Filter-based noise reduction in hearing aids under second order statistics estimation errors. IEEE Transactions on Audio, Speech and Language Processing, 19(5), 1368–1381.
Article Google Scholar
Davis, S., & Mermelstein, P. (1980). Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions on Acoustics, Speech and Signal Processing, 28(4), 357–366.
Article Google Scholar
Ephraim, Y., & Malah, D. (1984). Speech enhancement using a minimum mean square error short-time spectral amplitude estimator. IEEE Transactions on Acoustics, Speech and Signal Processing, 32(6), 1109–1121.
Article Google Scholar
Erkelens, J., Jensen, J., & Heusdens, R. (2007). A data driven approach to optimized spectral speech enhancement methods for various error criteria. Speech Communication, 49(7), 530–541.
Article Google Scholar
Erkelens, J. S., & Heusdens, R. (2008). Tracking of non-stationary noise based on data-driven recursive noise power estimation. IEEE Transactions on Audio, Speech, and Language Processing, 16(6), 1112–1123.
Article Google Scholar
Gerkmann, T., & Hendriks, R. C. (2011). Noise power estimation based on the probability of speech presence. In Proceedings of the IEEE workshop on applications of signal processing to audio and acoustics, pp. 145–148.
Ghanbari, Y., Karami, M., & Amelifard, B. (2004). Improved Multiband Spectral subtraction method for speech enhancement. In Proceedings of the sixth IASTED international conference on signal and image processing, pp. 225–230.
Haykin, S., & Widrow, B. (2003). Least-mean-square adaptive filters. New York: Wiley.
Book Google Scholar
Hellgren, J. (2002). Analysis of feedback cancellation in hearing aids with filtered-X LMS and the direct method of closed loop identification. IEEE Transactions on Speech and Audio Processing, 10(2), 119–131.
Article Google Scholar
Hermansky, H. (1990). Perceptual linear predictive (PLP) analysis for speech. Journal of Acoustic Society of America, 87(4), 1738–1752.
Article Google Scholar
Hermansky, H., & Morgan, N. (1994). RASTA processing of speech. IEEE Transactions on Speech and Audio Processing, 2(4), 578–589.
Article Google Scholar
Hossan, M. A., Memon, S., & Gregory, M. A. (2010). A novel approach for MFCC feature extraction. In Proceedings of the IEEE fourth international conference on signal processing and communication systems, pp. 1–5.
Huang, H. C., & Lee, J. (2012). A new variable step-size NLMS algorithm and its performance analysis. IEEE Transactions on Signal Processing, 60(4), 2055–2060.
Article MathSciNet MATH Google Scholar
Kalamani, M., Valarmathy, S., & Krishnamoorthi, M. (2014a). Speech enhancement using modified magnitude estimation- based spectral subtraction algorithm. Arabian Journal for Sciences and Engineering, 39(32), 8965–8978.
Article Google Scholar
Kalamani, M., Valarmathy, S., & Krishnamoorthi, M. (2014b). Adaptive noise reduction, algorithm for speech enhancement. World Academy of Science, Engineering and Technology - International Journal of Computer, Information, Systems and Control Engineering, 8(6), 987–994.
Google Scholar
Kalamani, M., Valarmathy, S., & Krishnamoorthi, M. (2014c). Hybrid modeling algorithm for Continuous Tamil Speech Recognition. World Academy of Science, Engineering and Technology - International Journal of Computer, Information, Systems and Control Engineering, 8(12), 1927–1934.
Google Scholar
Kalamani, M., Valarmathy, S., & Krishnamoorthi, M. (2015). Noise tracking algorithm for speech enhancement. Applied Mathematics and Information Sciences, 9(2), 691–698.
Google Scholar
Kamath, S., & Loizou, P. (2002). A multi-band spectral subtraction method for enhancing speech corrupted by colored noise. In Proceedings of the IEEE international conference on acoustics, speech, and signal processing, pp. 4164–4167.
Kesarkar, M. P. (2003). Feature extraction for speech recognition. Technical Credit Seminar Report, Electronic Systems Group, IIT Bombay.
Li, X. G., Yao, M. F., & Huang, W. T. (2011). Speech recognition based on k-means clustering and neural network ensembles. In Proceedings of the IEEE seventh international conference on natural computation, Vol. 2, pp. 614–617.
Loizou, P. C. (2005). Speech enhancement based on perceptually motivated Bayesian estimators of the magnitude spectrum. IEEE Transactions on Speech and Audio Processing, 13(5), 857–869.
Article Google Scholar
Martin, R. (2001). Noise power spectral density estimation based on optimal smoothing and minimum statistics. IEEE Transactions on Speech Audio Processing, 9(5), 504–512.
Article Google Scholar
Martin, R. (2005). Speech enhancement based on minimum mean-square error estimation and super-Gaussian priors. IEEE Transactions on Speech and Audio Processing, 3(5), 845–856.
Article Google Scholar
Mohammed, J. R., & Shafi, M. S. (2012). An efficient adaptive noise cancellation scheme using ALE and NLMS filters. International Journal of Electrical and Computer Engineering, 2(3), 325–332.
Google Scholar
Paliwal, K., Schwerin, B., & Wójcicki, K. (2012). Speech enhancement using a minimum mean square error short time spectral modulation magnitude estimator. Speech Communication, 54(2), 282–305.
Article Google Scholar
Paliwal, K., Wójcicki, K., & Schwerin, B. (2010). Single-channel speech enhancement using spectral subtraction in the short-time modulation domain. Speech Communication, 52(5), 450–475.
Article Google Scholar
Porter, J., & Boll, S. (1984). Optimal estimators for spectral restoration of noisy speech. In Proceedings of the IEEE international conference on acoustics, speech, and signal processing, Vol. 9, pp. 53–56.
Rabiner, L. (1989). A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, 77(2), 257–286.
Article Google Scholar
Rabiner, L. R., & Juang, B. H. (1993). Fundamentals of speech recognition. Englewood Cliffs: Prentice Hall.
Google Scholar
Rabiner, L. R. & Sambur, M. R. (1975). An algorithm for determining the endpoints of isolated utterances. The Bell System Technical Journal, 54(2), 297–315.
Article Google Scholar
Rahman, M. M., & Bhuiyan, M. A. A. (2012). Continuous Bangla speech segmentation using short-term speech features extraction approaches. International Journal of Advanced Computer Science and Applications, 3(11), 131–138.
Google Scholar
Rahman, M. Z. U., Shaik, R. A., & Reddy, D. V. (2009). Adaptive noise removal in the ECG using the block LMS algorithm. In Proceedings of the second IEEE international conference on adaptive science and technology, pp. 380–383.
Rangachari, S. (2004). Noise estimation algorithms for highly non-stationary environments. MS thesis, University of Texas, Dallas.
Rangachari, S., & Loizou, P. C. (2006). A noise-estimation algorithm for highly non-stationary environments. Speech Communication, 48(2), 220–231.
Article Google Scholar
Rangachari, S., Loizou, P. C., & Hu, Y. (2004). A noise estimation algorithm with rapid adaptation for highly non-stationary environments. Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, 1, I-305–308.
Google Scholar
Reynolds, D. A., & Rose, R. C. (1995). Robust text-independent speaker identification using Gaussian mixture speaker models. IEEE Transactions on Speech and Audio Processing, 3(1), 72–83.
Article Google Scholar
Scalart, P. (1996). Speech enhancement based on a priori signal to noise estimation. In Proceedings of the IEEE international conference on acoustics, speech, and signal processing, Vol. 2, pp. 629–632.
Sunny, S., David, P. S., & Jacob, K. P. (2012). Feature extraction methods based on linear predictive coding and wavelet packet decomposition for recognizing spoken words in Malayalam. In Proceedings of the IEEE international conference on advances in computing and communications, pp. 27–30.
Thangarajan, R., Natarajan, A. M., & Selvam, M. (2009). Syllable modeling in continuous speech recognition for Tamil language. International Journal of Speech Technology, 12(1), 47–57.
Article Google Scholar
Vyas, M. (2013). A Gaussian mixture model based speech recognition system using MATLAB. Signal & Image Processing: An International Journal (SIPIJ), 4, 109–118.
Google Scholar

Download references

Acknowledgements

The authors would like to thank the anonymous reviewers for all their valuable comments and suggestions.

Author information

Authors and Affiliations

Department of Electronics and Communication Engineering, Bannari Amman Institute of Technology, Sathyamangalam, Tamil Nadu, 638401, India
M. Kalamani & R. S. Valarmathi
Department of Computer Science and Engineering, Bannari Amman Institute of Technology, Sathyamangalam, Tamil Nadu, 638401, India
M. Krishnamoorthi

Authors

M. Kalamani
View author publications
You can also search for this author in PubMed Google Scholar
M. Krishnamoorthi
View author publications
You can also search for this author in PubMed Google Scholar
R. S. Valarmathi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to M. Kalamani.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kalamani, M., Krishnamoorthi, M. & Valarmathi, R. Continuous Tamil Speech Recognition technique under non stationary noisy environments. Int J Speech Technol 22, 47–58 (2019). https://doi.org/10.1007/s10772-018-09580-8

Download citation

Received: 02 August 2018
Accepted: 27 November 2018
Published: 30 November 2018
Issue Date: 15 March 2019
DOI: https://doi.org/10.1007/s10772-018-09580-8

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Continuous Tamil Speech Recognition technique under non stationary noisy environments

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Hindi speech recognition in noisy environment using hybrid technique

Hybrid Feature Extraction Technique for Tamil Automatic Speech Recognition System in Noisy Environment

Speech Recognition System Based on OLLO French Corpus by Using MFCCs

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Continuous Tamil Speech Recognition technique under non stationary noisy environments

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Hindi speech recognition in noisy environment using hybrid technique

Hybrid Feature Extraction Technique for Tamil Automatic Speech Recognition System in Noisy Environment

Speech Recognition System Based on OLLO French Corpus by Using MFCCs

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation