Abstract
In the last few years, the need for Continuous Speech Recognition system in Tamil language has been increased widely. In this research work, efficient Continuous Tamil Speech Recognition (CTSR) technique is proposed under non stationary noisy environments. This research work consists of two stages such as speech enhancement and modelling phase. In this, the modified Modulation Magnitude Estimation based Spectral Subtraction with Chi-Square Distribution based Noise Estimation (SS–NE) algorithm is proposed to enhance the noisy Tamil speech signal under various non-stationary noise environments. In order to extract the speech segments from the continuous speech, further the enhanced speech signal is segmented through the combination of short-time signal energy and spectral centroid features of the signal. In this work, 26 mel frequency cepstral coefficients per frame are found as optimal values and they are considered as acoustic feature vectors for each frame. In this research work, the Fuzzy C-Means (FCM) clustering is used in order to cluster the extracted feature vectors into discrete symbols. From the evaluation results, it is found that the optimal number of clusters ‘C’ as 5. Finally, Tamil speech from various speakers is recognized using Expectation Maximization Gaussian Mixture Model (EM-GMM) with 16 component densities under continuous measurements of labelled features from FCM clustering techniques in order to reduce the word error rate. From the simulated results, it is observed that the proposed FCM with EM-GMM model for CTSR improves the recognition accuracy from 1.2 to 4.4% when compared to the existing algorithms under different noisy environments by reducing the WER from 1.6 to 5.47%.
Similar content being viewed by others
References
Al-Alaoui, M. A., Al-Kanj, L., Azar, J., & Yaacoub, E. (2008). Speech recognition using artificial neural networks and hidden Markov models. IEEE Multidisciplinary Engineering Education Magazine, 3(3), 77–86.
Atlas, L., Li, Q., & Thompson, J. (2004). Homomorphic modulation spectra. In Proceedings of the IEEE international conference on acoustics, speech, and signal processing, Vol. 2, pp. ii761–ii764.
Benesty, J., & Huang, Y. (2003). Adaptive signal processing: Applications to real-world problems. Berlin: Springer.
Boll, S. (1979). Suppression of acoustic noise in speech using spectral subtraction. IEEE Transactions on Acoustics, Speech and Signal Processing, 27(2), 113–120.
Cappé, O. (1994). Elimination of the musical noise phenomenon with the Ephraim and Malah noise suppressor. IEEE Transactions on Speech and Audio Processing, 2(2), 345–349.
Chattopadhyay, S., Pratihar, D. K., & Sarkar, S. C. D. (2011). A comparative study of fuzzy C-means algorithm and entropy-based fuzzy clustering algorithms. Computing and Informatics, 30(4), 701–720.
Chi, H. F., Gao, S. X., Soli, S. D., & Alwan, A. (2003). Band-limited feedback cancellation with a modified filtered-X LMS algorithm for hearing aids. Speech Communication, 39(1), 147–161.
Cohen, I. (2003). Noise spectrum estimation in adverse environments: Improved minima controlled recursive averaging. IEEE Transactions on Speech Audio Processing, 11(5), 466–475.
Cohen, I. (2004). Speech enhancement using a non-causal a priori SNR estimator. IEEE Signal Processing Letters, 11(9), 725–728.
Cohen, I. (2005). Speech enhancement using super Gaussian speech models and non causal a priori SNR estimation. Speech Communication, 47(3), 336–350.
Cohen, I., & Berdugo, B. (2002). Noise Estimation by minima controlled recursive averaging for robust speech enhancement. IEEE Signal Processing Letters, 9(1), 12–15.
Cornelis, B., Moonen, M., & Wouters, J. (2011). Performance analysis of multichannel Wiener Filter-based noise reduction in hearing aids under second order statistics estimation errors. IEEE Transactions on Audio, Speech and Language Processing, 19(5), 1368–1381.
Davis, S., & Mermelstein, P. (1980). Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions on Acoustics, Speech and Signal Processing, 28(4), 357–366.
Ephraim, Y., & Malah, D. (1984). Speech enhancement using a minimum mean square error short-time spectral amplitude estimator. IEEE Transactions on Acoustics, Speech and Signal Processing, 32(6), 1109–1121.
Erkelens, J., Jensen, J., & Heusdens, R. (2007). A data driven approach to optimized spectral speech enhancement methods for various error criteria. Speech Communication, 49(7), 530–541.
Erkelens, J. S., & Heusdens, R. (2008). Tracking of non-stationary noise based on data-driven recursive noise power estimation. IEEE Transactions on Audio, Speech, and Language Processing, 16(6), 1112–1123.
Gerkmann, T., & Hendriks, R. C. (2011). Noise power estimation based on the probability of speech presence. In Proceedings of the IEEE workshop on applications of signal processing to audio and acoustics, pp. 145–148.
Ghanbari, Y., Karami, M., & Amelifard, B. (2004). Improved Multiband Spectral subtraction method for speech enhancement. In Proceedings of the sixth IASTED international conference on signal and image processing, pp. 225–230.
Haykin, S., & Widrow, B. (2003). Least-mean-square adaptive filters. New York: Wiley.
Hellgren, J. (2002). Analysis of feedback cancellation in hearing aids with filtered-X LMS and the direct method of closed loop identification. IEEE Transactions on Speech and Audio Processing, 10(2), 119–131.
Hermansky, H. (1990). Perceptual linear predictive (PLP) analysis for speech. Journal of Acoustic Society of America, 87(4), 1738–1752.
Hermansky, H., & Morgan, N. (1994). RASTA processing of speech. IEEE Transactions on Speech and Audio Processing, 2(4), 578–589.
Hossan, M. A., Memon, S., & Gregory, M. A. (2010). A novel approach for MFCC feature extraction. In Proceedings of the IEEE fourth international conference on signal processing and communication systems, pp. 1–5.
Huang, H. C., & Lee, J. (2012). A new variable step-size NLMS algorithm and its performance analysis. IEEE Transactions on Signal Processing, 60(4), 2055–2060.
Kalamani, M., Valarmathy, S., & Krishnamoorthi, M. (2014a). Speech enhancement using modified magnitude estimation- based spectral subtraction algorithm. Arabian Journal for Sciences and Engineering, 39(32), 8965–8978.
Kalamani, M., Valarmathy, S., & Krishnamoorthi, M. (2014b). Adaptive noise reduction, algorithm for speech enhancement. World Academy of Science, Engineering and Technology - International Journal of Computer, Information, Systems and Control Engineering, 8(6), 987–994.
Kalamani, M., Valarmathy, S., & Krishnamoorthi, M. (2014c). Hybrid modeling algorithm for Continuous Tamil Speech Recognition. World Academy of Science, Engineering and Technology - International Journal of Computer, Information, Systems and Control Engineering, 8(12), 1927–1934.
Kalamani, M., Valarmathy, S., & Krishnamoorthi, M. (2015). Noise tracking algorithm for speech enhancement. Applied Mathematics and Information Sciences, 9(2), 691–698.
Kamath, S., & Loizou, P. (2002). A multi-band spectral subtraction method for enhancing speech corrupted by colored noise. In Proceedings of the IEEE international conference on acoustics, speech, and signal processing, pp. 4164–4167.
Kesarkar, M. P. (2003). Feature extraction for speech recognition. Technical Credit Seminar Report, Electronic Systems Group, IIT Bombay.
Li, X. G., Yao, M. F., & Huang, W. T. (2011). Speech recognition based on k-means clustering and neural network ensembles. In Proceedings of the IEEE seventh international conference on natural computation, Vol. 2, pp. 614–617.
Loizou, P. C. (2005). Speech enhancement based on perceptually motivated Bayesian estimators of the magnitude spectrum. IEEE Transactions on Speech and Audio Processing, 13(5), 857–869.
Martin, R. (2001). Noise power spectral density estimation based on optimal smoothing and minimum statistics. IEEE Transactions on Speech Audio Processing, 9(5), 504–512.
Martin, R. (2005). Speech enhancement based on minimum mean-square error estimation and super-Gaussian priors. IEEE Transactions on Speech and Audio Processing, 3(5), 845–856.
Mohammed, J. R., & Shafi, M. S. (2012). An efficient adaptive noise cancellation scheme using ALE and NLMS filters. International Journal of Electrical and Computer Engineering, 2(3), 325–332.
Paliwal, K., Schwerin, B., & Wójcicki, K. (2012). Speech enhancement using a minimum mean square error short time spectral modulation magnitude estimator. Speech Communication, 54(2), 282–305.
Paliwal, K., Wójcicki, K., & Schwerin, B. (2010). Single-channel speech enhancement using spectral subtraction in the short-time modulation domain. Speech Communication, 52(5), 450–475.
Porter, J., & Boll, S. (1984). Optimal estimators for spectral restoration of noisy speech. In Proceedings of the IEEE international conference on acoustics, speech, and signal processing, Vol. 9, pp. 53–56.
Rabiner, L. (1989). A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, 77(2), 257–286.
Rabiner, L. R., & Juang, B. H. (1993). Fundamentals of speech recognition. Englewood Cliffs: Prentice Hall.
Rabiner, L. R. & Sambur, M. R. (1975). An algorithm for determining the endpoints of isolated utterances. The Bell System Technical Journal, 54(2), 297–315.
Rahman, M. M., & Bhuiyan, M. A. A. (2012). Continuous Bangla speech segmentation using short-term speech features extraction approaches. International Journal of Advanced Computer Science and Applications, 3(11), 131–138.
Rahman, M. Z. U., Shaik, R. A., & Reddy, D. V. (2009). Adaptive noise removal in the ECG using the block LMS algorithm. In Proceedings of the second IEEE international conference on adaptive science and technology, pp. 380–383.
Rangachari, S. (2004). Noise estimation algorithms for highly non-stationary environments. MS thesis, University of Texas, Dallas.
Rangachari, S., & Loizou, P. C. (2006). A noise-estimation algorithm for highly non-stationary environments. Speech Communication, 48(2), 220–231.
Rangachari, S., Loizou, P. C., & Hu, Y. (2004). A noise estimation algorithm with rapid adaptation for highly non-stationary environments. Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, 1, I-305–308.
Reynolds, D. A., & Rose, R. C. (1995). Robust text-independent speaker identification using Gaussian mixture speaker models. IEEE Transactions on Speech and Audio Processing, 3(1), 72–83.
Scalart, P. (1996). Speech enhancement based on a priori signal to noise estimation. In Proceedings of the IEEE international conference on acoustics, speech, and signal processing, Vol. 2, pp. 629–632.
Sunny, S., David, P. S., & Jacob, K. P. (2012). Feature extraction methods based on linear predictive coding and wavelet packet decomposition for recognizing spoken words in Malayalam. In Proceedings of the IEEE international conference on advances in computing and communications, pp. 27–30.
Thangarajan, R., Natarajan, A. M., & Selvam, M. (2009). Syllable modeling in continuous speech recognition for Tamil language. International Journal of Speech Technology, 12(1), 47–57.
Vyas, M. (2013). A Gaussian mixture model based speech recognition system using MATLAB. Signal & Image Processing: An International Journal (SIPIJ), 4, 109–118.
Acknowledgements
The authors would like to thank the anonymous reviewers for all their valuable comments and suggestions.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Kalamani, M., Krishnamoorthi, M. & Valarmathi, R. Continuous Tamil Speech Recognition technique under non stationary noisy environments. Int J Speech Technol 22, 47–58 (2019). https://doi.org/10.1007/s10772-018-09580-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10772-018-09580-8