Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Enhancements in Continuous Kannada ASR System by Background Noise Elimination

  • Published:
Circuits, Systems, and Signal Processing Aims and scope Submit manuscript

Abstract

In this work, we demonstrate the current advancements assimilated in the earlier developed continuous Kannada automatic speech recognition (ASR) spoken query system (SQS) under uncontrolled environment. The SQS comprises interactive voice response system and ASR models which are developed using Kaldi. A variety of background noises were added to the continuous Kannada speech data while training the ASR system, as it was gathered under a corrupted environment. In the earlier SQS, the background and other types of noises have reduced the accuracy of speech recognition. This can be overcome by developing a robust noise reduction algorithm for degraded speech enhancement. In the enhanced SQS, a background noise reduction module is introduced before the speech feature extraction step. The proposed noise cancellation algorithm is represented by the degraded spectrum of speech in a complex plane which is an amalgamation of clean speech spectrum and noise model vectors. The conducted investigational results reveal that the proposed noise suppression algorithm outperforms the traditional spectral subtraction algorithms and magnitude squared spectrum (MSS) estimators. The outputs of the proposed approach show that there is no audibility of musical noise and other types of noises in enhanced NOIZEUS speech corpora and continuous Kannada speech data. Therefore, the noise suppression algorithm is applied to the degraded continuous Kannada speech data for its enhancement. Using noise suppression algorithm and time delay neural network ASR modelling technique in SQS, there is an improvement of 1.87% in terms of word error rate in comparison with the earlier developed deep neural network - hidden Markov model (DNN-HMM)-based SQS. The online testing of enhanced continuous Kannada SQS is done by the 500 speakers/users of the Karnataka state under a corrupted environment. The source code of algorithms and ASR models used in this work is made publicly available https://sites.google.com/view/thimmarajayadavag/downloads.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  1. F. Albu, N. Dumitriu, L.D. Stanciu, Speech enhancement by spectral subtraction, in Proceedings of International Symposium on Electronics and Telecommunications, pp. 78-83 (1996)

  2. M. Berouti, M. Schwartz, J. Makhoul, Enhancement of speech corrupted by acoustic noise, in IEEE International Conference on Acoustics, Speech, and Signal Processing, pp. 208-211 (1979)

  3. S. Boll, Suppression of acoustic noise in speech using spectral subtraction. IEEE Trans. Acoust. Speech Signal Process. 27(2), 113–120 (1979)

    Article  Google Scholar 

  4. O. Cappé, Elimination of the musical noise phenomenon with the Ephraim, Malah noise suppressor. IEEE Trans. Speech Audio Process. 2(2), 346–349 (1994)

    Article  Google Scholar 

  5. Y. Ephraim, D. Malah, Speech enhancement using a minimum mean-square error short-time spectral amplitude estimator. IEEE Trans. Acoust. Speech Signal Process. 32(6), 1109–1121 (1984)

    Article  Google Scholar 

  6. N.W. Evans, J.S. Mason, W.M. Liu, B. Fauve, An assessment on the fundamental limitations of spectral subtraction, IEEE International Conference on Acoustics Speech and Signal Processing Proceedings. 1, 145–148 (2006)

  7. H. Gustafsson, S.E. Nordholm, I. Claesson, Spectral subtraction using reduced delay convolution and adaptive averaging. IEEE Trans. Speech Audio Process. 9(8), 799–807 (2001)

    Article  Google Scholar 

  8. H.G. Hirsch, D. Pearce, The AURORA experimental framework for the performance evaluation of speech recognition systems under noise conditions, Automatic speech recognition: challenges for the new Millenium ISCA tutorial and research workshop (2000)

  9. Y. Hu, P.C. Loizou, Evaluation of objective measures for speech enhancement. IEEE Trans. Audio Speech Lang. Process. 16(1), 229–238 (2008). https://doi.org/10.1109/TASL.2007.911054

  10. Y. Hu, P.C. Loizou, Subjective comparison and evaluation of speech enhancement algorithms. Speech Commun. 49, 588–601 (2007)

    Article  Google Scholar 

  11. Y. Hu, P.C. Loizou, Evaluation of objective quality measures for speech enhancement. IEEE Trans. Audio Speech Lang. Process. 16(1), 229–238 (2008)

    Article  Google Scholar 

  12. ITU, Perceptual evaluation of speech quality (PESQ) and objective method for end-to-end speech quality assessment of narrowband telephone networks and speech codecs, ITU-T Recommendation p. 862 (2001)

  13. S. Kamath, P.C. Loizou, A multi-band spectral subtraction method for enhancing speech corrupted by colored noise, in Proceedings IEEE International Conference on Acoustics, Speech, and Signal Process (2002)

  14. N. Kitaoka, S. Nakagawa, Evaluation of spectral subtraction with smoothing of time direction on the AURORA 2 task. In: Seventh International Conference on Spoken Language Processing, ICSLP2002, pp. 477–480. Denver, Colorado, USA (2002)

  15. B. Kumar, Comparative performance evaluation of greedy algorithms for speech enhancement system, Fluctuation and Noise Letters, World Scientific, vol. 20(2) (2020)

  16. S. Kumar, B. Kumar, N. Kumar, Speech enhancement techniques: a review. Rungta Int. J. Electr. Electron. Eng. vol. 1(1), (2016)

  17. P.S. Kumar, T.G. Yadava, H.S. Jayanna, Continuous Kannada speech recognition system under degraded condition. Circuits Syst. Signal Process. 39(1), 391–419 (2019)

    Article  Google Scholar 

  18. H. Liu, L. Zhao, A speaker verification method based on TDNN-LSTMP. Circuits Syst. Signal Process. 38, 4840–4854 (2019)

    Article  Google Scholar 

  19. P. Lockwood, J. Boudy, Experiments with a non-linear spectral subtractor (NSS) hidden Markov models and the projections for robust recognition in cars. Speech Commun. 11, 215–228 (1992)

    Article  Google Scholar 

  20. P.C. Loizou, Speech enhancement based on perceptually motivated Bayesian estimators of the speech magnitude spectrum. IEEE Trans. Speech Audio Process. 13(5), 857–869 (2005)

    Article  Google Scholar 

  21. P.C. Loizou, Speech Enhancement: Theory and Practice (CRC Press, Boca Raton, 2007)

    Book  Google Scholar 

  22. R. Martin, Noise power spectral density estimation based on optimal smoothing and minimum statistics. IEEE Trans. Speech Audio Process. 9(5), 504–512 (2001)

    Article  Google Scholar 

  23. A. Papoulis, S. Pillai, Probability random variables and stochastic processes, 4th edn. (McGraw-Hill Inc, New York, 2002)

    Google Scholar 

  24. D. Povey, A. Ghoshal, G. Boulianne, L. Burget, O. Glembek, N. Goel, M. Hannemann, P. Motlce, Y. Qian, P. Schwarz, J. Silovsky, G. Stemmer, The Kaldi Vesely, speech recognition toolkit, in Proceedings IEEE, Workshop on Automatic Speech Recognition and Understanding (US, Hilton Waikoloa Village, Big Island, Hawaii), p. 2011 (2011)

  25. L.R. Rabiner, Applications of voice processing to telecommunications. Proc. IEEE 82, 199–228 (1994)

    Article  Google Scholar 

  26. S. Shahnawazuddin, K.T. Deepak, B.D. Sarma, A. Deka, S.R.M. Prasanna, S. Rohit, Low complexity on-line adaptation techniques in context of Assamese spoken query system. J. Signal Process. Syst. 81(1), 83–97 (2015)

    Article  Google Scholar 

  27. S. Shahnawazuddin, K.T. Deepak, D. Abhishek, I. Siddika, S.R.M. Prasanna, S. Rohit, Improvements in IITG Assamese spoken query system: background noise suppression and alternate acoustic modeling. J. Signal Process. Syst. 88(1), 91–102 (2017)

    Article  Google Scholar 

  28. N. Virag, Single channel speech enhancement based on masking properties of the human auditory system. IEEE Trans. Speech Audio Process. 7(3), 126–137 (1999)

    Article  Google Scholar 

  29. M.R. Weiss, E. Aschkenasy, T.W. Parsons, Study and the development of the INTEL technique for improving speech intelligibility, in Technical Report NSC-FR/4023, (Nicolet Scientific Corporation, 1975)

  30. T.G. Yadava, H.S. Jayanna, A spoken query system for the agricultural commodity prices and weather information access in Kannada language. Int. J. Speech Technol. 20(3), 1–10 (2017)

    Article  Google Scholar 

  31. T.G. Yadava, H.S. Jayanna, Speech enhancement by combining spectral subtraction and minimum mean square error-spectrum power estimator based on zero crossing. Int. J. Speech Technol. 22(3), 639–648 (2018)

    Article  Google Scholar 

  32. T.G. Yadava, H.S. Jayanna, Enhancements in automatic Kannada speech recognition system by background noise elimination and alternate acoustic modelling. Int. J. Speech Technol. 23(1), 149–167 (2020)

    Article  Google Scholar 

  33. T.G. Yadava, B.G. Nagaraja, H.S. Jayanna, Speech enhancement and encoding by combining SS-VAD and LPC. Int. J. Speech Technol. 24, 165–172 (2021)

    Article  Google Scholar 

  34. L. Yang, P.C. Loizou, A geometric approach to spectral subtraction. Speech Commun. 50(6), 453–466 (2008)

  35. N.B. Yoma, F.R. McInnes, M.A. Jack, Improving performance of spectral subtraction in speech recognition using a model for additive noise. IEEE Trans. Speech Audio Process. 6(6), 579–582 (1998). https://doi.org/10.1109/89.725325

Download references

Author information

Authors and Affiliations

Authors

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yadava, G.T., Nagaraja, B.G. & Jayanna, H.S. Enhancements in Continuous Kannada ASR System by Background Noise Elimination. Circuits Syst Signal Process 41, 4041–4067 (2022). https://doi.org/10.1007/s00034-022-01973-0

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00034-022-01973-0

Keywords