Enhancements in Continuous Kannada ASR System by Background Noise Elimination

Yadava, G. Thimmaraja; Nagaraja, B. G.; Jayanna, H. S.

doi:10.1007/s00034-022-01973-0

Enhancements in Continuous Kannada ASR System by Background Noise Elimination

Published: 16 February 2022

Volume 41, pages 4041–4067, (2022)
Cite this article

Circuits, Systems, and Signal Processing Aims and scope Submit manuscript

G. Thimmaraja Yadava ORCID: orcid.org/0000-0002-3266-9732¹,
B. G. Nagaraja²^na1 &
H. S. Jayanna³^na1

332 Accesses
1 Altmetric
Explore all metrics

Abstract

In this work, we demonstrate the current advancements assimilated in the earlier developed continuous Kannada automatic speech recognition (ASR) spoken query system (SQS) under uncontrolled environment. The SQS comprises interactive voice response system and ASR models which are developed using Kaldi. A variety of background noises were added to the continuous Kannada speech data while training the ASR system, as it was gathered under a corrupted environment. In the earlier SQS, the background and other types of noises have reduced the accuracy of speech recognition. This can be overcome by developing a robust noise reduction algorithm for degraded speech enhancement. In the enhanced SQS, a background noise reduction module is introduced before the speech feature extraction step. The proposed noise cancellation algorithm is represented by the degraded spectrum of speech in a complex plane which is an amalgamation of clean speech spectrum and noise model vectors. The conducted investigational results reveal that the proposed noise suppression algorithm outperforms the traditional spectral subtraction algorithms and magnitude squared spectrum (MSS) estimators. The outputs of the proposed approach show that there is no audibility of musical noise and other types of noises in enhanced NOIZEUS speech corpora and continuous Kannada speech data. Therefore, the noise suppression algorithm is applied to the degraded continuous Kannada speech data for its enhancement. Using noise suppression algorithm and time delay neural network ASR modelling technique in SQS, there is an improvement of 1.87% in terms of word error rate in comparison with the earlier developed deep neural network - hidden Markov model (DNN-HMM)-based SQS. The online testing of enhanced continuous Kannada SQS is done by the 500 speakers/users of the Karnataka state under a corrupted environment. The source code of algorithms and ASR models used in this work is made publicly available https://sites.google.com/view/thimmarajayadavag/downloads.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 6

Amalgamation of noise elimination and TDNN acoustic modelling techniques for the advancements in continuous Kannada ASR system

Article 29 July 2023

Enhancements in automatic Kannada speech recognition system by background noise elimination and alternate acoustic modelling

Article 22 January 2020

Noise Robust E2E Continuous Kannada ASR System Under Real Time Conditions

Article 20 February 2025

References

F. Albu, N. Dumitriu, L.D. Stanciu, Speech enhancement by spectral subtraction, in Proceedings of International Symposium on Electronics and Telecommunications, pp. 78-83 (1996)
M. Berouti, M. Schwartz, J. Makhoul, Enhancement of speech corrupted by acoustic noise, in IEEE International Conference on Acoustics, Speech, and Signal Processing, pp. 208-211 (1979)
S. Boll, Suppression of acoustic noise in speech using spectral subtraction. IEEE Trans. Acoust. Speech Signal Process. 27(2), 113–120 (1979)
Article Google Scholar
O. Cappé, Elimination of the musical noise phenomenon with the Ephraim, Malah noise suppressor. IEEE Trans. Speech Audio Process. 2(2), 346–349 (1994)
Article Google Scholar
Y. Ephraim, D. Malah, Speech enhancement using a minimum mean-square error short-time spectral amplitude estimator. IEEE Trans. Acoust. Speech Signal Process. 32(6), 1109–1121 (1984)
Article Google Scholar
N.W. Evans, J.S. Mason, W.M. Liu, B. Fauve, An assessment on the fundamental limitations of spectral subtraction, IEEE International Conference on Acoustics Speech and Signal Processing Proceedings. 1, 145–148 (2006)
H. Gustafsson, S.E. Nordholm, I. Claesson, Spectral subtraction using reduced delay convolution and adaptive averaging. IEEE Trans. Speech Audio Process. 9(8), 799–807 (2001)
Article Google Scholar
H.G. Hirsch, D. Pearce, The AURORA experimental framework for the performance evaluation of speech recognition systems under noise conditions, Automatic speech recognition: challenges for the new Millenium ISCA tutorial and research workshop (2000)
Y. Hu, P.C. Loizou, Evaluation of objective measures for speech enhancement. IEEE Trans. Audio Speech Lang. Process. 16(1), 229–238 (2008). https://doi.org/10.1109/TASL.2007.911054
Y. Hu, P.C. Loizou, Subjective comparison and evaluation of speech enhancement algorithms. Speech Commun. 49, 588–601 (2007)
Article Google Scholar
Y. Hu, P.C. Loizou, Evaluation of objective quality measures for speech enhancement. IEEE Trans. Audio Speech Lang. Process. 16(1), 229–238 (2008)
Article Google Scholar
ITU, Perceptual evaluation of speech quality (PESQ) and objective method for end-to-end speech quality assessment of narrowband telephone networks and speech codecs, ITU-T Recommendation p. 862 (2001)
S. Kamath, P.C. Loizou, A multi-band spectral subtraction method for enhancing speech corrupted by colored noise, in Proceedings IEEE International Conference on Acoustics, Speech, and Signal Process (2002)
N. Kitaoka, S. Nakagawa, Evaluation of spectral subtraction with smoothing of time direction on the AURORA 2 task. In: Seventh International Conference on Spoken Language Processing, ICSLP2002, pp. 477–480. Denver, Colorado, USA (2002)
B. Kumar, Comparative performance evaluation of greedy algorithms for speech enhancement system, Fluctuation and Noise Letters, World Scientific, vol. 20(2) (2020)
S. Kumar, B. Kumar, N. Kumar, Speech enhancement techniques: a review. Rungta Int. J. Electr. Electron. Eng. vol. 1(1), (2016)
P.S. Kumar, T.G. Yadava, H.S. Jayanna, Continuous Kannada speech recognition system under degraded condition. Circuits Syst. Signal Process. 39(1), 391–419 (2019)
Article Google Scholar
H. Liu, L. Zhao, A speaker verification method based on TDNN-LSTMP. Circuits Syst. Signal Process. 38, 4840–4854 (2019)
Article Google Scholar
P. Lockwood, J. Boudy, Experiments with a non-linear spectral subtractor (NSS) hidden Markov models and the projections for robust recognition in cars. Speech Commun. 11, 215–228 (1992)
Article Google Scholar
P.C. Loizou, Speech enhancement based on perceptually motivated Bayesian estimators of the speech magnitude spectrum. IEEE Trans. Speech Audio Process. 13(5), 857–869 (2005)
Article Google Scholar
P.C. Loizou, Speech Enhancement: Theory and Practice (CRC Press, Boca Raton, 2007)
Book Google Scholar
R. Martin, Noise power spectral density estimation based on optimal smoothing and minimum statistics. IEEE Trans. Speech Audio Process. 9(5), 504–512 (2001)
Article Google Scholar
A. Papoulis, S. Pillai, Probability random variables and stochastic processes, 4th edn. (McGraw-Hill Inc, New York, 2002)
Google Scholar
D. Povey, A. Ghoshal, G. Boulianne, L. Burget, O. Glembek, N. Goel, M. Hannemann, P. Motlce, Y. Qian, P. Schwarz, J. Silovsky, G. Stemmer, The Kaldi Vesely, speech recognition toolkit, in Proceedings IEEE, Workshop on Automatic Speech Recognition and Understanding (US, Hilton Waikoloa Village, Big Island, Hawaii), p. 2011 (2011)
L.R. Rabiner, Applications of voice processing to telecommunications. Proc. IEEE 82, 199–228 (1994)
Article Google Scholar
S. Shahnawazuddin, K.T. Deepak, B.D. Sarma, A. Deka, S.R.M. Prasanna, S. Rohit, Low complexity on-line adaptation techniques in context of Assamese spoken query system. J. Signal Process. Syst. 81(1), 83–97 (2015)
Article Google Scholar
S. Shahnawazuddin, K.T. Deepak, D. Abhishek, I. Siddika, S.R.M. Prasanna, S. Rohit, Improvements in IITG Assamese spoken query system: background noise suppression and alternate acoustic modeling. J. Signal Process. Syst. 88(1), 91–102 (2017)
Article Google Scholar
N. Virag, Single channel speech enhancement based on masking properties of the human auditory system. IEEE Trans. Speech Audio Process. 7(3), 126–137 (1999)
Article Google Scholar
M.R. Weiss, E. Aschkenasy, T.W. Parsons, Study and the development of the INTEL technique for improving speech intelligibility, in Technical Report NSC-FR/4023, (Nicolet Scientific Corporation, 1975)
T.G. Yadava, H.S. Jayanna, A spoken query system for the agricultural commodity prices and weather information access in Kannada language. Int. J. Speech Technol. 20(3), 1–10 (2017)
Article Google Scholar
T.G. Yadava, H.S. Jayanna, Speech enhancement by combining spectral subtraction and minimum mean square error-spectrum power estimator based on zero crossing. Int. J. Speech Technol. 22(3), 639–648 (2018)
Article Google Scholar
T.G. Yadava, H.S. Jayanna, Enhancements in automatic Kannada speech recognition system by background noise elimination and alternate acoustic modelling. Int. J. Speech Technol. 23(1), 149–167 (2020)
Article Google Scholar
T.G. Yadava, B.G. Nagaraja, H.S. Jayanna, Speech enhancement and encoding by combining SS-VAD and LPC. Int. J. Speech Technol. 24, 165–172 (2021)
Article Google Scholar
L. Yang, P.C. Loizou, A geometric approach to spectral subtraction. Speech Commun. 50(6), 453–466 (2008)
N.B. Yoma, F.R. McInnes, M.A. Jack, Improving performance of spectral subtraction in speech recognition using a model for additive noise. IEEE Trans. Speech Audio Process. 6(6), 579–582 (1998). https://doi.org/10.1109/89.725325

Download references

Author information

B. G. Nagaraja and H. S. Jayanna have contributed equally.

Authors and Affiliations

E&CE, Nitte Meenakshi Institute of Technology, Yelahanka, Bengaluru, Karnataka, 560064, India
G. Thimmaraja Yadava
E&CE, K.L.E. Institute of Technology, Opposite to Airport, Gokul Road, Hubballi, Karnataka, 580027, India
B. G. Nagaraja
IS&E, Siddaganga Institute of Technology, B. H. Road, Tumkur, Karnataka, 572103, India
H. S. Jayanna

Authors

G. Thimmaraja Yadava
View author publications
You can also search for this author in PubMed Google Scholar
B. G. Nagaraja
View author publications
You can also search for this author in PubMed Google Scholar
H. S. Jayanna
View author publications
You can also search for this author in PubMed Google Scholar

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yadava, G.T., Nagaraja, B.G. & Jayanna, H.S. Enhancements in Continuous Kannada ASR System by Background Noise Elimination. Circuits Syst Signal Process 41, 4041–4067 (2022). https://doi.org/10.1007/s00034-022-01973-0

Download citation

Received: 22 October 2020
Revised: 15 January 2022
Accepted: 18 January 2022
Published: 16 February 2022
Issue Date: July 2022
DOI: https://doi.org/10.1007/s00034-022-01973-0

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Enhancements in Continuous Kannada ASR System by Background Noise Elimination

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Amalgamation of noise elimination and TDNN acoustic modelling techniques for the advancements in continuous Kannada ASR system

Enhancements in automatic Kannada speech recognition system by background noise elimination and alternate acoustic modelling

Noise Robust E2E Continuous Kannada ASR System Under Real Time Conditions

References

Author information

Authors and Affiliations

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now