Speech Enhancement Algorithm Based on Microphone Array and Lightweight CRN for Hearing Aid
Abstract
:1. Introduction
2. Algorithm Fundamentals
2.1. The Overall Structure of the Model
2.2. Beamforming Module
2.2.1. Impact of Feature Fusion Module on Model Performance
2.2.2. Complex Time-Frequency Long and Short-Term Memory Networks
2.2.3. CRN-Based Beamforming Module
2.3. Postfilter Module
2.4. Inter-Module Mask Module
3. Experimental Setup
3.1. Database
3.1.1. Hearing Aid Experimental Data
3.1.2. CHIME-3 Experimental Data
3.2. Loss Function
3.3. Model Parameter Setting
4. Experimental Results and Analysis
4.1. Performance Comparison Experiments of Different Modules
4.2. Comparison Experiments on the Hearing Aid Dataset
4.2.1. Comparison of Algorithms Under Different Interference Types
4.2.2. Comparison of Algorithms Under Different Noise Types
4.3. Comparison Experiments on the CHIME-3 Dataset
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Bouchard, C.; Havelock, D.I. Beamforming with microphone arrays for directional sources. J. Acoust. Soc. Am. 2009, 125, 2098–2104. [Google Scholar] [CrossRef] [PubMed]
- Priyanka, S.S. A review on adaptive beamforming techniques for speech enhancement. In Proceedings of the 2017 Innovations in Power and Advanced Computing Technologies (i- PACT), Vellore, India, 21–22 April 2017; pp. 1–6. [Google Scholar]
- Capon, J. High-Resolution Frequency-Wavenumber Spectrum Analysis. Proc. IEEE 1969, 57, 1408–1418. [Google Scholar] [CrossRef]
- Frost, O.L. An algorithm for linearly constrained adaptive array processing. Proc. IEEE 1972, 60, 926–935. [Google Scholar] [CrossRef]
- Griffiths, L.; Jim, C. An alternative approach to linearly constrained adaptive beamforming. IEEE Trans. Antennas Propag. 1982, 30, 27–34. [Google Scholar] [CrossRef]
- Heymann, J.; Drude, L.; Haeb-Umbach, R. Neural network based spectral mask estimation for acoustic beamforming. In Proceedings of the 41st IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016, Shanghai, China, 20–25 March 2016; Institute of Electrical and Electronics Engineers Inc.: Piscataway, NJ, USA, 2016. [Google Scholar]
- Chakrabarty, S.; Habets, E.A. Time-Frequency Masking Based Online Multi-Channel Speech Enhancement with Convolutional Recurrent Neural Networks. IEEE J. Sel. Top. Signal Process. 2019, 13, 787–799. [Google Scholar] [CrossRef]
- Wang, Z.-Q.; Wang, D. Combining Spectral and Spatial Features for Deep Learning Based Blind Speaker Separation. IEEE/ACM Trans. Audio Speech Lang. Process. 2019, 27, 457–468. [Google Scholar] [CrossRef]
- Sainath, T.N.; Weiss, R.J.; Wilson, K.W.; Li, B.; Narayanan, A.; Variani, E.; Bacchiani, M.; Shafran, I.; Senior, A.; Chin, K.; et al. Multichannel Signal Processing with Deep Neural Networks for Automatic Speech Recognition. IEEE/ACM Trans. Audio Speech Lang. Process. 2017, 25, 965–979. [Google Scholar] [CrossRef]
- Gu, R.; Zhang, S.-X.; Zou, Y.; Yu, D. Complex Neural Spatial Filter: Enhancing Multi-Channel Target Speech Separation in Complex Domain. IEEE Signal Process. Lett. 2021, 28, 1370–1374. [Google Scholar] [CrossRef]
- Jo, M.J.; Lee, G.W.; Moon, J.M.; Cho, C.; Kim, H.K. Estimation of MVDR beamforming weights based on deep neural network. In Proceedings of the 145th Audio Engineering Society International Convention, AES 2018, New York, NY, USA, 18–21 October 2018; Audio Engineering Society: New York, NY, USA, 2018. [Google Scholar]
- Ochiai, T.; Watanabe, S.; Hori, T.; Hershey, J.R.; Xiao, X. Unified Architecture for Multichannel End-to-End Speech Recognition with Neural Beamforming. IEEE J. Sel. Top. Signal Process. 2017, 11, 1274–1288. [Google Scholar] [CrossRef]
- Luo, Y.; Han, C.; Mesgarani, N.; Ceolini, E.; Liu, S.-C. FaSNet: Low-Latency Adaptive Beamforming for Multi-Microphone Audio Processing. In Proceedings of the 2019 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2019, Singapore, 15–18 December 2019; Institute of Electrical and Electronics Engineers Inc.: Singapore, 2019. [Google Scholar]
- Zhang, Z.; Yoshioka, T.; Kanda, N.; Chen, Z.; Wang, X.; Wang, D.; Eskimez, S.E. All-neural beamformer for continuous speech separation. In Proceedings of the 47th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022, Virtual, Online, 23–27 May 2022; Institute of Electrical and Electronics Engineers Inc.: Singapore, 2022. [Google Scholar]
- Li, A.; Yu, G.; Zheng, C.; Li, X. TaylorBeamformer: Learning All-Neural Beamformer for Multi-Channel Speech Enhancement from Taylor’s Approximation Theory. In Proceedings of the 23rd Annual Conference of the International Speech Communication Association, INTERSPEECH 2022, Incheon, Republic of Korea, 18–22 September 2022. [Google Scholar]
- Yang, Y.; Quan, C.; Li, X. MCNET: Fuse Multiple Cues for Multichannel Speech Enhancement. In Proceedings of the 48th IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2023, Rhodes Island, Greece, 4–10 June 2023; Institute of Electrical and Electronics Engineers Inc.: Piscataway, NJ, USA, 2023. [Google Scholar]
- Kim, M.; Cheong, S.; Shin, J.W. DNN-based Parameter Estimation for MVDR Beamforming and Post-filtering. In Proceedings of the 24th International Speech Communication Association, Interspeech 2023, Dublin, Ireland, 20–24 August 2023. [Google Scholar]
- Lei, T.; Hou, Z.; Hu, Y.; Yang, W.; Sun, T.; Rong, X.; Wang, D.; Chen, K.; Lu, J. A Low-Latency Hybrid Multi-Channel Speech Enhancement System For Hearing Aids. In Proceedings of the 48th IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2023, Rhodes Island, Greece, 4–10 June 2023; Institute of Electrical and Electronics Engineers Inc.: Piscataway, NJ, USA, 2023. [Google Scholar]
- Gu, R.; Zhang, S.-X.; Yu, M.; Yu, D. 3D Spatial Features for Multi-Channel Target Speech Separation. In Proceedings of the 2021 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2021, Cartagena, Colombia, 13–17 December 2021; Institute of Electrical and Electronics Engineers Inc.: Piscataway, NJ, USA, 2021. [Google Scholar]
- Le, X.; Chen, H.; Chen, K.; Lu, J. DPCRN: Dual-path convolution recurrent network for single channel speech enhancement. In Proceedings of the 22nd Annual Conference of the International Speech Communication Association, INTERSPEECH 2021, Brno, Czech Republic, 30 August–3 September 2021. [Google Scholar]
- Tan, K.; Wang, D. A convolutional recurrent neural network for real-time speech enhancement. In Proceedings of the 19th Annual Conference of the International Speech Communication, INTERSPEECH 2018, Hyderabad, India, 2–6 September 2018. [Google Scholar]
- Akeroyd, M.A.; Bailey, W.; Barker, J.; Cox, T.J.; Culling, J.F.; Graetzer, S.; Naylor, G.; Podwiska, Z.; Tu, Z. The 2nd Clarity Enhancement Challenge for Hearing Aid Speech Intelligibility Enhancement: Overview and Outcomes. In Proceedings of the 48th IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2023, Rhodes Island, Greece, 4–10 June 2023; Institute of Electrical and Electronics Engineers Inc.: Piscataway, NJ, USA, 2023. [Google Scholar]
- Demirahin, I.; Kjartansson, O.; Gutkin, A.; Rivera, C. Opensource Multispeaker Corpora of the English Accents in the British Isles. In Proceedings of the 12th Language Resources and Evaluation Conference (LREC 2020), Marseille, France, 11–16 May 2020. [Google Scholar]
- Fonseca, E.; Pons, J.; Favory, X.; Font, F.; Bogdanov, D.; Ferraro, A.; Oramas, S.; Porter, A.; Serra, X. Freesound datasets: A platform for the creation of open audio datasets. In Proceedings of the 18th International Society for Music Information Retrieval Conference, ISMIR 2017, Suzhou, China, 23–27 October 2017. [Google Scholar]
- Florian, D.; Ernst, S.M.A.; Ewert, S.D.; Birger, K. Adapting hearing devices to the individual ear acoustics: Database and target response correction functions for various device styles. Trends Hear. 2018, 22, 233121651877931. [Google Scholar]
- Schroder, D.; Vorlaridcr, M. RAVEN: A real-time framework for the Auralization of interactive virtual environments. In Proceedings of the 6th Forum Acusticum 2011, Aalborg, Denmark, 27 June–1 July 2011. [Google Scholar]
- Paul, D.B.; Baker, J. The design for the Wall Street Journal-based CSR corpus. In Proceedings of the Speech and Natural Language: Proceedings of a Workshop Held at Harriman, New York, NY, USA, 23–26 February 1992. [Google Scholar]
- Rix, A.W.; Beerends, J.G.; Hollier, M.P.; Hekstra, A.P. Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs. In Proceedings of the 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No. 01CH37221), Salt Lake City, UT, USA, 7–11 May 2001; IEEE: Piscataway, NJ, USA, 2001. [Google Scholar]
- Taal, C.H.; Hendriks, R.C.; Heusdens, R.; Jensen, J. A short-time objective intelligibility measure for time-frequency weighted noisy speech. In Proceedings of the 2010 IEEE International Conference on Acoustics, Speech and Signal Processing, Dallas, TX, USA, 15–19 March 2010; IEEE: Piscataway, NJ, USA, 2010. [Google Scholar]
- Kayser, H.; Herzke, T.; Maanen, P.; Zimmermann, M.; Grimm, G.; Hohmann, V. Open community platform for hearing aid algorithm research: Open Master Hearing Aid (openMHA). SoftwareX 2022, 17, 100953. [Google Scholar] [CrossRef] [PubMed]
- Tolooshams, B.; Giri, R.; Song, A.H.; Isik, U.; Krishnaswamy, A. Channel-Attention Dense U-Net for Multichannel Speech Enhancement. In Proceedings of the 2020 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2020, Barcelona, Spain, 4–8 May 2020; Institute of Electrical and Electronics Engineers Inc.: Piscataway, NJ, USA, 2020. [Google Scholar]
- Lee, D.; Choi, J.-W. DeFT-AN: Dense Frequency-Time Attentive Network for Multichannel Speech Enhancement. IEEE Signal Process. Lett. 2023, 30, 155–159. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions, and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions, or products referred to in the content. |
Layer Name | In_Channels | Out_Channels | Kernel_Size |
---|---|---|---|
CConv2D 1 | 4 | 12 | [3,3] |
CConv2D 2 | 12 | 24 | [3,3] |
CConv2D 3 | 24 | 48 | [3,3] |
CDeConv2D 1 | 96 | 48 | [4,3] |
CDeConv2D 2 | 72 | 36 | [4,3] |
CDeConv2D 3 | 48 | 24 | [3,3] |
Conv2d | 24 | 4 | [3,3] |
Noise Type | PESQ | STOI | ||||
---|---|---|---|---|---|---|
Front Mic | openMHA | Proposed | Front Mic | openMHA | Proposed | |
vacuum | 1.078 | 1.898 | 2.152 | 0.601 | 0.811 | 0.854 |
microwave | 1.201 | 2.051 | 2.21 | 0.641 | 0.835 | 0.904 |
kettle | 1.311 | 1.602 | 2.475 | 0.732 | 0.812 | 0.935 |
fan | 1.251 | 1.852 | 2.22 | 0.643 | 0.773 | 0.86 |
dishwasher | 1.352 | 1.756 | 2.012 | 0.656 | 0.912 | 0.925 |
hairdryer | 1.401 | 1.601 | 2.123 | 0.707 | 0.801 | 0.892 |
washing | 1.256 | 1.984 | 2.182 | 0.688 | 0.765 | 0.921 |
Algorithm | Parameter Size | MAC/s |
---|---|---|
CADUNet | 13.21 M | 35.3 G |
DeFTAN | 2.52 M | 42.7 G |
Proposed | 1.26 M | 13.3 G |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Xi, J.; Xu, Z.; Zhang, W.; Zhao, L.; Xie, Y. Speech Enhancement Algorithm Based on Microphone Array and Lightweight CRN for Hearing Aid. Electronics 2024, 13, 4394. https://doi.org/10.3390/electronics13224394
Xi J, Xu Z, Zhang W, Zhao L, Xie Y. Speech Enhancement Algorithm Based on Microphone Array and Lightweight CRN for Hearing Aid. Electronics. 2024; 13(22):4394. https://doi.org/10.3390/electronics13224394
Chicago/Turabian StyleXi, Ji, Zhe Xu, Weiqi Zhang, Li Zhao, and Yue Xie. 2024. "Speech Enhancement Algorithm Based on Microphone Array and Lightweight CRN for Hearing Aid" Electronics 13, no. 22: 4394. https://doi.org/10.3390/electronics13224394
APA StyleXi, J., Xu, Z., Zhang, W., Zhao, L., & Xie, Y. (2024). Speech Enhancement Algorithm Based on Microphone Array and Lightweight CRN for Hearing Aid. Electronics, 13(22), 4394. https://doi.org/10.3390/electronics13224394