Speech dereverberation is a signal processing technique of key importance for successful hands-free speech acquisition in applications of telecommunications and automatic speech recognition. Over the last few years, speech dereverberation has become a hot research topic driven by consumer demand, the availability of terminals based on Skype which encourage hands-free operation and the development of promising signal processing algorithms. Speech Dereverberation gathers together an overview, a mathematical formulation of the problem and the state-of-the-art solutions for dereverberation. Speech Dereverberation presents the most important current approaches to the problem of reverberation. It begins by providing a focused and digestible review of the relevant topics in room acoustics and also describes key performance measures for dereverberation. The algorithms are then explained together with relevant mathematical analysis and supporting examples that enable the reader to see the relative strengths and weaknesses of the various techniques, as well as giving a clear understanding of the open questions still to be addressed in this topic. Techniques rooted in speech enhancement are included, in addition to a substantial treatment of multichannel blind acoustic system identification and inversion. The TRINICON framework is shown in the context of dereverberation to be a powerful generalization of the signal processing for a important range of analysis and enhancement techniques. Speech Dereverberation offers the reader an overview of the subject area, as well as an in-depth text on the advanced signal processing involved. The book benefits the reader by providing such a wealth of information in one place, defines the current state of the art and, lastly, encourages further work on this topic by offering open research questions to exercise the curiosity of the reader. It is suitable for students at masters and doctoral level, as well as established researchers.
Cited By
- Gul S, Khan M and Shah S (2022). Preserving the beamforming effect for spatial cue-based pseudo-binaural dereverberation of a single source, Computer Speech and Language, 77:C, Online publication date: 1-Jan-2023.
- Bross A and Gannot S (2023). Training-Based Multiple Source Tracking Using Manifold-Learning and Recursive Expectation-Maximization, IEEE/ACM Transactions on Audio, Speech and Language Processing, 31, (1124-1140), Online publication date: 1-Jan-2023.
- Jälmby M, Elvander F and Waterschoot T (2023). Low-Rank Room Impulse Response Estimation, IEEE/ACM Transactions on Audio, Speech and Language Processing, 31, (957-969), Online publication date: 1-Jan-2023.
- Yang F (2021). Analysis of Deficient-Length Partitioned-Block Frequency-Domain Adaptive Filters, IEEE/ACM Transactions on Audio, Speech and Language Processing, 30, (456-467), Online publication date: 1-Jan-2022.
- Xue W, Moore A, Brookes M and Naylor P (2020). Speech Enhancement Based on Modulation-Domain Parametric Multichannel Kalman Filtering, IEEE/ACM Transactions on Audio, Speech and Language Processing, 29, (393-405), Online publication date: 1-Jan-2021.
- Hogg A, Evers C, Moore A and Naylor P (2021). Overlapping Speaker Segmentation Using Multiple Hypothesis Tracking of Fundamental Frequency, IEEE/ACM Transactions on Audio, Speech and Language Processing, 29, (1479-1490), Online publication date: 1-Jan-2021.
- Huang G, Chen J and Benesty J (2019). Design of Planar Differential Microphone Arrays With Fractional Orders, IEEE/ACM Transactions on Audio, Speech and Language Processing, 28, (116-130), Online publication date: 1-Jan-2020.
- Carbajal G, Serizel R, Vincent E and Humbert E (2020). Joint NN-Supported Multichannel Reduction of Acoustic Echo, Reverberation and Noise, IEEE/ACM Transactions on Audio, Speech and Language Processing, 28, (2158-2173), Online publication date: 1-Jan-2020.
- Tuna C, Canclini A, Borra F, Götz P, Antonacci F, Walther A, Sarti A and Habets E (2020). 3D Room Geometry Inference Using a Linear Loudspeaker Array and a Single Microphone, IEEE/ACM Transactions on Audio, Speech and Language Processing, 28, (1729-1744), Online publication date: 1-Jan-2020.
- Nahma L, Dam H, Yiu C and Nordholm S (2019). Robust broadband beamformer design for noise reduction and dereverberation, Multidimensional Systems and Signal Processing, 31:1, (135-155), Online publication date: 1-Jan-2020.
- Shojaei S and Haddadi F (2020). Blind three dimensional deconvolution via convex optimization, Multidimensional Systems and Signal Processing, 31:3, (1029-1049), Online publication date: 1-Jul-2020.
- Antonello N, De Sena E, Moonen M, Naylor P and van Waterschoot T (2019). Joint Acoustic Localization and Dereverberation Through Plane Wave Decomposition and Sparse Regularization, IEEE/ACM Transactions on Audio, Speech and Language Processing, 27:12, (1893-1905), Online publication date: 1-Dec-2019.
- Wang X, Cohen I, Chen J and Benesty J (2019). On Robust and High Directive Beamforming With Small-Spacing Microphone Arrays for Scattered Sources, IEEE/ACM Transactions on Audio, Speech and Language Processing, 27:4, (842-852), Online publication date: 1-Apr-2019.
- Seetharaman P, Mysore G, Pardo B, Smaragdis P and Gomes C VoiceAssist Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems, (1-6)
- Dadvar P and Geravanchizadeh M (2019). Robust binaural speech separation in adverse conditions based on deep neural network with modified spatial features and training target, Speech Communication, 108:C, (41-52), Online publication date: 1-Apr-2019.
- Cauchi B, Siedenburg K, Santos J, Falk T, Doclo S and Goetze S (2019). Non-Intrusive Speech Quality Prediction Using Modulation Energies and LSTM-Network, IEEE/ACM Transactions on Audio, Speech and Language Processing, 27:7, (1151-1163), Online publication date: 1-Jul-2019.
- Jannati M and Sayadiyan A (2018). Part-Syllable Transformation-Based Voice Conversion with Very Limited Training Data, Circuits, Systems, and Signal Processing, 37:5, (1935-1957), Online publication date: 1-May-2018.
- Grimm S and Freudenberger J (2018). Wind noise reduction for a closely spaced microphone array in a car environment, EURASIP Journal on Audio, Speech, and Music Processing, 2018:1, (1-9), Online publication date: 1-Dec-2018.
- Dong H and Lee C (2018). Speech intelligibility improvement in noisy reverberant environments based on speech enhancement and inverse filtering, EURASIP Journal on Audio, Speech, and Music Processing, 2018:1, (1-13), Online publication date: 1-Dec-2018.
- Sun X, Zhou Y and Shu X Multi-Channel Linear Prediction Speech Dereverberation Algorithm Based on QR-RLS Adaptive Filter Proceedings of the 3rd International Conference on Multimedia Systems and Signal Processing, (109-113)
- Xia Y and Li S (2018). Identifiability of Multichannel Blind Deconvolution and Nonconvex Regularization Algorithm, IEEE Transactions on Signal Processing, 66:20, (5299-5312), Online publication date: 1-Oct-2018.
- Antonello N, De Sena E, Moonen M, Naylor P and van Waterschoot T Joint Source Localization and Dereverberation by Sound Field Interpolation Using Sparse Regularization 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), (6892-6896)
- Lee W, Wang S, Chen F, Lu X, Chien S and Tsao Y Speech Dereverberation Based on Integrated Deep and Ensemble Learning Algorithm 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), (5454-5458)
- Li Y, Lee K and Bresler Y (2017). Identifiability and Stability in Blind Deconvolution Under Minimal Assumptions, IEEE Transactions on Information Theory, 63:7, (4619-4633), Online publication date: 1-Jul-2017.
- Gannot S, Vincent E, Markovich-Golan S, Ozerov A, Gannot S, Vincent E, Markovich-Golan S and Ozerov A (2017). A Consolidated Perspective on Multimicrophone Speech Enhancement and Source Separation, IEEE/ACM Transactions on Audio, Speech and Language Processing, 25:4, (692-730), Online publication date: 1-Apr-2017.
- Doire C, Brookes M, Naylor P, Hicks C, Betts D, Dmour M, Jensen S, Doire C, Brookes M, Naylor P, Hicks C, Betts D, Dmour M and Jensen S (2017). Single-Channel Online Enhancement of Speech Corrupted by Reverberation and Noise, IEEE/ACM Transactions on Audio, Speech and Language Processing, 25:3, (572-587), Online publication date: 1-Mar-2017.
- Wu B, Li K, Yang M, Lee C, Bo Wu , Kehuang Li , Minglei Yang , Chin-Hui Lee , Wu B, Yang M, Lee C and Li K (2017). A Reverberation-Time-Aware Approach to Speech Dereverberation Based on Deep Neural Networks, IEEE/ACM Transactions on Audio, Speech and Language Processing, 25:1, (102-111), Online publication date: 1-Jan-2017.
- Samarasinghe P, Abhayapala T, Chen H, Samarasinghe P, Abhayapala T and Hanchi Chen (2017). Estimating the Direct-to-Reverberant Energy Ratio Using a Spherical Harmonics-Based Spatial Correlation Model, IEEE/ACM Transactions on Audio, Speech and Language Processing, 25:2, (310-319), Online publication date: 1-Feb-2017.
- Remaggi L, Jackson P, Coleman P, Wang W, Remaggi L, Jackson P, Coleman P and Wenwu Wang (2017). Acoustic Reflector Localization, IEEE/ACM Transactions on Audio, Speech and Language Processing, 25:2, (296-309), Online publication date: 1-Feb-2017.
- Hafezi S, Moore A, Naylor P, Hafezi S, Moore A and Naylor P (2017). Augmented Intensity Vectors for Direction of Arrival Estimation in the Spherical Harmonic Domain, IEEE/ACM Transactions on Audio, Speech and Language Processing, 25:10, (1956-1968), Online publication date: 1-Oct-2017.
- Mohammadiha N and Doclo S (2016). Speech dereverberation using non-negative convolutive transfer function and spectro-temporal modeling, IEEE/ACM Transactions on Audio, Speech and Language Processing, 24:2, (276-289), Online publication date: 1-Feb-2016.
- Zhao H, Chen Y, Wang R and Malik H (2016). Anti-Forensics of Environmental-Signature-Based Audio Splicing Detection and Its Countermeasure via Rich-Features Classification, IEEE Transactions on Information Forensics and Security, 11:7, (1603-1617), Online publication date: 1-Jul-2016.
- Parada P, Sharma D, Lainez J, Barreda D, van Waterschoot T and Naylor P (2016). A single-channel non-intrusive C50 estimator correlated with speech recognition performance, IEEE/ACM Transactions on Audio, Speech and Language Processing, 24:4, (719-732), Online publication date: 1-Apr-2016.
- Kodrasi I and Doclo S (2016). Joint dereverberation and noise reduction based on acoustic multi-channel equalization, IEEE/ACM Transactions on Audio, Speech and Language Processing, 24:4, (680-693), Online publication date: 1-Apr-2016.
- Eaton J, Gaubitch N, Moore A, Naylor P, Eaton J, Gaubitch N, Moore A, Naylor P, Gaubitch N, Eaton J, Naylor P and Moore A (2016). Estimation of Room Acoustic Parameters, IEEE/ACM Transactions on Audio, Speech and Language Processing, 24:10, (1681-1693), Online publication date: 1-Oct-2016.
- Leglaive S, Badeau R, Richard G, Leglaive S, Badeau R and Richard G (2016). Multichannel Audio Source Separation With Probabilistic Reverberation Priors, IEEE/ACM Transactions on Audio, Speech and Language Processing, 24:12, (2453-2465), Online publication date: 1-Dec-2016.
- Braun S and P. Habets E (2015). A multichannel diffuse power estimator for dereverberation in the presence of multiple sources, EURASIP Journal on Audio, Speech, and Music Processing, 2015:1, (1-14), Online publication date: 1-Dec-2015.
- Thiergart O, Taseska M and Habets E (2014). An informed parametric spatial filter based on instantaneous direction-of-arrival estimates, IEEE/ACM Transactions on Audio, Speech and Language Processing, 22:12, (2182-2196), Online publication date: 1-Dec-2014.
- Rotili R, Principi E, Wöllmer M, Squartini S and Schuller B Conversational speech recognition in non-stationary reverberated environments Proceedings of the 2011 international conference on Cognitive Behavioural Systems, (50-59)
- Rotili R, Principi E, Squartini S and Schuller B A real-time speech enhancement framework for multi-party meetings Proceedings of the 5th international conference on Advances in nonlinear speech processing, (80-87)
- Javed H, Moore A and Naylor P Spherical microphone array acoustic rake receivers 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), (111-115)
- Schwartz O, Gannot S and Habets E Joint maximum likelihood estimation of late reverberant and speech power spectral density in noisy environments 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), (151-155)
- Kodrasi I, Jukić A and Doclo S Robust sparsity-promoting acoustic multi-channel equalization for speech dereverberation 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), (166-170)
- Parchami M, Zhu W and Champagne B Speech dereverberation using linear prediction with estimation of early speech spectral variance 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), (504-508)
- Cauchi B, Javed H, Gerkmann T, Doclo S, Goetze S and Naylor P Perceptual and instrumental evaluation of the perceived level of reverberation 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), (629-633)
- Germain F, Mysore G and Fujioka T Equalization matching of speech recordings in real-world environments 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), (609-613)
- Baghaki A, Ahmad M and Swamy M A new two-stage method for single-microphone speech dereverberation 2016 IEEE International Symposium on Circuits and Systems (ISCAS), (778-781)
- Dat T, Dennis J, Ren L and Terence N A comparative study of multi-channel processing methods for noisy automatic speech recognition in urban environments 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), (6465-6469)
Recommendations
Joint Dereverberation and Residual Echo Suppression of Speech Signals in Noisy Environments
Hands-free devices are often used in a noisy and reverberant environment. Therefore, the received microphone signal does not only contain the desired near-end speech signal but also interferences such as room reverberation that is caused by the near-end ...
Automatic speech recognition performance in different room acoustic environments with and without dereverberation preprocessing
The performance of recent dereverberation methods for reverberant speech preprocessing prior to Automatic Speech Recognition (ASR) is compared for an extensive range of room and source-receiver configurations. It is shown that room acoustic parameters ...