Abstract
For the last few decades, speech enhancement based on microphone arrays has primarily utilized prior information about system models, e.g., array geometry and source location. However, estimation of the time delay to align microphone inputs is largely affected by reverberation and microphone mismatch. Preprocessing time aligning, e.g., fixed beamforming (the first branch of the generalized sidelobe canceller), is not desirable in general applications. Recently, interest has shifted to linear filtering, which works with only second-order statistics of noisy input and estimated noise. This paper proposes a linear filter design based on a multichannel subspace approach for speech enhancement. The contribution of the proposed multichannel subspace methods is threefold. First, a linear filter is applied to the multichannel frequency domain using a spatiospectral correlation matrix. Next, three types of multichannel signal presence probability (MC-SPP) are derived in the subspace domain. Third, incorporating the MC-SPPs into the gain modification of the linear filter achieves further improved noise reduction performance. Of the gain modifications, the proposed gain modification with subspace probability related to the eigenvector corresponding to the maximum eigenvalue realized the best noise reduction performance. The evaluation on average improved the proposed subspace-based methods by approximately 4 dB in overall SNR while maintaining a similar cepstral distance measured over the minimum variance distortionless response with the state-of-the-art relative transfer function estimation in adverse noisy environments.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Affes, S., & Grenier, Y. (1997). A signal subspace tracking algorithm for microphone array processing of speech. IEEE Transactions on Speech and Audio Processing, 5(5), 425–437.
Allen, J., & Berkley, D. (1979). Image method for efficiently simulating small-room acoustics. The Journal of the Acoustical Society of America, 65, 943–950.
Asano, F., Hayamizu, S., Yamada, T., & Nakamura, S. (2000). Speech enhancement based on the subspace method. IEEE Transactions on Speech and Audio Processing, 8(5), 497–507.
Bartels, R. H., & Stewart, G. (1972). Solution of the matrix equation \(\text{ AX }+\text{ XB } = \text{ C }\). Communications of the ACM, 15(9), 820–822.
Benesty, J., Chen, J., & Huang, Y. (2007). Microphone array signal processing. Heidelberg, Berlin: Springer.
Benesty, J., Makino, S., & Chen, J. (2005). Speech enhancement. Heidelberg, Berlin: Springer.
Borowicz, A., & Petrovsky, A. (2005). Perceptually constrained subspace method for enhancing speech degraded by colored noise. In Proceedings of 2005 AES, Barcelona, Spain.
Cohen, I. (2002). Optimal speech enhancement under signal presence uncertainty using log-spectral amplitude estimator. IEEE Signal Processing Letters, 9(4), 113–116.
Cohen, I. (2003). Noise spectrum estimation in adverse environments: Improved minima controlled recursive averaging. IEEE Transactions on Speech and Audio Processing, 11(5), 466–475.
Dmochowski, J., Benesty, J., & Affes, S. (2007). Direction of arrival estimation using the parameterized spatial correlation matrix. IEEE Transactions on Audio, Speech, and Language Processing, 15(4), 1327–1339.
Ephraim, Y., & Trees, H. L. V. (1995). A signal subspace approach for speech enhancement. IEEE Transactions on Speech and Audio Processing, 3(4), 251–266.
Gannot, S., Burshtein, D., & Weinstein, E. (2001). Signal enhancement using beamforming and nonstationarity with application to speech. IEEE Transactions on Signal Processing, 49(8), 1614–1626.
Habets, E., & Gannot, S. (2007). Generating sensor signals in isotropic noise fields. The Journal of the Acoustical Society of America, 122, 3464–3470.
Hirsch, H. G., & Pearce, D. (2000). The AURORA experimental framework for the performance evaluations of speech recognition systems under noisy conditions. In ISCA ITRW ASR2000.
Hu, L., & Loizou, P. C. (2002). A subspace approach for enhancing speech corrupted by colored noise. IEEE Signal Processing Letters, 9(7), 204–206.
Hu, L., & Loizou, P. C. (2003). A generalized subspace approach for enhancing speech corrupted by colored noise. IEEE Transactions on Speech and Audio Processing, 11(4), 334–341.
IEEE Subcommittee. (1969). IEEE recommended practice for speech quality measurements. IEEE Transactions on Audio and Electroacoustics, AE–17(3), 225–246.
Johnson, D., & Dudgeon, D. (1993). Array signal processing: Concepts and techniques. Englewood Clifs, NJ: Prentice-Hall.
Kim, D. K., & Chang, J. H. (2011). A subspace approach based on embedded prewhitening for voice activity detection. The Journal of the Acoustical Society of America, 130(5), EL304–EL310.
Kim, N. S., & Chang, J. H. (2000). Spectral enhancement based on global soft decision. IEEE Signal Processing Letters, 7(6), 108–110.
Kitawaki, N., Nagabuchi, H., & Itoh, K. (1988). Objective quality evaluation for low bit-rate speech coding systems. IEEE Journal on Selected Areas in Communications, 6(2), 262–273.
Krueger, A., Warsitz, E., & Haeb-Umbach, R. (2011). Speech enhancement with a GSC-like structure employing eigenvector-based transfer function ratios estimation. IEEE Transactions on Audio, Speech, and Language Processing, 19(1), 206–219.
Lehmann, E. A., & Johansson, A. M. (2008). Prediction of energy decay in room impulse responses simulated with an image-souce model. The Journal of the Acoustical Society of America, 123(1), 269–277.
Lev-Ari, H., & Ephraim, Y. (2003). Extension of the signal subspace speech enhancement approach to colored noise. IEEE Signal Processing Letters, 10(4), 104–106.
Loizou, P. C. (2007). Speech enhancement. Boca Raton, FL: CRC Press.
Markovich-Golan, S., & Gannot, S. (2015). Performance analysis of the covariance subtraction method for relative transfer function estimation and comparison to the covariance whitening method. In Proceedings of 2015 ICASSP.
Serizel, R., Moonen, M., Dijk, B., & Wouters, J. (2014). Low-rank approximation based multichannel Wiener filter algorithms for noise reduction with application in cochlear implants. IEEE Transactions on Audio, Speech, and Language Processing, 22(4), 785–799.
Souden, M., Chen, J., Benesty, J., & Affes, S. (2010). Gaussian model-basedmultichannel speech presence probability. IEEE Transactions on Audio, Speech, and Language Processing, 18(5), 1072–1077.
Souden, M., Chen, J., Benesty, J., & Affes, S. (2011). An integrated solution for online multichannel noise tracking and reduction. IEEE Transactions on Audio, Speech, and Language Processing, 19(7), 2159–2169.
Varzandeh, R., Taseska, M., & Habets, E. (2017). An interative multichannel subspace-based covariance subtraction method for relative transfer function estimation. In Proceedings of HSCMA.
Wang, H., & Kaveh, M. (1985). Coherent signal-subspace processing for the detection and estimation of angles of arrival of multiple wideband sources. IEEE Transactions on Acoustics, Speech, and Signal Processing, ASSP–33(4), 823–831.
Warsitz, E., & Haeb-Umbach, R. (2007). Blind acoustic beamforming based on generalized eigenvalue decomposition. IEEE Transactions on Audio, Speech, and Language Processing, 15(5), 1529–1539.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Hong, J. A multichannel subspace approach with signal presence probability for speech enhancement. Multidim Syst Sign Process 30, 2045–2058 (2019). https://doi.org/10.1007/s11045-019-00640-z
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11045-019-00640-z