Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

A multichannel subspace approach with signal presence probability for speech enhancement

  • Published:
Multidimensional Systems and Signal Processing Aims and scope Submit manuscript

Abstract

For the last few decades, speech enhancement based on microphone arrays has primarily utilized prior information about system models, e.g., array geometry and source location. However, estimation of the time delay to align microphone inputs is largely affected by reverberation and microphone mismatch. Preprocessing time aligning, e.g., fixed beamforming (the first branch of the generalized sidelobe canceller), is not desirable in general applications. Recently, interest has shifted to linear filtering, which works with only second-order statistics of noisy input and estimated noise. This paper proposes a linear filter design based on a multichannel subspace approach for speech enhancement. The contribution of the proposed multichannel subspace methods is threefold. First, a linear filter is applied to the multichannel frequency domain using a spatiospectral correlation matrix. Next, three types of multichannel signal presence probability (MC-SPP) are derived in the subspace domain. Third, incorporating the MC-SPPs into the gain modification of the linear filter achieves further improved noise reduction performance. Of the gain modifications, the proposed gain modification with subspace probability related to the eigenvector corresponding to the maximum eigenvalue realized the best noise reduction performance. The evaluation on average improved the proposed subspace-based methods by approximately 4 dB in overall SNR while maintaining a similar cepstral distance measured over the minimum variance distortionless response with the state-of-the-art relative transfer function estimation in adverse noisy environments.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

References

  • Affes, S., & Grenier, Y. (1997). A signal subspace tracking algorithm for microphone array processing of speech. IEEE Transactions on Speech and Audio Processing, 5(5), 425–437.

    Article  Google Scholar 

  • Allen, J., & Berkley, D. (1979). Image method for efficiently simulating small-room acoustics. The Journal of the Acoustical Society of America, 65, 943–950.

    Article  Google Scholar 

  • Asano, F., Hayamizu, S., Yamada, T., & Nakamura, S. (2000). Speech enhancement based on the subspace method. IEEE Transactions on Speech and Audio Processing, 8(5), 497–507.

    Article  Google Scholar 

  • Bartels, R. H., & Stewart, G. (1972). Solution of the matrix equation \(\text{ AX }+\text{ XB } = \text{ C }\). Communications of the ACM, 15(9), 820–822.

    Article  MATH  Google Scholar 

  • Benesty, J., Chen, J., & Huang, Y. (2007). Microphone array signal processing. Heidelberg, Berlin: Springer.

    Google Scholar 

  • Benesty, J., Makino, S., & Chen, J. (2005). Speech enhancement. Heidelberg, Berlin: Springer.

    Google Scholar 

  • Borowicz, A., & Petrovsky, A. (2005). Perceptually constrained subspace method for enhancing speech degraded by colored noise. In Proceedings of 2005 AES, Barcelona, Spain.

  • Cohen, I. (2002). Optimal speech enhancement under signal presence uncertainty using log-spectral amplitude estimator. IEEE Signal Processing Letters, 9(4), 113–116.

    Article  Google Scholar 

  • Cohen, I. (2003). Noise spectrum estimation in adverse environments: Improved minima controlled recursive averaging. IEEE Transactions on Speech and Audio Processing, 11(5), 466–475.

    Article  Google Scholar 

  • Dmochowski, J., Benesty, J., & Affes, S. (2007). Direction of arrival estimation using the parameterized spatial correlation matrix. IEEE Transactions on Audio, Speech, and Language Processing, 15(4), 1327–1339.

    Article  Google Scholar 

  • Ephraim, Y., & Trees, H. L. V. (1995). A signal subspace approach for speech enhancement. IEEE Transactions on Speech and Audio Processing, 3(4), 251–266.

    Article  Google Scholar 

  • Gannot, S., Burshtein, D., & Weinstein, E. (2001). Signal enhancement using beamforming and nonstationarity with application to speech. IEEE Transactions on Signal Processing, 49(8), 1614–1626.

    Article  Google Scholar 

  • Habets, E., & Gannot, S. (2007). Generating sensor signals in isotropic noise fields. The Journal of the Acoustical Society of America, 122, 3464–3470.

    Article  Google Scholar 

  • Hirsch, H. G., & Pearce, D. (2000). The AURORA experimental framework for the performance evaluations of speech recognition systems under noisy conditions. In ISCA ITRW ASR2000.

  • Hu, L., & Loizou, P. C. (2002). A subspace approach for enhancing speech corrupted by colored noise. IEEE Signal Processing Letters, 9(7), 204–206.

    Article  Google Scholar 

  • Hu, L., & Loizou, P. C. (2003). A generalized subspace approach for enhancing speech corrupted by colored noise. IEEE Transactions on Speech and Audio Processing, 11(4), 334–341.

    Article  Google Scholar 

  • IEEE Subcommittee. (1969). IEEE recommended practice for speech quality measurements. IEEE Transactions on Audio and Electroacoustics, AE–17(3), 225–246.

    Google Scholar 

  • Johnson, D., & Dudgeon, D. (1993). Array signal processing: Concepts and techniques. Englewood Clifs, NJ: Prentice-Hall.

    MATH  Google Scholar 

  • Kim, D. K., & Chang, J. H. (2011). A subspace approach based on embedded prewhitening for voice activity detection. The Journal of the Acoustical Society of America, 130(5), EL304–EL310.

    Article  Google Scholar 

  • Kim, N. S., & Chang, J. H. (2000). Spectral enhancement based on global soft decision. IEEE Signal Processing Letters, 7(6), 108–110.

    Google Scholar 

  • Kitawaki, N., Nagabuchi, H., & Itoh, K. (1988). Objective quality evaluation for low bit-rate speech coding systems. IEEE Journal on Selected Areas in Communications, 6(2), 262–273.

    Article  Google Scholar 

  • Krueger, A., Warsitz, E., & Haeb-Umbach, R. (2011). Speech enhancement with a GSC-like structure employing eigenvector-based transfer function ratios estimation. IEEE Transactions on Audio, Speech, and Language Processing, 19(1), 206–219.

    Article  Google Scholar 

  • Lehmann, E. A., & Johansson, A. M. (2008). Prediction of energy decay in room impulse responses simulated with an image-souce model. The Journal of the Acoustical Society of America, 123(1), 269–277.

    Article  Google Scholar 

  • Lev-Ari, H., & Ephraim, Y. (2003). Extension of the signal subspace speech enhancement approach to colored noise. IEEE Signal Processing Letters, 10(4), 104–106.

    Article  Google Scholar 

  • Loizou, P. C. (2007). Speech enhancement. Boca Raton, FL: CRC Press.

    Book  Google Scholar 

  • Markovich-Golan, S., & Gannot, S. (2015). Performance analysis of the covariance subtraction method for relative transfer function estimation and comparison to the covariance whitening method. In Proceedings of 2015 ICASSP.

  • Serizel, R., Moonen, M., Dijk, B., & Wouters, J. (2014). Low-rank approximation based multichannel Wiener filter algorithms for noise reduction with application in cochlear implants. IEEE Transactions on Audio, Speech, and Language Processing, 22(4), 785–799.

    Article  Google Scholar 

  • Souden, M., Chen, J., Benesty, J., & Affes, S. (2010). Gaussian model-basedmultichannel speech presence probability. IEEE Transactions on Audio, Speech, and Language Processing, 18(5), 1072–1077.

    Article  Google Scholar 

  • Souden, M., Chen, J., Benesty, J., & Affes, S. (2011). An integrated solution for online multichannel noise tracking and reduction. IEEE Transactions on Audio, Speech, and Language Processing, 19(7), 2159–2169.

    Article  Google Scholar 

  • Varzandeh, R., Taseska, M., & Habets, E. (2017). An interative multichannel subspace-based covariance subtraction method for relative transfer function estimation. In Proceedings of HSCMA.

  • Wang, H., & Kaveh, M. (1985). Coherent signal-subspace processing for the detection and estimation of angles of arrival of multiple wideband sources. IEEE Transactions on Acoustics, Speech, and Signal Processing, ASSP–33(4), 823–831.

    Article  Google Scholar 

  • Warsitz, E., & Haeb-Umbach, R. (2007). Blind acoustic beamforming based on generalized eigenvalue decomposition. IEEE Transactions on Audio, Speech, and Language Processing, 15(5), 1529–1539.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jungpyo Hong.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hong, J. A multichannel subspace approach with signal presence probability for speech enhancement. Multidim Syst Sign Process 30, 2045–2058 (2019). https://doi.org/10.1007/s11045-019-00640-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11045-019-00640-z

Keywords