Abstract
In this paper, a voice activity detection (VAD) method for dual-channel noisy speech recognition is proposed on the basis of statistical models constructed by spatial cues and log energy. In particular, spatial cues are composed of the interaural time differences and interaural level differences of dual-channel speech signals, and the statistical models for speech presence and absence are based on a Gaussian kernel density. In order to evaluate the performance of the proposed VAD method, speech recognition is performed using only speech signals segmented by the proposed VAD method. The performance of the proposed VAD method is then compared with those of conventional methods such as a signal-to-noise ratio variance based method and a phase vector based method. It is shown from the experiments that the proposed VAD method outperforms conventional methods, providing the relative word error rate reductions of 19.5% and 12.2%, respectively.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Junqua, J.C., Mak, B., Reaves, B.: A robust algorithm for word boundary detection in the presence of noise. IEEE Transactions on Speech and Audio Processing 2(3), 406–412 (1994)
ETSI TS 101 707, V7.5.0: Digital Cellular Telecommunications System (Phase 2+); Discontinuous Transmission (DTX) for Adaptive Multi-Rate (AMR) Speech Traffic Channels (2000)
Rabiner, R., Sambur, M.R.: An algorithm for determining the endpoints of isolated utterances. Bell System Technical Journal 54(2), 297–315 (1975)
Tuker, R.: Voice activity detection using a periodicity measure. IEE Proceedings-I, Communications, Speech and Vision 139(4), 377–380 (1992)
Haigh, J.A., Mason, J.S.: Robust voice activity detection using cepstral features. In: Proceedings of the IEEE TENCON, pp. 321–324 (1993)
Ramirez, J., Segura, J.C., Benitez, C., Torre, A., Rubio, A.: Efficient voice activity detection algorithms using long-term speech information. Speech Communication 42(3-4), 271–287 (2004)
Welch, P.D.: The use of fast Fourier transform for the estimation of power spectra: a method based on time averaging over short, modified periodograms. IEEE Transactions on Audio Electroacoustics 15(2), 70–73 (1967)
Davis, A., Nordholm, S., Tognery, R.: Statistical voice activity detection using low-variance spectrum estimation and an adaptive threshold. IEEE Transactions on Audio, Speech, and Language Processing 14(2), 412–424 (2006)
Kim, G., Cho, N.I.: Voice activity detection using phase vector in microphone array. Electronic Letters 43(14), 783–784 (2007)
Patterson, R.D., Nimmo-Smith, I., Holdsworth, J., Rice, P.: An Efficient Auditory Filterbank Based on the Gammatone Functions. APU Report 2341, MRC, Applied Psychology Unit, Cambridge U.K (1998)
Glasberg, B.R., Moore, B.C.J.: Derivation of auditory filter shapes from notched–noise data. Hearing Research 47(1-2), 103–138 (1990)
Parzen, E.: On estimation of a probability density function and mode. The Annals of Mathematical Statistics 33(3), 1065–1076 (1962)
Kim, S., Oh, S., Jung, H.-Y., Jeong, H.-B., Kim, J.-S.: Common speech database collection. Proceedings of the Acoustical Society of Korea 21(1), 21–24 (2002)
Gardner, W.G., Martin, K.D.: HRTF measurements of a KEMAR. The Journal of the Acoustical Society of America 97(6), 3907–3908 (1995)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Park, J.H., Shin, M.H., Kim, H.K. (2010). Statistical Model-Based Voice Activity Detection Using Spatial Cues and Log Energy for Dual-Channel Noisy Speech Recognition. In: Kim, Th., Vasilakos, T., Sakurai, K., Xiao, Y., Zhao, G., Ślęzak, D. (eds) Communication and Networking. FGCN 2010. Communications in Computer and Information Science, vol 120. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-17604-3_19
Download citation
DOI: https://doi.org/10.1007/978-3-642-17604-3_19
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-17603-6
Online ISBN: 978-3-642-17604-3
eBook Packages: Computer ScienceComputer Science (R0)