Abstract
This paper presents a novel Voice Activity Detection (VAD) technique that can be easily applied to on–device isolated word recognition on a mobile device. The main speech features used are the Linear Predictive Coding (LPC) speech features which were correlated using the standard deviation of the signal. The output was further clustered using a modified K-means algorithm. The results presented show a significant improvement to a previous algorithm which was based on the LPC residual signal with an 86.6 % recognition rate as compared to this new technique with a 90 % recognition rate on the same data. This technique was able to achieve up to 97.7 % recognition for female users in some of the experiments. The fast processing time makes it viable for mobile devices.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Wu, B., Wang, K.: Voice activity detection based on auto-correlation function using wavelet transform and teager energy operator. Comput. Linguist. Chin. Lang. Process. 11, 87–100 (2006)
Waheed, K., Weaver, K., Salam, F.M.: A robust algorithm for detecting speech segments using an entropic contrast: circuits and systems. In: 2002. MWSCAS-2002. The 2002 45th Midwest Symposium on IEEE, vol. 3, pp. III-328–III-331 (2002)
Alarifi, A., Alkurtass, I., Al-Salman, A.: Arabic text-dependent speaker verification for mobile devices using artificial neural networks. In: Machine Learning and Applications and Workshops (ICMLA), 2011 10th International Conference on, vol. 2, pp. 350–353, IEEE (2011)
Huang, H., Lin, F.: A speech feature extraction method using complexity measure for voice activity detection in WGN. Speech Commun. 51, 714–723 (2009)
Ghaemmaghami, H., Baker, B.J., Vogt, R.J., Sridharan, S.: Noise robust voice activity detection using features extracted from the time-domain autocorrelation function. In: Proceedings of Interspeech (2010)
Ramırez, J., et al.: Efficient voice activity detection algorithms using long-term speech information. Speech commun. 42.3, 271–287 (2004)
Prasanta Kumar, G., Tsiartas, A., Narayanan, S.: Robust voice activity detection using long-term signal variability. IEEE Trans. Audio Speech Lang. Process. 19.3, 600–613 (2011)
Tashan, T., Allen, T., Nolle, L.: Speaker verification using heterogeneous neural network architecture with linear correlation speech activity detection. Expert Syst. (2013). doi:10.1111/exsy.12030
Rabiner, L.R., Schafer, R.W.: Digital Processing of Speech Signals. (IET, Stevenage 1979)
Mustafa, M.K., Allen, T., Evett, L.: A review of voice activity detection techniques for on-device isolated digit recognition on mobile devices. In: Bramer, M., Petridis, M. (eds.) Research and Development in Intelligent Systems XXXI, (Springer International Publishing, Switzerland 2014)
Smith, S.W.: The Scientist and Engineer’s Guide to Digital Signal Processing. (FreeTech Books, San Diego 2003)
Looney, C.G.: A fuzzy clustering and fuzzy merging algorithm, CS791q Class notes (1999)
Žalik, K.R.: An efficient k′-means clustering algorithm. Pattern Recogn. Lett. 29, 1385–1391 (2008)
CSLU Database.: http://www.cslu.ogi.edu/corpora/isolet/
Acknowledgements
The authors wish to thank the Petroleum Technology Development Fund (PTDF) for their continued support and sponsorship of this research. Dr. S Mustafa, Aishatu Mustafa and colleagues who helped in conducting experiments.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Mustafa, M.K., Allen, T., Appiah, K. (2015). A Novel K-Means Voice Activity Detection Algorithm Using Linear Cross Correlation on the Standard Deviation of Linear Predictive Coding. In: Bramer, M., Petridis, M. (eds) Research and Development in Intelligent Systems XXXII. SGAI 2015. Springer, Cham. https://doi.org/10.1007/978-3-319-25032-8_24
Download citation
DOI: https://doi.org/10.1007/978-3-319-25032-8_24
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-25030-4
Online ISBN: 978-3-319-25032-8
eBook Packages: Computer ScienceComputer Science (R0)