Abstract
Weighted Correlation based Atom Decomposition (WCAD) algorithm is a technique for intonation modelling that uses a matching pursuit framework to decompose the F0 contour into a set of basic components, called atoms. The atoms attempt to model the physiological activation of the laryngeal muscles responsible for changes in F0. Recently, WCAD has been upgraded to use the orthogonal matching pursuit (OMP) algorithm, which gives qualitative improvements in the modelling of intonation. A possible exploitation of the OMP based WCAD is the automatic detection of stress in speech, which we undertake for the Hungarian language. Correlation is demonstrated between stress and atomic peaks, as well as between stress and atomic valleys on the previous syllable. The stress detection technique based on WCAD is compared to a baseline system using HMM/GMM stress/phrase models. 7 % improvement is noticed in the F-measure compared to baseline when evaluating on hand-made reference. Finally, we propose a hybrid approach which outperforms both individual systems (by 11 % compared to the baseline).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
The WCAD implementation code is available on gitHub at https://github.com/dipteam/wcad.
References
Fujisaki, H.: The roles of physiology, physics and mathematics in modeling prosodic features of speech. In: Speech Prosody, Dresden, Germany, May 2006
Gerazov, B., Gjoreski, A., Ivanovski, Z.: Implementation of optimized matching pursuit techniques in weighted correlation based atom decomposition intonation modelling. In: 3rd International Acoustics and Audio Engineering Conference TAKTONS, Novi Sad, Serbia, pp. 68–69, November 2015
Gerazov, B., Honnet, P.E., Gjoreski, A., Garner, P.N.: Weighted correlation based atom decomposition intonation modelling. In: Proceedings of Interspeech, Dresden, Germany, pp. 1601–1605, September 2015
Gjoreski, A., Gerazov, B., Ivanovski, Z.: Atom-decomposition based analysis for the purpose of emphatic word detection. In: XII International Conference ETAI, Ohrid, Macedonia, September 2015
Hermes, D.J.: Measuring the perceptual similarity of pitch contours. J. Speech Lang. Hear. Res. 41(1), 73–82 (1998)
Pati, Y.C., Rezaiifar, R., Krishnaprasad, P.: Orthogonal matching pursuit: recursive function approximation with applications to wavelet decomposition. In: 1993 Conference Record of the Twenty-Seventh Asilomar Conference on Signals, Systems and Computers, pp. 40–44. IEEE (1993)
Roach, P.S., et al.: Babel: an eastern european multi-language database. In: International Conference on Speech and Language, pp. 1033–1036 (1996)
Szaszák, G., Beke, A., Olaszy, G., Tóth, B.P.: Gépi beszéd természetességének növelése automatikus, beszédjel alapú hangsúlycímkézö algoritmussal. In: Proceedings of 12th Hungarian Conference on Computational Linguistics (MSZNY), pp. 144–153 (2016)
Szaszák, G., Tulics, M.G., Tündik, M.A.: Analyzing f0 discontinuity for speech prosody enhancement. Acta Univ. Sapientiae Elect. Mech. Eng. 6(1), 59–67 (2014)
Acknowledgments
This work was supported by the Hungarian National Innovation Office (OTKA-PD-112598, “Automatic Phonological Phrase and Prosodic Event Detection for the Extraction of Syntactic and Semantic/Pragmatic Information from Speech” and by the Swiss National Science Foundation (No. CRSII2-147611/1, “SP2: SCOPES Project on Speech Prosody”).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Szaszák, G., Tündik, M.Á., Gerazov, B., Gjoreski, A. (2016). Combining Atom Decomposition of the F0 Track and HMM-based Phonological Phrase Modelling for Robust Stress Detection in Speech. In: Ronzhin, A., Potapova, R., Németh, G. (eds) Speech and Computer. SPECOM 2016. Lecture Notes in Computer Science(), vol 9811. Springer, Cham. https://doi.org/10.1007/978-3-319-43958-7_19
Download citation
DOI: https://doi.org/10.1007/978-3-319-43958-7_19
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-43957-0
Online ISBN: 978-3-319-43958-7
eBook Packages: Computer ScienceComputer Science (R0)