Abstract
Our recent work was focused on automatic speech recognition (ASR) of spoken word archive documents [6, 7]. One of the important tasks was to structuralize the recognized document (to segment the document and to detect sentence boundaries). Prosodic features play significant role in the spoken document structuralization. In our previous work we bound the prosodic information on the ASR events – words and noises. Many prosodic features (e.g. speech rate, vowel prominence or prolongation of last syllables) require higher time resolution than word-level [1]. For that reason we propose a scheme that is able to automatically syllabify the recognized words and by forced-alignment of its phonetic content provide the syllables (and its phonemes) with time-stamps. We presume that words, non-speech events, syllables and phonemes represent an appropriate hierarchical set of structuralization units for processing various prosodic features.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Bachan, J., Wagner, A., Klessa, K., Demenko, G.: Consistency of prosodic annotation of spontaneous speech for technology needs. In: 7th Language & Technology Conference, pp. 125–129 (2015)
Dahl, G.E., Yu, D., Deng, L., Acero, A.: Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition. IEEE Trans. Audio Speech Lang. Process. 20(1), 30–42 (2012)
Huici, H., Kairuz, H.A., Martens, H., Van Nuffelen, G., De Bodt, M.: Speech rate estimation in disordered speech based on spectral landmark detection. Biomed. Signal Process. Control 27, 1–6 (2016). http://www.sciencedirect.com/science/article/pii/S1746809416000069
Liang, F.M.: Word Hy-phen-a-tion by Com-put-er (hyphenation, computer). Ph.D. thesis, Stanford University, Stanford, CA, USA (1983). aAI8329742
Mateju, L., Červa, P., Ždánský, J.: Investigation into the use of deep neural networks for LVCSR of Czech. In: ECMSM 2015, pp. 1–4 (2015)
Nouza, J., Blavka, K., Boháč, M., Červa, P., Ždánský, J., Silovský, J., Pražák, J.: Voice technology to enable sophisticated access to historical audio archive of the Czech radio. In: Grana, C., Cucchiara, R. (eds.) MM4CH 2011. CCIS, vol. 247, pp. 27–38. Springer, Heidelberg (2012)
Nouza, J., et al.: Making Czech historical radio archive accessible and searchable for wide public. J. Multimedia 7(2), 159–169 (2012). http://ojs.academypublisher.com/index.php/jmm/article/view/jmm0702159169
Nouza, J., Psutka, J., Uhlír, J.: Phonetic alphabet for speech recognition of Czech. Radioengineering 6(4), 16–20 (1997)
Seps, L., Málek, J., Červa, P., Nouza, J.: Investigation of deep neural networks for robust recognition of nonlinearly distorted speech. In: INTERSPEECH, pp. 363–367 (2014)
Wagner, R.A., Fischer, M.J.: The string-to-string correction problem. J. ACM 21(1), 168–173 (1974)
Yarra, C., Deshmukh, O.D., Ghosh, P.K.: A mode-shape classification technique for robust speech rate estimation and syllable nuclei detection. Speech Commun. 78, 62–71 (2016). http://www.sciencedirect.com/science/article/pii/S016763931600025X
Acknowledgment
This work was partly supported by the Student’s Grant Scheme at the Technical University of Liberec (SGS 2016).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Boháč, M., Matějů, L., Rott, M., Šafařík, R. (2016). Automatic Syllabification and Syllable Timing of Automatically Recognized Speech – for Czech. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds) Text, Speech, and Dialogue. TSD 2016. Lecture Notes in Computer Science(), vol 9924. Springer, Cham. https://doi.org/10.1007/978-3-319-45510-5_62
Download citation
DOI: https://doi.org/10.1007/978-3-319-45510-5_62
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-45509-9
Online ISBN: 978-3-319-45510-5
eBook Packages: Computer ScienceComputer Science (R0)