Abstract
Automatic detection of misannotated words in single-speaker read-speech corpora is investigated in this paper. Support vector machine (SVM) classifier was proposed to detect the misannotated words. Its performance was evaluated with respect to various word-level feature sets. The SVM classifier was shown to perform very well with both high precision and recall scores and with F1 measure being almost 88%. This is a statistically significant improvement over a traditionally used outlier-based detection method.
The work has been supported by the Technology Agency of the Czech Republic, project No. TA01030476, and by the European Regional Development Fund (ERDF), project “New Technologies for Information Society” (NTIS), European Centre of Excellence, ED1.1.00/02.0090. The access to the MetaCentrum clusters provided under the programme LM2010005 is highly appreciated.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Matoušek, J., Tihelka, D., Šmídl, L.: On the impact of annotation errors on unit-selection speech synthesis. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds.) TSD 2012. LNCS, vol. 7499, pp. 456–463. Springer, Heidelberg (2012)
Matoušek, J., Romportl, J.: Recording and Annotation of Speech Corpus for Czech Unit Selection Speech Synthesis. In: Matoušek, V., Mautner, P. (eds.) TSD 2007. LNCS (LNAI), vol. 4629, pp. 326–333. Springer, Heidelberg (2007)
Adell, J., Agüero, P.D., Bonafonte, A.: Database pruning for unsupervised building of text-to-speech voices. In: Proc. ICASSP, Toulouse, France, pp. 889–892 (2006)
Tachibana, R., Nagano, T., Kurata, G., Nishimura, M., Babaguchi, N.: Preliminary experiments toward automatic generation of new TTS voices from recorded speech alone. In: Proc. INTERSPEECH, Antwerp, Belgium, pp. 1917–1920 (2007)
Wei, S., Hu, G., Hu, Y., Wang, R.H.: A new method for mispronunciation detection using support vector machine based on pronunciation space models. Speech Commun. 51(10), 896–905 (2009)
Kominek, J., Black, A.: Impact of durational outlier removal from unit selection catalogs. In: Proc. SSW, Pittsburgh, USA, pp. 155–160 (2004)
Lu, H., Wei, S., Dai, L., Wang, R.H.: Automatic error detection for unit selection speech synthesis using log likelihood ratio based SVM classifier. In: Proc. INTERSPEECH, Makuhari, Japan, pp. 162–165 (2010)
Wang, W.Y., Georgila, K.: Automatic detection of unnatural word-level segments in unit-selection speech synthesis. In: Proc. ASRU, Hawaii, USA, pp. 289–294 (2011)
Tihelka, D., Kala, J., Matoušek, J.: Enhancements of Viterbi search for fast unit selection synthesis. In: Proc. INTERSPEECH, Makuhari, Japan, pp. 174–177 (2010)
Young, S., Evermann, G., Gales, M., Hain, T., Kershaw, D., Liu, X., Moore, G., Odell, J., Ollason, D., Povey, D., Valtchev, V., Woodland, P.: HTK Book (for HTK Version 3.4). The Cambridge University, Cambridge (2006)
Matoušek, J., Tihelka, D., Psutka, J.V.: Experiments with Automatic Segmentation for Czech Speech Synthesis. In: Matoušek, V., Mautner, P. (eds.) TSD 2003. LNCS (LNAI), vol. 2807, pp. 287–294. Springer, Heidelberg (2003)
Matoušek, J., Romportl, J.: Automatic pitch-synchronous phonetic segmentation. In: Proc. INTERSPEECH, Brisbane, Australia, pp. 1626–1629 (2008)
Dietterich, T.G.: Approximate statistical tests for comparing supervised classification learning algorithms. Neural Comput. 10, 1895–1923 (1998)
Cortes, C., Vapnik, V.: Support-vector networks. Machine Leaming 20(3), 273–279 (1995)
Matoušek, J., Tihelka, D.: Annotation errors detection in TTS corpora. In: Proc. Interspeech, Lyon, France (2013)
Romportl, J., Kala, J.: Prosody modelling in Czech text-to-speech synthesis. In: Proc. SSW, Bonn, Germany, pp. 200–205 (2007)
Taylor, P., Caley, R., Black, A., King, S.: Edinburgh speech tools library: System documentation (1999), http://www.cstr.ed.ac.uk/projects/speech_tools/manual-1.2.0/
Pedregosa, F., Varoquaux, G., Gramfort, A., Thirion, V.M.B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perror, M.: Édouard Duchesnay: Scikit-learn: Machine learning in Python. J. Machine Learn. Res. 12, 2825–2830 (2011)
Přibil, J., Přibilová, A.: Comparison of spectral and prosodic parameters of male and female emotional speech in Czech and Slovak. In: Proc. ICASSP, Prague, Czech Republic, pp. 4720–4723 (2011)
Ircing, P., Psutka, J., Psutka, J.V.: Using morphological information for robust language modeling in Czech ASR system. IEEE Trans. Audio Speech Lang. Process. 17, 840–847 (2009)
Psutka, J., Švec, J., Psutka, J.V., Vaněk, J., Pražák, A., Šmídl, L., Ircing, P.: System for fast lexical and phonetic spoken term detection in a Czech cultural heritage archive. EURASIP J. Audio Speech Music Process. 10 (2011)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Matoušek, J., Tihelka, D. (2013). SVM-Based Detection of Misannotated Words in Read Speech Corpora. In: Habernal, I., Matoušek, V. (eds) Text, Speech, and Dialogue. TSD 2013. Lecture Notes in Computer Science(), vol 8082. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40585-3_58
Download citation
DOI: https://doi.org/10.1007/978-3-642-40585-3_58
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-40584-6
Online ISBN: 978-3-642-40585-3
eBook Packages: Computer ScienceComputer Science (R0)