Abstract
Researchers interested in the sounds of speech or the physical gestures of speakers make use of audio and video recordings in their work. Annotating these recordings presents a different set of requirements to the annotation of text. Special purpose tools have been developed to display video and audio signals and to allow the creation of time-aligned annotations. This chapter reviews the most widely used of these tools for both manual and automatic generation of annotations on multimodal data.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
- 2.
- 3.
- 4.
- 5.
- 6.
- 7.
- 8.
References
Anguera Miro, X., Bozonnet, S., Evans, N., Fredouille, C., Friedland, G., Vinyals, O.: Speaker diarization: a review of recent research. IEEE Trans. Audio Speech Lang. Process. 20(2), 356–370 (2012)
Barnett, R., Codó, E., Eppler, E., Forcadell, M., Gardner-Chloros, P., van Hout, R., Moyer, M., Torras, M.C., Turell, M.T., Sebba, M., et al.: The lides coding manual: a document for preparing and analyzing language interaction data version 1.1-July 1999. Int. J. Biling. 4(2), 131–271 (2000)
Barras, C., Geoffrois, E., Wu, Z., Liberman, M.: Transcriber: development and use of a tool for assisting speech corpora production. Speech Commun. 33(1,2), 5–22 (2000)
Beckman, M.E., Hirschberg, J.B., Shattuck-Hufnagel, S.: The original tobi system and the evolution of the tobi framework. Prosodic Models and Transcription: Towards Prosodic Typology, pp. 9–54. Oxford University Press, Oxford (2004)
Bert, M., Bruxelles, S., Etienne, C., Mondada, L., Traverso, V.: Tool-assisted analysis of interactional corpora: voilà in the clapi database. J. Fr. Lang. Stud. 18(01), 121–145 (2008)
Bird, S., Liberman, M.: A formal framework for linguistics annotation. Speech Commun. 33(1), 23–60 (2001)
Boersma, P.: The use of praat in corpus research. In: Durand, J., Gut, U., Kristoffersen, G. (eds.) The Oxford Handbook of Corpus Phonology, pp. 342–360. Oxford University Press, Oxford (2014)
Cassidy, S., Harrington, J.: Multi-level annotation in the Emu speech database management system. Speech Commun. 33, 61–77 (2000)
Du Bois, J.W., Schuetze-Coburn, S., Cumming, S., Paolino, D.: Outline of discourse transcription. In: Edwards, J.A., Lampert, M.D. (eds.) Talking Data: Transcription and Coding in Discourse Research, pp. 45–89. Lawrence Erlbaum Associates, New Jersey (1993)
Ehlich, K., Rehbein, J.: Halbinterpretative Arbeitstranskriptionen (HIAT). Linguistische Berichte 45, 21–41 (1976)
Glenn, M.L., Strassel, S.M., Lee, H.: Xtrans: a speech annotation and transcription tool. In: Proceedings of Interspeech, ISCA, Brighton, UK (2009)
Goldman, J.P.: EasyAlign: an automatic phonetic alignment tool under praat. In: INTERSPEECH, pp. 3233–3236 (2011)
John, T., Bombien, L.: Emu. In: Durand, J., Gut, U., Kristoffersen, G. (eds.) The Oxford Handbook of Corpus Phonology, pp. 321–341. Oxford University Press, Oxford (2014)
Lee, A., Kawahara, T.: Recent development of open-source speech recognition engine julius. In: Proceedings APSIPA ASC 2009, Asia-Pacific Signal and Information Processing Association, 2009 Annual Summit and Conference, International Organizing Committee, pp. 131–137 (2009)
MacWhinney, B.: The CHILDES Project: Tools for Analyzing Talk. Lawrence Erlbaum Associates, Mahwah (2000)
Nivre, J., Allwood, J., Grönqvist, L., Gunnarsson, M., Ahlsén, E., Vappula, H., Hagman, J., Larsson, S., Sofkova, S., Ottesjö, C.: Göteborg transcription standard. http://www.ling.gu.se/projekt/tal/index.cgi?PAGE=6 (2007)
Rehbein, J., Schmidt, T., Meyer, B., Watzke, F., Herkenrath, A.: Handbuch für das computergestützte Transkribieren nach HIAT. Sonderforschungsbereich 538 (2004)
Rosenfelder, I., Fruehwald, J., Evanini, K., Yuan, J.: FAVE (Forced Alignment and Vowel Extraction) Program Suite. http://fave.ling.upenn.edu (2011)
Rouvier, M., Dupuy, G., Gay, P., Khoury, E., Merlin, T., Meignier, S.: An open-source state-of-the-art toolbox for broadcast news diarization. In: Proceedings of Interspeech 2013, ISCA, Lyon, France (2013)
Sacks, H., Schegloff, E.A., Jefferson, G.: A simplest systematics for the organization of turn-taking for conversation. Language 50, 696–735 (1974)
Salvi, G., Vanhainen, N.: The wavesurfer automatic speech recognition plugin. In: Chair, N.C.C., Choukri, K., Declerck, T., Loftsson, H., Maegaard, B., Mariani, J., Moreno, A., Odijk, J., Piperidis, S. (eds.) Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC’14), European Language Resources Association (ELRA), Reykjavik, Iceland (2014)
Schiel, F., Draxler, C., Harrington, J.: Phonemic segmentation and labelling using the MAUS technique. In: New Tools and Methods for Very-Large-Scale Phonetics Research, University of Pennsylvania (2011)
Schmidt, T.: Exmaralda and the folk tools. In: Proceedings of LREC, ELRA. http://www.lrec-conf.org/proceedings/lrec2012/pdf/529_Paper.pdf (2012)
Schmidt, T., Duncan, S., Ehmer, O., Hoyt, J., Kipp, M., Loehr, D., Magnusson, M., Rose, T., Sloetjes, H.: An exchange format for multimodal annotations. In: Kipp, M., Martin, J.C., Paggio, P., Heylen, D. (eds.) Multimodal Corpora. Lecture Notes in Computer Science, vol. 5509, pp. 207–221. Springer, Heidelberg (2009). doi:10.1007/978-3-642-04793-0_13
Selting, M., Auer, P., Barth-Weingarten, D., Bergmann, J., Bergmann, P., Birkner, K., Couper-Kuhlen, E., Deppermann, A., Gilles, P., Gunthner, S., Hartung, M., derike Kern, F., Mertzlufft, C., Meyer, C., Morek, M., Oberzaucher, F., Peters, J., Quasthoff, U., Schutte, W., Stukenbrock, A., Uhmann, S.: Gesprächsanalytisches Transkriptionssystem 2 (GAT 2). In: Gesprächsforschung - Online-Zeitschrift zur verbalen Interaktion, vol. 10, pp. 353–402 (2009)
Tranter, S.E., Reynolds, D.A.: An overview of automatic speaker diarization systems. IEEE Trans. Audio Speech Lang. Process. 14(5), 1557–1565 (2006)
Wells, J.: SAMPA computer readable phonetic alphabet. In: Gibbon, D., Moore, R., Winski, R. (eds.) Handbook of Standards and Resources for Spoken Language Systems. Mouton de Gruyter, Berlin (1997)
Winkelmann, R., Raess, G.: Introducing a web application for labeling, visualizing speech and correcting derived speech signals. In: Chair, N.C.C., Choukri, K., Declerck, T., Loftsson, H., Maegaard, B., Mariani, J., Moreno, A., Odijk, J., Piperidis, S. (eds.) Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC’14), European Language Resources Association (ELRA), Reykjavik, Iceland (2014)
Young, S., Evermann, G., Gales, M., Hain, T., Kershaw, D., Liu, X., Moore, G., Odell, J., Ollason, D., Povey, D., et al.: The HTK Book, vol. 2. Entropic Cambridge Research Laboratory, Cambridge (1997)
Yuan, J., Liberman, M.: Speaker identification on the SCOTUS corpus. In: Proceedings of Acoustics ’08 (2008)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer Science+Business Media Dordrecht
About this chapter
Cite this chapter
Cassidy, S., Schmidt, T. (2017). Tools for Multimodal Annotation. In: Ide, N., Pustejovsky, J. (eds) Handbook of Linguistic Annotation. Springer, Dordrecht. https://doi.org/10.1007/978-94-024-0881-2_7
Download citation
DOI: https://doi.org/10.1007/978-94-024-0881-2_7
Published:
Publisher Name: Springer, Dordrecht
Print ISBN: 978-94-024-0879-9
Online ISBN: 978-94-024-0881-2
eBook Packages: Social SciencesSocial Sciences (R0)