Abstract
The amount of speech data available on-line and in institutional repositories, including recordings of lectures, “podcasts”, news broadcasts etc, has increased greatly in the past few years. Effective access to such data demands transcription. While current automatic speech recognition technology can help with this task, results of automatic transcription alone are often unsatisfactory. Recently, approaches which combine automatic speech recognition and collaborative transcription have been proposed in which geographically distributed users edit and correct automatically generated transcripts. These approaches, however, are based on traditional text-editor interfaces which provide little satisfaction to the users who perform these time-consuming tasks, most often on a voluntarily basis. We present a 3D “transcription game” interface which aims at improving the user experience of the transcription task and, ultimately, creating an extra incentive for users to engage in a process of collaborative transcription in the first place.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Ainsworth, W.A., Pratt, S.R.: Feedback strategies for error correction in speech recognition systems. International Journal of Man-Machine Studies 36(6), 833–842 (1992)
Chelba, C., Silva, J., Acero, A.: Soft indexing of speech content for search in spoken documents. Computer Speech & Language 21(3), 458–478 (2007)
Désilets, A., Gonzalez, L., Paquet, S., Stojanovic, M.: Translation the Wiki way. In: WikiSym 2006: Proceedings of the 2006 International Symposium on Wikis, pp. 19–32. ACM, New York (2006)
Evermann, G., Woodland, P.C.: Posterior probability decoding, confidence estimation and system combination. In: Proceedings of the Speech Transcription Workshop. College Park, MD (October 2000)
Goel, V., Byrne, W., Khudanpur, S.: LVCSR rescoring with modified loss functions: a decision theoretic perspective. In: Procs. of the IEEE Intl. Conf. on Acoustics, Speech and Signal Processing (ICASSP 1998), vol. 1, pp. 425–428 (1998)
Jelinek, F.: Statistical Methods for Speech Recognition. MIT Press, Cambridge (1998)
Luz, S., Masoodian, M., Rogers, B., Deering, C.: Interface design strategies for computer-assisted speech transcription. In: Proceedings of the Australasian Conference on Human-Computer Interaction (OZCHI 2008), pp. 203–210. ACM, New York (2008)
Luz, S., Masoodian, M., Rogers, B., Zhang, B.: A system for dynamic 3D visualisation of speech recognition paths. In: Bottoni, P., Levialdi, S. (eds.) Proceedings of Advanced Visual Interfaces (AVI 2008), pp. 482–483. ACM Press, New York (2008)
Mangu, L., Brill, E., Stolcke, A.: Finding consensus in speech recognition: word error minimization and other applications of confusion networks. Computer Speech & Language 14(4), 373–400 (2000)
Munteanu, C., Baecker, R., Penn, G.: Collaborative editing for improved usefulness and usability of transcript-enhanced webcasts. In: Proceedings of the 26th SIGCHI Conference on Human Factors in Computing Systems (CHI 2008), pp. 373–382. ACM, New York (2008)
Nanjo, H., Kawahara, T.: Towards an efficient archive of spontaneous speech: Design of computer-assisted speech transcription system. The Journal of the Acoustical Society of America 120, 3042 (2006)
Ogata, J., Goto, M.: PodCastle: a spoken document retrieval system for podcasts and its performance improvement by anonymous user contributions. In: SSCS 2009: Proceedings of the ACM Multimedia Workshop on Searching Spontaneous Conversational Speech, pp. 37–38. ACM, New York (2009)
Resnick, P., Iacovou, N., Suchak, M., Bergstrom, P., Riedl, J.: GroupLens: an open architecture for collaborative filtering of netnews. In: Proceedings of the 1994 ACM Conference on Computer Supported Cooperative Work, CSCW 1994, pp. 175–186. ACM, New York (1994)
Roy, B., Roy, D.: Fast transcription of unstructured audio recordings. In: Proceedings of the 10th Annual Conference of the International Speech Communication Association (INTERSPEECH), Bristol, UK, p. 4 (2009)
Suhm, B., Myers, B., Waibel, A.: Multimodal error correction for speech user interfaces. ACM Transactions on Computer-Human Interaction 8(1), 60–98 (2001)
Völkel, M., Krötzsch, M., Vrandecic, D., Haller, H., Studer, R.: Semantic Wikipedia. In: Proceedings of the 15th International Conference on World Wide Web, WWW 2006, pp. 585–594. ACM, New York (2006)
Wessel, F., Schluter, R., Ney, H.: Explicit word error minimization using word hypothesis posterior probabilities. In: Procs. of the IEEE Intl. Conf. on Acoustics, Speech, and Signal Processing (ICASSP 2001), vol. 1, pp. 33–36 (2001)
Zhou, Z.Y., Yu, P., Chelba, C., Seide, F.: Towards spoken-document retrieval for the internet: lattice indexing for large-scale web-search architectures. In: Proceedings of the Conference of the North American Chapter of the ACL, pp. 415–422 (2006)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Luz, S., Masoodian, M., Rogers, B. (2010). Supporting Collaborative Transcription of Recorded Speech with a 3D Game Interface. In: Setchi, R., Jordanov, I., Howlett, R.J., Jain, L.C. (eds) Knowledge-Based and Intelligent Information and Engineering Systems. KES 2010. Lecture Notes in Computer Science(), vol 6279. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15384-6_42
Download citation
DOI: https://doi.org/10.1007/978-3-642-15384-6_42
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-15383-9
Online ISBN: 978-3-642-15384-6
eBook Packages: Computer ScienceComputer Science (R0)