Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article

Speech technology on trial: Experiences from the August system

Published: 01 September 2000 Publication History

Abstract

In this paper, the August spoken dialogue system is described. This experimental Swedish dialogue system, which featured an animated talking agent, was exposed to the general public during a trial period of six months. The construction of the system was partly motivated by the need to collect genuine speech data from people with little or no previous experience of spoken dialogue systems. A corpus of more than 10,000 utterances of spontaneous computer- directed speech was collected and empirical linguistic analyses were carried out. Acoustical, lexical and syntactical aspects of this data were examined. In particular, user behavior and user adaptation during error resolution were emphasized. Repetitive sequences in the database were analyzed in detail. Results suggest that computer-directed speech during error resolution is increased in duration, hyperarticulated and contains inserted pauses. Design decisions which may have influenced how the users behaved when they interacted with August are discussed and implications for the development of future systems are outlined.

References

[1]
Allen J. F., Miller, B. W., Ringger, E. K. and Sikorski, T. (1996) Robust understanding in a dialogue system. Proc. of 34th meeting of the Association for Computational Linguistics.
[2]
Alleva, F. Huang, X., Hwang, M.-Y. and Jiang, L. (1997) Can continuous speech recognizers handle isolated speech? Proc. of Eurospeech '97, pp. 911-914.
[3]
Aust, H., Oerder, M., Seide, F. and Steinbiss, V. (1995) The Philips automatic train timetable information system. Speech Communication 17(3-4): 249-262.
[4]
Bell, L. and Gustafson, J. (1999a) Interaction with an animated agent in a spoken dialogue system. Proc. of Eurospeech '99, pp. 1143-1146.
[5]
Bell, L. and Gustafson, J. (1999b) Repetition and its phonetic realizations: Investigating a Swedish database of spontaneous computer-directed speech. Proceedings of ICPhS '99, pp. 1221-1224.
[6]
Bertenstam, J., Beskow, J., Blomberg, M., Carlson, R., Elenius, K., Granström, B., Gustafson, J., Hunnicutt, S., Högberg, J., Lindell, R., Neovius, L., Nord, L., Serpa-Leitao, A. and Ström, N. (1995) The Waxholm System - A Progress Report. Proc. of ESCA Tutorial and Workshop on Spoken Dialogue Systems, pp. 81-84.
[7]
Brennan, S. (1996) Lexical entrainment in spontaneous dialog. Proc. of ISSD, pp. 41-44.
[8]
Cheyer, A., Julia, L. and Martin, J. C. (1998) A unified framework for constructing multimodal experiments and applications. Proc. of CMC'98.
[9]
Chung, G., Seneff, S. and Hetherington, L. (1999) Towards multi-domain speech understanding using a two-stage recognizer. Proc. of Eurospeech '99, pp. 2655-2658.
[10]
Daelemans, W., Zavrel, J., van der Sloot, K. and van den Bosch, A. (1998) TiMBL: Tilburg Memory Based Learner, version 1.0, Reference Guide, LK Technical Report 98-03.
[11]
Dutoit, T., Pagel, V., Pierret, N., Bataille, F. and van der Vreken, O. (1996) The MBROLA Project: Towards a set of high-quality speech synthesizers free of use for noncommercial purposes. Proc. of ICSLP '96, Philadelphia, pp. 1393-1396.
[12]
Eskenazi, M., Rudnicky, A., Gregory, K., Constantinides, P., Brennan, R., Bennett, C. and Allen, J. (1999) Data collection and processing in the Carnegie Mellon communicator. Proceedings of Eurospeech '99, pp. 2695-2698.
[13]
Fraser, N. (1997) Assessment of Interactive Systems. In: Gibbon, D., Moore, R. and Winski, R. (eds.), EAGLES Handbook of Standards and Resources for Spoken Language Systems.
[14]
Gauvain, J. L., Bennacef, S., Devillers, L., Lamel, L. F. and Rosset, S. (1995) The spoken language component of the mask kiosk. Proc. of Human Comfort & Security Workshop.
[15]
Gustafson, J., Lundeberg, M. and Liljencrants, J. (1999) Experiences from the development of August - a multimodal spoken dialogue system. Proc. of IDS '99, pp. 81-85.
[16]
Gustafson, J., Larsson, A., Carlson, R. and Hellman, K. (1997) How do system questions influence lexical choices in user answers? Proc. of Eurospeech '97, pp. 2275-2278.
[17]
Heeman, P. A., Johnston, M., Denney, J. and Kaiser, E. (1998) Beyond structured dialogues: Factoring out grounding. Proc. of ICSLP '98, pp. 863-866.
[18]
Kamm, C. A., Litman, D. J. and Walker, M. A. (1998) From novice to expert: The effect of tutorials on user expertise with spoken dialogue systems. Proc. ICSLP '98, pp. 1211-1214.
[19]
Kennedy, A., Wilkes, A., Elder, L. and Murray, W. (1988) Dialogue with machines. Cognition 30: 73-105.
[20]
Lamel, L., Bennacef, S., Gauvain, J. L., Dartigues, H. and Temem, J. N. (1998) User evaluation of the Mask Kiosk. Proc. of ICSLP '98, pp. 2875-2878.
[21]
Lamel, L. F., Bennacef, S. K., Rosset, S., Devillers, L., Foukia, S., Gangolf, J. J. and Gauvain, J. L. (1997) The LIMSI RailTel System: Field trial of a telephone service for rail travel information. Speech Communication 23(1-2): 67-82.
[22]
Lindberg, N. (2000) Data driven methods in natural language processing - Two applications. Licentiate Thesis, Royal Institute of Technology, Stockholm.
[23]
Lundeberg, M. and Beskow, J. (1999) Developing a 3D-agent for the August dialogue system. Proc. of AVSP Workshop, pp. 151-154.
[24]
Nass, C. and Steuer, S. (1993) Voices, boxes, and sources of messages: Computers and social fctors. Human Communication Research 19(4): 504-527.
[25]
Oviatt, S. L., Cohen, P. R. and Wang, M. Q. (1994) Toward interface design for human language technology: Modality and structure as determinants of linguistic complexity. Speech Communication 15(3-4): 283-300.
[26]
Oviatt, S. L., Levow, G. A., MacEachern, M. and Kuhn, K. (1996) Modeling hyperarticulate speech during human-computer error resolution. Proc. of ICSLP '96, pp. 801-804.
[27]
Pargellis, A., Hong-Kwang, J. K. and Lee, C.-H. (1999) Automatic application generator matches user expectations to system capabilities. Proc. of IDS '99, pp. 37-40.
[28]
Ström, N. (1997) Automatic Continuous Speech Recognition with Rapid Speaker Adaptation for Human/Machine Interaction. PhD Thesis, Royal Institute of Technology, Stockholm.
[29]
Thomas, J. C. (1995) Human factors in lifecycle development. In: Syrdal, A., Bennett, R. and Greenspan, S. (eds.), Applied Speech Technology. Boca Raton: CRC Press.
[30]
Yankelovich, N. (1996) How do users know what to say? ACM Interactions 3(6).
[31]
Zoltan-Ford, E. (1991) How to get people to say and type what computers can understand. Int. J. Man-Machine Studies 34: 527-547.
[32]
Zue, V., Seneff, S., Glass, J., Hetherington, L., Hurley, E., Meng, H., Pao, C., Polifroni, J., Schloming, R. and Schmid, P. (1997) From interface to content: Translingual access and delivery of on-line information. Proc. of Eurospeech '97, pp. 2227-2230.

Cited By

View all
  • (2021)A Systematic Cross-Corpus Analysis of Human Reactions to Robot Conversational FailuresProceedings of the 2021 International Conference on Multimodal Interaction10.1145/3462244.3479887(112-120)Online publication date: 18-Oct-2021
  • (2018)All Work and No Play?Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems10.1145/3173574.3173577(1-13)Online publication date: 21-Apr-2018
  • (2011)A two-stage domain selection framework for extensible multi-domain spoken dialogue systemsProceedings of the SIGDIAL 2011 Conference10.5555/2132890.2132894(18-29)Online publication date: 17-Jun-2011
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Natural Language Engineering
Natural Language Engineering  Volume 6, Issue 3-4
September 2000
170 pages

Publisher

Cambridge University Press

United States

Publication History

Published: 01 September 2000

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 03 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2021)A Systematic Cross-Corpus Analysis of Human Reactions to Robot Conversational FailuresProceedings of the 2021 International Conference on Multimodal Interaction10.1145/3462244.3479887(112-120)Online publication date: 18-Oct-2021
  • (2018)All Work and No Play?Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems10.1145/3173574.3173577(1-13)Online publication date: 21-Apr-2018
  • (2011)A two-stage domain selection framework for extensible multi-domain spoken dialogue systemsProceedings of the SIGDIAL 2011 Conference10.5555/2132890.2132894(18-29)Online publication date: 17-Jun-2011
  • (2009)A review of ASR technologies for children's speechProceedings of the 2nd Workshop on Child, Computer and Interaction10.1145/1640377.1640384(1-8)Online publication date: 5-Nov-2009
  • (2008)Towards human-like spoken dialogue systemsSpeech Communication10.1016/j.specom.2008.04.00250:8-9(630-645)Online publication date: 1-Aug-2008
  • (2006)Virtual humansproceedings of the 21st national conference on Artificial intelligence - Volume 210.5555/1597348.1597437(1543-1545)Online publication date: 16-Jul-2006

View Options

View options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media