Abstract
In this paper, we describe a user study evaluating the usability of an augmented reality (AR) multimodal interface (MMI). We have developed an AR MMI that combines free-hand gesture and speech input in a natural way using a multimodal fusion architecture. We describe the system architecture and present a study exploring the usability of the AR MMI compared with speech-only and 3D-hand-gesture-only interaction conditions. The interface was used in an AR application for selecting 3D virtual objects and changing their shape and color. For each interface condition, we measured task completion time, the number of user and system errors, and user satisfactions. We found that the MMI was more usable than the gesture-only interface conditions, and users felt that the MMI was more satisfying to use than the speech-only interface conditions; however, it was neither more effective nor more efficient than the speech-only interface. We discuss the implications of this research for designing AR MMI and outline directions for future work. The findings could also be used to help develop MMIs for a wider range of AR applications, for example, in AR navigation tasks, mobile AR interfaces, or AR game applications.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Azuma RT (1997) A survey of augmented reality. Presence: Teleoperators Virtual Environ 6(4):355–385
Bevan N (1995) Measuring usability as quality of use. Softw Qual J 4:111–150
Bolt RA (1980) Put-that-there: voice and gesture at the graphics interface. Proc Annu Conf Comput Graph Interact Tech 14(3):262–270
Borgefors G (1986) Distance transformations in digital images. Comput Vis Graph Image Process 34:344–371
Chai D, Bouzerdoum A (2000) A Bayesian approach to skin color classification in YCbCr color space. Proc IEEE TENCONO’00 2:421–424
Chu CP, Dani TH, Gadh R (1997) Multimodal interface for a virtual reality based computer aided design system. Proc IEEE Int Conf Robot Automat 2:1329–1334
Cohen PR, Sullivan JW (1989) Synergistic user of direct manipulation and natural language. In: CHI'89 Proceedings of the SIGCHI conference on human factors in computing systems. ACM Press, New York, pp 227–233
Cohen PR, Johnston M, McGee D, Oviatt S, Pittman J, Smith I, Chen L, Clow J (1997) QuickSet: multimodal interaction for distributed applications. In: Proceedings of the fifth ACM international conference on multimedia. ACM Press, New York, pp 31–40
Fels S, Hinton G (1995) Glove-TalkII: an adaptive gesture-to-formant interface. In: CHI'95 Proceedings of the SIGCHI conference on human factors in computing systems. ACM Press, New York, pp 456–463
Frøkjær E, Hertzum M, Hornbæk K (2000) Measuring usability: are effectiveness, efficiency, and satisfaction really correlated? CHI Conf Proc 2(1):345–352
Hartley R, Zisserman A (2004) Multiple view geometry in computer vision. Cambridge University Press, Cambridge
Hauptmann AG (1989) Speech and gestures for graphic image manipulation. CHI Conf Proc 241–245
Heidemann G, Bax I, Bekel H (2004) Multimodal interaction in an augmented reality scenario. In: ICMI’04 Proceedings of the 6th international conference on multimodal interfaces. ACM, New York, pp 53–60
Irawati S, Green S, Billinghurst M, Duenser A, Ko H (2006a) Move the couch where? Developing an augmented reality multimodal interface. ICAT: 1–4
Irawati S, Green S, Billinghurst M, Duenser A, Ko H (2006b) An evaluation of an augmented reality multimodal interface using speech and paddle gestures. In: Advances in artificial reality and tele-existence, Lecture notes in computer science, vol 4282. pp 272–283
LaViola Jr. JJ (1999) A multimodal interface framework for using hand gestures and speech in virtual environment applications. Gesture-Based Commun Hum Comp Interact 303–341
Kaiser E, Olwal A, McGee D, Benko H, Corradini A, Li X, Cohen P, Feiner S (2003) Mutual disambiguation of 3D multimodal interaction in augmented and virtual reality. Proceedings of international conference on multimodal interfaces 12–19
Kato H, Billinghurst M, Poupyrev I, Imamoto K, Tachibana K (2000) Virtual object manipulation on a table-top AR environment. In: Proceedings of the international symposium on augmented reality (ISAR 2000). Munich, Germany, pp 111–119
Kölsch M, Turk M, Tobias H (2004) Vision-based interfaces for mobility. Proc MobiQuitous’04 86–94
Kölsch M, Turk M, Tobias H (2006) Multimodal interaction with a wearable augmented reality system. IEEE Comput Graph Appl 26(3):62–71
Koons DB, Sparrell CJ (1994) ICONIC: speech and depictive gestures at the human-machine interface. In: CHI'94 Conference companion on human factors in computing systems. ACM, New York, pp 453–454
Krum DM, Omotesto O, Ribarsky W, Starner T, Hodges LF (2002) Speech and gesture control of a whole earth 3D visualization environment. Proc Jt Eurograph-IEEE TCVG Symp Vis 195–200
Latoschik ME (2001) A gesture processing framework for multimodal interaction in virtual reality. AFRIGRAPH 2001:95–100
Lee M, Billinghurst M (2008) A wizard of Oz study for an AR multimodal interface. Proc Int Conf Multimod Interfaces 249–256
Lucente M, Zwart GJ, George AD (1998) Visualization space: a testbed for deviceless multimodal user interface. Proc AAAI Spring Symp Intell Environ. AAAI TR SS-98-02
Olwal A, Benko H, Feiner S (2003) Sense shapes: using statistical geometry for object selection in a multimodal augmented reality system. Proc Int Symp Mix Augment Real 300–301
Oviatt S, Coulson R, Lunsford R (2004) When Do We Interact Multimodally? Cognitive load and multimodal communication patterns. Proc Int Conf Multimod Interfaces 129–136
Point Grey Research Inc (2009) http://www.ptgrey.com/products/stereo.asp. Accessed 20 Nov 2009 [26]
Quek F, McNeil D, Bryll R, Duncan S, Ma X, Kirbas C, McCullough KE, Ansari R (2002) Multimodal human discourse: gesture and speech. TOCHI 9(3):171–193
Rauschert I, Agrawal P, Sharmar R, Fuhrmann S, Brewer I, MacEachren A, Wang H, Cai G (2002) Designing a human-centered, multimodal GIS interface to support emergency management. Proc Geogrc Inf Syst 119–124
Shneiderman B (2000) The limits of speech recognition. Commun ACM 43(9):63–65
Tse E, Greenberg S, Shen C (2006) GSI DEMO: Multiuser gesture/speech interaction over digital tables by wrapping single user applications. Proc Int Conf Multimod Interfaces 76–83
Weimer D, Genapathy SK (1989) A synthetic visual environment with hand gesturing and voice input. Proc Conf Hum Factors Comput Syst 235–240
Zhang Z (2000) A flexible new technique for camera calibration. IEEE Trans Pattern Anal Mach Intell 1330–1334
Acknowledgments
This work was supported by the DigiLog Miniature Augmented Reality Research Program funded by KAIST Research Foundation. It was supported by the Global Frontier R&D Program on <Human-centered Interaction for Coexistence> funded by the National Research Foundation of Korea grant funded by the Korean Government (MSIP) (NRF-2010-0029751).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Lee, M., Billinghurst, M., Baek, W. et al. A usability study of multimodal input in an augmented reality environment. Virtual Reality 17, 293–305 (2013). https://doi.org/10.1007/s10055-013-0230-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10055-013-0230-0