Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Gaze-Contingent Auditory Displays for Improved Spatial Attention in Virtual Reality

Published: 28 April 2017 Publication History

Abstract

Virtual reality simulations of group social interactions are important for many applications, including the virtual treatment of social phobias, crowd and group simulation, collaborative virtual environments (VEs), and entertainment. In such scenarios, when compared to the real world, audio cues are often impoverished. As a result, users cannot rely on subtle spatial audio-visual cues that guide attention and enable effective social interactions in real-world situations. We explored whether gaze-contingent audio enhancement techniques driven by inferring audio-visual attention in virtual displays could be used to enable effective communication in cluttered audio VEs. In all of our experiments, we hypothesized that visual attention could be used as a tool to modulate the quality and intensity of sounds from multiple sources to efficiently and naturally select spatial sound sources. For this purpose, we built a gaze-contingent display (GCD) that allowed tracking of a user’s gaze in real-time and modifying the volume of the speakers’ voices contingent on the current region of overt attention. We compared six different techniques for sound modulation with a base condition providing no attentional modulation of sound. The techniques were compared in terms of source recognition and preference in a set of user studies. Overall, we observed that users liked the ability to control the sounds with their eyes. They felt that a rapid change in attenuation with attention but not the elimination of competing sounds (partial rather than absolute selection) was most natural. In conclusion, audio GCDs offer potential for simulating rich, natural social, and other interactions in VEs. They should be considered for improving both performance and fidelity in applications related to social behaviour scenarios or when the user needs to work with multiple audio sources of information.

References

[1]
F. Argelaguet and C. Andujar. 2013. A survey of 3D object selection techniques for virtual environments. Computers 8 Graphics 37, 3 (2013), 121--136.
[2]
J. J. Baldis. 2001. Effects of spatial audio on memory, comprehension, and preference during desktop conferences. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI’01). ACM, New York, NY, 166--173.
[3]
D. W. Batteau. 1967. The role of the pinna in human localization. Royal Society of London B: Biological Sciences 168 (1967), 158--180.
[4]
S. Benford, C. Greenhalgh, and D. Lloyd. 1997. Crowded collaborative virtual environments. In Proceedings of the ACM Conference on Human Factors in Computing Systems. 7, 2 (1997), 59--66.
[5]
G. Bente, F. Eschenburg, and N. C. Krämer. 2007. Virtual gaze. A pilot study on the effects of computer simulated gaze in avatar-based conversations. Virtual Reality 4563 (2007), 185--194.
[6]
V. Best, E. J. Ozmeral, and B. G. Shinn-Cunningham. 2007. Visually-guided attention enhances target identification in a complex auditory scene. Journal for the Association for Research in Otolaryngology 8 (2007), 294--304.
[7]
M. Billinghurst, J. Bowskill, M. Jessop, and J. Morphett. 1998. A wearable spatial conferencing space. In Proceedings of the 2nd International Symposium on Wearable Computers, 1998. IEEE, 76--83.
[8]
F. Biocca, C. Harms, and J. Burgoon. 2003. Toward a more robust theory and measure of social presence: Review and suggested criteria. Presence 12 (2003), 456--480.
[9]
R. A. Bolt. 1981. Gaze-orchestrated dynamic windows. In SIGGRAPH Computer Graphics. ACM, 109--119.
[10]
D. S. Brungart and B. D. Simpson. 2005. Optimizing the spatial configuration of a seven-talker speech display. ACM Transactions on Applied Perception 2, 4 (Oct. 2005), 430--436.
[11]
D. S Brungart and B. D. Simpson. 2007. Cocktail party listening in a dynamic multitalker environment. Perception and Psychophysics 69 (2007), 79--91.
[12]
D. S. Brungart, B. D. Simpson, M. A. Ericson, and K. R. Scott. 2001. Informational and energetic masking effects in the perception of multiple simultaneous talkers. The Journal of the Acoustical Society of America 110 (2001), 2527--2538.
[13]
E. Castellina and F. Corno. 2008. Multimodal gaze interaction in 3D virtual environments. In COGAIN 2008 Communication, Environment and Mobility Control by Gaze (2008). 1--5.
[14]
E. C. Cherry. 1953. Some experiments on the recognition of speech, with one and with two ears. The Journal of the Acoustical Society of America. 25 (1953), 975--979.
[15]
B. Cullen, D. Galperin, K. Collins, B. Kapralos, and A. Hogue. 2012. The effects of audio on depth perception in S3D games. In Proceedings of the Audio Mostly Conference: A Conference on Interaction with Sound. ACM, 32--39.
[16]
S. Deo, M. Billinghurst, N. Adams, and J. Lehikoinen. 2007. Experiments in spatial mobile audio-conferencing. In Proceedings of the 4th International Conference on Mobile Technology, Applications, and Systems and the 1st International Symposium on Computer Human Interaction in Mobile Technology (Mobility’07). ACM, New York, NY, 447--451.
[17]
R. Drullman and A. W. Bronkhorst. 2000. Multichannel speech intelligibility and talker recognition using monaural, binaural, and three-dimensional auditory presentation. Journal of the Acoustical Society of America 107, 4 (2000), 2224--2235.
[18]
A. T. Duchowski. 2000. Acuity-matching resolution degradation through wavelet coefficient scaling. Image Processing 9 (2000), 1437--1440.
[19]
A. T. Duchowski. 2007. Eye Tracking Methodology. Springer Science 8 Business Media.
[20]
J. P. Egan, E. C. Carterette, and E. J. Thwing. 1954. Some factors affecting multi-channel listening. Journal of the Acoustical Society of America 26, 5 (1954), 774--782.
[21]
M. R. Frater, J. F. Arnold, and A. Vahedian. 2001. Impact of audio on subjective assessment of video quality in videoconferencing applications. Circuits and Systems for Video Technology 11 (2001), 1059--1062.
[22]
W. S. Geisler and J. S. Perry. 1998. A real-time foveated multiresolution system for low-bandwidth video communication. In Human Vision and Electronic Imaging III, Vol. 3299. International Society for Optics and Photonics, 294--305.
[23]
S. Goose, J. Riedlinger, and S. Kodlahalli. 2005. Conferencing3: 3D audio conferencing and archiving services for handheld wireless devices. International Journal of Wireless and Mobile Computing 1, 1 (Nov. 2005), 5--13.
[24]
D. Hindus, M. S. Ackerman, S. Mainwaring, and B. Starr. 1996. Thunderwire: A field study of an audio-only media space. In Proceedings of the 1996 ACM Conference on Computer Supported Cooperative Work (CSCW’96). ACM, New York, NY, 238--247.
[25]
S. Ho, T. Foulsham, and A. Kingstone. 2015. Speaking and listening with the eyes: Gaze signaling during dyadic interactions. PLoS ONE 10, 8 (2015), 1--18.
[26]
H. Huang, R. S. Allison, and M. Jenkin. 2004. Combined head-eye tracking for immersive virtual reality. In Proceedings of the 2004 14th International Conference on Artificial Reality and Telexistance.
[27]
A. Hyrskykari, P. Majaranta, and K. J. Räihä. 2005. From gaze control to attentive interfaces. In Proceedings of the International Conference on Human-Computer Interaction.
[28]
R. J. Jacob. 1991. The use of eye movements in human-computer interaction techniques: What you look at is what you get. Information Systems 9 (1991), 152--169.
[29]
R. Kilgore, M. Chignell, and P. Smith. 2003. Spatialized audioconferencing: What are the benefits?. In Proceedings of the 2003 Conference of the Centre for Advanced Studies on Collaborative Research (CASCON’03). IBM Press, 135--144.
[30]
F. Kistler, M. Wißner, and E. André. 2010. Level of detail based behavior control for virtual characters. In Intelligent Virtual Agents. Springer-Verlag, 118--124.
[31]
P. Kortum and W. S. Geisler. 1996. Implementation of a foveated image coding system for image bandwidth reduction. Human Vision and Electronic Imaging 2657 (1996), 350--360.
[32]
B. Kunka and B. Kostek. 2010. Exploiting audio-visual correlation by means of gaze tracking. International Journal of Computer Science and Applications 7 (2010), 104--123.
[33]
B. Kunka, B. Kostek, M. Kulesza, P. Szczuko, and A. Czyzewski. 2010. Gaze-tracking-based audio-visual correlation analysis employing quality of experience methodology. Intelligent Decision Technologies 4 (2010), 217--227.
[34]
J. Lewald. 1998. The effect of gaze eccentricity on perceived sound direction and its relation to visual localization. Hearing Research 115 (1998), 206--216.
[35]
J. Lewald and W. H. Ehrenstein. 1996. The effect of eye position on auditory lateralization. Experimental Brain Research 108 (1996), 473--485.
[36]
V. Losing, T. Pfeiffer, L. Rottkamp, and M. Zeunert. 2014. Guiding visual search tasks using gaze-contingent auditory feedback. In Proceedings of the International Joint Conference on Pervasive and Ubiquitous Computing: Adjunct Publication. ACM, 1093--1102.
[37]
R. K. Maddox, D. A. Pospisil, G. C. Stecker, and A. K. Lee. 2014. Directing eye gaze enhances auditory spatial cue discrimination. Current Biology 24 (2014), 748--752.
[38]
G. Marmitt and A. T. Duchowski. 2002. Modeling visual attention in VR: Measuring the accuracy of predicted scanpaths. Eurographics. 217--226.
[39]
G. Mastoropoulou, K. Debattista, A. Chalmers, and T. Troscianko. 2005. The influence of sound effects on the perceived smoothness of rendered animations. In Proceedings of the Symposium on Applied Perception in Graphics and Visualization. ACM, 9--15.
[40]
N. Megiddo. 2003. System and methodology for video conferencing and internet chatting in a cocktail party style. (May 6, 2003). US Patent 6,559,863. Filing date: Feb. 11, 2000.
[41]
G. A. Miller. 1947. The masking of speech. Psychological Bulletin 44 (1947), 105--129.
[42]
A. Mortlock, D. Machin, S. McConnell, and P. Sheppard. 1997. Virtual conferencing. BT Technology Journal 15, 4 (1997), 120--129.
[43]
H. Nakanishi, C. Yoshida, T. Nishimura, and T. Ishida. 1996. FreeWalk: Supporting casual meetings in a network. In Proceedings of the 1996 ACM Conference on Computer Supported Cooperative Work (CSCW’96). ACM, New York, NY, 308--314.
[44]
J. O’Donovan, J. Ward, S. Hodgins, and V. Sundstedt. 2009. Rabbit run: Gaze and voice based game interaction. In Proceedings of the Eurographics Ireland Workshop.
[45]
K. Okada, F. Maeda, Y. Ichikawaa, and Y. Matsushita. 1994. Multiparty videoconferencing at virtual social distance: MAJIC design. In Proceedings of the 1994 ACM Conference on Computer Supported Cooperative Work (CSCW’94). ACM, New York, NY, 385--393.
[46]
J. Orlosky, T. Toyama, K. Kiyokawa, and D. Sonntag. 2015. ModulAR: Eye-controlled vision augmentations for head mounted displays. IEEE Transactions on Visualization and Computer Graphics 21, 11 (2015), 1259--1268.
[47]
D. J. Parkhurst and E. Niebur. 2002. Variable-resolution displays: A theoretical, practical, and behavioral evaluation. In Human Factors: The Journal of the Human Factors and Ergonomics Society 44 (2002), 611--629.
[48]
N. Pelechano, C. Stocker, J. Allbeck, and N. Badler. 2008. Being a part of the crowd: Towards validating VR crowds using presence. In Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems. International Foundation for Autonomous Agents and Multiagent Systems, 136--142.
[49]
A. N. Rimell, N. J. Mansfield, and D. Hands. 2008. The influence of content, task and sensory interaction on multimedia quality perception. Ergonomics 51 (2008), 85--97.
[50]
A. N. Rimell and A. Owen. 2000. The effect of focused attention on audio-visual quality perception with applications in multi-modal codec design. In Proceedings of the Conference on Acoustics, Speech, and Signal Processing. IEEE, 2377--2380.
[51]
A. Sellen, B. Buxton, and J. Arnott. 1992. Using spatial cues to improve videoconferencing. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI’92). ACM, New York, NY, 651--652.
[52]
A. J. Sellen. 1992. Speech patterns in video-mediated conversations. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI’92). ACM, New York, NY, 49--59.
[53]
S. A. Shamma, M. Elhilali, and C. Micheyl. 2011. Temporal coherence and attention in auditory scene analysis. Trends in Neurosciences 34 (2011), 114--23.
[54]
N. Sidorakis, G. A. Koulieris, and K. Mania. 2015. Binocular eye-tracking for the control of a 3D immersive multimedia user interface. In Proceedings of the 2015 IEEE 1st Workshop on Everyday Virtual Reality (WEVR). 15--18.
[55]
S. Sridharan, J. Pieszala, and R. Bailey. 2015. Depth-based subtle gaze guidance in virtual reality environments. In Proceedings of the ACM SIGGRAPH Symposium on Applied Perception (SAP’15). ACM, New York, NY, 132--132.
[56]
I. Starker and R. A Bolt. 1990. A gaze-responsive self-disclosing display. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, 3--10.
[57]
L. B. Stelmach and W. J. Tam. 1994. Processing image sequences based on eye movements. In International Symposium on Electronic Imaging: Science and Technology, Vol. 2179. International Society for Optics and Photonics, SPIE, 90--98.
[58]
S. S. Stevens. 1955. The measurement of loudness. The Journal of the Acoustical Society of America 27 (1955), 815--829.
[59]
J. W. Strutt. 1907. On our perception of sound direction. Philosophical Magazine 13 (1907), 214--232.
[60]
V. Tanriverdi and R. J. K. Jacob. 2000. Interacting with eye movements in virtual environments. In Human Factors in Computing Systems. ACM, 265--272.
[61]
N. Tsumura, C. Endo, H. Haneishi, and Y. Miyake. 1996. Image compression and decompression based on gazing area. Human Vision and Electronic Imaging 2657 (1996), 361--367.
[62]
H. Turner and D. Firth. 2007. Generalized Nonlinear Models in R: An Overview of the Gnm Package. (R package version 1.0-6). Technical Report. 472.
[63]
H. Turner and D. Firth. 2012. Bradley-Terry Models in R: The BradleyTerry2 Package. Journal of Statistical Software 48, 9 (2012).
[64]
L. Twardon, H. Koesling, A. Finke, and H. Ritter. 2013. Gaze-contingent audio-visual substitution for the blind and visually impaired. In Proceedings of the International Conference on Pervasive Computing Technologies for Healthcare. Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering, 129--136.
[65]
J. van der Kamp and V. Sundstedt. 2011. Gaze and voice controlled drawing. In Proceedings of the 1st Conference on Novel Gaze-Controlled Applications (NGCA’11). ACM, New York, NY.
[66]
R. Vertegaal. 1999. Designing awareness with attention-based groupware. In INTERACT. 245--255.
[67]
M. Vidal, R. Bismuth, A. Bulling, and H. Gellersen. 2015. The royal corgi: Exploring social gaze interaction for immersive gameplay. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems. ACM, 115--124.
[68]
M. Vinnikov and R. S. Allison. 2013. Gaze-contingent simulations of visual defects in virtual environment: Challenges and limitations. In Proceedings of the CHI 2013 Workshop on “Gaze Interaction in the Post--WIMP World.”
[69]
M. L. H. Võ, T. J. Smith, P. K. Mital, and J. M. Henderson. 2012. Do the eyes really have it? Dynamic allocation of attention when viewing moving faces. Journal of Vision 12 (2012), 1--14.
[70]
M. Wilms, L. Schilbach, U. Pfeiffer, G. Bente, G. R. Fink, and K. Vogeley. 2010. It’s in your eyes--using gaze-contingent stimuli to create truly interactive paradigms for social cognitive and affective neuroscience Social Cognitive and Affective Neuroscience 5 (2010), 98--107.
[71]
S. Winkler and C. Faller. 2005. Audiovisual quality evaluation of low-bitrate video. In Human Vision and Electronic Imaging X, Bernice E. Rogowitz, Thrasyvoulos N. Pappas, and Scott J. Daly (Eds.), Vol. 5666. SPIE, 139--148.
[72]
H. A. Witkin and T. Leventhal. 1952. Sound localization with conflicting visual and auditory cues. Journal of Experimental Psychology 43 (1952), 58--67.
[73]
T. Yonezawa, H. Yamazoe, A. Utsumi, and S. Abe. 2007. Gaze-communicative behavior of stuffed-toy robot with joint attention and eye contact based on ambient gaze-tracking. In Proceedings of the International Conference on Multimodal Interfaces. 140--145.

Cited By

View all
  • (2024)Gamified immersive safety training in virtual reality: a mixed methods approachJournal of Workplace Learning10.1108/JWL-01-2024-000836:7(516-538)Online publication date: 25-Jun-2024
  • (2022)Communication in Immersive Social Virtual Reality: A Systematic Review of 10 Years’ StudiesProceedings of the Tenth International Symposium of Chinese CHI10.1145/3565698.3565767(27-37)Online publication date: 22-Oct-2022
  • (2022)Immersive virtual reality in the age of the Metaverse: A hybrid-narrative review based on the technology affordance perspectiveThe Journal of Strategic Information Systems10.1016/j.jsis.2022.10171731:2(101717)Online publication date: Jun-2022
  • Show More Cited By

Index Terms

  1. Gaze-Contingent Auditory Displays for Improved Spatial Attention in Virtual Reality

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Transactions on Computer-Human Interaction
      ACM Transactions on Computer-Human Interaction  Volume 24, Issue 3
      June 2017
      244 pages
      ISSN:1073-0516
      EISSN:1557-7325
      DOI:10.1145/3086563
      Issue’s Table of Contents
      Publication rights licensed to ACM. ACM acknowledges that this contribution was co-authored by an affiliate of the Canadian National Government. As such, the Crown in Right of Canada retains an equal interest in the copyright. Reprint requests should be forwarded to ACM, and reprints must include clear attribution to ACM and National Research Council Canada -NRC.

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 28 April 2017
      Accepted: 01 December 2016
      Revised: 01 December 2016
      Received: 01 August 2015
      Published in TOCHI Volume 24, Issue 3

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Gaze-contingent displays
      2. sound modulation
      3. user experience
      4. visual-audio attention

      Qualifiers

      • Research-article
      • Research
      • Refereed

      Funding Sources

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)48
      • Downloads (Last 6 weeks)8
      Reflects downloads up to 03 Feb 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Gamified immersive safety training in virtual reality: a mixed methods approachJournal of Workplace Learning10.1108/JWL-01-2024-000836:7(516-538)Online publication date: 25-Jun-2024
      • (2022)Communication in Immersive Social Virtual Reality: A Systematic Review of 10 Years’ StudiesProceedings of the Tenth International Symposium of Chinese CHI10.1145/3565698.3565767(27-37)Online publication date: 22-Oct-2022
      • (2022)Immersive virtual reality in the age of the Metaverse: A hybrid-narrative review based on the technology affordance perspectiveThe Journal of Strategic Information Systems10.1016/j.jsis.2022.10171731:2(101717)Online publication date: Jun-2022
      • (2019)Audio-augmented museum experiences with gaze trackingProceedings of the 18th International Conference on Mobile and Ubiquitous Multimedia10.1145/3365610.3368415(1-5)Online publication date: 26-Nov-2019
      • (2019)A Systematic Review of a Virtual Reality System from the Perspective of User ExperienceInternational Journal of Human–Computer Interaction10.1080/10447318.2019.1699746(1-18)Online publication date: 13-Dec-2019

      View Options

      Login options

      Full Access

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media