Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Scene-aware audio for 360° videos

Published: 30 July 2018 Publication History

Abstract

Although 360° cameras ease the capture of panoramic footage, it remains challenging to add realistic 360° audio that blends into the captured scene and is synchronized with the camera motion. We present a method for adding scene-aware spatial audio to 360° videos in typical indoor scenes, using only a conventional mono-channel microphone and a speaker. We observe that the late reverberation of a room's impulse response is usually diffuse spatially and directionally. Exploiting this fact, we propose a method that synthesizes the directional impulse response between any source and listening locations by combining a synthesized early reverberation part and a measured late reverberation tail. The early reverberation is simulated using a geometric acoustic simulation and then enhanced using a frequency modulation method to capture room resonances. The late reverberation is extracted from a recorded impulse response, with a carefully chosen time duration that separates out the late reverberation from the early reverberation. In our validations, we show that our synthesized spatial audio matches closely with recordings using ambisonic microphones. Lastly, we demonstrate the strength of our method in several applications.

Supplementary Material

MP4 File (111-567.mp4)
MP4 File (a111-li.mp4)

References

[1]
Robert Anderson, David Gallup, Jonathan T. Barron, Janne Kontkanen, Noah Snavely, Carlos Hernández, Sameer Agarwal, and Steven M Seitz. 2016. Jump: virtual reality video. ACM Trans. on Graph. 35, 6 (2016), 198.
[2]
Stefan Bilbao. 2009. Numerical Sound Synthesis. John Wiley & Sons, Ltd.
[3]
Chunxiao Cao, Zhong Ren, Carl Schissler, Dinesh Manocha, and Kun Zhou. 2016. Interactive Sound Propagation with Bidirectional Path Tracing. ACM Trans. Graph. 35, 6 (2016), 180:1--180:11.
[4]
Trevor J Cox, Peter D'Antonio, and Mark R Avis. 2004. Room sizing and optimization at low frequencies. Journal of the Audio Engineering Society 52, 6 (2004), 640--651.
[5]
F Dunn, WM Hartmann, DM Campbell, and Neville H Fletcher. 2015. Springer handbook of acoustics. Springer.
[6]
Angelo Farina. 2000. Simultaneous measurement of impulse response and distortion with a swept-sine technique. In Audio Engineering Society Convention 108. Audio Engineering Society.
[7]
Angelo Farina. 2007. Advancements in Impulse Response Measurements by Sine Sweeps. In Audio Engineering Society Convention 122. Audio Engineering Society.
[8]
Thomas Funkhouser, Ingrid Carlbom, Gary Elko, Gopal Pingali, Mohan Sondhi, and Jim West. 1998. A Beam Tracing Approach to Acoustic Modeling for Interactive Virtual Environments. In Proc. SIGGRAPH '98. 21--32.
[9]
Mark B. Gardner. 1968. Historical Background of the Haas and/or Precedence Effect. The Journal of the Acoustical Society of America 43, 6 (1968), 1243--1248.
[10]
François. G. Germain, Gautham. J. Mysore, and Takako. Fujioka. 2016. Equalization matching of speech recordings in real-world environments. In International Conference on Acoustics, Speech and Signal Processing (ICASSP). 609--613.
[11]
Murray Hodgson. 1991. Evidence of diffuse surface reflections in rooms. The Journal of the Acoustical Society of America 89, 2 (1991), 765--771.
[12]
J. Huang, Z. Chen, D. Ceylan, and H. Jin. 2017. 6-DOF VR videos with a single 360-camera. In 2017 IEEE Virtual Reality (VR). 37--44.
[13]
Zeyu Jin, Gautham J. Mysore, Stephen DiVerdi, Jingwan Lu, and Adam Finkelstein. 2017. VoCo: Text-based Insertion and Replacement in Audio Narration. ACM Trans. on Graph. 36, 4 (2017).
[14]
Gary S. Kendall. 1995. The Decorrelation of Audio Signals and Its Impact on Spatial Imagery. Computer Music Journal 19, 4 (1995), 71--87.
[15]
Johannes Kopf. 2016. 360 video stabilization. ACM Trans. on Graph. 35, 6 (2016), 195.
[16]
Heinrich Kuttruff. 2017. Room Acoustics (sixth ed.). CRC Press.
[17]
Jungjin Lee, Bumki Kim, Kyehyun Kim, Younghui Kim, and Junyong Noh. 2016. Rich360: optimized spherical representation from structured panoramic camera arrays. ACM Trans. on Graph. 35, 4 (2016), 63.
[18]
Stephen Robert Marschner and Donald P Greenberg. 1998. Inverse rendering for computer graphics. Cornell University.
[19]
Kevin Matzen, Michael F Cohen, Bryce Evans, Johannes Kopf, and Richard Szeliski. 2017. Low-cost 360 stereo photography and video capture. ACM Trans. on Graph. 36, 4 (2017), 148.
[20]
Athanasios Papoulis. 1977. Signal analysis. Vol. 191. McGraw-Hill New York.
[21]
Jackson Pope, David Creasey, and Alan Chalmers. 1999. Realtime Room Acoustics Using Ambisonics. In Audio Engineering Society Conference: 16th International Conference: Spatial Sound Reproduction.
[22]
N. Raghuvanshi, R. Narain, and M. C. Lin. 2009. Efficient and Accurate Sound Propagation Using Adaptive Rectangular Decomposition. IEEE Transactions on Visualization and Computer Graphics 15, 5 (2009), 789--801.
[23]
Nikunj Raghuvanshi and John Snyder. 2014. Parametric Wave Field Coding for Pre-computed Sound Propagation. ACM Trans. on Graph. 33, 4, Article 38 (July 2014), 11 pages.
[24]
Nikunj Raghuvanshi, John Snyder, Ravish Mehra, Ming Lin, and Naga Govindaraju. 2010. Precomputed Wave Simulation for Real-time Sound Propagation of Dynamic Sources in Complex Scenes. ACM Trans. on Graph. 29, 4, Article 68 (2010), 68:1--68:11 pages.
[25]
Zhimin Ren, Hengchin Yeh, and Ming C Lin. 2013. Example-guided physically based modal sound synthesis. ACM Trans. on Graph. 32, 1 (2013), 1.
[26]
Steve Rubin, Floraine Berthouzoz, Gautham J. Mysore, Wilmot Li, and Maneesh Agrawala. 2013. Content-based Tools for Editing Audio Stories. In Proc. UIST '13. 113--122.
[27]
Lauri Savioja and U. Peter Svensson. 2015. Overview of geometrical room acoustic modeling techniques. The Journal of the Acoustical Society of America 138, 2 (2015), 708--730.
[28]
Carl Schissler, Christian Loftin, and Dinesh Manocha. 2017a. Acoustic Classification and Optimization for Multi-Modal Rendering of Real-World Scenes. IEEE Transactions on Visualization and Computer Graphics (2017).
[29]
Carl Schissler, Ravish Mehra, and Dinesh Manocha. 2014. High-order Diffraction and Diffuse Reflections for Interactive Sound Propagation in Large Environments. ACM Trans. Graph. 33, 4, Article 39 (July 2014), 12 pages.
[30]
Carl Schissler, Aaron Nicholls, and Ravish Mehra. 2016. Efficient HRTF-based spatial audio for area and volumetric sources. IEEE transactions on visualization and computer graphics 22, 4 (2016), 1356--1366.
[31]
Carl Schissler, Peter Stirling, and Ravish Mehra. 2017b. Efficient construction of the spatial room impulse response. In Virtual Reality (VR), 2017 IEEE. IEEE, 122--130.
[32]
Efstathios Stavrakis, Nicolas Tsingos, and Paul Calamia. 2008. Topological Sound Propagation with Reverberation Graphs. Acta Acustica/Acustica - the Journal of the European Acoustics Association (EAA) (2008).
[33]
James Traer and Josh H. McDermott. 2016. Statistics of natural reverberation enable perceptual separation of sound and space. Proceedings of the National Academy of Sciences 113, 48 (2016), E7856--E7865. arXiv:http://www.pnas.org/content/113/48/E7856.full.pdf
[34]
Nicolas Tsingos. 2009. Precomputing Geometry-Based Reverberation Effects for Games. In Audio Engineering Society Conference: 35th International Conference: Audio for Games. http://www.aes.org/e-lib/browse.cfm?elib=15164
[35]
Nicolas Tsingos, Thomas Funkhouser, Addy Ngan, and Ingrid Carlbom. 2001. Modeling Acoustics in Virtual Environments Using the Uniform Theory of Diffraction. In Proceedings of the 28th Annual Conference on Computer Graphics and Interactive Techniques (SIGGRAPH '01). ACM, New York, NY, USA, 545--552.
[36]
Michael Vorländer. 2008. Auralization: Fundamentals of Acoustics, Modelling, Simulation, Algorithms and Acoustic Virtual Reality (RWTHedition) (2008 ed.). Springer.
[37]
Ciyou Zhu, Richard H. Byrd, Peihuang Lu, and Jorge Nocedal. 1997. Algorithm 778: L-BFGS-B: Fortran Subroutines for Large-scale Bound-constrained Optimization. ACM Trans. Math. Softw. 23, 4 (Dec. 1997), 550--560.
[38]
Franz Zotter, Hannes Pomberger, and Matthias Frank. 2009. An alternative ambisonics formulation: Modal source strength matching and the effect of spatial aliasing. In Audio Engineering Society Convention 126. Audio Engineering Society.

Cited By

View all
  • (2024)LAVSS: Location-Guided Audio-Visual Spatial Audio Separation2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)10.1109/WACV57701.2024.00542(5496-5507)Online publication date: 3-Jan-2024
  • (2024)SVGC-AVA: 360-Degree Video Saliency Prediction With Spherical Vector-Based Graph Convolution and Audio-Visual AttentionIEEE Transactions on Multimedia10.1109/TMM.2023.330659626(3061-3076)Online publication date: 2024
  • (2024)Unified Audio-Visual Saliency Model for Omnidirectional Videos With Spatial AudioIEEE Transactions on Multimedia10.1109/TMM.2023.327102226(764-775)Online publication date: 2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Graphics
ACM Transactions on Graphics  Volume 37, Issue 4
August 2018
1670 pages
ISSN:0730-0301
EISSN:1557-7368
DOI:10.1145/3197517
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 July 2018
Published in TOG Volume 37, Issue 4

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. 360° videos
  2. ambisonic audio

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)55
  • Downloads (Last 6 weeks)9
Reflects downloads up to 23 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)LAVSS: Location-Guided Audio-Visual Spatial Audio Separation2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)10.1109/WACV57701.2024.00542(5496-5507)Online publication date: 3-Jan-2024
  • (2024)SVGC-AVA: 360-Degree Video Saliency Prediction With Spherical Vector-Based Graph Convolution and Audio-Visual AttentionIEEE Transactions on Multimedia10.1109/TMM.2023.330659626(3061-3076)Online publication date: 2024
  • (2024)Unified Audio-Visual Saliency Model for Omnidirectional Videos With Spatial AudioIEEE Transactions on Multimedia10.1109/TMM.2023.327102226(764-775)Online publication date: 2024
  • (2024)Visually Guided Binaural Audio Generation with Cross-Modal ConsistencyICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)10.1109/ICASSP48485.2024.10446399(7980-7984)Online publication date: 14-Apr-2024
  • (2023)Binaural SoundNet: Predicting Semantics, Depth and Motion With Binaural SoundsIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2022.315564345:1(123-136)Online publication date: 1-Jan-2023
  • (2023)FoleyGAN: Visually Guided Generative Adversarial Network-Based Synchronous Sound Generation in Silent VideosIEEE Transactions on Multimedia10.1109/TMM.2022.317789425(4508-4519)Online publication date: 2023
  • (2023)A Composite T60 Regression and Classification Approach for Speech DereverberationIEEE/ACM Transactions on Audio, Speech, and Language Processing10.1109/TASLP.2023.324542331(1013-1023)Online publication date: 2023
  • (2023)A Survey on 360° Images and Videos in Mixed Reality: Algorithms and ApplicationsJournal of Computer Science and Technology10.1007/s11390-023-3210-138:3(473-491)Online publication date: 30-May-2023
  • (2022)Room Acoustic Properties Estimation from a Single 360° Photo2022 30th European Signal Processing Conference (EUSIPCO)10.23919/EUSIPCO55093.2022.9909598(857-861)Online publication date: 29-Aug-2022
  • (2022)GWA: A Large High-Quality Acoustic Dataset for Audio ProcessingACM SIGGRAPH 2022 Conference Proceedings10.1145/3528233.3530731(1-9)Online publication date: 27-Jul-2022
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media