Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3264869.3264876acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

An Audio-Visual Method for Room Boundary Estimation and Material Recognition

Published: 26 October 2018 Publication History

Abstract

In applications such as virtual and augmented reality, a plausible and coherent audio-visual reproduction can be achieved by deeply understanding the reference scene acoustics. This requires knowledge of the scene geometry and related materials. In this paper, we present an audio-visual approach for acoustic scene understanding. We propose a novel material recognition algorithm, that exploits information carried by acoustic signals. The acoustic absorption coefficients are selected as features. The training dataset was constructed by combining information available in the literature, and additional labeled data that we recorded in a small room having short reverberation time (RT60). Classic machine learning methods are used to validate the model, by employing data recorded in five rooms, having different sizes and RT60s. The estimated materials are utilized to label room boundaries, reconstructed by a vision-based method. Results show 89% and 80% agreement between the estimated and reference room volumes and materials, respectively.

References

[1]
Y. El Baba, A. Walther, and E. A. P. Habets. 2018. 3D Room Geometry Inference Based on Room Impulse Response Stacks. IEEE/ACM Transactions on Audio, Speech and Language Processing 26, 5 (2018), 857--872.
[2]
V. Badrinarayanan, A. Kendall, and R. Cipolla. 2017. SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence 39, 12 (2017).
[3]
S. Bech. 1998. Spatial aspects of reproduced sound in small rooms. J. Acoustical Society of America 103, 1 (1998), 434--445.
[4]
S. Bell, P. Upchurch, N. Snavely, and K. Bala. 2015. Material Recognition in the Wild With the Materials in Context Database. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[5]
L. P. Berg and J. M. Vance. 2017. Industry use of virtual reality in product design and manufacturing: a survey. Virtual Reality 21, 1 (2017), 1--17.
[6]
N. Bhatia and Vandana. 2010. Survey of Nearest Neighbor Techniques. International Journal of Computer Science and Information Security 8, 2 (2010), 302--305.
[7]
M. Billinghurst, A. Clark, and G. Lee. 2015. A Survey of Augmented Reality. Foundations and Trends®in Human-Computer Interaction 8, 2--3 (2015), 73--272.
[8]
N. Bonneel, C. Suied, I. Viaud-Delmon, and G. Drettakis. 2010. Bimodal Perception of Audio-visual Material Properties for Virtual Environments. ACM Transacions on Applied Perception 7, 1 (2010), 1:1--1:16.
[9]
K. Chen, Y.-K. Lai, and S.-M. Hu. 2015. 3D indoor scene modeling from RGB-D data: a survey. Computational Visual Media 1, 4 (2015), 267--278.
[10]
S. Choi, Q.-Y. Zhou, and V. Koltun. 2015. Robust Reconstruction of Indoor Scenes. In Proc. CVPR.
[11]
P. Coleman, A. Franck, D. Menzies, and P. J. B. Jackson. Berlin, Germany, 2017. Object-based reverberation encoding from first-order Ambisonic RIRs. In Proc. of the 142th AES Conv.
[12]
P. Coleman, L. Remaggi, and P. J. B. Jackson. 2015. S3A room impulse responses. (2015).
[13]
T. Cox and P. D'Antonio. 2016. Acoustic absorbers and diffusers, third edition: theory, design and application. CRC Press - Taylor & Francis Group.
[14]
I. Dokmanic, R. Parhizkar, A. Walther, Y. M. Lu, and M. Vetterli. 2013. Acoustic echoes reveal room shape. Proceedings of the National Academy of Sciences 110, 30 (2013), 12186--12191.
[15]
S. A. Dudani. 1976. The Distance-Weighted k-Nearest-Neighbor Rule. IEEE Transactions on Systems, Man, and Cybernetics SMC-6, 4 (1976), 325--327.
[16]
A. Farina. 2000. Simultaneous measurement of impulse response and distortion with a swept-sine technique. In Proc. of the 108th Audio Engineering Society Convention.
[17]
Forge. 2018. Forge AR/VR Toolkit. http://forgetoolkit.com/. (2018).
[18]
J. H. Friedman, J. L. Bentley, and R. A. Finkel. 1977. An Algorithm for Finding Best Matches in Logarithmic Expected Time. ACM Trans. Math. Software 3, 3 (1977), 209--226.
[19]
Google. 2018. Google VR SDK. https://developers.google.com/vr/. (2018).
[20]
J. Gou, L. Du, Y. Zhang, and T. Xiong. 2012. A New Distance-weighted k-nearest Neighbor Classifier. J. of Information and Comput. Science 9 (2012), 1429--1436.
[21]
V. Hulusic, C. Harvey, K. Debattista, N. Tsingos, S. Walker, D. Howard, and A. Chalmers. 2012. Acoustic Rendering and Auditory-Visual Cross-Modal Perception and Interaction. J. Computer Graphics Forum 31, 1 (2012), 102--131.
[22]
D. Hun, L. Bo, and X. Ren. 2011. Toward Robust Material Recognition for Everyday Objects. In Proc. of the British Machine Vision Conference (BMVC). 48.1--48.11.
[23]
M. W. Hussain, J. Civera, and L. Montano. 2014. Grounding Acoustic Echoes in Single View Geometry Estimation. In Proc. AAAI. 2760--2766.
[24]
M. Hvilshøj, S. Bøgh, O. S. Nielsen, and O. Madsen. 2012. Autonomous individual mobile manipulation (AIMM): past, present and future. Industrial Robot: An International Journal 39, 2 (2012), 120--135.
[25]
H. Kim, T. d. Campos, and A. Hilton. 2016. Room Layout Estimation with Object and Material Attributes Information Using a Spherical Camera. In Fourth International Conference on 3D Vision (3DV). 519--527.
[26]
H. Kim, R. J. Hughes, L. Remaggi, P. J. B. Jackson, A. Hilton, T. J. Cox, and B. Shirley. Berlin, Germany, 2017. Acoustic Room Modelling Using a Spherical Camera for Reverberant Spatial Audio Objects. In Audio Engineering Society Convention 142. http://www.aes.org/e-lib/browse.cfm?elib=18583
[27]
H. Kim, L. Remaggi, P. J. B. Jackson, F. M. Fazi, and A. Hilton. 2017. 3D Room Geometry Reconstruction Using Audio-Visual Sensors. In Proc. 3DV. 621--629.
[28]
H. Kim, L. Remaggi, P. J. B. Jackson, and A. Hilton. 2017. S3A audio-visual captures. (2017).
[29]
H. Kuttruff. 2009. Room Acoustics - Fifth edition. Spon press.
[30]
S.-W. Kwon, F. Bosche, C. Kim, C. Haas, and K. Liapi. 2004. Fitting range data to primitives for rapid local 3D modeling using sparse range point clouds. Automation in Construction 13, 1 (2004), 67--81.
[31]
C. Liu, L. Sharan, E. H. Adelson, and R. Rosenholtz. 2010. Exploring features in a Bayesian framework for material recognition. In Proc. CVPR. 239--246.
[32]
H. Liu, X. Song, J. Bimbo, L. Seneviratne, and K. Althoefer. 2012. Surface material recognition through haptic exploration using an intelligent contact sensing finger. In IEEE/RSJ International Conference on Intelligent Robots and Systems. 52--57.
[33]
M. Long. 2014. Architectural Acoustics - 2nd Edition. Academic Press.
[34]
E. Lopez-Caudana, O. Quiroz, A. Rodríguez, L. Yépez, and D. Ibarra. 2017. Classification of materials by acoustic signal processing in real time for NAO robots. International Journal of Advanced Robotic Systems 14, 4 (2017).
[35]
D. Markovic, K. Kowalczyk, F. Antonacci, C. Hofmann, A. Sarti, and W. Kellermann. 2014. Estimation of Acoustic Reflection Coefficients Through Pseudospectrum Matching. IEEE/ACM Transactions on Audio, Speech and Language Processing 22, 1 (2014), 125--137.
[36]
S. Mori, S. Ikeda, and H. Saito. 2017. A survey of diminished reality: Techniques for visually concealing, eliminating, and seeing through real objects. IPSJ Transactions on Computer Vision and Applications 9, 1 (2017), 1--17.
[37]
V. Murino and A. Fusiello. 2004. Augmented Scene Modeling and Visualization by Optical and Acoustic Sensor Integration. IEEE Transactions on Visualization and Computer Graphics 10, 6 (2004), 625--636.
[38]
C. Nocke. 2000. In-situ acoustic impedance measurement using a free-field transfer function method. Applied Acoustics 59, 3 (2000), 253--264.
[39]
Oculus. 2018. Oculus SDK. https://developer.oculus.com/. (2018).
[40]
F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. 2011. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research 12 (2011), 2825--2830.
[41]
L. Remaggi, P. J. B. Jackson, P. Coleman, and W. Wang. 2017. Acoustic Reflector Localization: Novel Image Source Reversion and Direct Localization Methods. IEEE/ACM Transactions on Audio, Speech and Language Processing 25, 2 (2017), 296--309.
[42]
L. Remaggi, H. Kim, P. J. B. Jackson, F. Fazi, and A. Hilton. 2018. Acoustic Reflector Localization and Classification. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[43]
Z. Ren, H. Yeh, R. Klatzky, and M. C. Lin. 2013. Auditory Perception of GeometryInvariant Material Properties. IEEE Transactions on Visualization and Computer Graphics 19, 4 (2013), 557--566.
[44]
F. Rumsey. 2002. Spatial quality evaluation for reproduced sound: Terminology, meaning, and a scene-based paradigm. J. Audio Engineering Society 50, 9 (2002), 651--666.
[45]
D. Scharstein and R. Szeliski. 2002. A Taxonomy and Evaluation of Dense Twoframe Stereo Correspondence Algorithms. International Journal of Computer Vision 47, 1 (2002), 7--42.
[46]
S. M. Seitz, B. Curless, J. Diebel, D. Scharstein, and R. Szeliski. 2006. A Comparison and Evaluation of Multi-View Stereo Reconstruction Algorithms. In Proc. CVPR. 519--528.
[47]
K. Simonyan and A. Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. CoRR abs/1409.1556 (2014).
[48]
M. Sokolova and G. Lapalme. 2009. A systematic analysis of performance measures for classification tasks. Information Processing & Management 45, 4 (2009), 427--437.
[49]
S. Song, S. Lichtenberg, and J. Xiao. 2015. SUN RGB-D: A RGB-D Scene Understanding Benchmark Suite. In Proceedings of CVPR.
[50]
D. G. Stork and M. E. Hennecke. 2013. Speechreading by humans and machines: models, systems, and applications. Springer.
[51]
H. Su, H. Fan, and L. Guibas. 2017. A Point Set Generation Network for 3D Object Reconstruction from a Single Image. In Proc. CVPR.
[52]
TheStonefox. 2018. VRTK: Virtual Reality Toolkit. https://vrtoolkit.readme.io/. (2018).
[53]
M. Turk. 2014. Multimodal interaction: A review. Pattern Recognition Letters 36 (2014), 189--195.
[54]
L. van der Maaten. 2014. Accelerating t-SNE using Tree-Based Algorithms. Journal of Machine Learning Research 15 (2014), 3221--3245.
[55]
M. Vorländer. 2007. Auralization: Fundamentals of Acoustics, Modelling, Simulations, Algorithms, and Acoustic Virtual Reality. Berlin, Germany: Springer-Verlag.
[56]
J. Xu, B. Stenger, T. Kerola, and T. Tung. 2017. Pano2CAD: Room Layout from a Single Panorama Image. In 2017 IEEE Winter Conference on Applications of Computer Vision (WACV). 354--362.
[57]
X. Yan, J. Yang, E. Yumer, Y. Guo, and H. Lee. 2016. Perspective Transformer Nets: Learning Single-View 3D Object Reconstruction without 3D Supervision. In Proc. NIPS.
[58]
M. Ye, Y. Zhang, R. Yang, and D. Manocha. 2015. 3D Reconstruction in the presence of glasses by acoustic and stereo fusion. In Proc. CVPR. 4885--4893.

Cited By

View all
  • (2023)A Composite T60 Regression and Classification Approach for Speech DereverberationIEEE/ACM Transactions on Audio, Speech, and Language Processing10.1109/TASLP.2023.324542331(1013-1023)Online publication date: 2023
  • (2022)Room Acoustic Properties Estimation from a Single 360° Photo2022 30th European Signal Processing Conference (EUSIPCO)10.23919/EUSIPCO55093.2022.9909598(857-861)Online publication date: 29-Aug-2022
  • (2022)Human-Machine Cooperative Echolocation Using UltrasoundIEEE Access10.1109/ACCESS.2022.322446810(125264-125278)Online publication date: 2022

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
AVSU'18: Proceedings of the 2018 Workshop on Audio-Visual Scene Understanding for Immersive Multimedia
October 2018
46 pages
ISBN:9781450359771
DOI:10.1145/3264869
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 October 2018

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. acoustic absorption coefficient
  2. audio-visual
  3. knn
  4. material recognition
  5. room boundary estimation

Qualifiers

  • Research-article

Funding Sources

Conference

MM '18
Sponsor:
MM '18: ACM Multimedia Conference
October 26, 2018
Seoul, Republic of Korea

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)18
  • Downloads (Last 6 weeks)1
Reflects downloads up to 27 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2023)A Composite T60 Regression and Classification Approach for Speech DereverberationIEEE/ACM Transactions on Audio, Speech, and Language Processing10.1109/TASLP.2023.324542331(1013-1023)Online publication date: 2023
  • (2022)Room Acoustic Properties Estimation from a Single 360° Photo2022 30th European Signal Processing Conference (EUSIPCO)10.23919/EUSIPCO55093.2022.9909598(857-861)Online publication date: 29-Aug-2022
  • (2022)Human-Machine Cooperative Echolocation Using UltrasoundIEEE Access10.1109/ACCESS.2022.322446810(125264-125278)Online publication date: 2022

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media