research-article

An Audio-Visual Method for Room Boundary Estimation and Material Recognition

Authors:

Philip J. B. Jackson,

Adrian HiltonAuthors Info & Claims

AVSU'18: Proceedings of the 2018 Workshop on Audio-Visual Scene Understanding for Immersive Multimedia

Pages 3 - 9

https://doi.org/10.1145/3264869.3264876

Published: 26 October 2018 Publication History

Abstract

In applications such as virtual and augmented reality, a plausible and coherent audio-visual reproduction can be achieved by deeply understanding the reference scene acoustics. This requires knowledge of the scene geometry and related materials. In this paper, we present an audio-visual approach for acoustic scene understanding. We propose a novel material recognition algorithm, that exploits information carried by acoustic signals. The acoustic absorption coefficients are selected as features. The training dataset was constructed by combining information available in the literature, and additional labeled data that we recorded in a small room having short reverberation time (RT60). Classic machine learning methods are used to validate the model, by employing data recorded in five rooms, having different sizes and RT60s. The estimated materials are utilized to label room boundaries, reconstructed by a vision-based method. Results show 89% and 80% agreement between the estimated and reference room volumes and materials, respectively.

References

[1]

Y. El Baba, A. Walther, and E. A. P. Habets. 2018. 3D Room Geometry Inference Based on Room Impulse Response Stacks. IEEE/ACM Transactions on Audio, Speech and Language Processing 26, 5 (2018), 857--872.

Digital Library

[2]

V. Badrinarayanan, A. Kendall, and R. Cipolla. 2017. SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence 39, 12 (2017).

[3]

S. Bech. 1998. Spatial aspects of reproduced sound in small rooms. J. Acoustical Society of America 103, 1 (1998), 434--445.

[4]

S. Bell, P. Upchurch, N. Snavely, and K. Bala. 2015. Material Recognition in the Wild With the Materials in Context Database. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]

L. P. Berg and J. M. Vance. 2017. Industry use of virtual reality in product design and manufacturing: a survey. Virtual Reality 21, 1 (2017), 1--17.

Digital Library

[6]

N. Bhatia and Vandana. 2010. Survey of Nearest Neighbor Techniques. International Journal of Computer Science and Information Security 8, 2 (2010), 302--305.

[7]

M. Billinghurst, A. Clark, and G. Lee. 2015. A Survey of Augmented Reality. Foundations and Trends®in Human-Computer Interaction 8, 2--3 (2015), 73--272.

Digital Library

[8]

N. Bonneel, C. Suied, I. Viaud-Delmon, and G. Drettakis. 2010. Bimodal Perception of Audio-visual Material Properties for Virtual Environments. ACM Transacions on Applied Perception 7, 1 (2010), 1:1--1:16.

Digital Library

[9]

K. Chen, Y.-K. Lai, and S.-M. Hu. 2015. 3D indoor scene modeling from RGB-D data: a survey. Computational Visual Media 1, 4 (2015), 267--278.

[10]

S. Choi, Q.-Y. Zhou, and V. Koltun. 2015. Robust Reconstruction of Indoor Scenes. In Proc. CVPR.

[11]

P. Coleman, A. Franck, D. Menzies, and P. J. B. Jackson. Berlin, Germany, 2017. Object-based reverberation encoding from first-order Ambisonic RIRs. In Proc. of the 142th AES Conv.

[12]

P. Coleman, L. Remaggi, and P. J. B. Jackson. 2015. S3A room impulse responses. (2015).

[13]

T. Cox and P. D'Antonio. 2016. Acoustic absorbers and diffusers, third edition: theory, design and application. CRC Press - Taylor & Francis Group.

[14]

I. Dokmanic, R. Parhizkar, A. Walther, Y. M. Lu, and M. Vetterli. 2013. Acoustic echoes reveal room shape. Proceedings of the National Academy of Sciences 110, 30 (2013), 12186--12191.

[15]

S. A. Dudani. 1976. The Distance-Weighted k-Nearest-Neighbor Rule. IEEE Transactions on Systems, Man, and Cybernetics SMC-6, 4 (1976), 325--327.

[16]

A. Farina. 2000. Simultaneous measurement of impulse response and distortion with a swept-sine technique. In Proc. of the 108th Audio Engineering Society Convention.

[17]

Forge. 2018. Forge AR/VR Toolkit. http://forgetoolkit.com/. (2018).

[18]

J. H. Friedman, J. L. Bentley, and R. A. Finkel. 1977. An Algorithm for Finding Best Matches in Logarithmic Expected Time. ACM Trans. Math. Software 3, 3 (1977), 209--226.

Digital Library

[19]

Google. 2018. Google VR SDK. https://developers.google.com/vr/. (2018).

[20]

J. Gou, L. Du, Y. Zhang, and T. Xiong. 2012. A New Distance-weighted k-nearest Neighbor Classifier. J. of Information and Comput. Science 9 (2012), 1429--1436.

[21]

V. Hulusic, C. Harvey, K. Debattista, N. Tsingos, S. Walker, D. Howard, and A. Chalmers. 2012. Acoustic Rendering and Auditory-Visual Cross-Modal Perception and Interaction. J. Computer Graphics Forum 31, 1 (2012), 102--131.

Digital Library

[22]

D. Hun, L. Bo, and X. Ren. 2011. Toward Robust Material Recognition for Everyday Objects. In Proc. of the British Machine Vision Conference (BMVC). 48.1--48.11.

[23]

M. W. Hussain, J. Civera, and L. Montano. 2014. Grounding Acoustic Echoes in Single View Geometry Estimation. In Proc. AAAI. 2760--2766.

Digital Library

[24]

M. Hvilshøj, S. Bøgh, O. S. Nielsen, and O. Madsen. 2012. Autonomous individual mobile manipulation (AIMM): past, present and future. Industrial Robot: An International Journal 39, 2 (2012), 120--135.

[25]

H. Kim, T. d. Campos, and A. Hilton. 2016. Room Layout Estimation with Object and Material Attributes Information Using a Spherical Camera. In Fourth International Conference on 3D Vision (3DV). 519--527.

[26]

H. Kim, R. J. Hughes, L. Remaggi, P. J. B. Jackson, A. Hilton, T. J. Cox, and B. Shirley. Berlin, Germany, 2017. Acoustic Room Modelling Using a Spherical Camera for Reverberant Spatial Audio Objects. In Audio Engineering Society Convention 142. http://www.aes.org/e-lib/browse.cfm?elib=18583

[27]

H. Kim, L. Remaggi, P. J. B. Jackson, F. M. Fazi, and A. Hilton. 2017. 3D Room Geometry Reconstruction Using Audio-Visual Sensors. In Proc. 3DV. 621--629.

[28]

H. Kim, L. Remaggi, P. J. B. Jackson, and A. Hilton. 2017. S3A audio-visual captures. (2017).

[29]

H. Kuttruff. 2009. Room Acoustics - Fifth edition. Spon press.

[30]

S.-W. Kwon, F. Bosche, C. Kim, C. Haas, and K. Liapi. 2004. Fitting range data to primitives for rapid local 3D modeling using sparse range point clouds. Automation in Construction 13, 1 (2004), 67--81.

[31]

C. Liu, L. Sharan, E. H. Adelson, and R. Rosenholtz. 2010. Exploring features in a Bayesian framework for material recognition. In Proc. CVPR. 239--246.

[32]

H. Liu, X. Song, J. Bimbo, L. Seneviratne, and K. Althoefer. 2012. Surface material recognition through haptic exploration using an intelligent contact sensing finger. In IEEE/RSJ International Conference on Intelligent Robots and Systems. 52--57.

[33]

M. Long. 2014. Architectural Acoustics - 2nd Edition. Academic Press.

[34]

E. Lopez-Caudana, O. Quiroz, A. Rodríguez, L. Yépez, and D. Ibarra. 2017. Classification of materials by acoustic signal processing in real time for NAO robots. International Journal of Advanced Robotic Systems 14, 4 (2017).

[35]

D. Markovic, K. Kowalczyk, F. Antonacci, C. Hofmann, A. Sarti, and W. Kellermann. 2014. Estimation of Acoustic Reflection Coefficients Through Pseudospectrum Matching. IEEE/ACM Transactions on Audio, Speech and Language Processing 22, 1 (2014), 125--137.

Digital Library

[36]

S. Mori, S. Ikeda, and H. Saito. 2017. A survey of diminished reality: Techniques for visually concealing, eliminating, and seeing through real objects. IPSJ Transactions on Computer Vision and Applications 9, 1 (2017), 1--17.

[37]

V. Murino and A. Fusiello. 2004. Augmented Scene Modeling and Visualization by Optical and Acoustic Sensor Integration. IEEE Transactions on Visualization and Computer Graphics 10, 6 (2004), 625--636.

Digital Library

[38]

C. Nocke. 2000. In-situ acoustic impedance measurement using a free-field transfer function method. Applied Acoustics 59, 3 (2000), 253--264.

[39]

Oculus. 2018. Oculus SDK. https://developer.oculus.com/. (2018).

[40]

F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. 2011. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research 12 (2011), 2825--2830.

Digital Library

[41]

L. Remaggi, P. J. B. Jackson, P. Coleman, and W. Wang. 2017. Acoustic Reflector Localization: Novel Image Source Reversion and Direct Localization Methods. IEEE/ACM Transactions on Audio, Speech and Language Processing 25, 2 (2017), 296--309.

Digital Library

[42]

L. Remaggi, H. Kim, P. J. B. Jackson, F. Fazi, and A. Hilton. 2018. Acoustic Reflector Localization and Classification. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[43]

Z. Ren, H. Yeh, R. Klatzky, and M. C. Lin. 2013. Auditory Perception of GeometryInvariant Material Properties. IEEE Transactions on Visualization and Computer Graphics 19, 4 (2013), 557--566.

Digital Library

[44]

F. Rumsey. 2002. Spatial quality evaluation for reproduced sound: Terminology, meaning, and a scene-based paradigm. J. Audio Engineering Society 50, 9 (2002), 651--666.

[45]

D. Scharstein and R. Szeliski. 2002. A Taxonomy and Evaluation of Dense Twoframe Stereo Correspondence Algorithms. International Journal of Computer Vision 47, 1 (2002), 7--42.

Digital Library

[46]

S. M. Seitz, B. Curless, J. Diebel, D. Scharstein, and R. Szeliski. 2006. A Comparison and Evaluation of Multi-View Stereo Reconstruction Algorithms. In Proc. CVPR. 519--528.

Digital Library

[47]

K. Simonyan and A. Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. CoRR abs/1409.1556 (2014).

[48]

M. Sokolova and G. Lapalme. 2009. A systematic analysis of performance measures for classification tasks. Information Processing & Management 45, 4 (2009), 427--437.

Digital Library

[49]

S. Song, S. Lichtenberg, and J. Xiao. 2015. SUN RGB-D: A RGB-D Scene Understanding Benchmark Suite. In Proceedings of CVPR.

[50]

D. G. Stork and M. E. Hennecke. 2013. Speechreading by humans and machines: models, systems, and applications. Springer.

Digital Library

[51]

H. Su, H. Fan, and L. Guibas. 2017. A Point Set Generation Network for 3D Object Reconstruction from a Single Image. In Proc. CVPR.

[52]

TheStonefox. 2018. VRTK: Virtual Reality Toolkit. https://vrtoolkit.readme.io/. (2018).

[53]

M. Turk. 2014. Multimodal interaction: A review. Pattern Recognition Letters 36 (2014), 189--195.

Digital Library

[54]

L. van der Maaten. 2014. Accelerating t-SNE using Tree-Based Algorithms. Journal of Machine Learning Research 15 (2014), 3221--3245.

Digital Library

[55]

M. Vorländer. 2007. Auralization: Fundamentals of Acoustics, Modelling, Simulations, Algorithms, and Acoustic Virtual Reality. Berlin, Germany: Springer-Verlag.

Digital Library

[56]

J. Xu, B. Stenger, T. Kerola, and T. Tung. 2017. Pano2CAD: Room Layout from a Single Panorama Image. In 2017 IEEE Winter Conference on Applications of Computer Vision (WACV). 354--362.

[57]

X. Yan, J. Yang, E. Yumer, Y. Guo, and H. Lee. 2016. Perspective Transformer Nets: Learning Single-View 3D Object Reconstruction without 3D Supervision. In Proc. NIPS.

Digital Library

[58]

M. Ye, Y. Zhang, R. Yang, and D. Manocha. 2015. 3D Reconstruction in the presence of glasses by acoustic and stereo fusion. In Proc. CVPR. 4885--4893.

Cited By

Li YLiu YWilliamson D(2023)A Composite T60 Regression and Classification Approach for Speech DereverberationIEEE/ACM Transactions on Audio, Speech, and Language Processing10.1109/TASLP.2023.324542331(1013-1023)Online publication date: 2023
https://doi.org/10.1109/TASLP.2023.3245423
Alawadh MWu YHeng YRemaggi LNiranjan MKim H(2022)Room Acoustic Properties Estimation from a Single 360° Photo2022 30th European Signal Processing Conference (EUSIPCO)10.23919/EUSIPCO55093.2022.9909598(857-861)Online publication date: 29-Aug-2022
https://doi.org/10.23919/EUSIPCO55093.2022.9909598
Watanabe HSumiya MTerada T(2022)Human-Machine Cooperative Echolocation Using UltrasoundIEEE Access10.1109/ACCESS.2022.322446810(125264-125278)Online publication date: 2022
https://doi.org/10.1109/ACCESS.2022.3224468

Index Terms

An Audio-Visual Method for Room Boundary Estimation and Material Recognition
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
        Object detection
        Object recognition
  2. Machine learning
    1. Machine learning algorithms
      1. Feature selection
    2. Machine learning approaches
      1. Instance-based learning

Recommendations

Visually-Guided Audio Spatialization in Video with Geometry-Aware Multi-task Learning
Abstract
Binaural audio provides human listeners with an immersive spatial sound experience, but most existing videos lack binaural audio recordings. We propose an audio spatialization method that draws on visual information in videos to convert their ... $^{}$
Estimation of Room Acoustic Parameters: The ACE Challenge

Reverberation time T₆₀ and Direct-to-reverberant ratio DRR are important parameters which together can characterize sound captured by microphones in nonanechoic rooms. These parameters are important in speech processing applications such as speech ...
Adaptive estimation and reshaping of room impulse response

In this paper, we focus on the reverberation depression in scenarios such as hand-free telephone and teleconference system applications. The combination of cross-relation based blind room impulse response (RIR) estimation and the p-norm based channel ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

AVSU'18: Proceedings of the 2018 Workshop on Audio-Visual Scene Understanding for Immersive Multimedia

October 2018

46 pages

ISBN:9781450359771

DOI:10.1145/3264869

General Chairs:
Adrian Hilton
University of Surrey, UK
,
Hong-Goo Kang
Yonsei University, South Korea
,
Hansung Kim
University of Surrey, UK
,
Kwanghoon Sohn
Yonsei University, South Korea

Copyright © 2018 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 October 2018

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Engineering and Physical Sciences Research Council

Conference

MM '18

Sponsor:

SIGMM

MM '18: ACM Multimedia Conference

October 26, 2018

Seoul, Republic of Korea

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
167
Total Downloads

Downloads (Last 12 months)18
Downloads (Last 6 weeks)1

Reflects downloads up to 27 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Li YLiu YWilliamson D(2023)A Composite T60 Regression and Classification Approach for Speech DereverberationIEEE/ACM Transactions on Audio, Speech, and Language Processing10.1109/TASLP.2023.324542331(1013-1023)Online publication date: 2023
https://doi.org/10.1109/TASLP.2023.3245423
Alawadh MWu YHeng YRemaggi LNiranjan MKim H(2022)Room Acoustic Properties Estimation from a Single 360° Photo2022 30th European Signal Processing Conference (EUSIPCO)10.23919/EUSIPCO55093.2022.9909598(857-861)Online publication date: 29-Aug-2022
https://doi.org/10.23919/EUSIPCO55093.2022.9909598
Watanabe HSumiya MTerada T(2022)Human-Machine Cooperative Echolocation Using UltrasoundIEEE Access10.1109/ACCESS.2022.322446810(125264-125278)Online publication date: 2022
https://doi.org/10.1109/ACCESS.2022.3224468

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents