research-article

3D attention-driven depth acquisition for object identification

Authors:

Daniel Cohen-Or,

Baoquan ChenAuthors Info & Claims

ACM Transactions on Graphics (TOG), Volume 35, Issue 6

Article No.: 238, Pages 1 - 14

https://doi.org/10.1145/2980179.2980224

Published: 05 December 2016 Publication History

Abstract

We address the problem of autonomously exploring unknown objects in a scene by consecutive depth acquisitions. The goal is to reconstruct the scene while online identifying the objects from among a large collection of 3D shapes. Fine-grained shape identification demands a meticulous series of observations attending to varying views and parts of the object of interest. Inspired by the recent success of attention-based models for 2D recognition, we develop a 3D Attention Model that selects the best views to scan from, as well as the most informative regions in each view to focus on, to achieve efficient object recognition. The region-level attention leads to focus-driven features which are quite robust against object occlusion. The attention model, trained with the 3D shape collection, encodes the temporal dependencies among consecutive views with deep recurrent networks. This facilitates order-aware view planning accounting for robot movement cost. In achieving instance identification, the shape collection is organized into a hierarchy, associated with pre-trained hierarchical classifiers. The effectiveness of our method is demonstrated on an autonomous robot (PR) that explores a scene and identifies the objects to construct a 3D scene model.

References

[1]

Atanasov, N., Sankaran, B., Ny, J. L., Pappas, G. J., and Daniilidis, K. 2014. Nonmyopic view planning for active object classification and pose estimation. IEEE Trans. on Robotics 30, 5, 1078--1090.

[2]

Ba, J., Mnih, V., and Kavukcuoglu, K. 2014. Multiple object recognition with visual attention. arXiv preprint arXiv:1412.7755.

[3]

Bansal, A., Shrivastava, A., Doersch, C., and Gupta, A. 2015. Mid-level elements for object detection. arXiv preprint arXiv:1504.07284.

[4]

Bart, E., Porteous, I., Perona, P., and Welling, M. 2008. Unsupervised learning of visual taxonomies. In Proc. CVPR, IEEE, 1--8.

[5]

Chen, K., Lai, Y.-K., Wu, Y.-X., Martin, R., and Hu, S.-M. 2014. Automatic semantic modeling of indoor scenes from low-quality rgb-d data using contextual information. ACM Trans. on Graph. (SIGGRAPH Asia) 33, 6, 208:1--208:15.

Digital Library

[6]

Choi, S., Zhou, Q.-Y., and Koltun, V. 2015. Robust reconstruction of indoor scenes. In Proc. CVPR, 5556--5565.

[7]

Choi, S., Zhou, Q.-Y., Miller, S., and Koltun, V. 2016. A large dataset of object scans. arXiv:1602.02481.

[8]

Corbetta, M., and Shulman, G. L. 2002. Control of goal-directed and stimulus-driven attention in the brain. Nature Reviews Neuroscience 3, 201--215.

[9]

Doersch, C., Gupta, A., and Efros, A. A. 2015. Unsupervised visual representation learning by context prediction. In Proc. ICCV, 1422--1430.

Digital Library

[10]

Fisher, M., Ritchie, D., Savva, M., Funkhouser, T., and Hanrahan, P. 2012. Example-based synthesis of 3D object arrangements. ACM Trans. on Graph. (SIGGRAPH Asia) 31, 6, 135:1--135:11.

Digital Library

[11]

Gao, T., and Koller, D. 2011. Discriminative learning of relaxed hierarchy for large-scale visual recognition. In Proc. ICCV, 2072--2079.

Digital Library

[12]

Gupta, S., Arbeláez, P., Girshick, R., and Malik, J. 2015. Aligning 3d models to RGB-D images of cluttered scenes. In Proc. CVPR, 4731--4740.

[13]

Haque, A., Alahi, A., and Fei-Fei, L. 2016. Recurrent attention models for depth-based person identification. In Proc. CVPR.

[14]

Hochreiter, S., and Schmidhuber, J. 1997. Long short-term memory. Neural computation 9, 8, 1735--1780.

Digital Library

[15]

Huang, Q.-X., Su, H., and Guibas, L. 2013. Fine-grained semi-supervised labeling of large shape collections. ACM Trans. on Graph. 32, 6, 190:1--190:10.

Digital Library

[16]

Huang, H., Lischinski, D., Hao, Z., Gong, M., Christie, M., and Cohen-Or, D. 2016. Trip synopsis: 60km in 60sec. Computer Graphics Forum (Pacific Graphics), to appear.

[17]

Hueting, M., Ovsjanikov, M., and Mitra, N. J. 2015. CrossLink: Joint understanding of image and 3d model collections through shape and camera pose variations. ACM Trans. on Graph. 34, 6, 233.

Digital Library

[18]

Kleiman, Y., van Kaick, O., Sorkine-Hornung, O., and Cohen-Or, D. 2015. SHED: hape edit distance for fine-grained shape similarity. ACM Trans. on Graph. 34, 6, 235:1--235:14.

Digital Library

[19]

Krause, J., Jin, H., Yang, J., and Fei-Fei, L. 2015. Fine-grained recognition without part annotations. In Proc. CVPR, 5546--5555.

[20]

Krizhevsky, A., Sutskever, I., and Hinton, G. E. 2012. ImageNet classification with deep convolutional neural networks. In Proc. NIPS, 1097--1105.

Digital Library

[21]

LeCun, Y., Bottou, L., Bengio, Y., and Haffner, P. 1998. Gradient-based learning applied to document recognition. Proceedings of the IEEE 86, 11, 2278--2324.

[22]

Li, L.-J., Wang, C., Lim, Y., Blei, D. M., and Fei-Fei, L. 2010. Building and using a semantivisual image hierarchy. In Proc. CVPR, IEEE, 3336--3343.

[23]

Li, Y., Dai, A., Guibas, L., and Niessner, M. 2015. Database-assisted object retrieval for real-time 3D reconstruction. Computer Graphics Forum (Eurographics) 34, 2.

Digital Library

[24]

Li, Y., Su, H., Qi, C. R., Fish, N., Cohen-Or, D., and Guibas, L. J. 2015. Joint embeddings of shapes and images via CNN image purification. ACM Trans. on Graph. 34, 6, 234.

Digital Library

[25]

Mnih, V., Heess, N., Graves, A., et al. 2014. Recurrent models of visual attention. In Proc. NIPS, 2204--2212.

Digital Library

[26]

Newcombe, R. A., Davison, A. J., Izadi, S., Kohli, P., Hilliges, O., Shotton, J., Molyneaux, D., Hodges, S., Kim, D., and Fitzgibbon, A. 2011. KinectFusion: Real-time dense surface mapping and tracking. In Proc. IEEE Int. Symp. on Mixed and Augmented Reality, 127--136.

Digital Library

[27]

Niessner, M., Zollhöfer, M., Izadi, S., and Stamminger, M. 2013. Real-time 3D reconstruction at scale using voxel hashing. ACM Trans. on Graph. (SIGGRAPH Asia) 32, 6, 169:1--169:11.

Digital Library

[28]

Nister, D., and Stewenius, H. 2006. Scalable recognition with a vocabulary tree. In Proc. CVPR, 2161--2168.

Digital Library

[29]

ROS, 2014. ROS Wiki. http://wiki.ros.org/.

[30]

Salas-Moreno, R. F., Newcombe, R. A., Strasdat, H., Kelly, P. H. J., and Davison, A. J. 2012. SLAM++: Simultaneous localisation and mapping at the level of objects. In CVPR, 1352--1359.

Digital Library

[31]

Shi, Y., Long, P., Xu, K., Huang, H., and Xiong, Y. 2016. Data-driven contextual modeling for 3d scene understanding. Computers and Graphics 55, 55--67.

Digital Library

[32]

Song, S., and Xiao, J. 2016. Deep sliding shapes for amodal 3d object detection in rgb-d images. In Proc. CVPR.

[33]

Su, H., Maji, S., Kalogerakis, E., and Learned-Miller, E. 2015. Multi-view convolutional neural networks for 3D shape recognition. In Proc. ICCV.

Digital Library

[34]

Su, H., Qi, C. R., Li, Y., and Guibas, L. 2015. Render for CNN: Viewpoint estimation in images using cnns trained with rendered 3d model views. In Proc. ICCV.

Digital Library

[35]

Su, H., Savva, M., Yi, L., Chang, A. X., Song, S., Yu, F., Li, Z., Xiao, J., Huang, Q., Savarese, S., Funkhouser, T., Hanrahan, P., and Guibas, L. J. 2015. ShapeNet: An information-rich 3d model repository. http://www.shapenet.org/.

[36]

Uijlings, J. R., van de Sande, K. E., Gevers, T., and Smeulders, A. W. 2013. Selective search for object recognition. Int. J. Computer Vision. 104, 2, 154--171.

Digital Library

[37]

Valentin, J., Vineet, V., Cheng, M.-M., Kim, D., Shotton, J., Kohli, P., Niessner, M., Criminisi, A., Izadi, S., and Torr, P. 2015. SemanticPaint: Interactive 3D labeling and learning at your finger tips. ACM Trans. on Graph. 34, 5.

Digital Library

[38]

Williams, R. J. 1992. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine learning 8, 3--4, 229--256.

Digital Library

[39]

Wu, S., Sun, W., Long, P., Huang, H., Cohen-Or, D., Gong, M., Deussen, O., and Chen, B. 2014. Quality-driven poisson-guided autoscanning. ACM Trans. on Graph. (SIGGRAPH Asia) 33, 6, 203:1--203:12.

Digital Library

[40]

Wu, Z., Song, S., Khosla, A., Yu, F., Zhang, L., Tang, X., and Xiao, J. 2015. 3D ShapeNets: A deep representation for volumetric shapes. In Proc. CVPR, 1912--1920.

[41]

Xiao, T., Xu, Y., Yang, K., Zhang, J., Peng, Y., and Zhang, Z. 2015. The application of two-level attention models in deep convolutional neural network for fine-grained image classification. In Proc. CVPR, 842--850.

[42]

Xu, K., Chen, K., Fu, H., Sun, W.-L., and Hu, S.-M. 2013. Sketch2Scene: Sketch-based co-retrieval and co-placement of 3D models. ACM Trans. on Graph. (SIGGRAPH) 32, 4, 123:1--123:10.

Digital Library

[43]

Xu, K., Huang, H., Shi, Y., Li, H., Long, P., Caichen, J., Sun, W., and Chen, B. 2015. Autoscanning for coupled scene reconstruction and proactive object analysis. ACM Trans. on Graph. 34, 6, 177:1--177:14.

Digital Library

[44]

Xu, K., Ba, J., Kiros, R., Courville, A., Salakhutdinov, R., Zemel, R., and Bengio, Y. 2015. Show, attend and tell: Neural image caption generation with visual attention. arXiv preprint arXiv:1502.03044.

[45]

Zelnik-Manor, L., and Perona, P. 2004. Self-tuning spectral clustering. In Proc. NIPS, 1601--1608.

Digital Library

[46]

Zhang, Y., Xu, W., Tong, Y., and Zhou, K. 2014. Online structure analysis for real-time indoor scene reconstruction. ACM Trans. on Graph. 34, 5, 159:1--159:12.

Digital Library

Cited By

Zhang JGireesh NWang JFang XXu CChen WDai LWang H(2024)GAMMA: Graspability-Aware Mobile MAnipulation Policy Learning based on Online Grasping Pose Fusion2024 IEEE International Conference on Robotics and Automation (ICRA)10.1109/ICRA57147.2024.10610125(1399-1405)Online publication date: 13-May-2024
https://doi.org/10.1109/ICRA57147.2024.10610125
Zhou HJi ZYou XLiu YChen LZhao KLin SHuang X(2023)Geometric Primitive-Guided UAV Path Planning for High-Quality Image-Based ReconstructionRemote Sensing10.3390/rs1510263215:10(2632)Online publication date: 18-May-2023
https://doi.org/10.3390/rs15102632
Li CGuo JHu RLiu L(2023)Online Scene CAD Recomposition via Autonomous ScanningACM Transactions on Graphics10.1145/361833942:6(1-16)Online publication date: 5-Dec-2023
https://doi.org/10.1145/3618339
Show More Cited By

Index Terms

3D attention-driven depth acquisition for object identification
1. Computing methodologies
  1. Computer graphics
    1. Shape modeling
      1. Shape analysis

Recommendations

Object-aware guidance for autonomous scene reconstruction

To carry out autonomous 3D scanning and online reconstruction of unknown indoor scenes, one has to find a balance between global exploration of the entire scene and local scanning of the objects within it. In this work, we propose a novel approach, which ...
3D object surface tracking using partial shape templates trained from a depth camera for spatial augmented reality environments
AUIC '13: Proceedings of the Fourteenth Australasian User Interface Conference - Volume 139

We present a 3D object tracking method using a single depth camera for Spatial Augmented Reality (SAR). The drastic change of illumination in a SAR environment makes object tracking difficult. Our method uses a depth camera to train and track the 3D ...
Joint Depth and Color Camera Calibration with Distortion Correction

We present an algorithm that simultaneously calibrates two color cameras, a depth camera, and the relative pose between them. The method is designed to have three key features: accurate, practical, and applicable to a wide range of sensors. The method ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Graphics

ACM Transactions on Graphics Volume 35, Issue 6

November 2016

1045 pages

ISSN:0730-0301

EISSN:1557-7368

DOI:10.1145/2980179

Issue’s Table of Contents

Copyright © 2016 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 December 2016

Published in TOG Volume 35, Issue 6

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

35
Total Citations
View Citations
629
Total Downloads

Downloads (Last 12 months)22
Downloads (Last 6 weeks)0

Reflects downloads up to 11 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Zhang JGireesh NWang JFang XXu CChen WDai LWang H(2024)GAMMA: Graspability-Aware Mobile MAnipulation Policy Learning based on Online Grasping Pose Fusion2024 IEEE International Conference on Robotics and Automation (ICRA)10.1109/ICRA57147.2024.10610125(1399-1405)Online publication date: 13-May-2024
https://doi.org/10.1109/ICRA57147.2024.10610125
Zhou HJi ZYou XLiu YChen LZhao KLin SHuang X(2023)Geometric Primitive-Guided UAV Path Planning for High-Quality Image-Based ReconstructionRemote Sensing10.3390/rs1510263215:10(2632)Online publication date: 18-May-2023
https://doi.org/10.3390/rs15102632
Li CGuo JHu RLiu L(2023)Online Scene CAD Recomposition via Autonomous ScanningACM Transactions on Graphics10.1145/361833942:6(1-16)Online publication date: 5-Dec-2023
https://doi.org/10.1145/3618339
Cao HXi XWu GHu RLiu L(2023)ScanBot: Autonomous Reconstruction via Deep Reinforcement LearningACM Transactions on Graphics10.1145/359211342:4(1-16)Online publication date: 26-Jul-2023
https://doi.org/10.1145/3592113
Zhang ZHan XDong BLi TYin BYang X(2023)Point Cloud Scene Completion With Joint Color and Semantic Estimation From Single RGB-D ImageIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2023.326444945:9(11079-11095)Online publication date: 1-Sep-2023
https://doi.org/10.1109/TPAMI.2023.3264449
Kakaletsis ENikolaidis N(2023)Using synthesized facial views for active face recognitionMachine Vision and Applications10.1007/s00138-023-01412-334:4Online publication date: 29-Jun-2023
https://dl.acm.org/doi/10.1007/s00138-023-01412-3
Guo JLi CXia XHu RLiu L(2022)Asynchronous Collaborative Autoscanning with Mode Switching for Multi-Robot Scene ReconstructionACM Transactions on Graphics10.1145/3550454.355548341:6(1-13)Online publication date: 30-Nov-2022
https://doi.org/10.1145/3550454.3555483
Huang PLin LXu KHuang H(2022)Autonomous Outdoor Scanning via Online Topological and Geometric Path OptimizationIEEE Transactions on Intelligent Transportation Systems10.1109/TITS.2020.303955723:4(3682-3695)Online publication date: Apr-2022
https://doi.org/10.1109/TITS.2020.3039557
Schweri LFoucher STang JAzevedo VGünther TSolenthaler B(2021)A Physics-Aware Neural Network Approach for Flow Data Reconstruction From Satellite ObservationsFrontiers in Climate10.3389/fclim.2021.6565053Online publication date: 9-Apr-2021
https://doi.org/10.3389/fclim.2021.656505
Huang SMa ZMu TFu HHu S(2021)Supervoxel Convolution for Online 3D Semantic SegmentationACM Transactions on Graphics10.1145/345348540:3(1-15)Online publication date: 1-Aug-2021
https://dl.acm.org/doi/10.1145/3453485
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents