Abstract
Google’s Project Tango has made integrated depth sensing and onboard visual-intertial odometry available to mobile devices such as phones and tablets. In this work, we explore the problem of large-scale, real-time 3D reconstruction on a mobile devices of this type. Solving this problem is a necessary prerequisite for many indoor applications, including navigation, augmented reality and building scanning. The main challenges include dealing with noisy and low-frequency depth data and managing limited computational and memory resources. State of the art approaches in large-scale dense reconstruction require large amounts of memory and high-performance GPU computing. Other existing 3D reconstruction approaches on mobile devices either only build a sparse reconstruction, offload their computation to other devices, or require long post-processing to extract the geometric mesh. In contrast, we can reconstruct and render a global mesh on the fly, using only the mobile device’s CPU, in very large (300 m\(^2\)) scenes, at a resolutions of 2–3 cm. To achieve this, we divide the scene into spatial volumes indexed by a hash map. Each volume contains the truncated signed distance function for that area of space, as well as the mesh segment derived from the distance function. This approach allows us to focus computational and memory resources only in areas of the scene which are currently observed, as well as leverage parallelization techniques for multi-core processing. Furthermore, we describe an on-device post-processing method for fusing datasets from multiple, independent trials, in order to improve the quality and coverage of the reconstruction. We discuss how the particularities of the devices impact our algorithm and implementation decisions. Finally, we provide both qualitative and quantitative results on publicly available RGB-D datasets, and on datasets collected in real-time from two devices.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Amanatides, J., & Woo, A. (1987). A fast voxel traversal algorithm for ray tracing. Eurographics, 87, 3–10.
Bylow, E., Sturm, J., Kerl, C., Kahl, F., & Cremers D. (2013). Real-time camera tracking and 3D reconstruction using signed distance functions. In Robotics: Science and systems (RSS) conference 2013.
Chen, J., Bautembach, D., & Izadi, S. (2013). Scalable real-time volumetric surface reconstruction. ACM Transactions on Graphics (TOG), 32(4), 113.
Chen, Y., & Medioni, G. (1991, April). Object modeling by registration of multiple range images. In Proceedings., 1991 IEEE international conference on robotics and automation (Vol. 3, pp. 2724 –2729).
Chilimbi, T. M., Hill, M. D., & Larus, J. R. (2000). Making pointer-based data structures cache conscious. Computer, 33(12), 67–74.
Curless, B., & Levoy, M. (1996). A volumetric method for building complex models from range images. In SIGGRAPH 96 conference proceedings (pp. 303–312). ACM.
Elfes, A. (1989). Using occupancy grids for mobile robot perception and navigation. Computer, 22, 46–57.
Engel, J., Schöps, T., & Cremers, D. (2014, September). LSD-SLAM: Large-scale direct monocular SLAM. In European conference on computer vision (ECCV).
Garland, M., & Heckbert, P. S. (1997). Surface simplification using quadric error metrics. In Proceedings of the 24th annual conference on computer graphics and interactive techniques (pp. 209–216). ACM Press/Addison-Wesley Publishing Co.
Google. Project Tango (2014). https://www.google.com/atap/projecttango.
Handa, A., Patraucean, V., Badrinarayanan, V., Stent, S., & Cipolla, R. (2015). Scenenet: Understanding real world indoor scenes with synthetic data. In CoRR. arXiv:1511.07041.
Hesch, J. A., Kottas, D. G., Bowman, Sean L., & Roumeliotis, S. I. (2014). Camera-IMU-based localization: Observability analysis and consistency improvement. The International Journal of Robotics Research, 33(1), 182–201.
Kähler, O., Prisacariu, V. A., Ren, C. Y., Sun, X., Torr, P. H. S., & Murray, D. W. (2015). Very high frame rate volumetric integration of depth images on mobile devices. IEEE Transactions on Visualization and Computer Graphics, 21(11), 1241–1250.
Klein, G., & Murray, D. (2007). Parallel tracking and mapping for small AR workspaces. In 2007 6th IEEE and ACM international symposium on mixed and augmented reality, ISMAR.
Klingensmith, M., Dryanovski, I., Srinivasa, S., & Xiao, J. (2015, July). Chisel: Real time large scale 3d reconstruction onboard a mobile device using spatially hashed signed distance fields. In Proceedings of robotics: Science and systems, Rome.
Klingensmith, M., Herrmann, M., & Srinivasa, S. S. (2014). Object modeling and recognition from sparse: Noisy data via voxel depth carving. In ISER, number d.
Lepetit, V., Moreno-Noguer, F., & Fua, P. (2009). Epnp: An accurate o (n) solution to the PnP problem. International Journal of Computer Vision, 81(2), 155–166.
Lorensen, W. E., & Cline, H. E. (1987). Marching cubes: A high resolution 3D surface construction algorithm. In SIGGRAPH 1987, (Vol. 21 pp. 163–169). ACM.
Lynen, S., Bosse, M., Furgale, P., & Siegwart, R. (2014). Placeless place-recognition. In 2nd international conference on 3D vision (3DV)
Microsoft. Kinect for Windows. http://www.microsoft.com/en-us/kinectforwindows/.
Mourikis, A. I., & Roumeliotis, S. I. (2007). A multi-state constraint Kalman filter for vision-aided inertial navigation. In 2007 IEEE international conference on robotics and automation.
Nerurkar, E. D., Wu, K. J., & Roumeliotis, S. I. (2014). C-KLAM: Constrained keyframe-based localization and mapping. In 2014 IEEE international conference on robotics and automation (ICRA) (pp. 3638–3643).
Newcombe, R. A., Izadi, S., Hilliges, O., Molyneaux, D., Kim, D., & Davison, A. J. Pushmeet K., Jamie S., Steve H., & Andrew F. (2011) KinectFusion: Real-time dense surface mapping and tracking. In 2011 10th IEEE international symposium on mixed and augmented reality, ISMAR 2011 (pp. 127–136).
Newcombe, R. A., Lovegrove, S. J., & Davison, A. J. (2011). DTAM: Dense tracking and mapping in real-time. 2011 IEEE international conference on computer vision (ICCV).
Nguyen, C. V., Izadi, S., & Lovell, D. (2012). Modeling kinect sensor noise for improved 3D reconstruction and tracking. In Proceedings—2nd joint 3DIM/3DPVT conference: 3D imaging, modeling, processing, visualization and transmission, 3DIMPVT 2012 (pp. 524–530).
Nieß ner, M., Zollhöfer, M., Izadi, S., & Stamminger, M. (2013). Real-time 3D reconstruction at scale using voxel hashing. In ACM transactions on graphics (TOG).
Rusinkiewicz, S., Hall-Holt, O., & Levoy, M. (2002). Real-time 3D model acquisition. In ACM transactions on graphics (Vol. 21, pp. 438–446). ACM
Scherzer, D., Wimmer, M., & Purgathofer, W. (2011). A survey of real-time hard shadow mapping methods. In Computer graphics forum (Vol. 30, pp. 169–186). Wiley Online Library.
Schöps, T., Sattler, T., Häne, C., & Pollefeys, M. (2015). 3D modeling on the go: Interactive 3D reconstruction of large-scale scenes on mobile devices. In International conference on 3D vision (3DV).
Structure Sensor. http://structure.io/
Sturm, J., Engelhard, N., Endres, F., Burgard, W., & Cremers, D. (2012). A benchmark for the evaluation of RGB-D SLAM systems. In IEEE international conference on intelligent robots and systems (pp. 573–580).
Tanskanen, P., Kolev, K., Meier, L., Camposeco, F., Saurer, O., & Pollefeys, M. (2013). Live metric 3D reconstruction on mobile phones. In 2013 IEEE international conference on computer vision (pp. 65–72).
Teschner, M., Hiedelberger, B., Müller, M., Pomeranets, D., & Gross, M. (2003). 2003. In: Vmv: Optimized spatial hashing for collision detection of deformable objects.
Weise, T., Leibe, B., & Van Gool, L. (2008). Accurate and robust registration for in-hand modeling. In 26th IEEE conference on computer vision and pattern recognition, CVPR (pp. 1–8).
Whelan, T., Leutenegger, S., Salas-Moreno, R. F., Glocker, B., & Davison, A. J. (2015, July). ElasticFusion: Dense SLAM without a pose graph. In Robotics: Science and systems (RSS), Rome.
Whelan, T., Johannsson, H., Kaess, M., Leonard, J. J., & McDonald, J. (2013). Robust real-time visual odometry for dense RGB-D mapping. In 2013 IEEE international conference on robotics and automation (ICRA).
Whelan, T., & Kaess, M. (2013, November). Deformation-based loop closure for large scale dense RGB-D SLAM. In 2013 IEEE/RSJ international conference on intelligent robots and systems (IROS), Tokyo.
Wurm, K. M., Hornung, A., Bennewitz, M., Stachniss, C., & Burgard, W. (2010). OctoMap: A probabilistic, flexible, and compact 3D map representation for robotic systems. In Proceedings of the ICRA 2010 workshop on best practice in 3D perception and modeling for mobile manipulation.
Zeng, M., Zhao, F., Zheng, J., & Liu, X. (2013). Octree-based fusion for realtime 3D reconstruction. Graphical Models, 75(3), 126–136.
Acknowledgements
This work was done with the support of Googles Advanced Technologies and Projects division (ATAP) for Project Tango. The authors thank to Johnny Lee, Joel Hesch, Esha Nerurkar, Simon Lynen, Ryan Hickman and other ATAP members for their close collaboration and support on this project.
Author information
Authors and Affiliations
Corresponding author
Additional information
This work is supported in part by U.S. Army Research Office under Grant No. W911NF0910565, Federal Highway Administration (FHWA) under Grant No. DTFH61-12-H-00002, Google under Grant No. RF-CUNY-65789-00-43, Toyota USA Grant No. 1011344 and U.S. Office of Naval Research Grant No. N000141210613.
This is one of several papers published in Autonomous Robots comprising the “Special Issue on Robotics Science and Systems”.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Supplementary material 1 (mp4 215717 KB)
Rights and permissions
About this article
Cite this article
Dryanovski, I., Klingensmith, M., Srinivasa, S.S. et al. Large-scale, real-time 3D scene reconstruction on a mobile device. Auton Robot 41, 1423–1445 (2017). https://doi.org/10.1007/s10514-017-9624-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10514-017-9624-2