research-article

Real-time, High-resolution Depth Upsampling on Embedded Accelerators

Authors:

David Langerman,

Alan GeorgeAuthors Info & Claims

ACM Transactions on Embedded Computing Systems (TECS), Volume 20, Issue 3

Article No.: 18, Pages 1 - 22

https://doi.org/10.1145/3436878

Published: 27 March 2021 Publication History

Abstract

High-resolution, low-latency apps in computer vision are ubiquitous in today’s world of mixed-reality devices. These innovations provide a platform that can leverage the improving technology of depth sensors and embedded accelerators to enable higher-resolution, lower-latency processing for 3D scenes using depth-upsampling algorithms. This research demonstrates that filter-based upsampling algorithms are feasible for mixed-reality apps using low-power hardware accelerators. The authors parallelized and evaluated a depth-upsampling algorithm on two different devices: a reconfigurable-logic FPGA embedded within a low-power SoC; and a fixed-logic embedded graphics processing unit. We demonstrate that both accelerators can meet the real-time requirements of 11 ms latency for mixed-reality apps.¹

References

[1]

NVIDIA. 2020. NVIDIA Tensor Cores: Versatility for HPC & AI. Retrieved from https://www.nvidia.com/en-us/data-center/tensor-cores/.

[2]

Passmark. 2020. PassMark PerformanceTest - PC benchmark software. Retrieved from https://www.passmark.com/products/performancetest/.

[3]

Amira Belhedi, Adrien Bartoli, Steve Bourgeois, Vincent Gay-Bellile, Kamel Hamrouni, and Patrick Sayd. 2015. Noise modelling in time-of-flight sensors with application to depth noise removal and uncertainty estimation in three-dimensional measurement. IET Comput. Vis. 9, 6 (2015), 967--977.

[4]

Ankita Bhutani and Pallavi Bhardwaj. 2017. Augmented Reality Market Size, Analysis - Industry Share 2017-2024. Retrieved from https://www.gminsights.com/ industry-analysis/augmented-reality-ar-market.

[5]

Atman Binstock. 2015. Powering the Rift. Retrieved from https://www.oculus.com/blog/powering-the-rift/.

[6]

J. Mark Bull. 1999. Measuring synchronisation and scheduling overheads in OpenMP. In Proceedings of the 1st European Workshop on OpenMP, Vol. 8. 49.

[7]

Derek Chan, Hylke Buisman, Christian Theobalt, and Sebastian Thrun. 2008. A noise-aware filter for real-time depth upsampling. In Proceedings of the Workshop on Multi-Camera and Multi-modal Sensor Fusion Algorithms and Applications.

[8]

T. Edeler, K. Ohliger, S. Hussmann, and A. Mertins. 2010. Time-of-flight depth image denoising using prior noise information. In Proceedings of the IEEE 10th International Conference on Signal Processing. 119--122.

[9]

Ivan Eichhardt, Dmitry Chetverikov, and Zsolt Janko. 2017. Image-guided ToF depth upsampling: A survey. Mach. Vis. Applic. 28, 3--4 (2017), 267--282.

[10]

Georgios Evangelidis, Miles Hansard, and Radu Horaud. 2015. Fusion of range and stereo data for high-resolution scene-modeling. IEEE Trans. Pattern Anal. Mach. Intell. 37, 11 (Nov. 2015), 2178--2192.

Digital Library

[11]

Anna Gabiger-Rose, Matthias Kube, Robert Weigel, and Richard Rose. 2013. An FPGA-based fully synchronized design of a bilateral filter for real-time image denoising. IEEE Trans. Industr. Electron. 61, 8 (2013), 4093--4104.

[12]

Vineet Gandhi, Jan Čech, and Radu Horaud. 2012. High-resolution depth maps based on TOF-stereo fusion. In Proceedings of the IEEE International Conference on Robotics and Automation. IEEE, 4742--4749.

[13]

HTC. 2018. VIVE Virtual Reality System. Retrieved from https://www.vive.com/us/product/vive-virtual-reality-system/.

[14]

Xilinx Inc. 2019. Xilinx Zynq UltraScale+ MPSoC ZCU102 Evaluation Kit. Retrieved from https://www.xilinx.com/products/boards-and-kits/ek-u1-zcu102-g.html.

[15]

M. Jordà, P. Valero-Lara, and A. J. Peña. 2019. Performance evaluation of cuDNN convolution algorithms on NVIDIA Volta GPUs. IEEE Access 7 (2019), 70461--70473.

[16]

Johannes Kopf, Michael F. Cohen, Dani Lischinski, and Matt Uyttendaele. 2007. Joint bilateral upsampling. In ACM Transactions on Graphics, Vol. 26. ACM, 96.

Digital Library

[17]

David Langerman, Sebastian Sabogal, Barath Ramesh, and Alan George. 2018. Accelerating real-time, high-resolution depth upsampling on FPGAs. In Proceedings of the IEEE International Conference on Image Processing, Applications and Systems (IPAS’18). 37--42.

[18]

K. Mohammad and S. Agaian. 2009. Efficient FPGA implementation of convolution. In Proceedings of the IEEE International Conference on Systems, Man and Cybernetics. 3478--3483.

[19]

Vladimir Nekrasov, Chunhua Shen, and Ian D. Reid. 2018. Light-weight RefineNet for real-time semantic segmentation. In Proceedings of the British Machine Vision Conference (BMVC’18).

[20]

Nicholas Nethercote and Julian Seward. 2007. Valgrind: A framework for heavyweight dynamic binary instrumentation. In Proceedings of the 28th ACM SIGPLAN Conference on Programming Language Design and Implementation.

Digital Library

[21]

Daniel Scharstein and Chris Pal. 2007. Learning conditional random fields for stereo. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 1--8.

[22]

Daniel Scharstein and Richard Szeliski. 2002. A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. Int. J. Comput. Vis. 47, 1--3 (2002), 7--42.

Digital Library

[23]

Daniel Scharstein and Richard Szeliski. 2003. High-accuracy stereo depth maps using structured light. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Vol. 1. IEEE, I--I.

[24]

Ryan Shea, Andy Sun, Silvery Fu, and Jiangchuan Liu. 2017. Towards fully offloaded cloud-based AR: Design, implementation and experience. In Proceedings of the 8th ACM on Multimedia Systems Conference. ACM, 321--330.

Digital Library

[25]

H. M. Waidyasooriya and M. Hariyama. 2019. Multi-FPGA accelerator architecture for stencil computation exploiting spacial and temporal scalability. IEEE Access 7 (2019), 53188--53201.

[26]

K. Wiatr and E. Jamro. 2000. Implementation image data convolutions operations in FPGA reconfigurable structures for real-time vision systems. In Proceedings of the International Conference on Information Technology: Coding and Computing (Cat. No.PR00540). 152--157.

[27]

Liang Yuan, Xin Jin, Yangguang Li, and Chun Yuan. 2017. Depth map super-resolution via low-resolution depth guided joint trilateral up-sampling. J. Vis. Commun. Image Repres. 46 (2017), 280--291.

Digital Library

[28]

Ming-Ze Yuan, Lin Gao, Hongbo Fu, and Shihong Xia. 2019. Temporal upsampling of depth maps using a hybrid camera. IEEE Trans. Vis. Comput. Graph. 25, 3 (Mar. 2019), 1591--1602.

[29]

David J. Zielinski, Hrishikesh M. Rao, Mark A. Sommer, and Regis Kopper. 2015. Exploring the effects of image persistence in low frame rate virtual environments. In Proceedings of the IEEE Virtual Reality Conference (VR’15). IEEE, 19--26.

Cited By

Thomas K APoddar SMondal H(2022)A CNN Hardware Accelerator Using Triangle-based ConvolutionACM Journal on Emerging Technologies in Computing Systems10.1145/354497518:4(1-23)Online publication date: 27-Jun-2022
https://dl.acm.org/doi/10.1145/3544975

Index Terms

Real-time, High-resolution Depth Upsampling on Embedded Accelerators
1. Computer systems organization
  1. Embedded and cyber-physical systems
    1. Embedded systems
  2. Real-time systems
    1. Real-time system architecture
2. Computing methodologies
  1. Parallel computing methodologies
    1. Parallel algorithms

Recommendations

From software to accelerators with LegUp high-level synthesis
CASES '13: Proceedings of the 2013 International Conference on Compilers, Architectures and Synthesis for Embedded Systems

Embedded system designers can achieve energy and performance benefits by using dedicated hardware accelerators. However, implementing custom hardware accelerators for an application can be difficult and time intensive. LegUp is an open-source high-level ...
Code generation from a domain-specific language for C-based HLS of hardware accelerators
CODES '14: Proceedings of the 2014 International Conference on Hardware/Software Codesign and System Synthesis

As today's computer architectures are becoming more and more heterogeneous, a plethora of options including CPUs, GPUs, DSPs, reconfigurable logic (FPGAs), and other application-specific processors come into consideration for close-to-sensor processing. ...
A hardware time manager implementation for the Xenomai real-time kernel of embedded Linux
2nd Workshop on Embed With Linux (EWiLi 2012)

Nowadays, the use of embedded operating systems in different embedded projects is subject to a tremendous growth. Embedded Linux is becoming one of those most popular EOSs due to its modularity, efficiency, reliability, and cost. One way to make it hard ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Embedded Computing Systems

ACM Transactions on Embedded Computing Systems Volume 20, Issue 3

May 2021

217 pages

ISSN:1539-9087

EISSN:1558-3465

DOI:10.1145/3458920

Editor:
Tulika Mitra
National University of Singapore, Singapore

Issue’s Table of Contents

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Journal Family

ACM Journals for the Design of Smart and Connected Systems

Publication History

Published: 27 March 2021

Accepted: 01 November 2020

Revised: 01 October 2020

Received: 01 June 2020

Published in TECS Volume 20, Issue 3

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

National ScienceFoundation

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
173
Total Downloads

Downloads (Last 12 months)23
Downloads (Last 6 weeks)1

Reflects downloads up to 11 Aug 2024

Other Metrics

View Author Metrics

Citations

Cited By

Thomas K APoddar SMondal H(2022)A CNN Hardware Accelerator Using Triangle-based ConvolutionACM Journal on Emerging Technologies in Computing Systems10.1145/354497518:4(1-23)Online publication date: 27-Jun-2022
https://dl.acm.org/doi/10.1145/3544975

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Issue’s Table of Contents