Depth Estimation with Simplified Transformer

Yang, John; An, Le; Dixit, Anurag; Koo, Jinkyu; Park, Su Inn

Computer Science > Computer Vision and Pattern Recognition

arXiv:2204.13791 (cs)

[Submitted on 28 Apr 2022 (v1), last revised 27 May 2022 (this version, v3)]

Title:Depth Estimation with Simplified Transformer

Authors:John Yang, Le An, Anurag Dixit, Jinkyu Koo, Su Inn Park

View PDF

Abstract:Transformer and its variants have shown state-of-the-art results in many vision tasks recently, ranging from image classification to dense prediction. Despite of their success, limited work has been reported on improving the model efficiency for deployment in latency-critical applications, such as autonomous driving and robotic navigation. In this paper, we aim at improving upon the existing transformers in vision, and propose a method for self-supervised monocular Depth Estimation with Simplified Transformer (DEST), which is efficient and particularly suitable for deployment on GPU-based platforms. Through strategic design choices, our model leads to significant reduction in model size, complexity, as well as inference latency, while achieving superior accuracy as compared to state-of-the-art. We also show that our design generalize well to other dense prediction task without bells and whistles.

Comments:	Accepted for the CVPR 2022 Transformers For Vision (T4V) workshop
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Cite as:	arXiv:2204.13791 [cs.CV]
	(or arXiv:2204.13791v3 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2204.13791

Submission history

From: Le An [view email]
[v1] Thu, 28 Apr 2022 21:39:00 UTC (18,642 KB)
[v2] Wed, 25 May 2022 17:04:44 UTC (18,642 KB)
[v3] Fri, 27 May 2022 23:14:52 UTC (18,642 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Depth Estimation with Simplified Transformer

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Depth Estimation with Simplified Transformer

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators