Zoom Better to See Clearer: Human and Object Parsing with Hierarchical Auto-Zoom Net

Xia, Fangting; Wang, Peng; Chen, Liang-Chieh; Yuille, Alan L.

Computer Science > Computer Vision and Pattern Recognition

arXiv:1511.06881 (cs)

[Submitted on 21 Nov 2015 (v1), last revised 28 Mar 2016 (this version, v5)]

Title:Zoom Better to See Clearer: Human and Object Parsing with Hierarchical Auto-Zoom Net

Authors:Fangting Xia, Peng Wang, Liang-Chieh Chen, Alan L. Yuille

View PDF

Abstract:Parsing articulated objects, e.g. humans and animals, into semantic parts (e.g. body, head and arms, etc.) from natural images is a challenging and fundamental problem for computer vision. A big difficulty is the large variability of scale and location for objects and their corresponding parts. Even limited mistakes in estimating scale and location will degrade the parsing output and cause errors in boundary details. To tackle these difficulties, we propose a "Hierarchical Auto-Zoom Net" (HAZN) for object part parsing which adapts to the local scales of objects and parts. HAZN is a sequence of two "Auto-Zoom Net" (AZNs), each employing fully convolutional networks that perform two tasks: (1) predict the locations and scales of object instances (the first AZN) or their parts (the second AZN); (2) estimate the part scores for predicted object instance or part regions. Our model can adaptively "zoom" (resize) predicted image regions into their proper scales to refine the parsing.
We conduct extensive experiments over the PASCAL part datasets on humans, horses, and cows. For humans, our approach significantly outperforms the state-of-the-arts by 5% mIOU and is especially better at segmenting small instances and small parts. We obtain similar improvements for parsing cows and horses over alternative methods. In summary, our strategy of first zooming into objects and then zooming into parts is very effective. It also enables us to process different regions of the image at different scales adaptively so that, for example, we do not need to waste computational resources scaling the entire image.

Comments:	A shortened version has been submitted to ECCV 2016
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Cite as:	arXiv:1511.06881 [cs.CV]
	(or arXiv:1511.06881v5 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.1511.06881

Submission history

From: Fangting Xia [view email]
[v1] Sat, 21 Nov 2015 13:32:26 UTC (3,521 KB)
[v2] Wed, 25 Nov 2015 00:39:14 UTC (74,139 KB)
[v3] Mon, 30 Nov 2015 02:32:33 UTC (75,777 KB)
[v4] Thu, 7 Jan 2016 23:48:34 UTC (75,949 KB)
[v5] Mon, 28 Mar 2016 21:53:31 UTC (8,418 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Zoom Better to See Clearer: Human and Object Parsing with Hierarchical Auto-Zoom Net

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Zoom Better to See Clearer: Human and Object Parsing with Hierarchical Auto-Zoom Net

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators