research-article

Unconstrained Fashion Landmark Detection via Hierarchical Recurrent Transformer Networks

Authors:

Xiaoou TangAuthors Info & Claims

MM '17: Proceedings of the 25th ACM international conference on Multimedia

Pages 172 - 180

https://doi.org/10.1145/3123266.3123276

Published: 19 October 2017 Publication History

Abstract

Fashion landmarks are functional key points defined on clothes, such as corners of neckline, hemline, and cuff. They have been recently introduced [18]as an effective visual representation for fashion image understanding. However, detecting fashion landmarks are challenging due to background clutters, human poses, and scales. To remove the above variations, previous works usually assumed bounding boxes of clothes are provided in training and test as additional annotations, which are expensive to obtain and inapplicable in practice. This work addresses unconstrained fashion landmark detection, where clothing bounding boxes are not provided in both training and test. To this end, we present a novel Deep LAndmark Network (DLAN), where bounding boxes and landmarks are jointly estimated and trained iteratively in an end-to-end manner. DLAN contains two dedicated modules, including a Selective Dilated Convolution for handling scale discrepancies, and a Hierarchical Recurrent Spatial Transformer for handling background clutters. To evaluate DLAN, we present a large-scale fashion landmark dataset, namely Unconstrained Landmark Database (ULD), consisting of 30K images. Statistics show that ULD is more challenging than existing datasets in terms of image scales, background clutters, and human poses. Extensive experiments demonstrate the effectiveness of DLAN over the state-of-the-art methods. DLAN also exhibits excellent generalization across different clothing categories and modalities, making it extremely suitable for real-world fashion analysis.

References

[1]

Lukas Bossard, Matthias Dantone, Christian Leistner, Christian Wengert, Till Quack, and Luc Van Gool. 2012. Apparel classification with style. ACCV. 321--335.

Digital Library

[2]

Huizhong Chen, Andrew Gallagher, and Bernd Girod. 2012. Describing clothing by semantic attributes. In ECCV. 609--623.

Digital Library

[3]

Qiang Chen, Junshi Huang, Rogerio Feris, Lisa M Brown, Jian Dong, and Shuicheng Yan. 2015. Deep Domain Adaptation for Describing People Based on Fine-Grained Clothing Attributes CVPR. 5315--5324.

[4]

Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database CVPR. 248--255.

[5]

Wei Di, Catherine Wah, Arpit Bhardwaj, Robinson Piramuthu, and Neel Sundaresan. 2013. Style finder: Fine-grained clothing style detection and retrieval CVPR Workshops. 8--13.

Digital Library

[6]

Jianlong Fu, Jinqiao Wang, Zechao Li, Min Xu, and Hanqing Lu. 2012. Efficient clothing retrieval with semantic-preserving visual phrases. ACCV. 420--431.

Digital Library

[7]

Ross Girshick. 2015. Fast r-cnn ICCV. 1440--1448.

Digital Library

[8]

Max Jaderberg, Karen Simonyan, Andrew Zisserman, and others. 2015. Spatial transformer networks. In NIPS. 2017--2025.

Digital Library

[9]

Yushi Jing, David Liu, Dmitry Kislyuk, Andrew Zhai, Jiajing Xu, Jeff Donahue, and Sarah Tavel. 2015. Visual search at pinterest. In KDD. ACM, 1889--1898.

Digital Library

[10]

Yannis Kalantidis, Lyndon Kennedy, and Li-Jia Li. 2013. Getting the look: clothing recognition and segmentation for automatic product suggestions in everyday photos. In ICMR. 105--112.

Digital Library

[11]

M Hadi Kiapour, Xufeng Han, Svetlana Lazebnik, Alexander C Berg, and Tamara L Berg. 2015. Where to Buy It: Matching Street Clothing Photos in Online Shops ICCV.

Digital Library

[12]

M Hadi Kiapour, Kota Yamaguchi, Alexander C Berg, and Tamara L Berg. 2014. Hipster wars: Discovering elements of fashion styles. ECCV. 472--488.

[13]

Yann LeCun, Bernhard Boser, John S Denker, Donnie Henderson, Richard E Howard, Wayne Hubbard, and Lawrence D Jackel. 1989. Backpropagation applied to handwritten zip code recognition. Neural computation, Vol. 1, 4 (1989), 541--551.

Digital Library

[14]

Xiaodan Liang, Chunyan Xu, Xiaohui Shen, Jianchao Yang, Si Liu, Jinhui Tang, Liang Lin, and Shuicheng Yan. 2015. Human parsing with contextualized convolutional neural network ICCV. 1386--1394.

Digital Library

[15]

Kevin Lin, Huei-Fang Yang, Jen-Hao Hsiao, and Chu-Song Chen. 2015. Deep learning of binary hash codes for fast image retrieval CVPR Workshop. 27--35.

[16]

Si Liu, Zheng Song, Guangcan Liu, Changsheng Xu, Hanqing Lu, and Shuicheng Yan. 2012. Street-to-shop: Cross-scenario clothing retrieval via parts alignment and auxiliary set CVPR. 3330--3337.

Digital Library

[17]

Ziwei Liu, Ping Luo, Shi Qiu, Xiaogang Wang, and Xiaoou Tang. 2016. DeepFashion: Powering Robust Clothes Recognition and Retrieval with Rich Annotations CVPR. 1096--1104.

[18]

Ziwei Liu, Sijie Yan, Ping Luo, Xiaogang Wang, and Xiaoou Tang. 2016. Fashion landmark detection in the wild. In ECCV. 229--245.

[19]

Jonathan Long, Evan Shelhamer, and Trevor Darrell. 2015. Fully convolutional networks for semantic segmentation CVPR. 3431--3440.

[20]

Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster r-cnn: Towards real-time object detection with region proposal networks NIPS. 91--99.

Digital Library

[21]

Edgar Simo-Serra, Sanja Fidler, Francesc Moreno-Noguer, and Raquel Urtasun. 2015. Neuroaesthetics in Fashion: Modeling the Perception of Beauty CVPR.

[22]

Edgar Simo-Serra and Hiroshi Ishikawa. 2016. Fashion style in 128 floats: joint ranking and classification using weak data for feature extraction. In CVPR. 298--307.

[23]

Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).

[24]

Jonathan J Tompson, Arjun Jain, Yann LeCun, and Christoph Bregler. 2014. Joint training of a convolutional network and a graphical model for human pose estimation NIPS. 1799--1807.

Digital Library

[25]

Alexander Toshev and Christian Szegedy. 2014. Deeppose: Human pose estimation via deep neural networks CVPR. 1653--1660.

Digital Library

[26]

Xianwang Wang and Tong Zhang. 2011. Clothes search in consumer photos via color matching and attribute learning ACM MM. 1353--1356.

Digital Library

[27]

Kota Yamaguchi, Tamara L Berg, and Luis E Ortiz. 2014. Chic or Social: Visual Popularity Analysis in Online Fashion Networks ACM MM. 773--776.

Digital Library

[28]

Kota Yamaguchi, M Hadi Kiapour, and Tamara L Berg. 2013. Paper doll parsing: Retrieving similar styles to parse clothing items ICCV. 3519--3526.

Digital Library

[29]

Kota Yamaguchi, M Hadi Kiapour, Luis E Ortiz, and Tamara L Berg. 2012. Parsing clothing in fashion photographs. In CVPR. 3570--3577.

Digital Library

[30]

Wei Yang, Ping Luo, and Liang Lin. 2014. Clothing co-parsing by joint image segmentation and labeling CVPR. 3182--3189.

Digital Library

[31]

Fisher Yu and Vladlen Koltun. 2015. Multi-scale context aggregation by dilated convolutions. arXiv preprint arXiv:1511.07122 (2015).

[32]

Xiangxin Zhu and Deva Ramanan. 2012. Face detection, pose estimation, and landmark localization in the wild CVPR. 2879--2886.

Digital Library

[33]

C Lawrence Zitnick and Piotr Dollár. 2014. Edge boxes: Locating object proposals from edges. ECCV. 391--405.

Cited By

Liu LZhang HZhou DShi J(2024)Toward Fashion Intelligence in the Big Data Era: State-of-the-Art and Future ProspectsIEEE Transactions on Consumer Electronics10.1109/TCE.2023.328588070:1(36-57)Online publication date: Feb-2024
https://doi.org/10.1109/TCE.2023.3285880
Li YZhang WWu MZhang DWang ZYou C(2024)Multi-keypoints matching network for clothing detectionThe Visual Computer10.1007/s00371-024-03337-yOnline publication date: 25-Mar-2024
https://doi.org/10.1007/s00371-024-03337-y
Yuan YZhang WDeng YLam W(2023)Multimodal Fashion Knowledge Extraction as CaptioningProceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region10.1145/3624918.3625315(52-62)Online publication date: 26-Nov-2023
https://dl.acm.org/doi/10.1145/3624918.3625315
Show More Cited By

Index Terms

Unconstrained Fashion Landmark Detection via Hierarchical Recurrent Transformer Networks
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision

Recommendations

Deep Fashion Analysis with Feature Map Upsampling and Landmark-Driven Attention
Computer Vision – ECCV 2018 Workshops
Abstract
In this paper, we propose an attentive fashion network to address three problems of fashion analysis, namely landmark localization, category classification and attribute prediction. By utilizing a landmark prediction branch with upsampling network ...
Multi-scale Hybrid Transformer Network with Grouped Convolutional Embedding for Automatic Cephalometric Landmark Detection
Computer-Aided Design and Computer Graphics
Abstract
Detection of anatomical landmarks in lateral cephalometric images is critical for orthodontic and orthognathic surgery. However, the industry faces the challenge of developing automatic cephalometric detection methods that are both precise and ...
Outdoor Landmark Detection for Real-World Localization using Faster R-CNN
ICCMA 2018: Proceedings of the 6th International Conference on Control, Mechatronics and Automation

This paper presents a method for outdoor localization using deep learning-based landmark detection. The proposed localization method relies on the Faster Regional Convolutional Neural Network (Faster R-CNN) landmark detector and the feedforward neural ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MM '17: Proceedings of the 25th ACM international conference on Multimedia

October 2017

2028 pages

ISBN:9781450349062

DOI:10.1145/3123266

General Chairs:
Qiong Liu
FXPAL, USA
,
Rainer Lienhart
Universität Augsburg, Germany
,
Haohong Wang
TCL America, USA
,
Program Chairs:
Sheng-Wei "Kuan-Ta" Chen
Academia Sinica, Taiwan
,
Susanne Boll
University of Oldenburg, Germany
,
Phoebe Chen
La Trobe University, Australia
,
Gerald Friedland
Lawrence Livermore National Lab, USA
,
Jia Li
Google, USA
,
Shuicheng Yan
Qihoo 360, China

Copyright © 2017 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 19 October 2017

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

MM '17

Sponsor:

SIGMM

MM '17: ACM Multimedia Conference

October 23 - 27, 2017

California, Mountain View, USA

Acceptance Rates

MM '17 Paper Acceptance Rate 189 of 684 submissions, 28%;

Overall Acceptance Rate 995 of 4,171 submissions, 24%

Upcoming Conference

MM '24

Sponsor:
sigmm

The 32nd ACM International Conference on Multimedia

October 28 - November 1, 2024

Melbourne , VIC , Australia

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

41
Total Citations
View Citations
490
Total Downloads

Downloads (Last 12 months)18
Downloads (Last 6 weeks)2

Reflects downloads up to 30 Aug 2024

Other Metrics

View Author Metrics

Citations

Cited By

Liu LZhang HZhou DShi J(2024)Toward Fashion Intelligence in the Big Data Era: State-of-the-Art and Future ProspectsIEEE Transactions on Consumer Electronics10.1109/TCE.2023.328588070:1(36-57)Online publication date: Feb-2024
https://doi.org/10.1109/TCE.2023.3285880
Li YZhang WWu MZhang DWang ZYou C(2024)Multi-keypoints matching network for clothing detectionThe Visual Computer10.1007/s00371-024-03337-yOnline publication date: 25-Mar-2024
https://doi.org/10.1007/s00371-024-03337-y
Yuan YZhang WDeng YLam W(2023)Multimodal Fashion Knowledge Extraction as CaptioningProceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region10.1145/3624918.3625315(52-62)Online publication date: 26-Nov-2023
https://dl.acm.org/doi/10.1145/3624918.3625315
Wang XZhong Y(2023)Improve Fashion Landmark Detection with Cross-Stage Connection Network2023 9th Annual International Conference on Network and Information Systems for Computers (ICNISC)10.1109/ICNISC60562.2023.00065(187-190)Online publication date: 27-Oct-2023
https://doi.org/10.1109/ICNISC60562.2023.00065
Yan KGao TZhang HXie C(2023)Linking Garment with Person via Semantically Associated Landmarks for Virtual Try-On2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52729.2023.01649(17194-17204)Online publication date: Jun-2023
https://doi.org/10.1109/CVPR52729.2023.01649
Zhang XShen MLi XWang X(2022)AABLSTM: A Novel Multi-task Based CNN-RNN Deep Model for Fashion AnalysisACM Transactions on Multimedia Computing, Communications, and Applications10.1145/351902919:1(1-18)Online publication date: 12-Mar-2022
https://dl.acm.org/doi/10.1145/3519029
Chen YZhou FSu Z(2022)Cross-Rolling Attention Network for Fashion Landmark Detection2022 26th International Conference on Pattern Recognition (ICPR)10.1109/ICPR56361.2022.9956116(4837-4843)Online publication date: 21-Aug-2022
https://doi.org/10.1109/ICPR56361.2022.9956116
Chen TNoh JCranfill LMorris JSon J(2022)Improving Fashion Attribute Classification Accuracy with Limited Labeled Data Using Transfer Learning2022 21st IEEE International Conference on Machine Learning and Applications (ICMLA)10.1109/ICMLA55696.2022.00195(1210-1217)Online publication date: Dec-2022
https://doi.org/10.1109/ICMLA55696.2022.00195
Mameli MPaolanti MPietrini RPazzaglia GFrontoni EZingaretti P(2022)Deep Learning Approaches for Fashion Knowledge Extraction From Social Media: A ReviewIEEE Access10.1109/ACCESS.2021.313789310(1545-1576)Online publication date: 2022
https://doi.org/10.1109/ACCESS.2021.3137893
Ghodhbani HNeji MRazzak IAlimi A(2022)You can try without visiting: a comprehensive survey on virtually try-on outfitsMultimedia Tools and Applications10.1007/s11042-022-12802-681:14(19967-19998)Online publication date: 10-Mar-2022
https://doi.org/10.1007/s11042-022-12802-6
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents