Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3123266.3123276acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Unconstrained Fashion Landmark Detection via Hierarchical Recurrent Transformer Networks

Published: 19 October 2017 Publication History

Abstract

Fashion landmarks are functional key points defined on clothes, such as corners of neckline, hemline, and cuff. They have been recently introduced [18]as an effective visual representation for fashion image understanding. However, detecting fashion landmarks are challenging due to background clutters, human poses, and scales. To remove the above variations, previous works usually assumed bounding boxes of clothes are provided in training and test as additional annotations, which are expensive to obtain and inapplicable in practice. This work addresses unconstrained fashion landmark detection, where clothing bounding boxes are not provided in both training and test. To this end, we present a novel Deep LAndmark Network (DLAN), where bounding boxes and landmarks are jointly estimated and trained iteratively in an end-to-end manner. DLAN contains two dedicated modules, including a Selective Dilated Convolution for handling scale discrepancies, and a Hierarchical Recurrent Spatial Transformer for handling background clutters. To evaluate DLAN, we present a large-scale fashion landmark dataset, namely Unconstrained Landmark Database (ULD), consisting of 30K images. Statistics show that ULD is more challenging than existing datasets in terms of image scales, background clutters, and human poses. Extensive experiments demonstrate the effectiveness of DLAN over the state-of-the-art methods. DLAN also exhibits excellent generalization across different clothing categories and modalities, making it extremely suitable for real-world fashion analysis.

References

[1]
Lukas Bossard, Matthias Dantone, Christian Leistner, Christian Wengert, Till Quack, and Luc Van Gool. 2012. Apparel classification with style. ACCV. 321--335.
[2]
Huizhong Chen, Andrew Gallagher, and Bernd Girod. 2012. Describing clothing by semantic attributes. In ECCV. 609--623.
[3]
Qiang Chen, Junshi Huang, Rogerio Feris, Lisa M Brown, Jian Dong, and Shuicheng Yan. 2015. Deep Domain Adaptation for Describing People Based on Fine-Grained Clothing Attributes CVPR. 5315--5324.
[4]
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database CVPR. 248--255.
[5]
Wei Di, Catherine Wah, Arpit Bhardwaj, Robinson Piramuthu, and Neel Sundaresan. 2013. Style finder: Fine-grained clothing style detection and retrieval CVPR Workshops. 8--13.
[6]
Jianlong Fu, Jinqiao Wang, Zechao Li, Min Xu, and Hanqing Lu. 2012. Efficient clothing retrieval with semantic-preserving visual phrases. ACCV. 420--431.
[7]
Ross Girshick. 2015. Fast r-cnn ICCV. 1440--1448.
[8]
Max Jaderberg, Karen Simonyan, Andrew Zisserman, and others. 2015. Spatial transformer networks. In NIPS. 2017--2025.
[9]
Yushi Jing, David Liu, Dmitry Kislyuk, Andrew Zhai, Jiajing Xu, Jeff Donahue, and Sarah Tavel. 2015. Visual search at pinterest. In KDD. ACM, 1889--1898.
[10]
Yannis Kalantidis, Lyndon Kennedy, and Li-Jia Li. 2013. Getting the look: clothing recognition and segmentation for automatic product suggestions in everyday photos. In ICMR. 105--112.
[11]
M Hadi Kiapour, Xufeng Han, Svetlana Lazebnik, Alexander C Berg, and Tamara L Berg. 2015. Where to Buy It: Matching Street Clothing Photos in Online Shops ICCV.
[12]
M Hadi Kiapour, Kota Yamaguchi, Alexander C Berg, and Tamara L Berg. 2014. Hipster wars: Discovering elements of fashion styles. ECCV. 472--488.
[13]
Yann LeCun, Bernhard Boser, John S Denker, Donnie Henderson, Richard E Howard, Wayne Hubbard, and Lawrence D Jackel. 1989. Backpropagation applied to handwritten zip code recognition. Neural computation, Vol. 1, 4 (1989), 541--551.
[14]
Xiaodan Liang, Chunyan Xu, Xiaohui Shen, Jianchao Yang, Si Liu, Jinhui Tang, Liang Lin, and Shuicheng Yan. 2015. Human parsing with contextualized convolutional neural network ICCV. 1386--1394.
[15]
Kevin Lin, Huei-Fang Yang, Jen-Hao Hsiao, and Chu-Song Chen. 2015. Deep learning of binary hash codes for fast image retrieval CVPR Workshop. 27--35.
[16]
Si Liu, Zheng Song, Guangcan Liu, Changsheng Xu, Hanqing Lu, and Shuicheng Yan. 2012. Street-to-shop: Cross-scenario clothing retrieval via parts alignment and auxiliary set CVPR. 3330--3337.
[17]
Ziwei Liu, Ping Luo, Shi Qiu, Xiaogang Wang, and Xiaoou Tang. 2016. DeepFashion: Powering Robust Clothes Recognition and Retrieval with Rich Annotations CVPR. 1096--1104.
[18]
Ziwei Liu, Sijie Yan, Ping Luo, Xiaogang Wang, and Xiaoou Tang. 2016. Fashion landmark detection in the wild. In ECCV. 229--245.
[19]
Jonathan Long, Evan Shelhamer, and Trevor Darrell. 2015. Fully convolutional networks for semantic segmentation CVPR. 3431--3440.
[20]
Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster r-cnn: Towards real-time object detection with region proposal networks NIPS. 91--99.
[21]
Edgar Simo-Serra, Sanja Fidler, Francesc Moreno-Noguer, and Raquel Urtasun. 2015. Neuroaesthetics in Fashion: Modeling the Perception of Beauty CVPR.
[22]
Edgar Simo-Serra and Hiroshi Ishikawa. 2016. Fashion style in 128 floats: joint ranking and classification using weak data for feature extraction. In CVPR. 298--307.
[23]
Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).
[24]
Jonathan J Tompson, Arjun Jain, Yann LeCun, and Christoph Bregler. 2014. Joint training of a convolutional network and a graphical model for human pose estimation NIPS. 1799--1807.
[25]
Alexander Toshev and Christian Szegedy. 2014. Deeppose: Human pose estimation via deep neural networks CVPR. 1653--1660.
[26]
Xianwang Wang and Tong Zhang. 2011. Clothes search in consumer photos via color matching and attribute learning ACM MM. 1353--1356.
[27]
Kota Yamaguchi, Tamara L Berg, and Luis E Ortiz. 2014. Chic or Social: Visual Popularity Analysis in Online Fashion Networks ACM MM. 773--776.
[28]
Kota Yamaguchi, M Hadi Kiapour, and Tamara L Berg. 2013. Paper doll parsing: Retrieving similar styles to parse clothing items ICCV. 3519--3526.
[29]
Kota Yamaguchi, M Hadi Kiapour, Luis E Ortiz, and Tamara L Berg. 2012. Parsing clothing in fashion photographs. In CVPR. 3570--3577.
[30]
Wei Yang, Ping Luo, and Liang Lin. 2014. Clothing co-parsing by joint image segmentation and labeling CVPR. 3182--3189.
[31]
Fisher Yu and Vladlen Koltun. 2015. Multi-scale context aggregation by dilated convolutions. arXiv preprint arXiv:1511.07122 (2015).
[32]
Xiangxin Zhu and Deva Ramanan. 2012. Face detection, pose estimation, and landmark localization in the wild CVPR. 2879--2886.
[33]
C Lawrence Zitnick and Piotr Dollár. 2014. Edge boxes: Locating object proposals from edges. ECCV. 391--405.

Cited By

View all
  • (2024)Toward Fashion Intelligence in the Big Data Era: State-of-the-Art and Future ProspectsIEEE Transactions on Consumer Electronics10.1109/TCE.2023.328588070:1(36-57)Online publication date: Feb-2024
  • (2024)Multi-keypoints matching network for clothing detectionThe Visual Computer10.1007/s00371-024-03337-yOnline publication date: 25-Mar-2024
  • (2023)Multimodal Fashion Knowledge Extraction as CaptioningProceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region10.1145/3624918.3625315(52-62)Online publication date: 26-Nov-2023
  • Show More Cited By

Index Terms

  1. Unconstrained Fashion Landmark Detection via Hierarchical Recurrent Transformer Networks

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MM '17: Proceedings of the 25th ACM international conference on Multimedia
    October 2017
    2028 pages
    ISBN:9781450349062
    DOI:10.1145/3123266
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 19 October 2017

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. convolutional neural network
    2. deep learning
    3. landmark detection
    4. visual fashion understanding

    Qualifiers

    • Research-article

    Conference

    MM '17
    Sponsor:
    MM '17: ACM Multimedia Conference
    October 23 - 27, 2017
    California, Mountain View, USA

    Acceptance Rates

    MM '17 Paper Acceptance Rate 189 of 684 submissions, 28%;
    Overall Acceptance Rate 995 of 4,171 submissions, 24%

    Upcoming Conference

    MM '24
    The 32nd ACM International Conference on Multimedia
    October 28 - November 1, 2024
    Melbourne , VIC , Australia

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)18
    • Downloads (Last 6 weeks)2
    Reflects downloads up to 30 Aug 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Toward Fashion Intelligence in the Big Data Era: State-of-the-Art and Future ProspectsIEEE Transactions on Consumer Electronics10.1109/TCE.2023.328588070:1(36-57)Online publication date: Feb-2024
    • (2024)Multi-keypoints matching network for clothing detectionThe Visual Computer10.1007/s00371-024-03337-yOnline publication date: 25-Mar-2024
    • (2023)Multimodal Fashion Knowledge Extraction as CaptioningProceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region10.1145/3624918.3625315(52-62)Online publication date: 26-Nov-2023
    • (2023)Improve Fashion Landmark Detection with Cross-Stage Connection Network2023 9th Annual International Conference on Network and Information Systems for Computers (ICNISC)10.1109/ICNISC60562.2023.00065(187-190)Online publication date: 27-Oct-2023
    • (2023)Linking Garment with Person via Semantically Associated Landmarks for Virtual Try-On2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52729.2023.01649(17194-17204)Online publication date: Jun-2023
    • (2022)AABLSTM: A Novel Multi-task Based CNN-RNN Deep Model for Fashion AnalysisACM Transactions on Multimedia Computing, Communications, and Applications10.1145/351902919:1(1-18)Online publication date: 12-Mar-2022
    • (2022)Cross-Rolling Attention Network for Fashion Landmark Detection2022 26th International Conference on Pattern Recognition (ICPR)10.1109/ICPR56361.2022.9956116(4837-4843)Online publication date: 21-Aug-2022
    • (2022)Improving Fashion Attribute Classification Accuracy with Limited Labeled Data Using Transfer Learning2022 21st IEEE International Conference on Machine Learning and Applications (ICMLA)10.1109/ICMLA55696.2022.00195(1210-1217)Online publication date: Dec-2022
    • (2022)Deep Learning Approaches for Fashion Knowledge Extraction From Social Media: A ReviewIEEE Access10.1109/ACCESS.2021.313789310(1545-1576)Online publication date: 2022
    • (2022)You can try without visiting: a comprehensive survey on virtually try-on outfitsMultimedia Tools and Applications10.1007/s11042-022-12802-681:14(19967-19998)Online publication date: 10-Mar-2022
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media