Deep Learning with Discriminative Margin Loss for Cross-Domain Consumer-to-Shop Clothes Retrieval
Abstract
:1. Introduction
- A cross-domain discriminative loss function, called DML, is proposed to learn deep discriminative features for customer-to-shop fashion search.
- DML learns a larger margin for the negative class compared to the positive class to increase the variation between classes and reduce the negative class.
2. Related Work
2.1. Fashion Retrieval
2.2. Loss Functions
3. The Proposed Approach
3.1. Motivation
3.2. Cross-Domain Discriminative Margin Loss (Dml)
3.3. Comparison to Other Loss Functions
4. Experiments
4.1. Datasets
4.2. Implementation Details
4.3. Experimental Results
4.4. Comparison with Other Loss Functions
4.5. Effects of and on Discriminative Margin Loss
5. Discussion
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Hadi Kiapour, M.; Han, X.; Lazebnik, S.; Berg, A.; Berg, T. Where to buy it: Matching street clothing photos in online shops. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 3343–3351. [Google Scholar]
- Li, Z.; Li, Y.; Gao, Y.; Liu, Y. Fast cross-scenario clothing retrieval based on indexing deep features. In Pacific Rim Conference on Multimedia; Springer: Cham, Switzerland, 2016; pp. 107–118. [Google Scholar]
- Liu, S.; Song, Z.; Liu, G.; Xu, C.; Lu, H.; Yan, S. Street-to-shop: Cross-scenario clothing retrieval via parts alignment and auxiliary set. In Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, USA, 16–21 June 2012; pp. 3330–3337. [Google Scholar]
- Wang, X.; Sun, Z.; Zhang, W.; Zhou, Y.; Jiang, Y. Matching user photos to online products with robust deep features. In Proceedings of the 2016 ACM on International Conference on Multimedia Retrieval, New York, NY, USA, 6–9 June 2016; pp. 7–14. [Google Scholar]
- Kalantidis, Y.; Kennedy, L.; Li, L. Getting the look: Clothing recognition and segmentation for automatic product suggestions in everyday photos. In Proceedings of the 3rd ACM Conference on International Conference on Multimedia Retrieval, Dallas, TX, USA, 16–20 April 2013; pp. 105–112. [Google Scholar]
- Ji, X.; Wang, W.; Zhang, M.; Yang, Y. Cross-domain image retrieval with attention modeling. In Proceedings of the 25th ACM International Conference on Multimedia, Mountain View, CA, USA, 23–27 October 2017; pp. 1654–1662. [Google Scholar]
- Cheng, Z.; Wu, X.; Liu, Y.; Hua, X. Video2shop: Exact matching clothes in videos to online shopping images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 4048–4056. [Google Scholar]
- Wang, Z.; Gu, Y.; Zhang, Y.; Zhou, J.; Gu, X. Clothing retrieval with visual attention model. In Proceedings of the 2017 IEEE Visual Communications and Image Processing (VCIP), St. Petersburg, FL, USA, 10–13 December 2017; pp. 1–4. [Google Scholar]
- Lasserre, J.; Bracher, C.; Vollgraf, R. Street2Fashion2Shop: Enabling Visual Search in Fashion e-Commerce Using Studio Images. In Proceedings of the International Conference on Pattern Recognition Applications and Methods; Springer: Cham, Switzerland, 2018; pp. 3–26. [Google Scholar]
- Gajic, B.; Baldrich, R. Cross-domain fashion image retrieval. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Salt Lake City, UT, USA, 18–22 June 2018; pp. 1869–1871. [Google Scholar]
- Kuang, Z.; Gao, Y.; Li, G.; Luo, P.; Chen, Y.; Lin, L.; Zhang, W. Fashion Retrieval via Graph Reasoning Networks on a Similarity Pyramid. arXiv 2019, arXiv:1908.11754. [Google Scholar]
- Park, S.; Shin, M.; Ham, S.; Choe, S.; Kang, Y. Study on Fashion Image Retrieval Methods for Efficient Fashion Visual Search. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Long Beach, CA, USA, 16–17 June 2019. [Google Scholar]
- Kucer, M.; Murray, N. A Detect-Then-Retrieve Model for Multi-Domain Fashion Item Retrieval. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Long Beach, CA, USA, 16–17 June 2019. [Google Scholar]
- Chopra, A.; Sinha, A.; Gupta, H.; Sarkar, M.; Ayush, K.; Krishnamurthy, B. Powering Robust Fashion Retrieval With Information Rich Feature Embeddings. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Long Beach, CA, USA, 16–17 June 2019. [Google Scholar]
- Miao, Y.; Li, G.; Bao, C.; Zhang, J.; Wang, J. ClothingNet: Cross-Domain Clothing Retrieval With Feature Fusion and Quadruplet Loss. IEEE Access 2020, 8, 142669–142679. [Google Scholar] [CrossRef]
- Liu, Z.; Luo, P.; Qiu, S.; Wang, X.; Tang, X. Deepfashion: Powering robust clothes recognition and retrieval with rich annotations. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 1096–1104. [Google Scholar]
- Huang, J.; Feris, R.; Chen, Q.; Yan, S. Cross-domain image retrieval with a dual attribute-aware ranking network. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 1062–1070. [Google Scholar]
- Wang, H.; Wang, Y.; Zhou, Z.; Ji, X.; Gong, D.; Zhou, J.; Li, Z.; Liu, W. Cosface: Large margin cosine loss for deep face recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 5265–5274. [Google Scholar]
- Chopra, S.; Hadsell, R.; LeCun, Y. Learning a similarity metric discriminatively, with application to face verification. In Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), San Diego, CA, USA, 20–25 June 2005; pp. 539–546. [Google Scholar]
- Hadsell, R.; Chopra, S.; LeCun, Y. Dimensionality reduction by learning an invariant mapping. In Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06), New York, NY, USA, 17–22 June 2006; Volume 2, pp. 1735–1742. [Google Scholar]
- Rao, Y.; Lu, J.; Zhou, J. Learning Discriminative Aggregation Network for Video-Based Face Recognition and Person Re-identification. Int. J. Comput. Vis. 2019, 127, 701–718. [Google Scholar] [CrossRef]
- Wang, J.; Song, Y.; Leung, T.; Rosenberg, C.; Wang, J.; Philbin, J.; Chen, B.; Wu, Y. Learning fine-grained image similarity with deep ranking. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 1386–1393. [Google Scholar]
- Liu, W.; Wen, Y.; Yu, Z.; Li, M.; Raj, B.; Song, L. Sphereface: Deep hypersphere embedding for face recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 212–220. [Google Scholar]
- Wen, Y.; Zhang, K.; Li, Z.; Qiao, Y. A discriminative feature learning approach for deep face recognition. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 11–14 October 2016; pp. 499–515. [Google Scholar]
- Liu, W.; Wen, Y.; Yu, Z.; Yang, M. Large-margin softmax loss for convolutional neural networks. ICML 2016, 2, 7. [Google Scholar]
- Wang, F.; Cheng, J.; Liu, W.; Liu, H. Additive margin softmax for face verification. IEEE Signal Process. Lett. 2018, 25, 926–930. [Google Scholar] [CrossRef] [Green Version]
- Deng, J.; Guo, J.; Xue, N.; Zafeiriou, S. Arcface: Additive angular margin loss for deep face recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–20 June 2019; pp. 4690–4699. [Google Scholar]
- Hoffer, E.; Ailon, N. Deep metric learning using triplet network. In International Workshop on Similarity-Based Pattern Recognition; Springer: Cham, Switzerland, 2015; pp. 84–92. [Google Scholar]
- Ouahabi, A.; Taleb-Ahmed, A. Deep learning for real-time semantic segmentation: Application in ultrasound imaging. Pattern Recognit. Lett. 2021, 144, 27–34. [Google Scholar] [CrossRef]
- Xuan, H.; Souvenir, R.; Pless, R. Deep randomized ensembles for metric learning. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 723–734. [Google Scholar]
- Shen, Y.; Xiao, T.; Li, H.; Yi, S.; Wang, X. End-to-end deep kronecker-product matching for person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 6886–6895. [Google Scholar]
- Su, H.; Wang, P.; Liu, L.; Li, H.; Li, Z.; Zhang, Y. Where to Look and How to Describe: Fashion Image Retrieval with an Attentional Heterogeneous Bilinear Network. IEEE Trans. Circuits Syst. Video Technol. 2020, 31, 3254–3265. [Google Scholar] [CrossRef]
- Verma, S.; An, S.; Arora, C.; Rai, A. Diversity in fashion recommendation using semantic parsing. In Proceedings of the 2018 25th IEEE International Conference on Image Processing (ICIP), Athens, Greece, 7–10 October 2018; pp. 500–504. [Google Scholar]
- Lasserre, J.; Rasch, K.; Vollgraf, R. Studio2shop: From studio photo shoots to fashion articles. arXiv 2018, arXiv:1807.00556. [Google Scholar]
Dataset | |||
---|---|---|---|
DARN | DeepFashion: Consumer-to-Shop | DeepFashion: InShop | |
Distinct Training Products | 10,979 | 15,898 | 3997 |
Training Street Photos | 50,528 | 98,768 | - |
Training Shop Photos | 32,194 | 98,768 | 25,882 |
Number of positive pairs | 50,528 | 98,768 | 13,528 |
Number of negative pairs | 252,640 | 493,840 | 67,640 |
Distinct Validation Products | 9635 | 8076 | - |
Validation Street Photos | 6318 | 48,917 | - |
Validation Shop Photos | 23,828 | 48,917 | - |
Distinct Test Products | 9636 | 8077 | 3985 |
Test Street Photos | 5966 | 47,734 | - |
Test Shop Photos | 23,773 | 47,734 | 26,830 |
Accuracy | |||
---|---|---|---|
Method | Top 1 | Top 20 | Top 50 |
FashionNet [16] | 0.073 | 0.188 | 0.228 |
Triplet [8] | 0.109 | 0.378 | 0.499 |
VAM+ImgDrop [8] | 0.137 | 0.439 | 0.569 |
DREML [30] | 0.186 | 0.510 | 0.591 |
KPM [31] | 0.213 | 0.541 | 0.652 |
AHBN [32] | - | 0.603 | - |
GRNet [11] | 0.257 | 0.644 | 0.750 |
DML | 0.236 | 0.624 | 0.759 |
Accuracy | |||
---|---|---|---|
Method | Top 1 | Top 20 | Top 50 |
FashionNet [16] | 0.529 | 0.764 | 0.796 |
VAM [8] | 0.669 | 0.892 | 0.945 |
DARN [6] | 0.382 | 0.675 | 0.717 |
Diversity Fashion [33] | - | 0.784 | - |
Studio2Shop [34] | - | 0.818 | - |
GoogleNet [8] | 0.554 | 0.823 | 0.877 |
DML | 0.712 | 0.875 | 0.921 |
Accuracy | |||
---|---|---|---|
Dataset | DeepFashion | DARN | |
Loss | |||
Norm-Softmax | 0.32 | 0.46 | |
SphereFace (m = 1.35) | 0.55 | 0.59 | |
ArcFace (m = 0.50) | 0.57 | 0.61 | |
CosFace (m = 0.35) | 0.58 | 0.64 | |
DML | 0.62 | 0.73 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Alirezazadeh, P.; Dornaika, F.; Moujahid, A. Deep Learning with Discriminative Margin Loss for Cross-Domain Consumer-to-Shop Clothes Retrieval. Sensors 2022, 22, 2660. https://doi.org/10.3390/s22072660
Alirezazadeh P, Dornaika F, Moujahid A. Deep Learning with Discriminative Margin Loss for Cross-Domain Consumer-to-Shop Clothes Retrieval. Sensors. 2022; 22(7):2660. https://doi.org/10.3390/s22072660
Chicago/Turabian StyleAlirezazadeh, Pendar, Fadi Dornaika, and Abdelmalik Moujahid. 2022. "Deep Learning with Discriminative Margin Loss for Cross-Domain Consumer-to-Shop Clothes Retrieval" Sensors 22, no. 7: 2660. https://doi.org/10.3390/s22072660
APA StyleAlirezazadeh, P., Dornaika, F., & Moujahid, A. (2022). Deep Learning with Discriminative Margin Loss for Cross-Domain Consumer-to-Shop Clothes Retrieval. Sensors, 22(7), 2660. https://doi.org/10.3390/s22072660