Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3394171.3416289acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
short-paper

Attention Based Beauty Product Retrieval Using Global and Local Descriptors

Published: 12 October 2020 Publication History

Abstract

Beauty product retrieval has drawn more and more attention for its wide application outlook and enormous economic benefits. However, this task is always challenging due to the variation of products, especially the disturbance of clustered background. In this paper, we first introduce attention mechanism into a global image descriptor, i.e., Maximum Activation of Convolutions (MAC), and propose Attention-based MAC (AMAC). With this enhancement, we can suppress the negative effect of background and highlight the foreground in an unsupervised manner. Then, AMAC and local descriptors are ensembled to complementarily increase the performance. Furthermore, we try to finetune multiple retrieval methods on the different datasets and adopt a query expansion strategy to obtain more improvements. Extensive experiments conducted on a dataset containing more the half million beauty products (Perfect-500K) demonstrate the effectiveness of the proposed method. Finally, our team (USTC-NELSLIP) wins the first place on the leaderboard of the 'AI Meets Beauty'Grand Challenge of ACM Multimedia 2020. The code is available at: https://github.com/gniknoil/Perfect500K-Beauty-Product-Retrieval-Challenge.

Supplementary Material

MP4 File (3394171.3416289.mp4)
We give a brief description of our proposed method in the challenge AI Meets Beauty. The challenge asks us to find the most similar images to a given candidate image in the database. However, due to the variations of the candidate images and images in the database, the performance of many popular image descriptors degrade greatly. To overcome this difficulty, we introduce attention mechanism into local and global descriptors (i.e. MAC and RMAC separately) to enhance their competence to resist the disturbance and propose new descriptors (i.e. AMAC and GRMAC separately). Our newly proposed descriptors performs better in this task compared to the original ones. Furthermore, we comprehensively fuse both global and local descriptors to gain more improvements. Extensive experiments conducted on the dataset Perfect-500k and validate the effectiveness of our method. Finally, our team(USTC-NELSLIP) outperforms all the competitors and win the 1st place in the challenge of the AI Meets Beauty of ACM MM 2020.

References

[1]
Ondvr ej Chum, Andrej Mikulik, Michal Perdoch, and Jivri Matas. 2011. Total recall II: Query expansion revisited. In CVPR 2011. IEEE, 889--896.
[2]
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition. Ieee, 248--255.
[3]
Yang Li, Yulong Xu, Jiabao Wang, Zhuang Miao, and Yafei Zhang. 2017. Ms-rmac: Multiscale regional maximum activation of convolutions for image retrieval. In IEEE Signal Processing Letters, Vol. 24. IEEE, 609--613.
[4]
Jian Han Lim, Nurul Japar, Chun Chet Ng, and Chee Seng Chan. 2018. Unprecedented Usage of Pre-Trained CNNs on Beauty Product. In Proceedings of the 26th ACM International Conference on Multimedia (Seoul, Republic of Korea) (MM '18). Association for Computing Machinery, New York, NY, USA, 2068--2072. https://doi.org/10.1145/3240508.3266433
[5]
Zehang Lin, Zhenguo Yang, Feitao Huang, and Junhong Chen. 2018. Regional maximum activations of convolutions with attention for cross-domain beauty and personal care product retrieval. In 2018 ACM Multimedia Conference on Multimedia Conference. ACM, 2073--2077.
[6]
David G Lowe. 2004. Distinctive image features from scale-invariant keypoints. In International journal of computer vision, Vol. 60. Springer, 91-110.
[7]
Maxime Oquab, Leon Bottou, Ivan Laptev, and Josef Sivic. 2014. Learning and transferring mid-level image representations using convolutional neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1717--1724.
[8]
Ali Sharif Razavian, Hossein Azizpour, Josephine Sullivan, and Stefan Carlsson. 2014. CNN features off-the-shelf: an astounding baseline for recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops. 806--813.
[9]
Josef Sivic and Andrew Zisserman. 2003. Video Google: A text retrieval approach to object matching in videos. In Proceedings Ninth IEEE International Conference on Computer Vision, Vol. 2. IEEE, 1470-1477.
[10]
Giorgos Tolias, Ronan Sicre, and Hervé Jégou. 2016. Particular object retrieval with integral max-pooling of CNN activations. In Proceedings of the International Conference on Learning Representations.
[11]
Antonio Torralba, Kevin P Murphy, William T Freeman, and Mark A Rubin. 2003. Context-based vision system for place and object recognition. In Proceedings Ninth IEEE International Conference on Computer Vision, Vol. 1. 273-280.
[12]
Qi Wang, Jingxiang Lai, Kai Xu, Wenyin Liu, and Liang Lei. 2018. Beauty Product Image Retrieval Based on Multi-Feature Fusion and Feature Aggregation. In Proceedings of the 26th ACM International Conference on Multimedia (Seoul, Republic of Korea) (MM '18). Association for Computing Machinery, New York, NY, USA, 2063--2067. https://doi.org/10.1145/3240508.3266431
[13]
Xiu-Shen Wei, Jian-Hao Luo, Jianxin Wu, and Zhi-Hua Zhou. 2017. Selective convolutional descriptor aggregation for fine-grained image retrieval. IEEE Transactions on Image Processing, Vol. 26, 6 (2017), 2868--2881.
[14]
Si Liu Jianlong Fu Jiaying Liu Shintami Chusnul Hidayati Johnny Tseng Wen-Huang Cheng, Jia Jia and Jau Huang. 2020. Perfect Corp. Challenge 2020: Half Million Beauty Product Image Recognition. colorblue https://challenge2020.perfectcorp.com/.
[15]
Saining Xie, Ross Girshick, Piotr Dollár, Zhuowen Tu, and Kaiming He. 2017. Aggregated residual transformations for deep neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1492--1500.
[16]
A. B. Yandex and V. Lempitsky. 2015. Aggregating Local Deep Features for Image Retrieval. In 2015 IEEE International Conference on Computer Vision. 1269--1277.
[17]
Jun Yu, Guochen Xie, Mengyan Li, Haonian Xie, and Lingyun Yu. 2019. Beauty Product Retrieval Based on Regional Maximum Activation of Convolutions with Generalized Attention. In Proceedings of the 27th ACM International Conference on Multimedia (Nice, France) (MM '19). Association for Computing Machinery, New York, NY, USA, 2553--2557. https://doi.org/10.1145/3343031.3356065

Cited By

View all
  • (2023)Neural Image Popularity Assessment with Retrieval-augmented TransformerProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3611918(2427-2436)Online publication date: 26-Oct-2023

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
MM '20: Proceedings of the 28th ACM International Conference on Multimedia
October 2020
4889 pages
ISBN:9781450379885
DOI:10.1145/3394171
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 October 2020

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. attention mechanism
  2. feature fusion
  3. image retrieval

Qualifiers

  • Short-paper

Funding Sources

  • USTC Research Funds of the Double First-Class Initiative
  • National Nature Science Foundation of China

Conference

MM '20
Sponsor:

Acceptance Rates

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)11
  • Downloads (Last 6 weeks)2
Reflects downloads up to 10 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Neural Image Popularity Assessment with Retrieval-augmented TransformerProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3611918(2427-2436)Online publication date: 26-Oct-2023

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media