research-article

Beauty Product Image Retrieval Based on Multi-Feature Fusion and Feature Aggregation

Authors:

Liang LeiAuthors Info & Claims

MM '18: Proceedings of the 26th ACM international conference on Multimedia

Pages 2063 - 2067

https://doi.org/10.1145/3240508.3266431

Published: 15 October 2018 Publication History

Abstract

We propose a beauty product image retrieval method based on multi-feature fusion and feature aggregation. The key idea is representing the image with the feature vector obtained by multi-feature fusion and feature aggregation. VGG16 and ResNet50 are chosen to extract image features, and Crow is adopted to perform deep feature aggregation. Benefited from the idea of transfer learning, we fine turn VGG16 on the Perfect-500K data set to improve the performance of image retrieval. The proposed method won the third price in Perfect Corp. Challenge 2018 with the best result 0.270676 mAP. We released our code on GitHub: https://github.com/wangqi12332155/ACMMM-beauty-AI-challenge.

References

[1]

Wen-Huang Cheng, Jia Jia, Si Liu, etc. 2018. Perfect Corp. Challenge 2018: Half Million Beauty Product Image Recognition. In https://challenge2018.perfectcorp.com/index.html.

[2]

Simonyan, K., & Zisserman, A. 2014. Very deep convolutional networks for large-scale image recognition. In Computer Science,

[3]

Datar, M., Immorlica, N., Indyk, P., & Mirrokni, V. S. 2004. Locality-sensitive hashing scheme based on p-stable distributions. In ACM, Twentieth Symposium on Computational Geometry, Vol.34, 253--262.

Digital Library

[4]

He, K., Zhang, X., Ren, S., & Sun, J. 2016. Deep Residual Learning for Image Recognition. In IEEE Conference on Computer Vision and Pattern Recognition, 770--778.

[5]

Kalantidis, Y., Mellina, C., & Osindero, S. 2016. Cross-Dimensional Weighting for Aggregated Deep Convolutional Features. In Springer, Cham, European Conference on Computer Vision, 685--701.

[6]

Gordo, A., Almazán, J., Revaud, J., & Larlus, D. 2016. Deep Image Retrieval: Learning Global Representations for Image Search. In Springer, Cham, European Conference on Computer Vision, 241--257.

[7]

Gordo, A., Almazán, J., Revaud, J., & Larlus, D. 2016. End-to-end learning of deep visual representations for image retrieval. In International Journal of Computer Vision, 1--18.

Digital Library

[8]

Sivic, J. 2003. A Text Retrieval Approach to Object Matching in Videos. In Proc. of IEEE International Conference on Computer Vision.

Digital Library

[9]

Nister, D., & Stewenius, H. 2006. Scalable recognition with a vocabulary tree. In Computer Vision and Pattern Recognition, 2(10), 2161--2168.

Digital Library

[10]

Philbin, J., Chum, O., Isard, M., Sivic, J., & Zisserman, A. 2007. Object retrieval with large vocabularies and fast spatial matching. In Computer Vision and Pattern Recognition.

[11]

Jegou, H., Douze, M., & Schmid, C. 2008. Hamming Embedding and Weak Geometric Consistency for Large Scale Image Search. In European Conference on Computer Vision, Vol.5302, 304--317.

Digital Library

[12]

Jégou, H., Douze, M., & Schmid, C. 2010. Improving bag-of-features for large scale image search. In International Journal of Computer Vision, 87(3), 316--336.

Digital Library

[13]

Tolias, G., & Avrithis, Y. 2016. Erratum to: image search with selective match kernels: aggregation across single and multiple images.In International Journal of Computer Vision, 116(3), 262--262.

Digital Library

[14]

Perronnin, F., & Dance, C. 2007. Fisher Kernels on Visual Vocabularies for Image Categorization. In IEEE Conference on Computer Vision and Pattern Recognition, 1--8.

[15]

Perronnin, F., Liu, Y., Sanchez, J., & Poirier, H. 2010. Large-scale image retrieval with compressed Fisher vectors. In Computer Vision and Pattern Recognition, Vol.26, 3384--3391.

[16]

Jégou, H., Douze, M., Schmid, C., & Pérez, P. 2010. Aggregating local descriptors into a compact image representation. In Computer Vision and Pattern Recognition, Vol.238, 3304--3311.

[17]

Krizhevsky, A., Sutskever, I., & Hinton, G. E. 2012. ImageNet classification with deep convolutional neural networks. In International Conference on Neural Information Processing Systems, Vol.60, 1097--1105.

Digital Library

[18]

Razavian, A. S., Azizpour, H., Sullivan, J., & Carlsson, S. 2014. CNN features off-the-shelf : an astounding baseline for recognition. In Computer Vision and Pattern Recognition Workshops, pp.512--519.

Digital Library

[19]

Babenko, A., & Lempitsky, V. 2015. Aggregating deep convolutional features for image retrieval. In Computer Science.

[20]

Babenko, A., Slesarev, A., Chigorin, A., & Lempitsky, V. 2014. Neural codes for image retrieval. In European conference on computer vision, 8689, 584--599.

[21]

Gordo, A., Almazán, J., Revaud, J., & Larlus, D. 2016. Deep Image Retrieval: Learning Global Representations for Image Search. In European Conference on Computer Vision, 241--257.

[22]

Radenovic, F., Tolias, G., & Chum, O. 2016. CNN Image Retrieval Learns from BoW: Unsupervised Fine-Tuning with Hard Examples. In European Conference on Computer Vision, 3--20.

Cited By

Zhou HZhao HWang QHao GLei L(2023)Miper-MVS: Multi-scale iterative probability estimation with refinement for efficient multi-view stereoNeural Networks10.1016/j.neunet.2023.03.012162(502-515)Online publication date: May-2023
https://doi.org/10.1016/j.neunet.2023.03.012
Wang QDeng HWu XYang ZLiu YWang YHao G(2023)LCM-Captioner: A lightweight text-based image captioning method with collaborative mechanism between vision and textNeural Networks10.1016/j.neunet.2023.03.010162(318-329)Online publication date: May-2023
https://doi.org/10.1016/j.neunet.2023.03.010
Sun YChen YWu PWang XWang Q(2023)DRL: Dynamic rebalance learning for adversarial robustness of UAV with long-tailed distributionComputer Communications10.1016/j.comcom.2023.04.002205(14-23)Online publication date: May-2023
https://doi.org/10.1016/j.comcom.2023.04.002
Show More Cited By

Recommendations

DCAFuse: Dual-Branch Diffusion-CNN Complementary Feature Aggregation Network for Multi-Modality Image Fusion
MM '24: Proceedings of the 32nd ACM International Conference on Multimedia

Multi-modality image fusion (MMIF) aims to integrate the complementary features of source images into the fused image, including target saliency and texture specifics. Recently, image fusion methods leveraging diffusion models have demonstrated ...
Image Retrieval Based on Multi-feature Fusion
IMCCC '14: Proceedings of the 2014 Fourth International Conference on Instrumentation and Measurement, Computer, Communication and Control

In content-based image retrieval, and for this critical issue of image feature fusion, paper proposes a new method to determine the weights for multi-feature fusion. In this paper, color histogram, color correlogram, gray level co-occurrence matrix, ...
Series feature aggregation for content-based image retrieval

Feature aggregation is a critical technique in content-based image retrieval (CBIR) systems that employs multiple visual features to characterize image content. Most previous feature aggregation schemes apply parallel topology, e.g., the linear ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MM '18: Proceedings of the 26th ACM international conference on Multimedia

October 2018

2167 pages

ISBN:9781450356657

DOI:10.1145/3240508

General Chairs:
Susanne Boll
University of Oldenburg, Germany
,
Kyoung Mu Lee
Seoul National University, Korea
,
Jiebo Luo
University of Rochester, USA
,
Wenwu Zhu
Tsinghua University, China
,
Program Chairs:
Hyeran Byun
Yonsei University, Korea
,
Chang Wen Chen
State Univ. Of New York at Buffalo, USA
,
Rainer Lienhart
University of Augsburg, Germany
,
Tao Mei
JD AI, China

Copyright © 2018 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 October 2018

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

the National Natural Science Foundation of China
the Guangdong Innovative Research Team Program

Conference

MM '18

Sponsor:

SIGMM

MM '18: ACM Multimedia Conference

October 22 - 26, 2018

Seoul, Republic of Korea

Acceptance Rates

MM '18 Paper Acceptance Rate 209 of 757 submissions, 28%;

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

10
Total Citations
View Citations
353
Total Downloads

Downloads (Last 12 months)16
Downloads (Last 6 weeks)2

Reflects downloads up to 01 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Zhou HZhao HWang QHao GLei L(2023)Miper-MVS: Multi-scale iterative probability estimation with refinement for efficient multi-view stereoNeural Networks10.1016/j.neunet.2023.03.012162(502-515)Online publication date: May-2023
https://doi.org/10.1016/j.neunet.2023.03.012
Wang QDeng HWu XYang ZLiu YWang YHao G(2023)LCM-Captioner: A lightweight text-based image captioning method with collaborative mechanism between vision and textNeural Networks10.1016/j.neunet.2023.03.010162(318-329)Online publication date: May-2023
https://doi.org/10.1016/j.neunet.2023.03.010
Sun YChen YWu PWang XWang Q(2023)DRL: Dynamic rebalance learning for adversarial robustness of UAV with long-tailed distributionComputer Communications10.1016/j.comcom.2023.04.002205(14-23)Online publication date: May-2023
https://doi.org/10.1016/j.comcom.2023.04.002
Bodapati JShaik NNaralasetti V(2021)Deep convolution feature aggregation: an application to diabetic retinopathy severity level predictionSignal, Image and Video Processing10.1007/s11760-020-01816-yOnline publication date: 4-Jan-2021
https://doi.org/10.1007/s11760-020-01816-y
Yu JXie GLi MXie HHao XGao FShuang FWen Chen CCucchiara RHua XQi GRicci EZhang ZZimmermann R(2020)Attention Based Beauty Product Retrieval Using Global and Local DescriptorsProceedings of the 28th ACM International Conference on Multimedia10.1145/3394171.3416289(4708-4712)Online publication date: 12-Oct-2020
https://dl.acm.org/doi/10.1145/3394171.3416289
Vu TDang AWang JWen Chen CCucchiara RHua XQi GRicci EZhang ZZimmermann R(2020)Learning to Remember Beauty ProductsProceedings of the 28th ACM International Conference on Multimedia10.1145/3394171.3416281(4728-4732)Online publication date: 12-Oct-2020
https://dl.acm.org/doi/10.1145/3394171.3416281
Hou JJi SWang AWen Chen CCucchiara RHua XQi GRicci EZhang ZZimmermann R(2020)Attention-driven Unsupervised Image Retrieval for Beauty Products with Visual and Textual CluesProceedings of the 28th ACM International Conference on Multimedia10.1145/3394171.3416271(4718-4722)Online publication date: 12-Oct-2020
https://dl.acm.org/doi/10.1145/3394171.3416271
Wang ZLiu XLin JYang CLi H(2020)Multi-attention based cross-domain beauty product image retrievalScience China Information Sciences10.1007/s11432-019-2721-063:2Online publication date: 14-Jan-2020
https://doi.org/10.1007/s11432-019-2721-0
Wang JZhu SXu JCao DAmsaleg LHuet BLarson MGravier GHung HNgo CTsang Ooi W(2019)The Retrieval of the BeautifulProceedings of the 27th ACM International Conference on Multimedia10.1145/3343031.3356059(2548-2552)Online publication date: 15-Oct-2019
https://dl.acm.org/doi/10.1145/3343031.3356059
Wang QLai JYang ZXu KKan PLiu WLei L(2019)Improving cross-dimensional weighting pooling with multi-scale feature fusion for image retrievalNeurocomputing10.1016/j.neucom.2019.08.025363:C(17-26)Online publication date: 21-Oct-2019
https://dl.acm.org/doi/10.1016/j.neucom.2019.08.025

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents