Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3503161.3548032acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article
Open access

HyP2 Loss: Beyond Hypersphere Metric Space for Multi-label Image Retrieval

Published: 10 October 2022 Publication History

Abstract

Image retrieval has become an increasingly appealing technique with broad multimedia application prospects, where deep hashing serves as the dominant branch towards low storage and efficient retrieval. In this paper, we carried out in-depth investigations on metric learning in deep hashing for establishing a powerful metric space in multi-label scenarios, where the pair loss suffers high computational overhead and converge difficulty, while the proxy loss is theoretically incapable of expressing the profound label dependencies and exhibits conflicts in the constructed hypersphere space. To address the problems, we propose a novel metric learning framework with Hybrid Proxy-Pair Loss (HyP$^2$ Loss) that constructs an expressive metric space with efficient training complexity w.r.t. the whole dataset. The proposed HyP$^2$ Loss focuses on optimizing the hypersphere space by learnable proxies and excavating data-to-data correlations of irrelevant pairs, which integrates sufficient data correspondence of pair-based methods and high-efficiency of proxy-based methods. Extensive experiments on four standard multi-label benchmarks justify the proposed method outperforms the state-of-the-art, is robust among different hash bits and achieves significant performance gains with a faster, more stable convergence speed. Our code is available at https://github.com/JerryXu0129/HyP2-Loss.

Supplementary Material

MP4 File (MM22-fp1188.mp4)
Compared with the common single-label image retrieval, the multi-label image retrieval task is more challenging as the image features are more complex and image embedding in metric space is higher required. Pair-based methods are most commonly used in image retrieval tasks. However, such approaches suffer high computational consumption and converge difficulty, especially are more serious and inevitable in multi-label scenarios. Proxy-based methods are proposed to improve model robustness with efficient training complexity in single-label scenarios. However, they are also disqualified in multi-label tasks for some reason. In this paper, we theoretically analyze the primary reasons that proxy-based methods are disqualified for multi-label retrieval. Then we propose the novel HyP2 Loss, which introduces a crucial constraint term of irrelevant samples on the basis of the proxy loss, while preserving the efficient training complexity, which compensates for the limitation of the hypersphere metric space.

References

[1]
Nicolas Aziere and Sinisa Todorovic. 2019. Ensemble Deep Manifold Similarity Learning Using Hard Proxies. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR. Computer Vision Foundation / IEEE, 7299--7307.
[2]
Léon Bottou. 2010. Large-Scale Machine Learning with Stochastic Gradient Descent. In 19th International Conference on Computational Statistics, COMPSTAT 2010, Paris, France, August 22--27, 2010 - Keynote, Invited and Contributed Papers. Physica-Verlag, 177--186.
[3]
Jane Bromley, Isabelle Guyon, Yann LeCun, Eduard Säckinger, and Roopak Shah. 1993. Signature Verification Using a Siamese Time Delay Neural Network. In Advances in Neural Information Processing Systems 6, [7th NIPS Conference, Denver, Colorado, USA, 1993]. Morgan Kaufmann, 737--744.
[4]
Yue Cao, Mingsheng Long, JianminWang, Han Zhu, and QingfuWen. 2016. Deep Quantization Network for Efficient Image Retrieval. In Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, Dale Schuurmans and Michael P. Wellman (Eds.). AAAI Press, 3457--3463.
[5]
Zhangjie Cao, Mingsheng Long, Jianmin Wang, and Philip S. Yu. 2017. HashNet: Deep Learning to Hash by Continuation. In IEEE International Conference on Computer Vision, ICCV. IEEE Computer Society, 5609--5618.
[6]
Wei Chen, Yu Liu,WeipingWang, Erwin M. Bakker, Theodoros Georgiou, PaulW. Fieguth, Li Liu, and Michael S. Lew. 2021. Deep Image Retrieval: A Survey. arXiv preprint:abs/2101.11282 (2021).
[7]
Sumit Chopra, Raia Hadsell, and Yann LeCun. 2005. Learning a Similarity Metric Discriminatively, with Application to Face Verification. In 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. IEEE Computer Society, 539--546.
[8]
Tat-Seng Chua, Jinhui Tang, Richang Hong, Haojie Li, Zhiping Luo, and Yantao Zheng. 2009. NUS-WIDE: a real-world web image database from National University of Singapore. In Proceedings of the 8th ACM International Conference on Image and Video Retrieval, CIVR, Stéphane Marchand-Maillet and Yiannis Kompatsiaris (Eds.). ACM.
[9]
Hui Cui, Lei Zhu, Jingjing Li, Zhiyong Cheng, and Zheng Zhang. 2021. Twopronged Strategy: Lightweight Augmented Graph Network Hashing for Scalable Image Retrieval. In MM '21: ACM Multimedia Conference, Heng Tao Shen, Yueting Zhuang, John R. Smith, Yang Yang, Pablo Cesar, Florian Metze, and Balakrishnan Prabhakaran (Eds.). ACM, 1432--1440.
[10]
Ritendra Datta, Dhiraj Joshi, Jia Li, and James Ze Wang. 2008. Image retrieval: Ideas, influences, and trends of the new age. ACM Comput. Surv. 40, 2 (2008), 5:1--5:60.
[11]
Mark Everingham, Luc Van Gool, Christopher K. I. Williams, John M. Winn, and Andrew Zisserman. 2010. The Pascal Visual Object Classes (VOC) Challenge. Int. J. Comput. Vis. 88, 2 (2010), 303--338.
[12]
Lixin Fan, KamWoh Ng, Ce Ju, Tianyu Zhang, and Chee Seng Chan. 2020. Deep Polarized Network for Supervised Learning of Accurate Binary Hashing Codes. In Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, IJCAI 2020, Christian Bessiere (Ed.). ijcai.org, 825--831.
[13]
Fangxiang Feng, Tianrui Niu, Ruifan Li, Xiaojie Wang, and Huixing Jiang. 2020. Learning Visual Features from Product Title for Image Retrieval. In MM '20: The 28th ACM International Conference on Multimedia, Chang Wen Chen, Rita Cucchiara, Xian-Sheng Hua, Guo-Jun Qi, Elisa Ricci, Zhengyou Zhang, and Roger Zimmermann (Eds.). ACM, 4723--4727.
[14]
Yunchao Gong, Svetlana Lazebnik, Albert Gordo, and Florent Perronnin. 2013. Iterative Quantization: A Procrustean Approach to Learning Binary Codes for Large-Scale Image Retrieval. IEEE Trans. Pattern Anal. Mach. Intell. 35, 12 (2013), 2916--2929.
[15]
Raia Hadsell, Sumit Chopra, and Yann LeCun. 2006. Dimensionality Reduction by Learning an Invariant Mapping. In 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. IEEE Computer Society, 1735--1742.
[16]
Ben Harwood, Vijay Kumar B. G, Gustavo Carneiro, Ian D. Reid, and Tom Drummond. 2017. Smart Mining for Deep Metric Learning. In IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22--29, 2017. IEEE Computer Society, 2840--2848.
[17]
Jiun Tian Hoe, KamWoh Ng, Tianyu Zhang, Chee Seng Chan, Yi-Zhe Song, and Tao Xiang. 2021. One Loss for All: Deep Hashing with a Single Cosine Similarity based Learning Objective. CoRR abs/2109.14449 (2021).
[18]
Chang-Qin Huang, Shang-Ming Yang, Yan Pan, and Hanjiang Lai. 2018. Object-Location-Aware Hashing for Multi-Label Image Retrieval via Automatic Mask Learning. IEEE Trans. Image Process. 27, 9 (2018), 4490--4502.
[19]
Mark J. Huiskes and Michael S. Lew. 2008. The MIR flickr retrieval evaluation. In Proceedings of the 1st ACM SIGMM International Conference on Multimedia Information Retrieval, MIR, Michael S. Lew, Alberto Del Bimbo, and Erwin M. Bakker (Eds.). ACM, 39--43.
[20]
Young Kyun Jang, Geonmo Gu, ByungSoo Ko, and Nam Ik Cho. 2021. Self-Distilled Hashing for Deep Image Retrieval. CoRR abs/2112.08816 (2021).
[21]
Sungyeon Kim, Dongwon Kim, Minsu Cho, and Suha Kwak. 2020. Proxy Anchor Loss for Deep Metric Learning. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR. IEEE, 3235--3244.
[22]
Alex Krizhevsky, Geoffrey Hinton, et al. 2009. Learning multiple layers of features from tiny images. (2009).
[23]
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2017. ImageNet classification with deep convolutional neural networks. Commun. ACM 60, 6 (2017), 84--90.
[24]
Hanjiang Lai, Yan Pan, Ye Liu, and Shuicheng Yan. 2015. Simultaneous feature learning and hash coding with deep neural networks. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR. IEEE Computer Society, 3270--3278.
[25]
Hanjiang Lai, Pan Yan, Xiangbo Shu, Yunchao Wei, and Shuicheng Yan. 2016. Instance-Aware Hashing for Multi-Label Image Retrieval. IEEE Trans. Image Process. 25, 6 (2016), 2469--2479.
[26]
Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. 1998. Gradientbased learning applied to document recognition. Proc. IEEE 86, 11 (1998), 2278-- 2324.
[27]
Wu-Jun Li, Sheng Wang, and Wang-Cheng Kang. 2016. Feature Learning Based Deep Supervised Hashing with Pairwise Labels. In Proceedings of the Twenty- Fifth International Joint Conference on Artificial Intelligence, IJCAI, Subbarao Kambhampati (Ed.). IJCAI/AAAI Press, 1711--1717.
[28]
Ying Li, Hongwei Zhou, Yeyu Yin, and Jiaquan Gao. 2021. Multi-label Pattern Image Retrieval via Attention Mechanism Driven Graph Convolutional Network. In MM '21: ACM Multimedia Conference, Heng Tao Shen, Yueting Zhuang, John R. Smith, Yang Yang, Pablo Cesar, Florian Metze, and Balakrishnan Prabhakaran (Eds.). ACM, 300--308.
[29]
Kevin Lin, Huei-Fang Yang, Jen-Hao Hsiao, and Chu-Song Chen. 2015. Deep learning of binary hash codes for fast image retrieval. In 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops, CVPR Workshops. IEEE Computer Society, 27--35.
[30]
Haomiao Liu, Ruiping Wang, Shiguang Shan, and Xilin Chen. 2019. Deep Supervised Hashing for Fast Image Retrieval. Int. J. Comput. Vis. 127, 9 (2019), 1217--1234.
[31]
Wei Liu, Jun Wang, Rongrong Ji, Yu-Gang Jiang, and Shih-Fu Chang. 2012. Supervised hashing with kernels. In 2012 IEEE Conference on Computer Vision and Pattern Recognition. IEEE Computer Society, 2074--2081.
[32]
Cheng Ma, Jiwen Lu, and Jie Zhou. 2021. Rank-Consistency Deep Hashing for Scalable Multi-Label Image Search. IEEE Trans. Multim. 23 (2021), 3943--3956.
[33]
Cheng Ma, Jiwen Lu, and Jie Zhou. 2021. Rank-Consistency Deep Hashing for Scalable Multi-Label Image Search. IEEE Trans. Multim. 23 (2021), 3943--3956.
[34]
Yair Movshovitz, Alexander Toshev, Thomas K. Leung, Sergey Ioffe, and Saurabh Singh. 2017. No Fuss Distance Metric Learning Using Proxies. In IEEE International Conference on Computer Vision, ICCV. IEEE Computer Society, 360--368.
[35]
Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Köpf, Edward Z. Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. 2019. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS. 8024--8035.
[36]
Qi Qian, Lei Shang, Baigui Sun, Juhua Hu, Tacoma Tacoma, Hao Li, and Rong Jin. 2019. SoftTriple Loss: Deep Metric Learning Without Triplet Sampling. In 2019 IEEE/CVF International Conference on Computer Vision, ICCV 2019. IEEE, 6449--6457.
[37]
Josiane Rodrigues, Marco Cristo, and Juan G Colonna. 2020. Deep hashing for multi-label image retrieval: a survey. Artificial Intelligence Review 53, 7 (2020), 5261--5307.
[38]
David E Rumelhart, Geoffrey E Hinton, and Ronald J Williams. 1986. Learning representations by back-propagating errors. nature 323, 6088 (1986), 533--536.
[39]
Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael S. Bernstein, Alexander C. Berg, and Fei-Fei Li. 2015. ImageNet Large Scale Visual Recognition Challenge. Int. J. Comput. Vis. 115, 3 (2015), 211--252.
[40]
Florian Schroff, Dmitry Kalenichenko, and James Philbin. 2015. FaceNet: A unified embedding for face recognition and clustering. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR. IEEE Computer Society, 815--823.
[41]
Fumin Shen, Chunhua Shen, Wei Liu, and Heng Tao Shen. 2015. Supervised Discrete Hashing. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR. IEEE Computer Society, 37--45.
[42]
Kihyuk Sohn. 2016. Improved Deep Metric Learning with Multi-class N-pair Loss Objective. In Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems. 1849--1857.
[43]
Hyun Oh Song, Yu Xiang, Stefanie Jegelka, and Silvio Savarese. 2016. Deep Metric Learning via Lifted Structured Feature Embedding. In 2016 IEEE Conference on Computer Vision and Pattern Recognition. IEEE Computer Society, 4004--4012.
[44]
Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott E. Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2015. Going deeper with convolutions. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR. IEEE Computer Society, 1--9.
[45]
Rong-Cheng Tu, Xian-Ling Mao, Cihang Kong, Zihang Shao, Ze-Lin Li, Wei Wei, and Heyan Huang. 2021. Weighted Gaussian Loss based Hamming Hashing. In MM '21: ACM Multimedia Conference, Heng Tao Shen, Yueting Zhuang, John R. Smith, Yang Yang, Pablo Cesar, Florian Metze, and Balakrishnan Prabhakaran (Eds.). ACM, 3409--3417.
[46]
Laurens Van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. Journal of machine learning research 9, 11 (2008).
[47]
Jun Wang, Wei Liu, Sanjiv Kumar, and Shih-Fu Chang. 2016. Learning to Hash for Indexing Big Data - A Survey. Proc. IEEE 104, 1 (2016), 34--57.
[48]
Jingdong Wang, Ting Zhang, Jingkuan Song, Nicu Sebe, and Heng Tao Shen. 2018. A Survey on Learning to Hash. IEEE Trans. Pattern Anal. Mach. Intell. 40, 4 (2018), 769--790.
[49]
Xiaofang Wang, Yi Shi, and Kris M. Kitani. 2016. Deep Supervised Hashing with Triplet Labels. In Computer Vision - ACCV 2016 - 13th Asian Conference on Computer Vision (Lecture Notes in Computer Science, Vol. 10111), Shang-Hong Lai, Vincent Lepetit, Ko Nishino, and Yoichi Sato (Eds.). Springer, 70--84.
[50]
Yair Weiss, Antonio Torralba, and Robert Fergus. 2008. Spectral Hashing. In Advances in Neural Information Processing Systems 21, Proceedings of the Twenty-Second Annual Conference on Neural Information Processing Systems, Daphne Koller, Dale Schuurmans, Yoshua Bengio, and Léon Bottou (Eds.). Curran Associates, Inc., 1753--1760.
[51]
Dayan Wu, Zheng Lin, Bo Li, Mingzhen Ye, and Weiping Wang. 2017. Deep Supervised Hashing for Multi-Label and Large-Scale Image Retrieval. In Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval, ICMR, Bogdan Ionescu, Nicu Sebe, Jiashi Feng, Martha A. Larson, Rainer Lienhart, and Cees Snoek (Eds.). ACM, 150--158.
[52]
Haifeng Xia, Taotao Jing, Chen Chen, and Zhengming Ding. 2021. Semisupervised Domain Adaptive Retrieval via Discriminative Hashing Learning. In MM '21: ACM Multimedia Conference, Heng Tao Shen, Yueting Zhuang, John R. Smith, Yang Yang, Pablo Cesar, Florian Metze, and Balakrishnan Prabhakaran (Eds.). ACM, 3853--3861.
[53]
Rongkai Xia, Yan Pan, Hanjiang Lai, Cong Liu, and Shuicheng Yan. 2014. Supervised Hashing for Image Retrieval via Image Representation Learning. In Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence, Carla E. Brodley and Peter Stone (Eds.). AAAI Press, 2156--2162.
[54]
Chengyin Xu, Zhengzhuo Xu, Zenghao Chai, Hongjia Li, Qiruyi Zuo, Lingyu Yang, and Chun Yuan. 2021. HHF: Hashing-guided Hinge Function for Deep Hashing Retrieval. CoRR abs/2112.02225 (2021).
[55]
Li Yuan, Tao Wang, Xiaopeng Zhang, Francis E. H. Tay, Zequn Jie, Wei Liu, and Jiashi Feng. 2020. Central Similarity Quantization for Efficient Image and Video Retrieval. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition,CVPR. Computer Vision Foundation / IEEE, 3080--3089.
[56]
Lei Zhang and Yong Rui. 2013. Image search - from thousands to billions in 20 years. ACM Trans. Multim. Comput. Commun. Appl. 9, 1s (2013), 36:1--36:20.
[57]
Zheng Zhang, Qin Zou, Yuewei Lin, Long Chen, and SongWang. 2020. Improved Deep Hashing With Soft Pairwise Similarity for Multi-Label Image Retrieval. IEEE Trans. Multim. 22, 2 (2020), 540--553.
[58]
Fang Zhao, Yongzhen Huang, Liang Wang, and Tieniu Tan. 2015. Deep semantic ranking based hashing for multi-label image retrieval. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR. IEEE Computer Society, 1556--1564.
[59]
Han Zhu, Mingsheng Long, Jianmin Wang, and Yue Cao. 2016. Deep Hashing Network for Efficient Similarity Retrieval. In Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, Dale Schuurmans and Michael P. Wellman (Eds.). AAAI Press, 2415--2421.

Cited By

View all
  • (2024)Hashing Orthogonal Constraint Loss for Multi-Label Image RetrievalProceedings of 2024 ACM ICMR Workshop on Multimodal Video Retrieval10.1145/3664524.3675367(27-32)Online publication date: 10-Jun-2024
  • (2024)Deep Neighborhood-aware Proxy Hashing with Uniform Distribution Constraint for Cross-modal RetrievalACM Transactions on Multimedia Computing, Communications, and Applications10.1145/364363920:6(1-23)Online publication date: 8-Mar-2024
  • (2024)Hash-Based Remote Sensing Image RetrievalIEEE Transactions on Geoscience and Remote Sensing10.1109/TGRS.2024.342935062(1-23)Online publication date: 2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
MM '22: Proceedings of the 30th ACM International Conference on Multimedia
October 2022
7537 pages
ISBN:9781450392037
DOI:10.1145/3503161
This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 October 2022

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. deep hashing
  2. image retrieval
  3. metric learning
  4. multi-label

Qualifiers

  • Research-article

Funding Sources

  • SZSTC Grant
  • Shenzhen Key Laboratory

Conference

MM '22
Sponsor:

Acceptance Rates

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)242
  • Downloads (Last 6 weeks)30
Reflects downloads up to 09 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Hashing Orthogonal Constraint Loss for Multi-Label Image RetrievalProceedings of 2024 ACM ICMR Workshop on Multimodal Video Retrieval10.1145/3664524.3675367(27-32)Online publication date: 10-Jun-2024
  • (2024)Deep Neighborhood-aware Proxy Hashing with Uniform Distribution Constraint for Cross-modal RetrievalACM Transactions on Multimedia Computing, Communications, and Applications10.1145/364363920:6(1-23)Online publication date: 8-Mar-2024
  • (2024)Hash-Based Remote Sensing Image RetrievalIEEE Transactions on Geoscience and Remote Sensing10.1109/TGRS.2024.342935062(1-23)Online publication date: 2024
  • (2024)Deep global semantic structure-preserving hashing via corrective triplet loss for remote sensing image retrievalExpert Systems with Applications: An International Journal10.1016/j.eswa.2023.122105238:PDOnline publication date: 27-Feb-2024
  • (2023)Unbiased risk estimator to multi-labeled complementary label learningProceedings of the Thirty-Second International Joint Conference on Artificial Intelligence10.24963/ijcai.2023/415(3732-3740)Online publication date: 19-Aug-2023
  • (2023)Neural Image Popularity Assessment with Retrieval-augmented TransformerProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3611918(2427-2436)Online publication date: 26-Oct-2023
  • (2023)Deep Semantic-Aware Proxy Hashing for Multi-Label Cross-Modal RetrievalIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2023.328526634:1(576-589)Online publication date: 12-Jun-2023
  • (2023)Deep Adaptive Quadruplet Hashing With Probability Sampling for Large-Scale Image RetrievalIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2023.328186833:12(7914-7927)Online publication date: 1-Jun-2023
  • (2023)Deep Hashing With Multi-Central Ranking Loss for Multi-Label Image RetrievalIEEE Signal Processing Letters10.1109/LSP.2023.324451630(135-139)Online publication date: 2023

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media