research-article

Robust Contrastive Cross-modal Hashing with Noisy Labels

Authors:

Peng HuAuthors Info & Claims

MM '24: Proceedings of the 32nd ACM International Conference on Multimedia

Pages 5752 - 5760

https://doi.org/10.1145/3664647.3680564

Published: 28 October 2024 Publication History

Abstract

Cross-modal hashing has emerged as a promising technique for retrieving relevant information across distinct media types thanks to its low storage cost and high retrieval efficiency. However, the success of most existing methods heavily relies on large-scale well-annotated datasets, which are costly and scarce in the real world due to ubiquitous labeling noise. To tackle this problem, in this paper, we propose a novel framework, termed Noise Resistance Cross-modal Hashing (NRCH), to learn hashing with noisy labels by overcoming two key challenges, i.e. noise overfitting and error accumulation. Specifically, i) to mitigate the overfitting issue caused by noisy labels, we present a novel Robust Contrastive Hashing loss (RCH) to target homologous pairs instead of noisy positive pairs, thus avoiding overemphasizing noise. In other words, RCH enforces the model focus on more reliable positives instead of unreliable ones constructed by noisy labels, thereby enhancing the robustness of the model against noise; ii) to circumvent error accumulation, a Dynamic Noise Separator (DNS) is proposed to dynamically and accurately separate the clean and noisy samples by adaptively fitting the loss distribution, thus alleviate the adverse influence of noise on iterative training. Finally, we conduct extensive experiments on four widely used benchmarks to demonstrate the robustness of our NRCH against noisy labels for cross-modal retrieval. The code is available at: https://github.com/LonganWANG-cs/NRCH.git.

References

[1]

Devansh Arpit, Stanisław Jastrzcebski, Nicolas Ballas, David Krueger, Emmanuel Bengio, Maxinder S. Kanwal, Tegan Maharaj, Asja Fischer, Aaron Courville, Yoshua Bengio, and Simon Lacoste-Julien. 2017. A Closer Look at Memorization in Deep Networks. In Proceedings of the 34th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 70), Doina Precup and Yee Whye Teh (Eds.). PMLR, 233--242. https://proceedings.mlr.press/v70/arpit17a.html

[2]

Tom Burgert, Mahdyar Ravanbakhsh, and Begüm Demir. 2022. On the effects of different types of label noise in multi-label remote sensing image classification. IEEE Transactions on Geoscience and Remote Sensing, Vol. 60 (2022), 1--13.

[3]

Tat-Seng Chua, Jinhui Tang, Richang Hong, Haojie Li, Zhiping Luo, and Yantao Zheng. 2009. Nus-wide: a real-world web image database from national university of singapore. In Proceedings of the ACM international conference on image and video retrieval. 1--9.

Digital Library

[4]

Cheng Deng, Zhaojia Chen, Xianglong Liu, Xinbo Gao, and Dacheng Tao. 2018. Triplet-based deep hashing network for cross-modal retrieval. IEEE Transactions on Image Processing, Vol. 27, 8 (2018), 3893--3903.

[5]

Hugo Jair Escalante, Carlos A Hernández, Jesus A Gonzalez, Aurelio López-López, Manuel Montes, Eduardo F Morales, L Enrique Sucar, Luis Villasenor, and Michael Grubinger. 2010. The segmented and annotated IAPR TC-12 benchmark. Computer vision and image understanding, Vol. 114, 4 (2010), 419--428.

[6]

Beno^it Frénay and Michel Verleysen. 2013. Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems, Vol. 25, 5 (2013), 845--869.

[7]

Zijun Gao, Jun Wang, Guoxian Yu, Zhongmin Yan, Carlotta Domeniconi, and Jinglin Zhang. 2023. Long-tail cross modal hashing. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 37. 7642--7650.

Digital Library

[8]

Bo Han, Jiangchao Yao, Gang Niu, Mingyuan Zhou, Ivor Tsang, Ya Zhang, and Masashi Sugiyama. 2018. Masking: A new perspective of noisy supervision. Advances in neural information processing systems, Vol. 31 (2018).

[9]

Bo Han, Quanming Yao, Xingrui Yu, Gang Niu, Miao Xu, Weihua Hu, Ivor Tsang, and Masashi Sugiyama. 2018. Co-teaching: Robust training of deep neural networks with extremely noisy labels. Advances in neural information processing systems, Vol. 31 (2018).

[10]

Peng Hu, Xi Peng, Hongyuan Zhu, Liangli Zhen, and Jie Lin. 2021. Learning cross-modal retrieval with noisy labels. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5403--5413.

[11]

Peng Hu, Hongyuan Zhu, Jie Lin, Dezhong Peng, Yin-Ping Zhao, and Xi Peng. 2023. Unsupervised Contrastive Cross-Modal Hashing. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 45, 3 (2023), 3877--3889. https://doi.org/10.1109/TPAMI.2022.3177356

[12]

Zhenyu Huang, Guocheng Niu, Xiao Liu, Wenbiao Ding, Xinyan Xiao, Hua Wu, and Xi Peng. 2021. Learning with noisy correspondence for cross-modal matching. Advances in Neural Information Processing Systems, Vol. 34 (2021), 29406--29419.

[13]

Mark J Huiskes and Michael S Lew. 2008. The mir flickr retrieval evaluation. In Proceedings of the 1st ACM international conference on Multimedia information retrieval. 39--43.

Digital Library

[14]

Lu Jiang, Zhengyuan Zhou, Thomas Leung, Li-Jia Li, and Li Fei-Fei. 2018. Mentornet: Learning data-driven curriculum for very deep neural networks on corrupted labels. In International conference on machine learning. PMLR, 2304--2313.

[15]

Qing-Yuan Jiang and Wu-Jun Li. 2017. Deep cross-modal hashing. In Proceedings of the IEEE conference on computer vision and pattern recognition. 3232--3240.

[16]

Hanjiang Lai, Yan Pan, Ye Liu, and Shuicheng Yan. 2015. Simultaneous feature learning and hash coding with deep neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 3270--3278.

[17]

Jey Han Lau and Timothy Baldwin. 2016. An empirical evaluation of doc2vec with practical insights into document embedding generation. arXiv preprint arXiv:1607.05368 (2016).

[18]

Junnan Li, Richard Socher, and Steven CH Hoi. 2020. Dividemix: Learning with noisy labels as semi-supervised learning. arXiv preprint arXiv:2002.07394 (2020).

[19]

Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. 2014. Microsoft coco: Common objects in context. In Computer Vision--ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6--12, 2014, Proceedings, Part V 13. Springer, 740--755.

[20]

Devraj Mandal and Soma Biswas. 2020. Cross-modal retrieval with noisy labels. In 2020 IEEE International Conference on Image Processing (ICIP). IEEE, 2326--2330.

[21]

Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et al. 2019. Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, Vol. 32 (2019).

[22]

Haim Permuter, Joseph Francos, and Ian Jermyn. 2006. A study of Gaussian mixture models of color and texture features for image classification and segmentation. Pattern recognition, Vol. 39, 4 (2006), 695--706.

[23]

Yang Qin, Yingke Chen, Dezhong Peng, Xi Peng, Joey Tianyi Zhou, and Peng Hu. 2024. Noisy-correspondence learning for text-to-image person re-identification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 27197--27206.

[24]

Yang Qin, Dezhong Peng, Xi Peng, Xu Wang, and Peng Hu. 2022. Deep evidential learning with noisy correspondence for cross-modal retrieval. In Proceedings of the 30th ACM International Conference on Multimedia. 4948--4956.

Digital Library

[25]

Yang Qin, Yuan Sun, Dezhong Peng, Joey Tianyi Zhou, Xi Peng, and Peng Hu. 2024. Cross-modal active complementary learning with self-refining correspondence. Advances in Neural Information Processing Systems, Vol. 36 (2024).

[26]

Hwanjun Song, Minseok Kim, Dongmin Park, Yooju Shin, and Jae-Gil Lee. 2022. Learning from noisy labels with deep neural networks: A survey. IEEE Transactions on Neural Networks and Learning Systems (2022).

[27]

Shupeng Su, Zhisheng Zhong, and Chao Zhang. 2019. Deep joint-semantics reconstructing hashing for large-scale unsupervised cross-modal retrieval. In Proceedings of the IEEE/CVF international conference on computer vision. 3027--3035.

[28]

Yuan Sun, Jian Dai, Zhenwen Ren, Yingke Chen, Dezhong Peng, and Peng Hu. 2024. Dual Self-Paced Cross-Modal Hashing. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 38. 15184--15192.

Digital Library

[29]

Yuan Sun, Zhenwen Ren, Peng Hu, Dezhong Peng, and Xu Wang. 2023. Hierarchical consensus hashing for cross-modal retrieval. IEEE Transactions on Multimedia, Vol. 26 (2023), 824--836.

Digital Library

[30]

Tijmen Tieleman, Geoffrey Hinton, et al. 2012. Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural networks for machine learning, Vol. 4, 2 (2012), 26--31.

[31]

Rong-Cheng Tu, Xian-Ling Mao, Bing Ma, Yong Hu, Tan Yan, Wei Wei, and Heyan Huang. 2020. Deep cross-modal hashing with hashing functions and unified hash codes jointly learning. IEEE Transactions on Knowledge and Data Engineering, Vol. 34, 2 (2020), 560--572.

Digital Library

[32]

Runmin Wang, Guoxian Yu, Hong Zhang, Maozu Guo, Lizhen Cui, and Xiangliang Zhang. 2021. Noise-robust deep cross-modal hashing. Information Sciences, Vol. 581 (2021), 136--154.

Digital Library

[33]

Fei Wu, Shuaishuai Li, Guangwei Gao, Yimu Ji, Xiao-Yuan Jing, and Zhiguo Wan. 2023. Semi-supervised cross-modal hashing via modality-specific and cross-modal graph convolutional networks. Pattern Recognition, Vol. 136 (2023), 109211.

Digital Library

[34]

Han Xiao, Huang Xiao, and Claudia Eckert. 2012. Adversarial label flips attack on support vector machines. In ECAI 2012. IOS Press, 870--875.

[35]

De Xie, Cheng Deng, Chao Li, Xianglong Liu, and Dacheng Tao. 2020. Multi-task consistency-preserving adversarial hashing for cross-modal retrieval. IEEE Transactions on Image Processing, Vol. 29 (2020), 3626--3637.

Digital Library

[36]

Tianyuan Xu, Xueliang Liu, Zhen Huang, Dan Guo, Richang Hong, and Meng Wang. 2022. Early-learning regularized contrastive learning for cross-modal retrieval with noisy labels. In Proceedings of the 30th ACM International Conference on Multimedia. 629--637.

Digital Library

[37]

Erkun Yang, Dongren Yao, Tongliang Liu, and Cheng Deng. 2022. Mutual quantization for cross-modal search with noisy labels. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 7551--7560.

[38]

Jun Yu, Hao Zhou, Yibing Zhan, and Dacheng Tao. 2021. Deep graph-neighbor coherence preserving network for unsupervised cross-modal hashing. In Proceedings of the AAAI conference on artificial intelligence, Vol. 35. 4626--4634.

[39]

Xingrui Yu, Bo Han, Jiangchao Yao, Gang Niu, Ivor Tsang, and Masashi Sugiyama. 2019. How does disagreement help generalization against label corruption?. In International Conference on Machine Learning. PMLR, 7164--7173.

[40]

Chiyuan Zhang, Samy Bengio, Moritz Hardt, Benjamin Recht, and Oriol Vinyals. 2021. Understanding deep learning (still) requires rethinking generalization. Commun. ACM, Vol. 64, 3 (2021), 107--115.

Digital Library

[41]

Peng-Fei Zhang, Yang Li, Zi Huang, and Hongzhi Yin. 2021. Privacy protection in deep multi-modal retrieval. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. 634--643.

Digital Library

[42]

Xi Zhang, Hanjiang Lai, and Jiashi Feng. 2018. Attention-aware deep adversarial hashing for cross-modal retrieval. In Proceedings of the European conference on computer vision (ECCV). 591--606.

Digital Library

[43]

Zheng Zhang, Haoyang Luo, Lei Zhu, Guangming Lu, and Heng Tao Shen. 2022. Modality-invariant asymmetric networks for cross-modal hashing. IEEE Transactions on Knowledge and Data Engineering, Vol. 35, 5 (2022), 5091--5104.

[44]

Lei Zhu, Chaoqun Zheng, Weili Guan, Jingjing Li, Yang Yang, and Heng Tao Shen. 2023. Multi-modal Hashing for Efficient Multimedia Retrieval: A Survey. IEEE Transactions on Knowledge and Data Engineering (2023).

Index Terms

Robust Contrastive Cross-modal Hashing with Noisy Labels
1. Information systems
  1. Information retrieval
    1. Specialized information retrieval

Recommendations

Unpaired robust hashing with noisy labels for zero-shot cross-modal retrieval
Abstract
With new social media concepts emerging, zero-shot cross-modal retrieval methods have gained significant attention. Most of the existing methods assume that the labels of training data are correct and the different modalities are perfectly ...
Noise-robust Deep Cross-Modal Hashing
Highlights
- A noise-robust cross-modal hashing method NrCMH is proposed for data with noisy labels.
Abstract
Cross-modal hashing has been intensively studied to efficiently retrieve multi-modal data across modalities. Supervised cross-modal hashing methods leverage the labels of training data to improve the retrieval performance. However, ...
A Label Noise Robust Cross-Modal Hashing Approach
Knowledge Science, Engineering and Management
Abstract
Cross-modal hashing has attracted more and more research interest for its high speed and low storage cost in solving cross-modal approximate nearest neighbor search problem. With the rapid growth of social networks, a large amount of information ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MM '24: Proceedings of the 32nd ACM International Conference on Multimedia

October 2024

11719 pages

ISBN:9798400706868

DOI:10.1145/3664647

General Chairs:
Jianfei Cai
Monash University, Australia
,
Mohan Kankanhalli
NUS, Singapore
,
Balakrishnan Prabhakaran
UT Dallas, USA
,
Susanne Boll
University of Oldenburg, Germany
,
Program Chairs:
Ramanathan Subramanian
University of Canberra & IIT Ropar, Australia
,
Liang Zheng
Australian National University, Australia
,
Vivek K. Singh
Rutgers University, USA
,
Pablo Cesar
Centrum Wiskunde & Informatica, Netherlands
,
Lexing Xie
Australian National University, Australia
,
Dong Xu
University of Hong Kong, Hong Kong

Copyright © 2024 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 October 2024

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Sichuan Science and Technology Planning Project under Grants
NSFC under Grants
Fundamental Research Funds for the Central Universities under Grants

Conference

MM '24

Sponsor:

SIGMM

MM '24: The 32nd ACM International Conference on Multimedia

October 28 - November 1, 2024

Melbourne VIC, Australia

Acceptance Rates

MM '24 Paper Acceptance Rate 1,150 of 4,385 submissions, 26%;

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
126
Total Downloads

Downloads (Last 12 months)126
Downloads (Last 6 weeks)30

Reflects downloads up to 18 Feb 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten