Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3664647.3680564acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Robust Contrastive Cross-modal Hashing with Noisy Labels

Published: 28 October 2024 Publication History

Abstract

Cross-modal hashing has emerged as a promising technique for retrieving relevant information across distinct media types thanks to its low storage cost and high retrieval efficiency. However, the success of most existing methods heavily relies on large-scale well-annotated datasets, which are costly and scarce in the real world due to ubiquitous labeling noise. To tackle this problem, in this paper, we propose a novel framework, termed Noise Resistance Cross-modal Hashing (NRCH), to learn hashing with noisy labels by overcoming two key challenges, i.e. noise overfitting and error accumulation. Specifically, i) to mitigate the overfitting issue caused by noisy labels, we present a novel Robust Contrastive Hashing loss (RCH) to target homologous pairs instead of noisy positive pairs, thus avoiding overemphasizing noise. In other words, RCH enforces the model focus on more reliable positives instead of unreliable ones constructed by noisy labels, thereby enhancing the robustness of the model against noise; ii) to circumvent error accumulation, a Dynamic Noise Separator (DNS) is proposed to dynamically and accurately separate the clean and noisy samples by adaptively fitting the loss distribution, thus alleviate the adverse influence of noise on iterative training. Finally, we conduct extensive experiments on four widely used benchmarks to demonstrate the robustness of our NRCH against noisy labels for cross-modal retrieval. The code is available at: https://github.com/LonganWANG-cs/NRCH.git.

References

[1]
Devansh Arpit, Stanisław Jastrzcebski, Nicolas Ballas, David Krueger, Emmanuel Bengio, Maxinder S. Kanwal, Tegan Maharaj, Asja Fischer, Aaron Courville, Yoshua Bengio, and Simon Lacoste-Julien. 2017. A Closer Look at Memorization in Deep Networks. In Proceedings of the 34th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 70), Doina Precup and Yee Whye Teh (Eds.). PMLR, 233--242. https://proceedings.mlr.press/v70/arpit17a.html
[2]
Tom Burgert, Mahdyar Ravanbakhsh, and Begüm Demir. 2022. On the effects of different types of label noise in multi-label remote sensing image classification. IEEE Transactions on Geoscience and Remote Sensing, Vol. 60 (2022), 1--13.
[3]
Tat-Seng Chua, Jinhui Tang, Richang Hong, Haojie Li, Zhiping Luo, and Yantao Zheng. 2009. Nus-wide: a real-world web image database from national university of singapore. In Proceedings of the ACM international conference on image and video retrieval. 1--9.
[4]
Cheng Deng, Zhaojia Chen, Xianglong Liu, Xinbo Gao, and Dacheng Tao. 2018. Triplet-based deep hashing network for cross-modal retrieval. IEEE Transactions on Image Processing, Vol. 27, 8 (2018), 3893--3903.
[5]
Hugo Jair Escalante, Carlos A Hernández, Jesus A Gonzalez, Aurelio López-López, Manuel Montes, Eduardo F Morales, L Enrique Sucar, Luis Villasenor, and Michael Grubinger. 2010. The segmented and annotated IAPR TC-12 benchmark. Computer vision and image understanding, Vol. 114, 4 (2010), 419--428.
[6]
Beno^it Frénay and Michel Verleysen. 2013. Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems, Vol. 25, 5 (2013), 845--869.
[7]
Zijun Gao, Jun Wang, Guoxian Yu, Zhongmin Yan, Carlotta Domeniconi, and Jinglin Zhang. 2023. Long-tail cross modal hashing. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 37. 7642--7650.
[8]
Bo Han, Jiangchao Yao, Gang Niu, Mingyuan Zhou, Ivor Tsang, Ya Zhang, and Masashi Sugiyama. 2018. Masking: A new perspective of noisy supervision. Advances in neural information processing systems, Vol. 31 (2018).
[9]
Bo Han, Quanming Yao, Xingrui Yu, Gang Niu, Miao Xu, Weihua Hu, Ivor Tsang, and Masashi Sugiyama. 2018. Co-teaching: Robust training of deep neural networks with extremely noisy labels. Advances in neural information processing systems, Vol. 31 (2018).
[10]
Peng Hu, Xi Peng, Hongyuan Zhu, Liangli Zhen, and Jie Lin. 2021. Learning cross-modal retrieval with noisy labels. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5403--5413.
[11]
Peng Hu, Hongyuan Zhu, Jie Lin, Dezhong Peng, Yin-Ping Zhao, and Xi Peng. 2023. Unsupervised Contrastive Cross-Modal Hashing. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 45, 3 (2023), 3877--3889. https://doi.org/10.1109/TPAMI.2022.3177356
[12]
Zhenyu Huang, Guocheng Niu, Xiao Liu, Wenbiao Ding, Xinyan Xiao, Hua Wu, and Xi Peng. 2021. Learning with noisy correspondence for cross-modal matching. Advances in Neural Information Processing Systems, Vol. 34 (2021), 29406--29419.
[13]
Mark J Huiskes and Michael S Lew. 2008. The mir flickr retrieval evaluation. In Proceedings of the 1st ACM international conference on Multimedia information retrieval. 39--43.
[14]
Lu Jiang, Zhengyuan Zhou, Thomas Leung, Li-Jia Li, and Li Fei-Fei. 2018. Mentornet: Learning data-driven curriculum for very deep neural networks on corrupted labels. In International conference on machine learning. PMLR, 2304--2313.
[15]
Qing-Yuan Jiang and Wu-Jun Li. 2017. Deep cross-modal hashing. In Proceedings of the IEEE conference on computer vision and pattern recognition. 3232--3240.
[16]
Hanjiang Lai, Yan Pan, Ye Liu, and Shuicheng Yan. 2015. Simultaneous feature learning and hash coding with deep neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 3270--3278.
[17]
Jey Han Lau and Timothy Baldwin. 2016. An empirical evaluation of doc2vec with practical insights into document embedding generation. arXiv preprint arXiv:1607.05368 (2016).
[18]
Junnan Li, Richard Socher, and Steven CH Hoi. 2020. Dividemix: Learning with noisy labels as semi-supervised learning. arXiv preprint arXiv:2002.07394 (2020).
[19]
Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. 2014. Microsoft coco: Common objects in context. In Computer Vision--ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6--12, 2014, Proceedings, Part V 13. Springer, 740--755.
[20]
Devraj Mandal and Soma Biswas. 2020. Cross-modal retrieval with noisy labels. In 2020 IEEE International Conference on Image Processing (ICIP). IEEE, 2326--2330.
[21]
Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et al. 2019. Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, Vol. 32 (2019).
[22]
Haim Permuter, Joseph Francos, and Ian Jermyn. 2006. A study of Gaussian mixture models of color and texture features for image classification and segmentation. Pattern recognition, Vol. 39, 4 (2006), 695--706.
[23]
Yang Qin, Yingke Chen, Dezhong Peng, Xi Peng, Joey Tianyi Zhou, and Peng Hu. 2024. Noisy-correspondence learning for text-to-image person re-identification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 27197--27206.
[24]
Yang Qin, Dezhong Peng, Xi Peng, Xu Wang, and Peng Hu. 2022. Deep evidential learning with noisy correspondence for cross-modal retrieval. In Proceedings of the 30th ACM International Conference on Multimedia. 4948--4956.
[25]
Yang Qin, Yuan Sun, Dezhong Peng, Joey Tianyi Zhou, Xi Peng, and Peng Hu. 2024. Cross-modal active complementary learning with self-refining correspondence. Advances in Neural Information Processing Systems, Vol. 36 (2024).
[26]
Hwanjun Song, Minseok Kim, Dongmin Park, Yooju Shin, and Jae-Gil Lee. 2022. Learning from noisy labels with deep neural networks: A survey. IEEE Transactions on Neural Networks and Learning Systems (2022).
[27]
Shupeng Su, Zhisheng Zhong, and Chao Zhang. 2019. Deep joint-semantics reconstructing hashing for large-scale unsupervised cross-modal retrieval. In Proceedings of the IEEE/CVF international conference on computer vision. 3027--3035.
[28]
Yuan Sun, Jian Dai, Zhenwen Ren, Yingke Chen, Dezhong Peng, and Peng Hu. 2024. Dual Self-Paced Cross-Modal Hashing. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 38. 15184--15192.
[29]
Yuan Sun, Zhenwen Ren, Peng Hu, Dezhong Peng, and Xu Wang. 2023. Hierarchical consensus hashing for cross-modal retrieval. IEEE Transactions on Multimedia, Vol. 26 (2023), 824--836.
[30]
Tijmen Tieleman, Geoffrey Hinton, et al. 2012. Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural networks for machine learning, Vol. 4, 2 (2012), 26--31.
[31]
Rong-Cheng Tu, Xian-Ling Mao, Bing Ma, Yong Hu, Tan Yan, Wei Wei, and Heyan Huang. 2020. Deep cross-modal hashing with hashing functions and unified hash codes jointly learning. IEEE Transactions on Knowledge and Data Engineering, Vol. 34, 2 (2020), 560--572.
[32]
Runmin Wang, Guoxian Yu, Hong Zhang, Maozu Guo, Lizhen Cui, and Xiangliang Zhang. 2021. Noise-robust deep cross-modal hashing. Information Sciences, Vol. 581 (2021), 136--154.
[33]
Fei Wu, Shuaishuai Li, Guangwei Gao, Yimu Ji, Xiao-Yuan Jing, and Zhiguo Wan. 2023. Semi-supervised cross-modal hashing via modality-specific and cross-modal graph convolutional networks. Pattern Recognition, Vol. 136 (2023), 109211.
[34]
Han Xiao, Huang Xiao, and Claudia Eckert. 2012. Adversarial label flips attack on support vector machines. In ECAI 2012. IOS Press, 870--875.
[35]
De Xie, Cheng Deng, Chao Li, Xianglong Liu, and Dacheng Tao. 2020. Multi-task consistency-preserving adversarial hashing for cross-modal retrieval. IEEE Transactions on Image Processing, Vol. 29 (2020), 3626--3637.
[36]
Tianyuan Xu, Xueliang Liu, Zhen Huang, Dan Guo, Richang Hong, and Meng Wang. 2022. Early-learning regularized contrastive learning for cross-modal retrieval with noisy labels. In Proceedings of the 30th ACM International Conference on Multimedia. 629--637.
[37]
Erkun Yang, Dongren Yao, Tongliang Liu, and Cheng Deng. 2022. Mutual quantization for cross-modal search with noisy labels. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 7551--7560.
[38]
Jun Yu, Hao Zhou, Yibing Zhan, and Dacheng Tao. 2021. Deep graph-neighbor coherence preserving network for unsupervised cross-modal hashing. In Proceedings of the AAAI conference on artificial intelligence, Vol. 35. 4626--4634.
[39]
Xingrui Yu, Bo Han, Jiangchao Yao, Gang Niu, Ivor Tsang, and Masashi Sugiyama. 2019. How does disagreement help generalization against label corruption?. In International Conference on Machine Learning. PMLR, 7164--7173.
[40]
Chiyuan Zhang, Samy Bengio, Moritz Hardt, Benjamin Recht, and Oriol Vinyals. 2021. Understanding deep learning (still) requires rethinking generalization. Commun. ACM, Vol. 64, 3 (2021), 107--115.
[41]
Peng-Fei Zhang, Yang Li, Zi Huang, and Hongzhi Yin. 2021. Privacy protection in deep multi-modal retrieval. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. 634--643.
[42]
Xi Zhang, Hanjiang Lai, and Jiashi Feng. 2018. Attention-aware deep adversarial hashing for cross-modal retrieval. In Proceedings of the European conference on computer vision (ECCV). 591--606.
[43]
Zheng Zhang, Haoyang Luo, Lei Zhu, Guangming Lu, and Heng Tao Shen. 2022. Modality-invariant asymmetric networks for cross-modal hashing. IEEE Transactions on Knowledge and Data Engineering, Vol. 35, 5 (2022), 5091--5104.
[44]
Lei Zhu, Chaoqun Zheng, Weili Guan, Jingjing Li, Yang Yang, and Heng Tao Shen. 2023. Multi-modal Hashing for Efficient Multimedia Retrieval: A Survey. IEEE Transactions on Knowledge and Data Engineering (2023).

Index Terms

  1. Robust Contrastive Cross-modal Hashing with Noisy Labels

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MM '24: Proceedings of the 32nd ACM International Conference on Multimedia
    October 2024
    11719 pages
    ISBN:9798400706868
    DOI:10.1145/3664647
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 28 October 2024

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. cross-modal hashing
    2. cross-modal retrieval
    3. noisy labels

    Qualifiers

    • Research-article

    Funding Sources

    • Sichuan Science and Technology Planning Project under Grants
    • NSFC under Grants
    • Fundamental Research Funds for the Central Universities under Grants

    Conference

    MM '24
    Sponsor:
    MM '24: The 32nd ACM International Conference on Multimedia
    October 28 - November 1, 2024
    Melbourne VIC, Australia

    Acceptance Rates

    MM '24 Paper Acceptance Rate 1,150 of 4,385 submissions, 26%;
    Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 126
      Total Downloads
    • Downloads (Last 12 months)126
    • Downloads (Last 6 weeks)30
    Reflects downloads up to 18 Feb 2025

    Other Metrics

    Citations

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media