Breaking Through the Noisy Correspondence: A Robust Model for Image-Text Matching

Published: 19 August 2024


Unleashing the power of image-text matching in real-world applications is hampered by noisy correspondence. Manually curating high-quality datasets is expensive and time-consuming, and datasets generated using diffusion models are not adequately well-aligned. The most promising way is to collect image-text pairs from the Internet, but it will inevitably introduce noisy correspondence. To reduce the negative impact of noisy correspondence, we propose a novel model that first transforms the noisy correspondence filtering problem into a similarity distribution modeling problem by exploiting the powerful capabilities of pre-trained models. Specifically, we use the Gaussian Mixture model to model the similarity obtained by CLIP as clean distribution and noisy distribution, to filter out most of the noisy correspondence in the dataset. Afterward, we used relatively clean data to fine-tune the model. To further reduce the negative impact of unfiltered noisy correspondence, i.e., a minimal part where two distributions intersect during the fine-tuning process, we propose a distribution-sensitive dynamic margin ranking loss, further increasing the distance between the two distributions. Through continuous iteration, the noisy correspondence gradually decreases and the model performance gradually improves. Our extensive experiments demonstrate the effectiveness and robustness of our model even under high noise rates.


Index Terms

  1. Breaking Through the Noisy Correspondence: A Robust Model for Image-Text Matching



    Published In

    cover image ACM Transactions on Information Systems
    ACM Transactions on Information Systems  Volume 42, Issue 6
    November 2024
    467 pages
    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 19 August 2024
    Online AM: 29 April 2024
    Accepted: 15 April 2024
    Revised: 20 March 2024
    Received: 25 September 2023
    Published in TOIS Volume 42, Issue 6

    Author Tags

    1. Cross-model retrieval
    2. image-text matching
    3. noisy correspondence
    4. similarity distribution modeling


    Funding Sources

    • National Natural Science Foundation of China
    • Shandong Provincial Natural Science Foundation
    • Science and Technology Innovation Program for Distinguished Young Scholars of Shandong Province Higher Education Institutions


