Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3652583.3658107acmconferencesArticle/Chapter ViewAbstractPublication PagesicmrConference Proceedingsconference-collections
research-article

Prompt Expending for Single Positive Multi-Label Learning with Global Unannotated Categories

Published: 07 June 2024 Publication History

Abstract

Multi-label learning (MLL) learns from samples associated with multiple labels, where it is expensive and time consuming to provide detailed annotation for each sample in real-world datasets. To deal with this challenge, single positive multi-label learning (SPML) has been studied in recent years. In SPML, each sample is annotated with only one positive label, which is much easier and less costly. However, in many real-world scenarios, single positive labels may have global unannotated categories (GUCs) in annotation process, which exist in the label space but do not serve as single positive label for any samples. Unfortunately, previous SPML approaches are less applicable to classify GUCs due to the absence of supervised information. To solve this problem, we propose a novel prompt expanding framework that leverages a large-scale pretrained vision and language model called the Recognize Anything Model (RAM) to offer supervision signals for GUCs. Specifically, we first provide a simple but effective strategy to generate reliable pseudo-labels for GUCs by utilizing zero-shot predictions of RAM. Subsequently, we introduce additional prompts from a large common category list and fuse them by learnable weighting factors, which expends the semantic representation of GUCs. Experiments show that our method achieves state-of-the-art results on all four benchmarks. The code to reproduce the experiments is at: https://github.com/yingpenga/VLSPE

References

[1]
Tat-Seng Chua, Jinhui Tang, Richang Hong, Haojie Li, Zhiping Luo, and Yantao Zheng. 2009. Nus-wide: a real-world web image database from national university of singapore. In Proceedings of the ACM international conference on image and video retrieval. 1--9.
[2]
Elijah Cole, Oisin Mac Aodha, Titouan Lorieul, Pietro Perona, Dan Morris, and Nebojsa Jojic. 2021. Multi-label learning from single positive labels. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 933--942.
[3]
Didan Deng, Zhaokang Chen, and Bertram E Shi. 2020. Multitask emotion recognition with incomplete labels. In 2020 15th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2020). IEEE, 592--599.
[4]
Ning Ding, Shengding Hu, Weilin Zhao, Yulin Chen, Zhiyuan Liu, Hai-Tao Zheng, and Maosong Sun. 2021. Openprompt: An open-source framework for prompt-learning. arXiv preprint arXiv:2111.01998 (2021).
[5]
Yu Du, Fangyun Wei, Zihe Zhang, Miaojing Shi, Yue Gao, and Guoqi Li. 2022. Learning to prompt for open-vocabulary object detection with vision-language model. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 14084--14093.
[6]
Pinar Duygulu, Kobus Barnard, Joao FG de Freitas, and David A Forsyth. 2002. Object recognition as machine translation: Learning a lexicon for a fixed image vocabulary. In Computer Vision-ECCV 2002: 7th European Conference on Computer Vision Copenhagen, Denmark, May 28--31, 2002 Proceedings, Part IV 7. Springer, 97--112.
[7]
Mark Everingham and John Winn. 2012. The PASCAL visual object classes challenge 2012 (VOC2012) development kit. Pattern Anal. Stat. Model. Comput. Learn., Tech. Rep, Vol. 2007, 1--45 (2012), 5.
[8]
Peng Gao, Shijie Geng, Renrui Zhang, Teli Ma, Rongyao Fang, Yongfeng Zhang, Hongsheng Li, and Yu Qiao. 2023. Clip-adapter: Better vision-language models with feature adapters. International Journal of Computer Vision (2023), 1--15.
[9]
Xiuwen Gong, Dong Yuan, and Wei Bao. 2022. Partial multi-label learning via large margin nearest neighbour embeddings. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 36. 6729--6736.
[10]
Meng Han, Hongxin Wu, Zhiqiang Chen, Muhang Li, and Xilong Zhang. 2023. A survey of multi-label classification based on supervised and semi-supervised learning. International Journal of Machine Learning and Cybernetics, Vol. 14, 3 (2023), 697--724.
[11]
Shuying Jin, Deren Li, and Jianya Gong. 2005. A comparison of SVMs with MLC algorithms on texture features. In MIPPR 2005: Image Analysis Techniques, Vol. 6044. SPIE, 662--667.
[12]
Youngwook Kim, Jae Myung Kim, Zeynep Akata, and Jungwoo Lee. 2022. Large loss matters in weakly supervised multi-label classification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 14156--14165.
[13]
Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. 2014. Microsoft coco: Common objects in context. In Computer Vision--ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6--12, 2014, Proceedings, Part V 13. Springer, 740--755.
[14]
Bing-Qing Liu, Bin-Bin Jia, and Min-Ling Zhang. 2023. Towards enabling binary decomposition for partial multi-label learning. IEEE transactions on pattern analysis and machine intelligence (2023).
[15]
Vivian Liu and Lydia B Chilton. 2022. Design guidelines for prompt engineering text-to-image generative models. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems. 1--23.
[16]
Weiwei Liu, Haobo Wang, Xiaobo Shen, and Ivor W Tsang. 2021. The emerging trends of multi-label learning. IEEE transactions on pattern analysis and machine intelligence, Vol. 44, 11 (2021), 7955--7974.
[17]
Robert Logan IV, Ivana Balavz ević, Eric Wallace, Fabio Petroni, Sameer Singh, and Sebastian Riedel. 2022. Cutting Down on Prompts and Parameters: Simple Few-Shot Learning with Language Models. In Findings of the Association for Computational Linguistics: ACL 2022. 2824--2835.
[18]
Ilya Loshchilov and Frank Hutter. 2018. Decoupled Weight Decay Regularization. In International Conference on Learning Representations.
[19]
Jiasen Lu, Dhruv Batra, Devi Parikh, and Stefan Lee. 2019. Vilbert: Pretraining task-agnostic visiolinguistic representations for vision-and-language tasks. Advances in neural information processing systems, Vol. 32 (2019).
[20]
Huaishao Luo, Lei Ji, Ming Zhong, Yang Chen, Wen Lei, Nan Duan, and Tianrui Li. 2022. Clip4clip: An empirical study of clip for end to end video clip retrieval and captioning. Neurocomputing, Vol. 508 (2022), 293--304.
[21]
Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. 2021. Learning transferable visual models from natural language supervision. In International conference on machine learning. PMLR, 8748--8763.
[22]
Jesse Read, Bernhard Pfahringer, Geoff Holmes, and Eibe Frank. 2011. Classifier chains for multi-label classification. Machine learning, Vol. 85 (2011), 333--359.
[23]
Hang Su, Subhransu Maji, Evangelos Kalogerakis, and Erik Learned-Miller. 2015. Multi-view convolutional neural networks for 3d shape recognition. In Proceedings of the IEEE international conference on computer vision. 945--953.
[24]
Ximeng Sun, Ping Hu, and Kate Saenko. 2022. Dualcoop: Fast adaptation to multi-label recognition with limited annotations. Advances in Neural Information Processing Systems, Vol. 35 (2022), 30569--30582.
[25]
Anhui Tan, Jiye Liang, Wei-Zhi Wu, and Jia Zhang. 2022. Semi-supervised partial multi-label classification via consistency learning. Pattern Recognition, Vol. 131 (2022), 108839.
[26]
Grigorios Tsoumakas, Ioannis Katakis, and Ioannis Vlahavas. 2010. Random k-labelsets for multilabel classification. IEEE transactions on knowledge and data engineering, Vol. 23, 7 (2010), 1079--1089.
[27]
Catherine Wah, Steve Branson, Peter Welinder, Pietro Perona, and Serge Belongie. 2011. The caltech-ucsd birds-200--2011 dataset. (2011).
[28]
Deng-Bao Wang, Lei Feng, and Min-Ling Zhang. 2021. Learning from Complementary Labels via Partial-Output Consistency Regularization. In IJCAI. 3075--3081.
[29]
Lichen Wang, Yunyu Liu, Can Qin, Gan Sun, and Yun Fu. 2020. Dual relation semi-supervised multi-label learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 6227--6234.
[30]
Yue Wang, Lijun Wu, Juntao Li, Xiaobo Liang, and Min Zhang. 2023. Are the BERT Family Zero-Shot Learners? A Study on Their Potential and Limitations. Artificial Intelligence (2023), 103953.
[31]
Zhouxia Wang, Tianshui Chen, Guanbin Li, Ruijia Xu, and Liang Lin. 2017. Multi-label image recognition by recurrently discovering attentional regions. In Proceedings of the IEEE international conference on computer vision. 464--472.
[32]
Zifeng Wang, Zizhao Zhang, Chen-Yu Lee, Han Zhang, Ruoxi Sun, Xiaoqi Ren, Guolong Su, Vincent Perot, Jennifer Dy, and Tomas Pfister. 2022. Learning to prompt for continual learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 139--149.
[33]
Ming-Kun Xie and Sheng-Jun Huang. 2021. Partial multi-label learning with noisy label identification. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 44, 7 (2021), 3676--3687.
[34]
Ming-Kun Xie, Jiahao Xiao, and Sheng-Jun Huang. 2022. Label-aware global consistency for multi-label learning with single positive labels. Advances in Neural Information Processing Systems, Vol. 35 (2022), 18430--18441.
[35]
Xin Xing, Zhexiao Xiong, Abby Stylianou, Srikumar Sastry, Liyu Gong, and Nathan Jacobs. 2023. Vision-Language Pseudo-Labels for Single-Positive Multi-Label Learning. arXiv preprint arXiv:2310.15985 (2023).
[36]
Ning Xu, Congyu Qiao, Jiaqi Lv, Xin Geng, and Min-Ling Zhang. 2022. One positive label is sufficient: Single-positive multi-label learning with label enhancement. Advances in Neural Information Processing Systems, Vol. 35 (2022), 21765--21776.
[37]
Hsiang-Fu Yu, Prateek Jain, Purushottam Kar, and Inderjit Dhillon. 2014. Large-scale multi-label learning with missing labels. In International conference on machine learning. PMLR, 593--601.
[38]
Min-Ling Zhang and Zhi-Hua Zhou. 2007. ML-KNN: A lazy learning approach to multi-label learning. Pattern recognition, Vol. 40, 7 (2007), 2038--2048.
[39]
Youcai Zhang, Yuhao Cheng, Xinyu Huang, Fei Wen, Rui Feng, Yaqian Li, and Yandong Guo. 2021. Simple and robust loss design for multi-label learning with missing labels. arXiv preprint arXiv:2112.07368 (2021).
[40]
Youcai Zhang, Xinyu Huang, Jinyu Ma, Zhaoyang Li, Zhaochuan Luo, Yanchun Xie, Yuzhuo Qin, Tong Luo, Yaqian Li, Shilong Liu, et al. 2023. Recognize Anything: A Strong Image Tagging Model. arXiv preprint arXiv:2306.03514 (2023).
[41]
Donghao Zhou, Pengfei Chen, Qiong Wang, Guangyong Chen, and Pheng-Ann Heng. 2022a. Acknowledging the unknown for multi-label learning with single positive labels. In European Conference on Computer Vision. Springer, 423--440.
[42]
Kaiyang Zhou, Jingkang Yang, Chen Change Loy, and Ziwei Liu. 2022b. Learning to prompt for vision-language models. International Journal of Computer Vision, Vol. 130, 9 (2022), 2337--2348.

Index Terms

  1. Prompt Expending for Single Positive Multi-Label Learning with Global Unannotated Categories

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ICMR '24: Proceedings of the 2024 International Conference on Multimedia Retrieval
    May 2024
    1379 pages
    ISBN:9798400706196
    DOI:10.1145/3652583
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 07 June 2024

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. pretrained vision and language model
    2. prompt expending
    3. single positive multi-label learning
    4. weakly supervised learning

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    ICMR '24
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 254 of 830 submissions, 31%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 106
      Total Downloads
    • Downloads (Last 12 months)106
    • Downloads (Last 6 weeks)6
    Reflects downloads up to 31 Jan 2025

    Other Metrics

    Citations

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media