research-article

Prompt Expending for Single Positive Multi-Label Learning with Global Unannotated Categories

Authors:

Xinzheng XuAuthors Info & Claims

ICMR '24: Proceedings of the 2024 International Conference on Multimedia Retrieval

Pages 561 - 569

https://doi.org/10.1145/3652583.3658107

Published: 07 June 2024 Publication History

Abstract

Multi-label learning (MLL) learns from samples associated with multiple labels, where it is expensive and time consuming to provide detailed annotation for each sample in real-world datasets. To deal with this challenge, single positive multi-label learning (SPML) has been studied in recent years. In SPML, each sample is annotated with only one positive label, which is much easier and less costly. However, in many real-world scenarios, single positive labels may have global unannotated categories (GUCs) in annotation process, which exist in the label space but do not serve as single positive label for any samples. Unfortunately, previous SPML approaches are less applicable to classify GUCs due to the absence of supervised information. To solve this problem, we propose a novel prompt expanding framework that leverages a large-scale pretrained vision and language model called the Recognize Anything Model (RAM) to offer supervision signals for GUCs. Specifically, we first provide a simple but effective strategy to generate reliable pseudo-labels for GUCs by utilizing zero-shot predictions of RAM. Subsequently, we introduce additional prompts from a large common category list and fuse them by learnable weighting factors, which expends the semantic representation of GUCs. Experiments show that our method achieves state-of-the-art results on all four benchmarks. The code to reproduce the experiments is at: https://github.com/yingpenga/VLSPE

References

[1]

Tat-Seng Chua, Jinhui Tang, Richang Hong, Haojie Li, Zhiping Luo, and Yantao Zheng. 2009. Nus-wide: a real-world web image database from national university of singapore. In Proceedings of the ACM international conference on image and video retrieval. 1--9.

Digital Library

[2]

Elijah Cole, Oisin Mac Aodha, Titouan Lorieul, Pietro Perona, Dan Morris, and Nebojsa Jojic. 2021. Multi-label learning from single positive labels. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 933--942.

[3]

Didan Deng, Zhaokang Chen, and Bertram E Shi. 2020. Multitask emotion recognition with incomplete labels. In 2020 15th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2020). IEEE, 592--599.

Digital Library

[4]

Ning Ding, Shengding Hu, Weilin Zhao, Yulin Chen, Zhiyuan Liu, Hai-Tao Zheng, and Maosong Sun. 2021. Openprompt: An open-source framework for prompt-learning. arXiv preprint arXiv:2111.01998 (2021).

[5]

Yu Du, Fangyun Wei, Zihe Zhang, Miaojing Shi, Yue Gao, and Guoqi Li. 2022. Learning to prompt for open-vocabulary object detection with vision-language model. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 14084--14093.

[6]

Pinar Duygulu, Kobus Barnard, Joao FG de Freitas, and David A Forsyth. 2002. Object recognition as machine translation: Learning a lexicon for a fixed image vocabulary. In Computer Vision-ECCV 2002: 7th European Conference on Computer Vision Copenhagen, Denmark, May 28--31, 2002 Proceedings, Part IV 7. Springer, 97--112.

[7]

Mark Everingham and John Winn. 2012. The PASCAL visual object classes challenge 2012 (VOC2012) development kit. Pattern Anal. Stat. Model. Comput. Learn., Tech. Rep, Vol. 2007, 1--45 (2012), 5.

[8]

Peng Gao, Shijie Geng, Renrui Zhang, Teli Ma, Rongyao Fang, Yongfeng Zhang, Hongsheng Li, and Yu Qiao. 2023. Clip-adapter: Better vision-language models with feature adapters. International Journal of Computer Vision (2023), 1--15.

[9]

Xiuwen Gong, Dong Yuan, and Wei Bao. 2022. Partial multi-label learning via large margin nearest neighbour embeddings. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 36. 6729--6736.

[10]

Meng Han, Hongxin Wu, Zhiqiang Chen, Muhang Li, and Xilong Zhang. 2023. A survey of multi-label classification based on supervised and semi-supervised learning. International Journal of Machine Learning and Cybernetics, Vol. 14, 3 (2023), 697--724.

[11]

Shuying Jin, Deren Li, and Jianya Gong. 2005. A comparison of SVMs with MLC algorithms on texture features. In MIPPR 2005: Image Analysis Techniques, Vol. 6044. SPIE, 662--667.

[12]

Youngwook Kim, Jae Myung Kim, Zeynep Akata, and Jungwoo Lee. 2022. Large loss matters in weakly supervised multi-label classification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 14156--14165.

[13]

Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. 2014. Microsoft coco: Common objects in context. In Computer Vision--ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6--12, 2014, Proceedings, Part V 13. Springer, 740--755.

[14]

Bing-Qing Liu, Bin-Bin Jia, and Min-Ling Zhang. 2023. Towards enabling binary decomposition for partial multi-label learning. IEEE transactions on pattern analysis and machine intelligence (2023).

Digital Library

[15]

Vivian Liu and Lydia B Chilton. 2022. Design guidelines for prompt engineering text-to-image generative models. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems. 1--23.

Digital Library

[16]

Weiwei Liu, Haobo Wang, Xiaobo Shen, and Ivor W Tsang. 2021. The emerging trends of multi-label learning. IEEE transactions on pattern analysis and machine intelligence, Vol. 44, 11 (2021), 7955--7974.

[17]

Robert Logan IV, Ivana Balavz ević, Eric Wallace, Fabio Petroni, Sameer Singh, and Sebastian Riedel. 2022. Cutting Down on Prompts and Parameters: Simple Few-Shot Learning with Language Models. In Findings of the Association for Computational Linguistics: ACL 2022. 2824--2835.

[18]

Ilya Loshchilov and Frank Hutter. 2018. Decoupled Weight Decay Regularization. In International Conference on Learning Representations.

[19]

Jiasen Lu, Dhruv Batra, Devi Parikh, and Stefan Lee. 2019. Vilbert: Pretraining task-agnostic visiolinguistic representations for vision-and-language tasks. Advances in neural information processing systems, Vol. 32 (2019).

[20]

Huaishao Luo, Lei Ji, Ming Zhong, Yang Chen, Wen Lei, Nan Duan, and Tianrui Li. 2022. Clip4clip: An empirical study of clip for end to end video clip retrieval and captioning. Neurocomputing, Vol. 508 (2022), 293--304.

Digital Library

[21]

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. 2021. Learning transferable visual models from natural language supervision. In International conference on machine learning. PMLR, 8748--8763.

[22]

Jesse Read, Bernhard Pfahringer, Geoff Holmes, and Eibe Frank. 2011. Classifier chains for multi-label classification. Machine learning, Vol. 85 (2011), 333--359.

[23]

Hang Su, Subhransu Maji, Evangelos Kalogerakis, and Erik Learned-Miller. 2015. Multi-view convolutional neural networks for 3d shape recognition. In Proceedings of the IEEE international conference on computer vision. 945--953.

Digital Library

[24]

Ximeng Sun, Ping Hu, and Kate Saenko. 2022. Dualcoop: Fast adaptation to multi-label recognition with limited annotations. Advances in Neural Information Processing Systems, Vol. 35 (2022), 30569--30582.

[25]

Anhui Tan, Jiye Liang, Wei-Zhi Wu, and Jia Zhang. 2022. Semi-supervised partial multi-label classification via consistency learning. Pattern Recognition, Vol. 131 (2022), 108839.

Digital Library

[26]

Grigorios Tsoumakas, Ioannis Katakis, and Ioannis Vlahavas. 2010. Random k-labelsets for multilabel classification. IEEE transactions on knowledge and data engineering, Vol. 23, 7 (2010), 1079--1089.

[27]

Catherine Wah, Steve Branson, Peter Welinder, Pietro Perona, and Serge Belongie. 2011. The caltech-ucsd birds-200--2011 dataset. (2011).

[28]

Deng-Bao Wang, Lei Feng, and Min-Ling Zhang. 2021. Learning from Complementary Labels via Partial-Output Consistency Regularization. In IJCAI. 3075--3081.

[29]

Lichen Wang, Yunyu Liu, Can Qin, Gan Sun, and Yun Fu. 2020. Dual relation semi-supervised multi-label learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 6227--6234.

[30]

Yue Wang, Lijun Wu, Juntao Li, Xiaobo Liang, and Min Zhang. 2023. Are the BERT Family Zero-Shot Learners? A Study on Their Potential and Limitations. Artificial Intelligence (2023), 103953.

[31]

Zhouxia Wang, Tianshui Chen, Guanbin Li, Ruijia Xu, and Liang Lin. 2017. Multi-label image recognition by recurrently discovering attentional regions. In Proceedings of the IEEE international conference on computer vision. 464--472.

[32]

Zifeng Wang, Zizhao Zhang, Chen-Yu Lee, Han Zhang, Ruoxi Sun, Xiaoqi Ren, Guolong Su, Vincent Perot, Jennifer Dy, and Tomas Pfister. 2022. Learning to prompt for continual learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 139--149.

[33]

Ming-Kun Xie and Sheng-Jun Huang. 2021. Partial multi-label learning with noisy label identification. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 44, 7 (2021), 3676--3687.

[34]

Ming-Kun Xie, Jiahao Xiao, and Sheng-Jun Huang. 2022. Label-aware global consistency for multi-label learning with single positive labels. Advances in Neural Information Processing Systems, Vol. 35 (2022), 18430--18441.

[35]

Xin Xing, Zhexiao Xiong, Abby Stylianou, Srikumar Sastry, Liyu Gong, and Nathan Jacobs. 2023. Vision-Language Pseudo-Labels for Single-Positive Multi-Label Learning. arXiv preprint arXiv:2310.15985 (2023).

[36]

Ning Xu, Congyu Qiao, Jiaqi Lv, Xin Geng, and Min-Ling Zhang. 2022. One positive label is sufficient: Single-positive multi-label learning with label enhancement. Advances in Neural Information Processing Systems, Vol. 35 (2022), 21765--21776.

[37]

Hsiang-Fu Yu, Prateek Jain, Purushottam Kar, and Inderjit Dhillon. 2014. Large-scale multi-label learning with missing labels. In International conference on machine learning. PMLR, 593--601.

[38]

Min-Ling Zhang and Zhi-Hua Zhou. 2007. ML-KNN: A lazy learning approach to multi-label learning. Pattern recognition, Vol. 40, 7 (2007), 2038--2048.

[39]

Youcai Zhang, Yuhao Cheng, Xinyu Huang, Fei Wen, Rui Feng, Yaqian Li, and Yandong Guo. 2021. Simple and robust loss design for multi-label learning with missing labels. arXiv preprint arXiv:2112.07368 (2021).

[40]

Youcai Zhang, Xinyu Huang, Jinyu Ma, Zhaoyang Li, Zhaochuan Luo, Yanchun Xie, Yuzhuo Qin, Tong Luo, Yaqian Li, Shilong Liu, et al. 2023. Recognize Anything: A Strong Image Tagging Model. arXiv preprint arXiv:2306.03514 (2023).

[41]

Donghao Zhou, Pengfei Chen, Qiong Wang, Guangyong Chen, and Pheng-Ann Heng. 2022a. Acknowledging the unknown for multi-label learning with single positive labels. In European Conference on Computer Vision. Springer, 423--440.

Digital Library

[42]

Kaiyang Zhou, Jingkang Yang, Chen Change Loy, and Ziwei Liu. 2022b. Learning to prompt for vision-language models. International Journal of Computer Vision, Vol. 130, 9 (2022), 2337--2348.

Digital Library

Index Terms

Prompt Expending for Single Positive Multi-Label Learning with Global Unannotated Categories
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
        Object recognition

Recommendations

Hierarchical Prompt Learning Using CLIP for Multi-label Classification with Single Positive Labels
MM '23: Proceedings of the 31st ACM International Conference on Multimedia

Collecting full annotations to construct multi-label datasets is difficult and labor-consuming. As an effective solution to relieve the annotation burden, single positive multi-label learning (SPML) draws increasing attention from both academia and ...
Hierarchical Multi-Label Classification with Partial Labels and Unknown Hierarchy
CIKM '23: Proceedings of the 32nd ACM International Conference on Information and Knowledge Management

Hierarchical multi-label classification aims at learning a multi-label classifier from a dataset whose labels are organized into a hierarchical structure. To the best of our knowledge, we propose for the first time the problem of finding a multi-label ...
Acknowledging the Unknown for Multi-label Learning with Single Positive Labels
Computer Vision – ECCV 2022
Abstract
Due to the difficulty of collecting exhaustive multi-label annotations, multi-label datasets often contain partial labels. We consider an extreme of this weakly supervised learning problem, called single positive multi-label learning (SPML), where ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ICMR '24: Proceedings of the 2024 International Conference on Multimedia Retrieval

May 2024

1379 pages

ISBN:9798400706196

DOI:10.1145/3652583

General Chairs:
Cathal Gurrin
Dublin City University, Ireland
,
Rachada Kongkachandra
Thammasat University, Thailand
,
Klaus Schoeffmann
Klagenfurt University, Austria
,
Program Chairs:
Duc-Tien Dang-Nguyen
University of Bergen, Norway
,
Luca Rossetto
University of Zurich, Switzerland
,
Shin'ichi Satoh
National Institute of Informatics, Japan
,
Liting Zhou
Dublin City University, Ireland

Copyright © 2024 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 June 2024

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Conference

ICMR '24

Sponsor:

ICMR '24: International Conference on Multimedia Retrieval

June 10 - 14, 2024

Phuket, Thailand

Acceptance Rates

Overall Acceptance Rate 254 of 830 submissions, 31%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
106
Total Downloads

Downloads (Last 12 months)106
Downloads (Last 6 weeks)6

Reflects downloads up to 31 Jan 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten