Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3664647.3680793acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Multi-Label Learning with Block Diagonal Labels

Published: 28 October 2024 Publication History

Abstract

Collecting large-scale multi-label data with full labels is difficult for real-world scenarios. Many existing studies have tried to address the issue of missing labels caused by annotation but ignored the difficulties encountered during the annotation process. We find that the high annotation workload can be attributed to two reasons: (1) Annotators are required to identify labels on widely varying visual concepts. (2) Exhaustively annotating the entire dataset with all the labels becomes notably difficult and time-consuming. In this paper, we propose a new setting, i.e. block diagonal labels, to reduce the workload on both sides. The numerous categories can be divided into different subsets based on semantics and relevance. Each annotator can only focus on its own subset of labels so that only a small set of highly relevant labels are required to be annotated per image. To deal with the issue of such missing labels, we introduce a simple yet effective method that does not require any prior knowledge of the dataset. In practice, we propose an Adaptive Pseudo-Labeling method to predict the unknown labels with less noise. Formal analysis is conducted to evaluate the superiority of our setting. Extensive experiments are conducted to verify the effectiveness of our method on multiple widely used benchmarks.

References

[1]
Emanuel Ben-Baruch, Tal Ridnik, Itamar Friedman, Avi Ben-Cohen, Nadav Zamir, Asaf Noy, and Lihi Zelnik-Manor. 2022. Multi-label Classification with Partial Annotations using Class-aware Selective Loss. In IEEE Conference on Computer Vision and Pattern Recognition. 4764--4772.
[2]
Tianshui Chen, Liang Lin, Riquan Chen, Xiaolu Hui, and Hefeng Wu. 2020. Knowledge-guided multi-label few-shot learning for general image recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 44, 3 (2020), 1371--1384.
[3]
Tianshui Chen, Tao Pu, Hefeng Wu, Yuan Xie, and Liang Lin. 2022. Structured semantic transfer for multi-label recognition with partial labels. In AAAI Conference on Artificial Intelligence. 339--346.
[4]
Tianshui Chen, Muxin Xu, Xiaolu Hui, Hefeng Wu, and Liang Lin. 2019. Learning semantic-specific graph representation for multi-label image recognition. In IEEE International Conference on Computer Vision. 522--531.
[5]
Zhao-Min Chen, Xiu-Shen Wei, Peng Wang, and Yanwen Guo. 2019. Multi-label image recognition with graph convolutional networks. In IEEE Conference on Computer Vision and Pattern Recognition. 5177--5186.
[6]
Tat-Seng Chua, Jinhui Tang, Richang Hong, Haojie Li, Zhiping Luo, and Yantao Zheng. 2009. Nus-wide: a real-world web image database from national university of singapore. In ACM International Conference on Image and Video Retrieval. 1--9.
[7]
Elijah Cole, Oisin Mac Aodha, Titouan Lorieul, Pietro Perona, Dan Morris, and Nebojsa Jojic. 2021. Multi-label learning from single positive labels. In IEEE Conference on Computer Vision and Pattern Recognition. 933--942.
[8]
Ekin D Cubuk, Barret Zoph, Dandelion Mane, Vijay Vasudevan, and Quoc V Le. 2018. Autoaugment: Learning augmentation policies from data. arXiv:1805.09501 (2018).
[9]
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. In IEEE Conference on Computer Vision and Pattern Recognition. 248--255.
[10]
Jia Deng, Olga Russakovsky, Jonathan Krause, Michael S Bernstein, Alex Berg, and Li Fei-Fei. 2014. Scalable multi-label annotation. In SIGCHI Conference on Human Factors in Computing Systems. 3099--3102.
[11]
Terrance DeVries and Graham W Taylor. 2017. Improved regularization of convolutional neural networks with cutout. arXiv:1708.04552 (2017).
[12]
Zixuan Ding, Ao Wang, Hui Chen, Qiang Zhang, Pengzhang Liu, Yongjun Bao, Weipeng Yan, and Jungong Han. 2023. Exploring structured semantic prior for multi label recognition with incomplete labels. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3398--3407.
[13]
Thibaut Durand, Nazanin Mehrasa, and Greg Mori. 2019. Learning a deep convnet for multi-label classification with partial labels. In IEEE Conference on Computer Vision and Pattern Recognition. 647--657.
[14]
Mark Everingham, Luc Van Gool, Christopher KI Williams, John Winn, and Andrew Zisserman. 2010. The pascal visual object classes (voc) challenge. International Journal of Computer Vision, Vol. 88, 2 (2010), 303--338.
[15]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In IEEE Conference on Computer Vision and Pattern Recognition. 770--778.
[16]
Dat Huynh and Ehsan Elhamifar. 2020. Interactive multi-label cnn learning with partial labels. In IEEE Conference on Computer Vision and Pattern Recognition. 9423--9432.
[17]
Youngwook Kim, Jae Myung Kim, Zeynep Akata, and Jungwoo Lee. 2022. Large Loss Matters in Weakly Supervised Multi-Label Classification. In IEEE Conference on Computer Vision and Pattern Recognition. 14156--14165.
[18]
Youngwook Kim, Jae Myung Kim, Jieun Jeong, Cordelia Schmid, Zeynep Akata, and Jungwoo Lee. 2023. Bridging the Gap Between Model Explanations in Partially Annotated Multi-Label Classification. In IEEE Conference on Computer Vision and Pattern Recognition. 3408--3417.
[19]
Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv:1412.6980 (2014).
[20]
Adriana Kovashka, Olga Russakovsky, Li Fei-Fei, Kristen Grauman, et al. 2016. Crowdsourcing in computer vision. Foundations and Trends® in Computer Graphics and Vision, Vol. 10, 3 (2016), 177--243.
[21]
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2017. Imagenet classification with deep convolutional neural networks. Commun. ACM, Vol. 60, 6 (2017), 84--90.
[22]
Alina Kuznetsova, Hassan Rom, Neil Alldrin, Jasper Uijlings, Ivan Krasin, Jordi Pont-Tuset, Shahab Kamali, Stefan Popov, Matteo Malloci, Tom Duerig, and Vittorio Ferrari. 2018. The Open Images Dataset V4: Unified image classification, object detection, and visual relationship detection at scale. arXiv:1811.00982 (2018).
[23]
Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. 2014. Microsoft coco: Common objects in context. In European Conference on Computer Vision. 740--755.
[24]
Biao Liu, Ning Xu, Jiaqi Lv, and Xin Geng. 2023. Revisiting pseudo-label for single-positive multi-label learning. In International Conference on Machine Learning. PMLR, 22249--22265.
[25]
Shilong Liu, Lei Zhang, Xiao Yang, Hang Su, and Jun Zhu. 2021. Query2label: A simple transformer way to multi-label classification. arXiv:2107.10834 (2021).
[26]
Ilya Loshchilov and Frank Hutter. 2017. Decoupled weight decay regularization. arXiv:1711.05101 (2017).
[27]
Tao Pu, Tianshui Chen, Hefeng Wu, and Liang Lin. 2022. Semantic-aware representation blending for multi-label image recognition with partial labels. In AAAI Conference on Artificial Intelligence. 2091--2098.
[28]
Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. 2021. Learning transferable visual models from natural language supervision. In International Conference on Machine Learning. PMLR, 8748--8763.
[29]
Tal Ridnik, Emanuel Ben-Baruch, Nadav Zamir, Asaf Noy, Itamar Friedman, Matan Protter, and Lihi Zelnik-Manor. 2021. Asymmetric loss for multi-label classification. In IEEE International Conference on Computer Vision. 82--91.
[30]
Tal Ridnik, Hussam Lawen, Asaf Noy, Emanuel Ben Baruch, Gilad Sharir, and Itamar Friedman. 2021. Tresnet: High performance gpu-dedicated architecture. In IEEE Winter Conference on Applications of Computer Vision. 1400--1409.
[31]
Tal Ridnik, Gilad Sharir, Avi Ben-Cohen, Emanuel Ben-Baruch, and Asaf Noy. 2023. Ml-decoder: Scalable and versatile classification head. In IEEE Winter Conference on Applications of Computer Vision. 32--41.
[32]
Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556 (2014).
[33]
Leslie N Smith. 2018. A disciplined approach to neural network hyper-parameters: Part 1-learning rate, batch size, momentum, and weight decay. arXiv:1803.09820 (2018).
[34]
Ximeng Sun, Ping Hu, and Kate Saenko. 2022. Dualcoop: Fast adaptation to multi-label recognition with limited annotations. In Advances in Neural Information Processing Systems. 30569--30582.
[35]
Antti Tarvainen and Harri Valpola. 2017. Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. In Advances in Neural Information Processing Systems, Vol. 30.
[36]
Ao Wang, Hui Chen, Zijia Lin, Zixuan Ding, Pengzhang Liu, Yongjun Bao, Weipeng Yan, and Guiguang Ding. 2023. Hierarchical prompt learning using clip for multi-label classification with single positive labels. In Proceedings of the 31st ACM International Conference on Multimedia. 5594--5604.
[37]
Jeremy M Wolfe, Todd S Horowitz, and Naomi M Kenner. 2005. Rare items often missed in visual searches. Nature, Vol. 435, 7041 (2005), 439--440.
[38]
Tong Wu, Qingqiu Huang, Ziwei Liu, Yu Wang, and Dahua Lin. 2020. Distribution-balanced loss for multi-label classification in long-tailed datasets. In European Conference on Computer Vision. 162--178.
[39]
Xiangping Wu, Qingcai Chen, Wei Li, Yulun Xiao, and Baotian Hu. 2020. AdaHGNN: Adaptive hypergraph neural networks for multi-label image classification. In Proceedings of the 28th ACM International Conference on Multimedia. 284--293.
[40]
Ming-Kun Xie, Feng Sun, and Sheng-Jun Huang. 2021. Partial multi-label learning with meta disambiguation. In Proceedings of the 27th ACM SIGKDD conference on knowledge discovery & data mining. 1904--1912.
[41]
Xin Xing, Zhexiao Xiong, Abby Stylianou, Srikumar Sastry, Liyu Gong, and Nathan Jacobs. 2024. Vision-Language Pseudo-Labels for Single-Positive Multi-Label Learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 7799--7808.
[42]
Hao Yang, Joey Tianyi Zhou, Yu Zhang, Bin-Bin Gao, Jianxin Wu, and Jianfei Cai. 2016. Exploit bounding box annotations for multi-label object recognition. In IEEE Conference on Computer Vision and Pattern Recognition. 280--288.
[43]
Sangdoo Yun, Seong Joon Oh, Byeongho Heo, Dongyoon Han, Junsuk Choe, and Sanghyuk Chun. 2021. Re-labeling imagenet: from single to multi-labels, from global to localized labels. In IEEE Conference on Computer Vision and Pattern Recognition. 2340--2350.
[44]
Jiawei Zhan, Jun Liu, Wei Tang, Guannan Jiang, Xi Wang, Bin-Bin Gao, Tianliang Zhang, Wenlong Wu, Wei Zhang, Chengjie Wang, et al. 2022. Global Meets Local: Effective Multi-Label Image Classification via Category-Aware Weak Supervision. In Proceedings of the 30th ACM International Conference on Multimedia. 6318--6326.
[45]
Bowen Zhang, Yidong Wang, Wenxin Hou, Hao Wu, Jindong Wang, Manabu Okumura, and Takahiro Shinozaki. 2021. Flexmatch: Boosting semi-supervised learning with curriculum pseudo labeling. In Advances in Neural Information Processing Systems. 18408--18419.
[46]
Youcai Zhang, Yuhao Cheng, Xinyu Huang, Fei Wen, Rui Feng, Yaqian Li, and Yandong Guo. 2021. Simple and Robust Loss Design for Multi-Label Learning with Missing Labels. arXiv:2112.07368 (2021).

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
MM '24: Proceedings of the 32nd ACM International Conference on Multimedia
October 2024
11719 pages
ISBN:9798400706868
DOI:10.1145/3664647
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 October 2024

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. missing labels
  2. multi-label classification
  3. pseudo labels

Qualifiers

  • Research-article

Conference

MM '24
Sponsor:
MM '24: The 32nd ACM International Conference on Multimedia
October 28 - November 1, 2024
Melbourne VIC, Australia

Acceptance Rates

MM '24 Paper Acceptance Rate 1,150 of 4,385 submissions, 26%;
Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 46
    Total Downloads
  • Downloads (Last 12 months)46
  • Downloads (Last 6 weeks)13
Reflects downloads up to 14 Jan 2025

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media