research-article

Open access

Uncertainty Quantification for Extreme Classification

Authors:

Wei-Cheng Chang,

Hsiang-Fu YuAuthors Info & Claims

SIGIR '23: Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval

Pages 1649 - 1659

https://doi.org/10.1145/3539618.3591780

Published: 18 July 2023 Publication History

Abstract

Uncertainty quantification is one of the most crucial tasks to obtain trustworthy and reliable machine learning models for decision making. However, most research in this domain has only focused on problems with small label spaces and ignored eXtreme Multi-label Classification (XMC), which is an essential task in the era of big data for web-scale machine learning applications. Moreover, enormous label spaces could also lead to noisy retrieval results and intractable computational challenges for uncertainty quantification. In this paper, we aim to investigate general uncertainty quantification approaches for tree-based XMC models with a probabilistic ensemble-based framework. In particular, we analyze label-level and instance-level uncertainty in XMC, and propose a general approximation framework based on beam search to efficiently estimate the uncertainty with a theoretical guarantee under long-tail XMC predictions. Empirical studies on six large-scale real-world datasets show that our framework not only outperforms single models in predictive performance, but also can serve as strong uncertainty-based baselines for label misclassification and out-of-distribution detection, with significant speedup. Besides, our framework can further yield better state-of-the-art results based on deep XMC models with uncertainty quantification.

References

[1]

Moloud Abdar, Farhad Pourpanah, Sadiq Hussain, Dana Rezazadegan, Li Liu, Mohammad Ghavamzadeh, Paul Fieguth, Xiaochun Cao, Abbas Khosravi, U Rajendra Acharya, et al. 2021. A review of uncertainty quantification in deep learning: Techniques, applications and challenges. Information Fusion (2021).

[2]

Mario Almagro, Raquel Mart'inez Unanue, Victor Fresno, and Soto Montalvo. 2020. ICD-10 coding of Spanish electronic discharge summaries: an extreme classification problem. IEEE Access, Vol. 8 (2020), 100073--100083.

[3]

Rohit Babbar and Bernhard Schölkopf. 2017. DiSMEC: Distributed sparse machines for extreme multi-label classification. In Proceedings of the Tenth ACM International Conference on Web Search and Data Mining. 721--729.

Digital Library

[4]

Rohit Babbar and Bernhard Schölkopf. 2019. Data scarcity, robustness and extreme multi-label classification. Machine Learning, Vol. 108, 8 (2019), 1329--1351.

Digital Library

[5]

Tal Baumel, Jumana Nassour-Kassis, Raphael Cohen, Michael Elhadad, and Noémie Elhadad. 2018. Multi-label classification of patient notes: case study on ICD code assignment. In Workshops at the thirty-second AAAI conference on artificial intelligence.

[6]

José M Bernardo and Adrian FM Smith. 2009. Bayesian theory. Vol. 405. John Wiley & Sons.

[7]

Kush Bhatia, Himanshu Jain, Purushottam Kar, Manik Varma, and Prateek Jain. 2015. Sparse Local Embeddings for Extreme Multi-label Classification. In NIPS, Vol. 29. 730--738.

Digital Library

[8]

Leo Breiman. 1996. Bagging predictors. Machine learning, Vol. 24, 2 (1996), 123--140.

[9]

Wei-Cheng Chang, Daniel Jiang, Hsiang-Fu Yu, Choon Hui Teo, Jiong Zhang, Kai Zhong, Kedarnath Kolluri, Qie Hu, Nikhil Shandilya, Vyacheslav Ievgrafov, et al. 2021. Extreme multi-label learning for semantic matching in product search. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. 2643--2651.

Digital Library

[10]

Wei-Cheng Chang, Hsiang-Fu Yu, Kai Zhong, Yiming Yang, and Inderjit S Dhillon. 2020. Taming Pretrained Transformers for Extreme Multi-label Text Classification. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 3163--3171.

Digital Library

[11]

Bertrand Charpentier, Daniel Zügner, and Stephan Günnemann. 2020. Posterior Network: Uncertainty Estimation without OOD Samples via Density-Based Pseudo-Counts. Advances in Neural Information Processing Systems, Vol. 33 (2020), 1356--1367.

[12]

Hugh A Chipman, Edward I George, and Robert E McCulloch. 2007. Bayesian ensemble learning. Advances in neural information processing systems, Vol. 19 (2007), 265.

[13]

Lavsen Dahal, Aayush Kafle, and Bishesh Khanal. 2020. Uncertainty Estimation in Deep 2D Echocardiography Segmentation. arXiv preprint arXiv:2005.09349 (2020).

[14]

Emily Denton, Jason Weston, Manohar Paluri, Lubomir Bourdev, and Rob Fergus. 2015. User conditional hashtag prediction for images. In Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining. 1731--1740.

Digital Library

[15]

Stefan Depeweg, Jose-Miguel Hernandez-Lobato, Finale Doshi-Velez, and Steffen Udluft. 2018. Decomposition of uncertainty in Bayesian deep learning for efficient and risk-sensitive learning. In International Conference on Machine Learning. PMLR, 1184--1193.

[16]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, 4171--4186.

[17]

Pedro M Domingos. 1997. Why Does Bagging Work? A Bayesian Account and its Implications. In KDD. Citeseer, 155--158.

[18]

Tony Duan, Avati Anand, Daisy Yi Ding, Khanh K Thai, Sanjay Basu, Andrew Ng, and Alejandro Schuler. 2020. NGBoost: Natural gradient boosting for probabilistic prediction. In International Conference on Machine Learning. PMLR, 2690--2700.

[19]

Yarin Gal. 2016. Uncertainty in Deep Learning. Ph.,D. Dissertation. University of Cambridge.

[20]

Yarin Gal and Zoubin Ghahramani. 2016. Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In international conference on machine learning. PMLR, 1050--1059.

[21]

Yasser Ganjisaffar, Rich Caruana, and Cristina Videira Lopes. 2011. Bagging gradient-boosted trees for high precision, low variance ranking models. In Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval. 85--94.

Digital Library

[22]

Zoubin Ghahramani. 2015. Probabilistic machine learning and artificial intelligence. Nature, Vol. 521, 7553 (2015), 452--459.

[23]

Dan Hendrycks and Kevin Gimpel. 2016. A baseline for detecting misclassified and out-of-distribution examples in neural networks. arXiv preprint arXiv:1610.02136 (2016).

[24]

Himanshu Jain, Yashoteja Prabhu, and Manik Varma. 2016. Extreme multi-label loss functions for recommendation, tagging, ranking & other missing label applications. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 935--944.

Digital Library

[25]

Ting Jiang, Deqing Wang, Leilei Sun, Huayi Yang, Zhengyang Zhao, and Fuzhen Zhuang. 2021. LightXML: Transformer with Dynamic Negative Sampling for High-Performance Extreme Multi-label Text Classification. In Proceedings of the AAAI Conference on Artificial Intelligence. 7987--7994.

[26]

Sujay Khandagale, Han Xiao, and Rohit Babbar. 2020. Bonsai: diverse and shallow trees for extreme multi-label classification. Machine Learning, Vol. 109, 11 (2020), 2099--2119.

Digital Library

[27]

Andreas Kirsch, Joost Van Amersfoort, and Yarin Gal. 2019. BatchBALD: Efficient and diverse batch acquisition for deep bayesian active learning. Advances in neural information processing systems, Vol. 32 (2019), 7026--7037.

[28]

Abhishek Kumar, Shankar Vembu, Aditya Krishna Menon, and Charles Elkan. 2013. Beam search algorithms for multilabel learning. Machine learning, Vol. 92, 1 (2013), 65--89.

[29]

Balaji Lakshminarayanan, Alexander Pritzel, and Charles Blundell. 2017. Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles. Advances in Neural Information Processing Systems, Vol. 30 (2017).

[30]

Alex M Lamb, Anirudh Goyal Alias Parth Goyal, Ying Zhang, Saizheng Zhang, Aaron C Courville, and Yoshua Bengio. 2016. Professor forcing: A new algorithm for training recurrent networks. In NeurIPS. 4601--4609.

[31]

Yann LeCun, Sumit Chopra, Raia Hadsell, M Ranzato, and F Huang. 2006. A tutorial on energy-based learning. Predicting structured data, Vol. 1, 0 (2006).

[32]

Jingzhou Liu, Wei-Cheng Chang, Yuexin Wu, and Yiming Yang. 2017. Deep learning for extreme multi-label text classification. In Proceedings of the 40th international ACM SIGIR conference on research and development in information retrieval. 115--124.

Digital Library

[33]

Jeremiah Zhe Liu. 2019. Variable selection with rigorous uncertainty quantification using bayesian deep neural networks. In Bayesian Deep Learning Workshop at NeurIPS.

[34]

Weitang Liu, Xiaoyun Wang, John Owens, and Yixuan Li. 2020. Energy-based Out-of-distribution Detection. Advances in Neural Information Processing Systems, Vol. 33 (2020).

[35]

Xuanqing Liu, Wei-Cheng Chang, Hsiang-Fu Yu, Cho-Jui Hsieh, and Inderjit Dhillon. 2021. Label disentanglement in partition-based extreme multilabel classification. Advances in Neural Information Processing Systems, Vol. 34 (2021).

[36]

Andrey Malinin. 2019. Uncertainty estimation in deep learning with application to spoken language assessment. Ph.,D. Dissertation. University of Cambridge.

[37]

Andrey Malinin and Mark Gales. 2020. Uncertainty Estimation in Autoregressive Structured Prediction. In International Conference on Learning Representations.

[38]

Andrey Malinin, Liudmila Prokhorenkova, and Aleksei Ustimenko. 2020. Uncertainty in Gradient Boosting via Ensembles. In International Conference on Learning Representations.

[39]

Yashoteja Prabhu, Anil Kag, Shrutendra Harsola, Rahul Agrawal, and Manik Varma. 2018. Parabel: Partitioned label trees for extreme classification with application to dynamic search advertising. In Proceedings of the 2018 World Wide Web Conference. 993--1002.

Digital Library

[40]

Abhijit Guha Roy, Sailesh Conjeti, Nassir Navab, Christian Wachinger, Alzheimer's Disease Neuroimaging Initiative, et al. 2019. Bayesian QuickNAT: Model uncertainty in deep whole-brain segmentation for structure-wise quality control. NeuroImage, Vol. 195 (2019), 11--22.

[41]

Wang Ruo-Peng and Xu Hong-Min. 2009. A smoothing function for 1-norm support vector machines. In 2009 Fifth International Conference on Natural Computation, Vol. 1. IEEE, 450--454.

Digital Library

[42]

Mohammad Hossein Shaker and Eyke Hüllermeier. 2020. Aleatoric and epistemic uncertainty with random forests. arXiv preprint arXiv:2001.00893 (2020).

[43]

Artem Shelmanov, Evgenii Tsymbalov, Dmitri Puzyrev, Kirill Fedyanin, Alexander Panchenko, and Maxim Panov. 2021. How Certain is Your Transformer?. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume. 1833--1840.

[44]

Yukihiro Tagami. 2017. AnnexML: Approximate nearest neighbor search for extreme multi-label classification. In Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining. 455--464.

Digital Library

[45]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in Neural Information Processing Systems.

[46]

Warren E Walker, Poul Harremoës, Jan Rotmans, Jeroen P Van Der Sluijs, Marjolein BA Van Asselt, Peter Janssen, and Martin P Krayer von Krauss. 2003. Defining uncertainty: a conceptual basis for uncertainty management in model-based decision support. Integrated assessment, Vol. 4, 1 (2003), 5--17.

[47]

Haoran Wang, Weitang Liu, Alex Bocchieri, and Yixuan Li. 2021. Can multi-label classification networks know what they don't know? Advances in Neural Information Processing Systems, Vol. 34 (2021).

[48]

Alfred Wehrl. 1978. General properties of entropy. Reviews of Modern Physics, Vol. 50, 2 (1978), 221.

[49]

Shunyao Wu, Yuzhu Chen, Zhiruo Li, Jian Li, Fengyang Zhao, and Xiaoquan Su. 2021. Towards multi-label classification: Next step of machine learning for microbiome research Computational and Structural Biotechnology Journal (2021).

[50]

Marek Wydmuch, Kalina Jasinska, Mikhail Kuznetsov, Róbert Busa-Fekete, and Krzysztof Dembczyński. 2018. A no-regret generalization of hierarchical softmax to extreme multi-label classification. In Proceedings of the 32nd International Conference on Neural Information Processing Systems. 6358--6368.

[51]

Ian En-Hsu Yen, Xiangru Huang, Pradeep Ravikumar, Kai Zhong, and Inderjit Dhillon. 2016. PD-Sparse: A primal and dual sparse approach to extreme multiclass and multilabel classification. In International conference on machine learning. PMLR, 3069--3077.

[52]

Ronghui You, Zihan Zhang, Ziye Wang, Suyang Dai, Hiroshi Mamitsuka, and Shanfeng Zhu. 2019. AttentionXML: Label Tree-based Attention-Aware ee Model for High-Performance Extreme Multi-Label Text Classification. Advances in Neural Information Processing Systems, Vol. 32 (2019), 5820--5830.

[53]

Hsiang-Fu Yu, Prateek Jain, Purushottam Kar, and Inderjit Dhillon. 2014. Large-scale multi-label learning with missing labels. In International conference on machine learning. PMLR, 593--601.

[54]

Hsiang-Fu Yu, Kai Zhong, Jiong Zhang, Wei-Cheng Chang, and Inderjit S Dhillon. 2020. PECOS: Prediction for enormous and correlated output spaces. arXiv preprint arXiv:2010.05878 (2020).

[55]

Jiong Zhang, Wei-Cheng Chang, Hsiang-Fu Yu, and Inderjit S Dhillon. 2021. Fast Multi-Resolution Transformer Fine-tuning for Extreme Multi-label Text Classification. In Advances in Neural Information Processing Systems.

[56]

Jize Zhang, Bhavya Kailkhura, and T Yong-Jin Han. 2020. Mix-n-match: Ensemble and compositional methods for uncertainty calibration in deep learning. In International Conference on Machine Learning. PMLR, 11117--11128.

[57]

Wenbin Zheng, Xiaping Fu, and Yibin Ying. 2014. Spectroscopy-based food classification with extreme learning machine. Chemometrics and Intelligent Laboratory Systems, Vol. 139 (2014), 42--47.

Index Terms

Uncertainty Quantification for Extreme Classification
1. Computing methodologies
  1. Machine learning
2. Information systems
  1. Information retrieval

Recommendations

Uncertainty Quantification for Text Classification
SIGIR '23: Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval

This full-day tutorial introduces modern techniques for practical uncertainty quantification specifically in the context of multi-class and multi-label text classification. First, we explain the usefulness of estimating aleatoric uncertainty and ...
Numerical approach for quantification of epistemic uncertainty

In the field of uncertainty quantification, uncertainty in the governing equations may assume two forms: aleatory uncertainty and epistemic uncertainty. Aleatory uncertainty can be characterised by known probability distributions whilst epistemic ...
Uncertainty Quantification for Text Classification
Advances in Information Retrieval
Abstract
This half-day tutorial introduces modern techniques for practical uncertainty quantification specifically in the context of multi-class and multi-label text classification. First, we explain the usefulness of estimating aleatoric uncertainty and ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SIGIR '23: Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval

July 2023

3567 pages

ISBN:9781450394086

DOI:10.1145/3539618

General Chairs:
Hsin-Hsi Chen
National Taiwan University
,
Wei-Jou (Edward) Duh
National Taiwan University
,
Hen-Hsen Huang
Academia Sinica
,
Program Chairs:
Makoto P. Kato
Spotify
,
Josiane Mothe
Universite de Toulouse
,
Barbara Poblete
University of Chile and Amazon Visiting Academic

Copyright © 2023 Owner/Author.

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives International 4.0 License.

Sponsors

SIGIR: ACM Special Interest Group on Information Retrieval

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 July 2023

Check for updates

Author Tags

Qualifiers

Research-article

Conference

SIGIR '23

Sponsor:

SIGIR

SIGIR '23: The 46th International ACM SIGIR Conference on Research and Development in Information Retrieval

July 23 - 27, 2023

Taipei, Taiwan

Acceptance Rates

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
194
Total Downloads

Downloads (Last 12 months)172
Downloads (Last 6 weeks)17

Reflects downloads up to 09 Aug 2024

Other Metrics

View Author Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents