research-article

PAC-Bayes generalization certificates for learned inductive conformal prediction

AUTHORs:

Apoorva Sharma,

Anirudha MajumdarAuthors Info & Claims

NIPS '23: Proceedings of the 37th International Conference on Neural Information Processing Systems

Article No.: 2028, Pages 46807 - 46829

Published: 30 May 2024 Publication History

Abstract

Inductive Conformal Prediction (ICP) provides a practical and effective approach for equipping deep learning models with uncertainty estimates in the form of set-valued predictions which are guaranteed to contain the ground truth with high probability. Despite the appeal of this coverage guarantee, these sets may not be efficient: the size and contents of the prediction sets are not directly controlled, and instead depend on the underlying model and choice of score function. To remedy this, recent work has proposed learning model and score function parameters using data to directly optimize the efficiency of the ICP prediction sets. While appealing, the generalization theory for such an approach is lacking: direct optimization of empirical efficiency may yield prediction sets that are either no longer efficient on test data, or no longer obtain the required coverage on test data. In this work, we use PAC-Bayes theory to obtain generalization bounds on both the coverage and the efficiency of set-valued predictors which can be directly optimized to maximize efficiency while satisfying a desired test coverage. In contrast to prior work, our framework allows us to utilize the entire calibration dataset to learn the parameters of the model and score function, instead of requiring a separate hold-out set for obtaining test-time coverage guarantees. We leverage these theoretical results to provide a practical algorithm for using calibration data to simultaneously fine-tune the parameters of a model and score function while guaranteeing test-time coverage and efficiency of the resulting prediction sets. We evaluate the approach on regression and classification tasks, and outperform baselines calibrated using a Hoeffding bound-based PAC guarantee on ICP, especially in the low-data regime.

References

[1]

Md Manjurul Ahsan, Shahana Akter Luna, and Zahed Siddique. Machine-learning-based disease diagnosis: A comprehensive review. Healthcare, 10(3):541, 2022.

[2]

Anastasios N Angelopoulos and Stephen Bates. A gentle introduction to conformal prediction and distribution-free uncertainty quantification. arXiv:2107.07511, 2021.

[3]

Sanjeev Arora, Rong Ge, Behnam Neyshabur, and Yi Zhang. Stronger generalization bounds for deep nets via a compression approach. In Int. Conf. on Machine Learning, pages 254-263, 2018.

[4]

Yu Bai, Song Mei, Huan Wang, Yingbo Zhou, and Caiming Xiong. Efficient and differentiable conformal prediction with general function classes. In Int. Conf. on Learning Representations, 2022.

[5]

Peter L. Bartlett, Dylan J. Foster, and Matus J Telgarsky. Spectrally-Normalized Margin Bounds for Neural Networks. In Conf. on Neural Information Processing Systems, volume 30, 2017.

[6]

Anselm Blumer, Andrzej Ehrenfeucht, David Haussler, and Manfred K Warmuth. Occam's razor. Information Processing Letters, 24(6):377-380, 1987.

[7]

Matthew Cleaveland, Insup Lee, George J Pappas, and Lars Lindemann. Conformal prediction regions for time series using linear complementarity programming. arXiv:2304.01075, 2023.

[8]

Marco Cuturi, Olivier Teboul, and Jean-Philippe Vert. Differentiable ranking and sorting using optimal transport. In Conf. on Neural Information Processing Systems, volume 32, 2019.

[9]

Monroe D Donsker and SR Srinivasa Varadhan. Asymptotic evaluation of certain markov process expectations for large time. iv. Communications on Pure and Applied Mathematics, 36(2):183-212, 1983.

[10]

Gintare Karolina Dziugaite and Daniel M. Roy. Computing Nonvacuous Generalization Bounds for Deep (Stochastic) Neural Networks with Many More Parameters than Training Data. In Proc. Conf. on Uncertainty in Artificial Intelligence, 2017.

[11]

Gintare Karolina Dziugaite and Daniel M Roy. Data-dependent pac-bayes priors via differential privacy. In Conf. on Neural Information Processing Systems, volume 31, 2018.

[12]

Bat-Sheva Einbinder, Yaniv Romano, Matteo Sesia, and Yanfei Zhou. Training uncertainty-aware classifiers with conformalized deep learning. In Conf. on Neural Information Processing Systems, volume 35, 2022.

[13]

Mahdi Milani Fard, Joelle Pineau, and Csaba Szepesvári. Pac-bayesian policy evaluation for reinforcement learning. arXiv:1202.3717, 2012.

[14]

Aditya Grover, Eric Wang, Aaron Zweig, and Stefano Ermon. Stochastic optimization of sorting networks via continuous relaxations. In Int. Conf. on Learning Representations, 2019.

[15]

Yiding Jiang, Behnam Neyshabur, Hossein Mobahi, Dilip Krishnan, and Samy Bengio. Fantastic Generalization Measures and Where to Find Them. In Int. Conf. on Learning Representations, 2020.

[16]

Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278-2324, 1998.

[17]

Lars Lindemann, Matthew Cleaveland, Gihyun Shim, and George J Pappas. Safe planning in dynamic environments using conformal prediction. IEEE Robotics and Automation Letters, 8, 2023.

[18]

Sanae Lotfi, Marc Finzi, Sanyam Kapoor, Andres Potapczynski, Micah Goldblum, and Andrew G Wilson. PAC-Bayes compression bounds so tight that they can explain generalization. In Conf. on Neural Information Processing Systems, volume 35, 2022.

[19]

Anirudha Majumdar, Alec Farid, and Anoopkumar Sonar. PAC-Bayes control: learning policies that provably generalize to novel environments. Int. Journal of Robotics Research, 40(2-3):574-593, 2021.

Digital Library

[20]

Andreas Maurer. A note on the pac bayesian theorem. arXiv:cs/0411099, 2004.

[21]

David A McAllester. Some PAC-Bayesian theorems. In Proc. Computational Learning Theory, pages 230-234, 1998.

Digital Library

[22]

Behnam Neyshabur, Srinadh Bhojanapalli, David McAllester, and Nathan Srebro. A PAC-Bayesian Approach to Spectrally-Normalized Margin Bounds for Neural Networks. arXiv:1707.09564, 2017a.

[23]

Behnam Neyshabur, Srinadh Bhojanapalli, David McAllester, and Nati Srebro. Exploring Generalization in Deep Learning. In Conf. on Neural Information Processing Systems, volume 30, 2017b.

[24]

Yaniv Ovadia, Emily Fertig, Jie Ren, Zachary Nado, David Sculley, Sebastian Nowozin, Joshua Dillon, Balaji Lakshminarayanan, and Jasper Snoek. Can you trust your model's uncertainty? evaluating predictive uncertainty under dataset shift. Conf. on Neural Information Processing Systems, 32, 2019.

[25]

Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et al. Pytorch: An imperative style, high-performance deep learning library. In Conf. on Neural Information Processing Systems, volume 32, 2019.

[26]

Maria Perez-Ortiz, Omar Rivasplata, Benjamin Guedj, Matthew Gleeson, Jingyu Zhang, John Shawe-Taylor, Miroslaw Bober, and Josef Kittler. Learning pac-bayes priors for probabilistic neural networks. arXiv:2109.10304, 2021.

[27]

Maria Pérez-Ortiz, Omar Rivasplata, John Shawe-Taylor, and Csaba Szepesvári. Tighter risk certificates for neural networks. Journal of Machine Learning Research, 22:10326-10365, 2021.

Digital Library

[28]

Allen Ren, Sushant Veer, and Anirudha Majumdar. Generalization guarantees for imitation learning. In Conf. on Robot Learning, pages 1426-1442, 2021.

[29]

Jorma Rissanen. Stochastic Complexity in Statistical Inquiry. World Scientific Publishing, 1989.

Digital Library

[30]

Omar Rivasplata, Vikram M Tankasali, and Csaba Szepesvari. Pac-bayes with backprop. arXiv:1908.07380, 2019.

[31]

Yaniv Romano, Evan Patterson, and Emmanuel Candes. Conformalized quantile regression. In Conf. on Neural Information Processing Systems, volume 32, 2019.

[32]

Yaniv Romano, Matteo Sesia, and Emmanuel Candes. Classification with valid and adaptive coverage. In Conf. on Neural Information Processing Systems, volume 33, 2020.

[33]

Mauricio Sadinle, Jing Lei, and Larry Wasserman. Least ambiguous set-valued classifiers with bounded error levels. Journal of the American Statistical Association, 114(525):223-234, 2019.

[34]

Wilko Schwarting, Javier Alonso-Mora, and Daniela Rus. Planning and decision-making for autonomous vehicles. Annual Review of Control, Robotics, and Autonomous Systems, 1(1):187-210, 2018.

[35]

Matthias Seeger, John Langford, and Nimrod Megiddo. An improved predictive accuracy bound for averaging classifiers. In Int. Conf. on Machine Learning, pages 290-297, 2001.

[36]

Shai Shalev-Shwartz and Shai Ben-David. Understanding Machine Learning: From Theory to Algorithms. Cambridge Univ. Press, 2014.

[37]

Apoorva Sharma, Navid Azizan, and Marco Pavone. Sketching curvature for efficient out-of-distribution detection for deep neural networks. In Proc. Conf. on Uncertainty in Artificial Intelligence, pages 1958-1967, 2021.

[38]

David Stutz, Ali Taylan Cemgil, Arnaud Doucet, et al. Learning optimal conformai classifiers. In Int. Conf. on Learning Representations, 2021.

[39]

Jessica Vamathevan, Dominic Clark, Paul Czodrowski, Ian Dunham, Edgardo Ferran, George Lee, Bin Li, Anant Madabhushi, Parantu Shah, Michaela Spitzer, et al. Applications of machine learning in drug discovery and development. Nature Reviews Drug Discovery, 18(6):463-477, 2019.

[40]

Vladimir N Vapnik and A Ya Chervonenkis. On the uniform convergence of relative frequencies of events to their probabilities. Dokl. Akad. Nauk, 181(4), 1968.

[41]

Sushant Veer and Anirudha Majumdar. Probably approximately correct vision-based planning using motion primitives. In Conf. on Robot Learning, pages 1001-1014, 2020.

[42]

Paul Viallard, Pascal Germain, Amaury Habrard, and Emilie Morvant. A general framework for the practical disintegration of pac-bayesian bounds. Machine Learning, 2023.

[43]

Vladimir Vovk. Conditional validity of inductive conformal predictors. In Asian Conf. on Machine Learning, pages 475-490, 2012.

[44]

Vladimir Vovk, Alexander Gammerman, and Glenn Shafer. Algorithmic learning in a random world. Springer, 2005.

[45]

Waymo. The Waymo Driver Handbook: Teaching an autonomous vehicle how to perceive and understand the world around it, 2021. Available at https://blog.waymo.com/2021/10/the-waymo-driver-handbook-perception.html.

[46]

Omry Yadan. Hydra - a framework for elegantly configuring complex applications. Github, 2019. URL https://github.com/facebookresearch/hydra.

[47]

Heng Yang and Marco Pavone. Object pose estimation with statistical guarantees: Conformal keypoint detection and geometric uncertainty propagation. In IEEE Conf. on Computer Vision and Pattern Recognition, 2023.

[48]

Yachong Yang and Arun Kumar Kuchibhotla. Finite-sample efficient conformal prediction. arXiv:2104.13871, 2021.

Recommendations

How tight can PAC-Bayes be in the small data regime?
NIPS '21: Proceedings of the 35th International Conference on Neural Information Processing Systems

In this paper, we investigate the question: Given a small number of datapoints, for example N = 30, how tight can PAC-Bayes and test set bounds be made? For such small datasets, test set bounds adversely affect generalisation performance by withholding ...
Fast-rate PAC-bayes generalization bounds via shifted rademacher processes
NIPS'19: Proceedings of the 33rd International Conference on Neural Information Processing Systems

The developments of Rademacher complexity and PAC-Bayesian theory have been largely independent. One exception is the PAC-Bayes theorem of Kakade, Sridharan, and Tewari [21], which is established via Rademacher complexity theory by viewing Gibbs ...
User-friendly Introduction to PAC-Bayes Bounds

Aggregated predictors are obtained by making a set of basic predictors vote according to some weights, that is, to some probability distribution. Randomized predictors are obtained by sampling in a set of basic predictors, according to some prescribed ...

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings

NIPS '23: Proceedings of the 37th International Conference on Neural Information Processing Systems

December 2023

80772 pages

Copyright © 2023 Neural Information Processing Systems Foundation, Inc.

Publisher

Curran Associates Inc.

Red Hook, NY, United States

Publication History

Published: 30 May 2024

Qualifiers

Research-article
Research
Refereed limited

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 04 Oct 2024

Other Metrics

View Author Metrics

Citations

View Options

View options

Media

Figures

Other

Tables

View Table of Contents