research-article

Free access

A simple approach to improve single-model deep uncertainty via distance-awareness

AUTHORs:

Jeremiah Zhe Liu,

Ghassen Jerfel,

Balaji LakshminarayananAuthors Info & Claims

The Journal of Machine Learning Research, Volume 24, Issue 1

Article No.: 42, Pages 1667 - 1729

Published: 06 March 2024 Publication History

PDF eReader Publisher Site

Abstract

Accurate uncertainty quantification is a major challenge in deep learning, as neural networks can make overconfident errors and assign high confidence predictions to out-of-distribution (OOD) inputs. The most popular approaches to estimate predictive uncertainty in deep learning are methods that combine predictions from multiple neural networks, such as Bayesian neural networks (BNNs) and deep ensembles. However their practicality in real-time, industrial-scale applications are limited due to the high memory and computational cost. Furthermore, ensembles and BNNs do not necessarily fix all the issues with the underlying member networks. In this work, we study principled approaches to improve uncertainty property of a single network, based on a single, deterministic representation. By formalizing the uncertainty quantification as a minimax learning problem, we first identify distance awareness, i.e., the model's ability to quantify the distance of a testing example from the training data, as a necessary condition for a DNN to achieve high-quality (i.e., minimax optimal) uncertainty estimation. We then propose Spectral-normalized Neural Gaussian Process (SNGP), a simple method that improves the distance-awareness ability of modern DNNs with two simple changes: (1) applying spectral normalization to hidden weights to enforce bi-Lipschitz smoothness in representations and (2) replacing the last output layer with a Gaussian process layer. On a suite of vision and language understanding benchmarks and on modern architectures (Wide-ResNet and BERT), SNGP consistently outperforms other single-model approaches in prediction, calibration and out-of-domain detection. Furthermore, SNGP provides complementary benefits to popular techniques such as deep ensembles and data augmentation, making it a simple and scalable building block for probabilistic deep learning. Code is open-sourced at https://github.com/google/uncertainty-baselines.

References

[1]

Ittai Abraham, Yair Bartal, and Ofer Neiman. Advances in metric embedding theory. Advances in Mathematics, 228(6):3026-3126, 2011.

[2]

Dario Amodei, Chris Olah, Jacob Steinhardt, Paul Christiano, John Schulman, and Dan Mané. Concrete Problems in AI Safety. arXiv:1606.06565 [cs], June 2016.

[3]

Senjian An, Farid Boussaid, and Mohammed Bennamoun. How Can Deep Rectifier Networks Achieve Linear Separability and Preserve Distances? In International Conference on Machine Learning, pages 514-523, June 2015. ISSN: 1938-7228 Section: Machine Learning.

[4]

Cem Anil, James Lucas, and Roger Grosse. Sorting Out Lipschitz Function Approximation. In International Conference on Machine Learning, pages 291-301, May 2019. ISSN: 1938-7228 Section: Machine Learning.

[5]

Francis Bach. Breaking the Curse of Dimensionality with Convex Neural Networks. Journal of Machine Learning Research, 18(19):1-53, 2017. ISSN 1533-7928.

[6]

Peter Bartlett, Steven Evans, and Phil Long. Representing smooth functions as compositions of near-identity functions with implications for deep network optimization. arXiv, 2018.

[7]

Jens Behrmann, Will Grathwohl, Ricky T. Q. Chen, David Duvenaud, and Joern-Henrik Jacobsen. Invertible Residual Networks. In International Conference on Machine Learning, pages 573-582, May 2019. ISSN: 1938-7228 Section: Machine Learning.

[8]

Jens Behrmann, Paul Vicol, Kuan-Chieh Wang, Roger Grosse, and Joern-Henrik Jacobsen. Understanding and mitigating exploding inverses in invertible neural networks. In International Conference on Artificial Intelligence and Statistics, pages 1792-1800. PMLR, 2021.

[9]

Abhijit Bendale and Terrance E. Boult. Towards Open Set Deep Networks. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.

[10]

James O. Berger. Statistical Decision Theory and Bayesian Analysis. Springer Series in Statistics. Springer-Verlag, New York, 2 edition, 1985. ISBN 978-0-387-96098-2.

[11]

Christopher M. Bishop. Pattern Recognition and Machine Learning. Springer, New York, April 2011. ISBN 978-0-387-31073-2.

[12]

Avrim Blum. Random Projection, Margins, Kernels, and Feature-Selection. In Subspace, Latent Structure and Feature Selection, Lecture Notes in Computer Science, pages 52-68, Berlin, Heidelberg, 2006. Springer. ISBN 978-3-540-34138-3.

Digital Library

[13]

Charles Blundell, Yee Whye Teh, and Katherine A Heller. Bayesian rose trees. In Proceedings of the Twenty-Sixth Conference on Uncertainty in Artificial Intelligence, pages 65-72, 2010.

[14]

John Bradshaw, Alexander G. de G. Matthews, and Zoubin Ghahramani. Adversarial Examples, Uncertainty, and Transfer Testing Robustness in Gaussian Process Hybrid Deep Networks. arXiv:1707.02476 [stat], July 2017. arXiv: 1707.02476.

[15]

Jochen Bröcker. Reliability, sufficiency, and the decomposition of proper scores. Quarterly Journal of the Royal Meteorological Society: A journal of the atmospheric sciences, applied meteorology and physical oceanography, 135(643):1512-1519, 2009.

[16]

Roberto Calandra, Jan Peters, Carl E. Rasmussen, and Marc Peter Deisenroth. Manifold Gaussian Processes for regression. 2016 International Joint Conference on Neural Networks (IJCNN), 2016.

[17]

Daniel Cer, Mona Diab, Eneko Agirre, Inigo Lopez-Gazpio, and Lucia Specia. SemEval-2017 Task 1: Semantic Textual Similarity Multilingual and Crosslingual Focused Evaluation. In Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017), pages 1-14, Vancouver, Canada, August 2017. Association for Computational Linguistics.

[18]

Daniel Cer, Yinfei Yang, Sheng-yi Kong, Nan Hua, Nicole Limtiaco, Rhomni St John, Noah Constant, Mario Guajardo-Cespedes, Steve Yuan, Chris Tar, et al. Universal sentence encoder for english. In Proceedings of the 2018 conference on empirical methods in natural language processing: system demonstrations, pages 169-174, 2018.

[19]

Dhivya Chandrasekaran and Vijay Mago. Evolution of semantic similarity--a survey. ACM Computing Surveys (CSUR), 54(2):1-37, 2021.

[20]

Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. A simple framework for contrastive learning of visual representations. In International conference on machine learning, pages 1597-1607. PMLR, 2020a.

[21]

Xinlei Chen, Haoqi Fan, Ross Girshick, and Kaiming He. Improved baselines with momentum contrastive learning. arXiv preprint arXiv:2003.04297, 2020b.

[22]

Leena Chennuru Vankadara and Ulrike von Luxburg. Measures of distortion for machine learning. Advances in Neural Information Processing Systems, 31, 2018.

[23]

Artem Chernodub and Dimitri Nowicki. Norm-preserving Orthogonal Permutation Linear Unit Activation Functions (OPLU). arXiv:1604.02313 [cs], January 2017. arXiv: 1604.02313.

[24]

Krzysztof Choromanski, Mark Rowland, Tamas Sarlos, Vikas Sindhwani, Richard Turner, and Adrian Weller. The Geometry of Random Features. In International Conference on Artificial Intelligence and Statistics, pages 1-9, March 2018.

[25]

Ekin D Cubuk, Barret Zoph, Jonathon Shlens, and Quoc V Le. Randaugment: Practical automated data augmentation with a reduced search space. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pages 702-703, 2020.

[26]

Peng Cui, Zhijie Deng, Wenbo Hu, and Jun Zhu. Accurate and reliable forecasting using stochastic differential equations. arXiv preprint arXiv:2103.15041, 2021.

[27]

Andreas Damianou and Neil Lawrence. Deep Gaussian Processes. In Artificial Intelligence and Statistics, pages 207-215, April 2013.

[28]

Jean Daunizeau. Semi-analytical approximations to statistical moments of sigmoid and softmax mappings of normal variables. arXiv preprint arXiv:1703.00091, 2017.

[29]

Erik Daxberger, Agustinus Kristiadi, Alexander Immer, Runa Eschenhagen, Matthias Bauer, and Philipp Hennig. Laplace redux-effortless Bayesian deep learning. Advances in Neural Information Processing Systems, 34, 2021.

[30]

Morris H DeGroot and Mark J Schervish. Probability and statistics. Pearson Education, 2012.

[31]

Guillaume P. Dehaene. A deterministic and computable Bernstein-von Mises theorem. ArXiv, 2019.

[32]

John S. Denker and Yann LeCun. Transforming Neural-Net Output Levels to Probability Distributions. In Advances in Neural Information Processing Systems 3, pages 853-859. Morgan-Kaufmann, 1991.

[33]

Thomas Deselaers and Vittorio Ferrari. Visual and semantic similarity in imagenet. In CVPR 2011, pages 1777-1784. IEEE, 2011.

[34]

Laurent Dinh, David Krueger, and Yoshua Bengio. NICE: Non-linear Independent Components Estimation. arXiv:1410.8516 [cs], October 2014. arXiv: 1410.8516.

[35]

Laurent Dinh, Jascha Sohl-Dickstein, and Samy Bengio. Density estimation using Real NVP. arXiv:1605.08803 [cs, stat], May 2016. arXiv: 1605.08803.

[36]

Mike Dusenberry, Ghassen Jerfel, Yeming Wen, Yian Ma, Jasper Snoek, Katherine Heller, Balaji Lakshminarayanan, and Dustin Tran. Efficient and Scalable Bayesian Neural Nets with Rank-1 Factors. Proceedings of the International Conference on Machine Learning, 1, 2020.

[37]

Vincent Dutordoir, Nicolas Durrande, and James Hensman. Sparse gaussian processes with spherical harmonic features. In International Conference on Machine Learning, pages 2793-2802. PMLR, 2020.

[38]

Runa Eschenhagen, Erik Daxberger, Philipp Hennig, and Agustinus Kristiadi. Mixtures of laplace approximations for improved post-hoc uncertainty in deep learning. CoRR, abs/2111.03577, 2021.

[39]

Stanislav Fort, Huiyi Hu, and Balaji Lakshminarayanan. Deep Ensembles: A Loss Landscape Perspective. arXiv:1912.02757 [cs, stat], December 2019. arXiv: 1912.02757.

[40]

David Freedman. Wald Lecture: On the Bernstein-von Mises theorem with infinite-dimensional parameters. The Annals of Statistics, 27(4):1119-1141, August 1999. ISSN 0090-5364, 2168-8966.

[41]

Yarin Gal and Zoubin Ghahramani. Dropout As a Bayesian Approximation: Representing Model Uncertainty in Deep Learning. In Proceedings of the 33rd International Conference on International Conference on Machine Learning - Volume 48, ICML'16, pages 1050-1059, New York, NY, USA, 2016. JMLR.org.

[42]

Andrew Gelman, John B. Carlin, Hal S. Stern, David B. Dunson, Aki Vehtari, and Donald B. Rubin. Bayesian Data Analysis. Chapman and Hall/CRC, Boca Raton, 3 edition edition, November 2013. ISBN 978-1-4398-4095-5.

[43]

Tilmann Gneiting and Adrian E Raftery. Strictly Proper Scoring Rules, Prediction, and Estimation. Journal of the American Statistical Association, 102(477):359-378, March 2007. ISSN 0162-1459.

[44]

Tilmann Gneiting, Fadoua Balabdaoui, and Adrian E. Raftery. Probabilistic forecasts, calibration and sharpness. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 69(2): 243-268, April 2007. ISSN 1467-9868.

[45]

Henry Gouk, Eibe Frank, Bernhard Pfahringer, and Michael J Cree. Regularisation of neural networks by enforcing lipschitz continuity. Machine Learning, 110(2):393-416, 2021.

[46]

Peter D. Grünwald and A. Philip Dawid. Game theory, maximum entropy, minimum discrepancy and robust Bayesian decision theory. Annals of Statistics, 32(4):1367-1433, August 2004. ISSN 0090-5364, 2168-8966. Publisher: Institute of Mathematical Statistics.

[47]

Ishaan Gulrajani, Faruk Ahmed, Martin Arjovsky, Vincent Dumoulin, and Aaron Courville. Improved training of wasserstein GANs. In Proceedings of the 31st International Conference on Neural Information Processing Systems, NeurIPS'17, pages 5769-5779, Long Beach, California, USA, December 2017. Curran Associates Inc. ISBN 978-1-5108-6096-4.

[48]

Chuan Guo, Geoff Pleiss, Yu Sun, and Kilian Q. Weinberger. On Calibration of Modern Neural Networks. In International Conference on Machine Learning, pages 1321-1330, July 2017. ISSN: 1938-7228 Section: Machine Learning.

[49]

Abhishek Gupta, Alagan Anpalagan, Ling Guan, and Ahmed Shaharyar Khwaja. Deep learning for object detection and scene perception in self-driving cars: Survey, challenges, and open issues. Array, 10:100057, 2021. ISSN 2590-0056.

[50]

Danijar Hafner, Dustin Tran, Timothy Lillicrap, Alex Irpan, and James Davidson. Noise contrastive priors for functional uncertainty. In Uncertainty in Artificial Intelligence, pages 905-914. PMLR, 2020.

[51]

Kehang Han, Balaji Lakshminarayanan, and Jeremiah Zhe Liu. Reliable graph neural networks for drug discovery under distributional shift. In NeurIPS 2021 Workshop on Distribution Shifts: Connecting Methods and Applications, 2021.

[52]

Richard Harang and Ethan M Rudd. Towards principled uncertainty estimation for deep neural networks. arXiv preprint arXiv:1810.12278, 2018.

[53]

Tatsunori B Hashimoto, David Alvarez-Melis, and Tommi S Jaakkola. Word embeddings as metric recovery in semantic spaces. Transactions of the Association for Computational Linguistics, 4: 273-286, 2016.

[54]

Michael Hauser and Asok Ray. Principles of Riemannian geometry in neural networks. In Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017a.

[55]

Michael Hauser and Asok Ray. Principles of Riemannian Geometry in Neural Networks. In Advances in Neural Information Processing Systems 30, pages 2807-2816. Curran Associates, Inc., 2017b.

[56]

Matthias Hein, Maksym Andriushchenko, and Julian Bitterwolf. Why relu networks yield highconfidence predictions far away from the training data and how to mitigate the problem. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 41-50, 2019a.

[57]

Matthias Hein, Maksym Andriushchenko, and Julian Bitterwolf. Why ReLU Networks Yield High-Confidence Predictions Far Away From the Training Data and How to Mitigate the Problem. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 41-50, 2019b.

[58]

Dan Hendrycks and Thomas Dietterich. Benchmarking neural network robustness to common corruptions and perturbations. In International Conference on Learning Representations, 2018.

[59]

Dan Hendrycks and Kevin Gimpel. A Baseline for Detecting Misclassified and Out-of-Distribution Examples in Neural Networks. In International Conference on Learning Representations, Toulon, France, April 2017.

[60]

Dan Hendrycks, Mantas Mazeika, and Thomas Dietterich. Deep anomaly detection with outlier exposure. In International Conference on Learning Representations, 2018.

[61]

Dan Hendrycks, Kimin Lee, and Mantas Mazeika. Using Pre-Training Can Improve Model Robustness and Uncertainty. In International Conference on Machine Learning, pages 2712-2721, May 2019. ISSN: 1938-7228 Section: Machine Learning.

[62]

Dan Hendrycks*, Norman Mu*, Ekin Dogus Cubuk, Barret Zoph, Justin Gilmer, and Balaji Lakshminarayanan. AugMix: A Simple Method to Improve Robustness and Uncertainty under Data Shift. In International Conference on Learning Representations, 2020.

[63]

James Hensman, Alexander Matthews, and Zoubin Ghahramani. Scalable variational Gaussian process classification. In Artificial Intelligence and Statistics, pages 351-360. PMLR, 2015.

[64]

Irina Higgins, David Amos, David Pfau, Sebastien Racaniere, Loic Matthey, Danilo Rezende, and Alexander Lerchner. Towards a definition of disentangled representations. arXiv preprint arXiv:1812.02230, 2018.

[65]

Marius Hobbhahn, Agustinus Kristiadi, and Philipp Hennig. Fast predictive uncertainty for classification with bayesian deep networks. In Uncertainty in Artificial Intelligence, pages 822-832. PMLR, 2022.

[66]

Sergey Ioffe and Christian Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International conference on machine learning, pages 448-456. PMLR, 2015.

[67]

Joern-Henrik Jacobsen, Jens Behrmann, Richard Zemel, and Matthias Bethge. Excessive invariance causes adversarial vulnerability. In International Conference on Learning Representations, 2019a.

[68]

Jörn-Henrik Jacobsen, Jens Behrmannn, Nicholas Carlini, Florian Tramer, and Nicolas Papernot. Exploiting excessive invariance caused by norm-bounded adversarial robustness. arXiv preprint arXiv:1903.10484, 2019b.

[69]

Jörn-Henrik Jacobsen, Arnold W.M. Smeulders, and Edouard Oyallon. i-RevNet: Deep invertible networks. In International Conference on Learning Representations, 2018.

[70]

John Jumper, Richard Evans, Alexander Pritzel, Tim Green, Michael Figurnov, Olaf Ronneberger, Kathryn Tunyasuvunakool, Russ Bates, Augustin Žídek, Anna Potapenko, et al. Highly accurate protein structure prediction with AlphaFold. Nature, 596(7873):583-589, 2021.

[71]

Mohammad Emtiyaz E Khan, Alexander Immer, Ehsan Abedi, and Maciej Korzepa. Approximate Inference Turns Deep Networks into Gaussian Processes. In Advances in Neural Information Processing Systems 32, pages 3094-3104. Curran Associates, Inc., 2019.

[72]

Ilyes Khemakhem, Diederik Kingma, Ricardo Monti, and Aapo Hyvarinen. Variational autoencoders and nonlinear ica: A unifying framework. In International Conference on Artificial Intelligence and Statistics, pages 2207-2217. PMLR, 2020.

[73]

Ian Kivlichan, Zi Lin, Jeremiah Liu, and Lucy Vasserman. Measuring and improving model-moderator collaboration using uncertainty estimation. In Proceedings of the 5th Workshop on Online Abuse and Harms (WOAH 2021), pages 36-53, Online, August 2021. Association for Computational Linguistics.

[74]

Agustinus Kristiadi, Matthias Hein, and Philipp Hennig. Being bayesian, even just a bit, fixes overconfidence in relu networks. In International conference on machine learning, pages 5436- 5446. PMLR, 2020.

[75]

Agustinus Kristiadi, Matthias Hein, and Philipp Hennig. Learnable uncertainty under Laplace approximations. In Proceedings of the Thirty-Seventh Conference on Uncertainty in Artificial Intelligence, volume 161 of Proceedings of Machine Learning Research, pages 344-353. PMLR, 27-30 Jul 2021.

[76]

Agustinus Kristiadi, Matthias Hein, and Philipp Hennig. Being a bit frequentist improves bayesian neural networks. In International Conference on Artificial Intelligence and Statistics, pages 529-545. PMLR, 2022.

[77]

Frederik Kunstner, Philipp Hennig, and Lukas Balles. Limitations of the empirical fisher approximation for natural gradient descent. Advances in neural information processing systems, 32, 2019.

[78]

Balaji Lakshminarayanan, Alexander Pritzel, and Charles Blundell. Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles. In Advances in Neural Information Processing Systems 30, pages 6402-6413. Curran Associates, Inc., 2017.

[79]

Jurgen Landes. Probabilism, entropies and strictly proper scoring rules. International Journal of Approximate Reasoning, 63:1-21, August 2015. ISSN 0888-613X.

[80]

Stefan Larson, Anish Mahendran, Joseph J Peper, Christopher Clarke, Andrew Lee, Parker Hill, Jonathan K Kummerfeld, Kevin Leach, Michael A Laurenzano, Lingjia Tang, et al. An evaluation dataset for intent classification and out-of-scope prediction. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 1311-1316, 2019.

[81]

Neil D Lawrence and Joaquin Quinonero-Candela. Local distance preservation in the GP-LVM through back constraints. In Proceedings of the 23rd international conference on Machine learning, pages 513-520, 2006.

[82]

L. LeCam. Convergence of Estimates Under Dimensionality Restrictions. The Annals of Statistics, 1 (1):38-53, January 1973. ISSN 0090-5364, 2168-8966.

[83]

Kimin Lee, Honglak Lee, Kibok Lee, and Jinwoo Shin. Training Confidence-calibrated Classifiers for Detecting Out-of-Distribution Samples. In International Conference on Learning Representations, 2018a.

[84]

Kimin Lee, Kibok Lee, Honglak Lee, and Jinwoo Shin. A simple unified framework for detecting out-of-distribution samples and adversarial attacks. Advances in neural information processing systems, 31, 2018b.

[85]

Kimin Lee, Kibok Lee, Honglak Lee, and Jinwoo Shin. A Simple Unified Framework for Detecting Out-of-Distribution Samples and Adversarial Attacks. In Advances in Neural Information Processing Systems 31, pages 7167-7177. Curran Associates, Inc., 2018c.

[86]

Shiyu Liang, Yixuan Li, and R Srikant. Enhancing the reliability of out-of-distribution image detection in neural networks. In International Conference on Learning Representations, 2018.

[87]

Fanghui Liu, Xiaolin Huang, Yudong Chen, and Johan AK Suykens. Random features for kernel approximation: A survey on algorithms, theory, and beyond. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(10):7128-7148, 2021.

[88]

Jeremiah Liu, Zi Lin, Shreyas Padhy, Dustin Tran, Tania Bedrax Weiss, and Balaji Lakshminarayanan. Simple and principled uncertainty estimation with deterministic deep learning via distance awareness. Advances in Neural Information Processing Systems, 33:7498-7512, 2020a.

[89]

Weitang Liu, Xiaoyun Wang, John Owens, and Yixuan Li. Energy-based out-of-distribution detection. Advances in Neural Information Processing Systems, 33:21464-21475, 2020b.

[90]

Zhiyun Lu, Eugene Ie, and Fei Sha. Mean-field approximation to Gaussian-softmax integral with application to uncertainty estimation. arXiv preprint arXiv:2006.07584, 2020.

[91]

David Macêdo and Teresa Ludermir. Enhanced isotropy maximization loss: Seamless and high-performance out-of-distribution detection simply replacing the softmax loss. arXiv preprint arXiv:2105.14399, 2021.

[92]

David Macedo, Tsang Ing Ren, Cleber Zanchettin, Adriano L. I. Oliveira, Alain Tapp, and Teresa Ludermir. Isotropic Maximization Loss and Entropic Score: Fast, Accurate, Scalable, Unexposed, Turnkey, and Native Neural Networks Out-of-Distribution Detection. arXiv:1908.05569 [cs, stat], February 2020. arXiv: 1908.05569.

[93]

David Macêdo, Cleber Zanchettin, and Teresa Ludermir. Distinction maximization loss: Efficiently improving classification accuracy, uncertainty estimation, and out-of-distribution detection simply replacing the loss and calibrating. arXiv preprint arXiv:2205.05874, 2022.

[94]

David J. C. MacKay. A practical Bayesian framework for backpropagation networks. Neural Computation, 4(3):448-472, May 1992. ISSN 0899-7667. Number: 3 Publisher: MIT Press.

[95]

Andrey Malinin and Mark Gales. Predictive Uncertainty Estimation via Prior Networks. In Advances in Neural Information Processing Systems 31, pages 7047-7058. Curran Associates, Inc., 2018a.

[96]

Andrey Malinin and Mark Gales. Prior Networks for Detection of Adversarial Attacks. arXiv:1812.02575 [cs, stat], December 2018b. arXiv: 1812.02575.

[97]

Jiri Matoušek. Lecture notes on metric embeddings. Technical report, Technical report, ETH Zürich, 2013.

[98]

Alexander Meinke and Matthias Hein. Towards neural networks that provably know when they don't knflow. In International Conference on Learning Representations, 2020.

[99]

Thomas P. Minka. A family of algorithms for approximate Bayesian inference. phd, Massachusetts Institute of Technology, USA, 2001. AAI0803033.

[100]

Takeru Miyato, Toshiki Kataoka, Masanori Koyama, and Yuichi Yoshida. Spectral Normalization for Generative Adversarial Networks. In International Conference on Learning Representations, 2018.

[101]

Saif Mohammad and Graeme Hirst. Distributional measures as proxies for semantic distance: A survey. Computational Linguistics, 1(1), 2006.

[102]

Jishnu Mukhoti, Andreas Kirsch, Joost van Amersfoort, Philip HS Torr, and Yarin Gal. Deterministic neural networks with appropriate inductive biases capture epistemic and aleatoric uncertainty. arXiv preprint arXiv:2102.11582, 2021.

[103]

Zachary Nado, Neil Band, Mark Collier, Josip Djolonga, Michael W Dusenberry, Sebastian Farquhar, Angelos Filos, Marton Havasi, Rodolphe Jenatton, Ghassen Jerfel, et al. Uncertainty baselines: Benchmarks for uncertainty & robustness in deep learning. arXiv preprint arXiv:2106.04015, 2021.

[104]

Yuval Netzer, Tao Wang, Adam Coates, Alessandro Bissacco, Bo Wu, and Andrew Y. Ng. Reading Digits in Natural Images with Unsupervised Feature Learning. In NeurIPS Workshop on Deep Learning and Unsupervised Feature Learning 2011, 2011.

[105]

Anh Nguyen, Jason Yosinski, and Jeff Clune. Deep neural networks are easily fooled: High confidence predictions for unrecognizable images. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 427-436, 2015.

[106]

Jeremy Nixon, Michael W Dusenberry, Linchuan Zhang, Ghassen Jerfel, and Dustin Tran. Measuring calibration in deep learning. In CVPR Workshop, 2019.

[107]

Kazuki Osawa, Siddharth Swaroop, Mohammad Emtiyaz E Khan, Anirudh Jain, Runa Eschenhagen, Richard E Turner, and Rio Yokota. Practical Deep Learning with Bayesian Principles. In Advances in Neural Information Processing Systems 32, pages 4287-4299. Curran Associates, Inc., 2019.

[108]

Yaniv Ovadia, Emily Fertig, Jie Ren, Zachary Nado, D. Sculley, Sebastian Nowozin, Joshua Dillon, Balaji Lakshminarayanan, and Jasper Snoek. Can you trust your model' s uncertainty? evaluating predictive uncertainty under dataset shift. In Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019.

[109]

Shreyas Padhy, Zachary Nado, Jie Ren, Jeremiah Liu, Jasper Snoek, and Balaji Lakshminarayanan. Revisiting one-vs-all classifiers for predictive uncertainty and out-of-distribution detection in neural networks. arXiv preprint arXiv:2007.05134, 2020.

[110]

Maxim Panov and Vladimir Spokoiny. Finite Sample Bernstein von Mises Theorem for Semi-parametric Problems. Bayesian Analysis, 10(3):665-710, September 2015. ISSN 1936-0975, 1931-6690.

[111]

Vardan Papyan. Traces of class/cross-class structure pervade deep learning spectra. Journal of Machine Learning Research, 21(252):1-64, 2020.

[112]

Matthew Parry, A. Philip Dawid, and Steffen Lauritzen. Proper local scoring rules. Annals of Statistics, 40(1):561-592, February 2012. ISSN 0090-5364, 2168-8966. Publisher: Institute of Mathematical Statistics.

[113]

Dominique Perrault-Joncas and Marina Meila. Metric learning and manifolds: Preserving the intrinsic geometry. Preprint Department of Statistics, University of Washington, 2012.

[114]

Ryan Poplin, Pi-Chuan Chang, David Alexander, Scott Schwartz, Thomas Colthurst, Alexander Ku, Dan Newburger, Jojo Dijamco, Nam Nguyen, Pegah T Afshar, et al. A universal SNP and small-indel variant caller using deep neural networks. Nature biotechnology, 36(10):983-987, 2018.

[115]

Ali Rahimi and Benjamin Recht. Uniform approximation of functions with random bases. In 2008 46th Annual Allerton Conference on Communication, Control, and Computing, pages 555-561. IEEE, 2008a.

[116]

Ali Rahimi and Benjamin Recht. Random Features for Large-Scale Kernel Machines. In Advances in Neural Information Processing Systems 20, pages 1177-1184. Curran Associates, Inc., 2008b.

[117]

Carl Edward Rasmussen and Christopher K. I. Williams. Gaussian Processes for Machine Learning. University Press Group Limited, January 2006. ISBN 978-0-262-18253-9. Google-Books-ID: vWtwQgAACAAJ.

[118]

Jamie Reilly, Bonnie Zuckerman, Ann Marie Finley, Celia Paula Litovsky, and Yoed Kenett. What is semantic distance? a review and proposed method for modeling conceptual transitions in natural language. 2022.

[119]

Jie Ren, Peter J Liu, Emily Fertig, Jasper Snoek, Ryan Poplin, Mark Depristo, Joshua Dillon, and Balaji Lakshminarayanan. Likelihood ratios for out-of-distribution detection. Advances in neural information processing systems, 32, 2019.

[120]

Jie Ren, Stanislav Fort, Jeremiah Liu, Abhijit Guha Roy, Shreyas Padhy, and Balaji Lakshminarayanan. A simple fix to Mahalanobis distance for improving near-OOD detection. arXiv preprint arXiv:2106.09022, 2021.

[121]

Carlos Riquelme, George Tucker, and Jasper Snoek. Deep Bayesian Bandits Showdown: An Empirical Comparison of Bayesian Deep Networks for Thompson Sampling. In International Conference on Learning Representations, 2018.

[122]

Hippolyt Ritter, Aleksandar Botev, and David Barber. A Scalable Laplace Approximation for Neural Networks. In International Conference on Learning Representations, 2018.

[123]

Francois Rousseau, Lucas Drumetz, and Ronan Fablet. Residual Networks as Flows of Diffeomorphisms. Journal of Mathematical Imaging and Vision, 62(3):365-375, April 2020. ISSN 1573-7683.

[124]

Abhijit Guha Roy, Jie Ren, Shekoofeh Azizi, Aaron Loh, Vivek Natarajan, Basil Mustafa, Nick Pawlowski, Jan Freyberg, Yuan Liu, Zach Beaver, et al. Does your dermatology classifier know what it doesn't know? detecting the long-tail of unseen conditions. Medical Image Analysis, 75: 102274, 2022.

[125]

Wenjie Ruan, Xiaowei Huang, and Marta Kwiatkowska. Reachability analysis of deep neural networks with provable guarantees. In Proceedings of the 27th International Joint Conference on Artificial Intelligence, IJCAI'18, pages 2651-2659, Stockholm, Sweden, July 2018. AAAI Press. ISBN 978-0-9992411-2-7.

[126]

Walter Rudin et al. Principles of mathematical analysis, volume 3. McGraw-hill New York, 1976.

[127]

Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg, and Li Fei-Fei. ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision, 115(3): 211-252, December 2015. ISSN 1573-1405.

Digital Library

[128]

Ruslan Salakhutdinov and Andriy Mnih. Bayesian Probabilistic Matrix Factorization Using Markov Chain Monte Carlo. In Proceedings of the 25th International Conference on Machine Learning, ICML '08, pages 880-887, New York, NY, USA, 2008. ACM. ISBN 978-1-60558-205-4.

Digital Library

[129]

Russ R Salakhutdinov and Geoffrey E Hinton. Using deep belief nets to learn covariance kernels for gaussian processes. In Advances in Neural Information Processing Systems, 2007.

[130]

Hugh Salimbeni and Marc Deisenroth. Doubly stochastic variational inference for deep Gaussian processes. Advances in neural information processing systems, 30, 2017.

[131]

Walter J. Scheirer, Lalit P. Jain, and Terrance E. Boult. Probability Models for Open Set Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(11):2317-2324, November 2014. ISSN 1939-3539. Conference Name: IEEE Transactions on Pattern Analysis and Machine Intelligence.

[132]

Micheal O. Searcod. Metric Spaces. Springer London, London, 2007 edition edition, August 2006. ISBN 978-1-84628-369-7.

[133]

Murat Sensoy, Lance Kaplan, and Melih Kandemir. Evidential deep learning to quantify classification uncertainty. Advances in Neural Information Processing Systems, 31, 2018a.

[134]

Murat Sensoy, Lance Kaplan, and Melih Kandemir. Evidential Deep Learning to Quantify Classification Uncertainty. In Advances in Neural Information Processing Systems 31, pages 3179-3189. Curran Associates, Inc., 2018b.

[135]

Lei Shu, Hu Xu, and Bing Liu. Doc: Deep open classification of text documents. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 2911-2916, 2017.

[136]

Nicki Skafte, Martin Jørgensen, and Søren Hauberg. Reliable training and estimation of variance networks. Advances in Neural Information Processing Systems, 32, 2019.

[137]

Lewis Smith, Joost van Amersfoort, Haiwen Huang, Stephen Roberts, and Yarin Gal. Can convolutional resnets approximately preserve input distances? a frequency analysis perspective. arXiv preprint arXiv:2106.02469, 2021.

[138]

Jasper Snoek, Oren Rippel, Kevin Swersky, Ryan Kiros, Nadathur Satish, Narayanan Sundaram, Mostofa Patwary, Mr Prabhat, and Ryan Adams. Scalable bayesian optimization using deep neural networks. In International conference on machine learning, pages 2171-2180. PMLR, 2015.

[139]

Jure Sokolic, Raja Giryes, Guillermo Sapiro, and Miguel R. D. Rodrigues. Robust Large Margin Deep Neural Networks. IEEE Transactions on Signal Processing, 2017.

Digital Library

[140]

Natasa Tagasovska and David Lopez-Paz. Single-Model Uncertainties for Deep Learning. In Advances in Neural Information Processing Systems 32, pages 6417-6428. Curran Associates, Inc., 2019.

[141]

Joshua B Tenenbaum, Vin de Silva, and John C Langford. A global geometric framework for nonlinear dimensionality reduction. Science, 290(5500):2319-2323, 2000.

[142]

Sunil Thulasidasan, Gopinath Chennupati, Jeff A Bilmes, Tanmoy Bhattacharya, and Sarah Michalak. On Mixup Training: Improved Calibration and Predictive Uncertainty for Deep Neural Networks. In Advances in Neural Information Processing Systems 32, pages 13888-13899. Curran Associates, Inc., 2019.

[143]

Luke Tierney, Robert E. Kass, and Joseph B. Kadane. Approximate Marginal Densities of Nonlinear Functions. Biometrika, 76(3):425-433, 1989. ISSN 0006-3444. Publisher: [Oxford University Press, Biometrika Trust].

[144]

Michalis Titsias. Variational Learning of Inducing Variables in Sparse Gaussian Processes. In Artificial Intelligence and Statistics, pages 567-574, April 2009.

[145]

Gia-Lac Tran, Edwin V. Bonilla, John Cunningham, Pietro Michiardi, and Maurizio Filippone. Calibrating Deep Convolutional Gaussian Processes. In The 22nd International Conference on Artificial Intelligence and Statistics, pages 1554-1563, April 2019. ISSN: 1938-7228 Section: Machine Learning.

[146]

Yusuke Tsuzuku, Issei Sato, and Masashi Sugiyama. Lipschitz-Margin Training: Scalable Certification of Perturbation Invariance for Deep Neural Networks. In Advances in Neural Information Processing Systems 31, pages 6541-6550. Curran Associates, Inc., 2018.

[147]

Joost Van Amersfoort, Lewis Smith, Yee Whye Teh, and Yarin Gal. Uncertainty estimation using a single deep deterministic neural network. In International conference on machine learning, pages 9690-9700. PMLR, 2020.

[148]

Joost van Amersfoort, Lewis Smith, Andrew Jesson, Oscar Key, and Yarin Gal. On feature collapse and deep kernel learning for single forward pass uncertainty. arXiv preprint arXiv:2102.11409, 2021.

[149]

Nikhita Vedula, Nedim Lipka, Pranav Maneriker, and Srinivasan Parthasarathy. Towards Open Intent Discovery for Conversational Text. arXiv:1904.08524 [cs], April 2019. arXiv: 1904.08524.

[150]

Grace Wahba. Spline Models for Observational Data. SIAM, September 1990. ISBN 978-0-89871-244-5.

[151]

Yeming Wen, Dustin Tran, and Jimmy Ba. BatchEnsemble: an Alternative Approach to Efficient Ensemble and Lifelong Learning. In International Conference on Learning Representations, 2020.

[152]

Tsui-Wei Weng, Huan Zhang, Pin-Yu Chen, Jinfeng Yi, Dong Su, Yupeng Gao, Cho-Jui Hsieh, and Luca Daniel. Evaluating the Robustness of Neural Networks: An Extreme Value Theory Approach. In International Conference on Learning Representations, 2018.

[153]

Florian Wenzel, Kevin Roth, Bastiaan Veeling, Jakub Swiatkowski, Linh Tran, Stephan Mandt, Jasper Snoek, Tim Salimans, Rodolphe Jenatton, and Sebastian Nowozin. How good is the Bayes posterior in deep neural networks really? In Proceedings of the 37th International Conference on Machine Learning, volume 119 of Proceedings of Machine Learning Research, pages 10248- 10259. PMLR, 13-18 Jul 2020.

[154]

Andrew G Wilson and Pavel Izmailov. Bayesian deep learning and a probabilistic perspective of generalization. Advances in neural information processing systems, 33:4697-4708, 2020.

[155]

Andrew Gordon Wilson, Zhiting Hu, Ruslan Salakhutdinov, and Eric P Xing. Deep kernel learning. In Artificial intelligence and statistics, pages 370-378. PMLR, 2016a.

[156]

Andrew Gordon Wilson, Zhiting Hu, Ruslan Salakhutdinov, and Eric P. Xing. Stochastic Variational Deep Kernel Learning. In Proceedings of the 30th International Conference on Neural Information Processing Systems, NeurIPS'16, pages 2594-2602, USA, 2016b. Curran Associates Inc. ISBN 978-1-5108-3881-9.

[157]

Mohammad-Ali Yaghoub-Zadeh-Fard, Boualem Benatallah, Fabio Casati, Moshe Chai Barukh, and Shayan Zamanirad. User Utterance Acquisition for Training Task-Oriented Bots: A Review of Challenges, Techniques and Opportunities. IEEE Internet Computing, pages 1-1, 2020. ISSN 1941-0131. Conference Name: IEEE Internet Computing.

[158]

Felix Xinnan X Yu, Ananda Theertha Suresh, Krzysztof M Choromanski, Daniel N Holtmann-Rice, and Sanjiv Kumar. Orthogonal Random Features. In Advances in Neural Information Processing Systems 29, pages 1975-1983. Curran Associates, Inc., 2016.

[159]

Sergey Zagoruyko and Nikos Komodakis. Wide Residual Networks. arXiv:1605.07146 [cs], June 2017. arXiv: 1605.07146.

[160]

Yinhe Zheng, Guanyi Chen, and Minlie Huang. Out-of-Domain Detection for Natural Language Understanding in Dialog Systems. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 28:1198-1209, 2020. ISSN 2329-9304. Conference Name: IEEE/ACM Transactions on Audio, Speech, and Language Processing.

Digital Library

Index Terms

A simple approach to improve single-model deep uncertainty via distance-awareness
1. Computing methodologies
  1. Artificial intelligence
    1. Knowledge representation and reasoning
      1. Probabilistic reasoning
      2. Vagueness and fuzzy logic
  2. Machine learning
    1. Machine learning approaches
      1. Neural networks
2. Theory of computation
  1. Theory and algorithms for application domains
    1. Machine learning theory

Index terms have been assigned to the content through auto-classification.

Recommendations

A survey of uncertainty in deep neural networks
Abstract
Over the last decade, neural networks have reached almost every field of science and become a crucial part of various real world applications. Due to the increasing spread, confidence in neural network predictions has become more and more ...
Simple and principled uncertainty estimation with deterministic deep learning via distance awareness
NIPS '20: Proceedings of the 34th International Conference on Neural Information Processing Systems

Bayesian neural networks and deep ensembles are principled approaches to estimate the predictive uncertainty of a deep learning model. However their practicality in real-time, industrial-scale applications are limited due to their heavy memory and ...
Numerical approach for quantification of epistemic uncertainty

In the field of uncertainty quantification, uncertainty in the governing equations may assume two forms: aleatory uncertainty and epistemic uncertainty. Aleatory uncertainty can be characterised by known probability distributions whilst epistemic ...

Comments

Information & Contributors

Information

Published In

cover image The Journal of Machine Learning Research

The Journal of Machine Learning Research Volume 24, Issue 1

January 2023

18881 pages

ISSN:1532-4435

EISSN:1533-7928

Editors:
Pradeep Ravikumar
Carnegie Mellon University
,
Tong Zhang
University of Illinois Urbana-Champaign

Issue’s Table of Contents

Copyright © 2023.

CC-BY 4.0

Publisher

JMLR.org

Publication History

Published: 06 March 2024

Accepted: 01 December 2022

Revised: 01 December 2022

Received: 01 May 2022

Published in JMLR Volume 24, Issue 1

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
35
Total Downloads

Downloads (Last 12 months)35
Downloads (Last 6 weeks)8

Reflects downloads up to 02 Sep 2024

Other Metrics

View Author Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Media

Figures

Other

Tables

View Issue’s Table of Contents