TRADI: Tracking Deep Neural Network Weight Distributions

Franchi, Gianni; Bursuc, Andrei; Aldea, Emanuel; Dubuisson, Séverine; Bloch, Isabelle

doi:10.1007/978-3-030-58520-4_7

Gianni Franchi^12,13,
Andrei Bursuc¹⁴,
Emanuel Aldea¹³,
Séverine Dubuisson¹⁵ &
…
Isabelle Bloch¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12362))

Included in the following conference series:

European Conference on Computer Vision

3353 Accesses
20 Citations

Abstract

During training, the weights of a Deep Neural Network (DNN) are optimized from a random initialization towards a nearly optimum value minimizing a loss function. Only this final state of the weights is typically kept for testing, while the wealth of information on the geometry of the weight space, accumulated over the descent towards the minimum is discarded. In this work we propose to make use of this knowledge and leverage it for computing the distributions of the weights of the DNN. This can be further used for estimating the epistemic uncertainty of the DNN by aggregating predictions from an ensemble of networks sampled from these distributions. To this end we introduce a method for tracking the trajectory of the weights during optimization, that does neither require any change in the architecture, nor in the training procedure. We evaluate our method, TRADI, on standard classification and regression benchmarks, and on out-of-distribution detection for classification and semantic segmentation. We achieve competitive results, while preserving computational efficiency in comparison to ensemble approaches.

This work was supported by ANR Project MOHICANS (ANR-15-CE39-0005). We would like to thank Saclay-IA cluster and CNRS Jean-Zay supercomputer.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

A Random Focusing Method with Jensen–Shannon Divergence for Improving Deep Neural Network Performance Ensuring Architecture Consistency

Article Open access 17 June 2024

ComBiNet: Compact Convolutional Bayesian Neural Network for Image Segmentation

Gradient-Regularized Out-of-Distribution Detection

Notes

1.
Recent theoretical works [40] show connections between optimization and KF, enforcing the validity of our approach.

References

Notmnist dataset. http://yaroslavvb.blogspot.com/2011/09/notmnist-dataset.html
Andrychowicz, M., et al.: Learning to learn by gradient descent by gradient descent. In: Advances in Neural Information Processing Systems, pp. 3981–3989 (2016)
Google Scholar
Beluch, W.H., Genewein, T., Nürnberger, A., Köhler, J.M.: The power of ensembles for active learning in image classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 9368–9377 (2018)
Google Scholar
Bishop, C.M.: Pattern Recognition and Machine Learning. Springer, Heidelberg (2006)
Google Scholar
Blundell, C., Cornebise, J., Kavukcuoglu, K., Wierstra, D.: Weight uncertainty in neural network. In: Bach, F., Blei, D. (eds.) Proceedings of the 32nd International Conference on Machine Learning. Proceedings of Machine Learning Research, 07–09 July 2015, vol. 37, pp. 1613–1622. PMLR, Lille, France (2015) http://proceedings.mlr.press/v37/blundell15.html
Blundell, C., Cornebise, J., Kavukcuoglu, K., Wierstra, D.: Weight uncertainty in neural networks. arXiv preprint arXiv:1505.05424 (2015)
Brostow, G.J., Shotton, J., Fauqueur, J., Cipolla, R.: Segmentation and recognition using structure from motion point clouds. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008. LNCS, vol. 5302, pp. 44–57. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-88682-2_5
Chapter Google Scholar
Chen, C., Lu, C.X., Markham, A., Trigoni, N.: Ionet: learning to cure the curse of drift in inertial odometry. In: The Thirty-Second AAAI Conference on Artificial Intelligence (AAAI-18) (2018)
Google Scholar
Dosovitskiy, A., Ros, G., Codevilla, F., Lopez, A., Koltun, V.: CARLA: an open urban driving simulator. In: Proceedings of the 1st Annual Conference on Robot Learning, pp. 1–16 (2017)
Google Scholar
Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016)
Google Scholar
Gal, Y., Hron, J., Kendall, A.: Concrete dropout. In: NIPS (2017)
Google Scholar
Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, pp. 249–256 (2010)
Google Scholar
Graves, A.: Practical variational inference for neural networks. In: Advances in Neural Information Processing Systems, pp. 2348–2356 (2011)
Google Scholar
Grewal, M.S.: Kalman Filtering. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-04898-2_321
Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: Proceedings of the 34th International Conference on Machine Learning, vol. 70, pp. 1321–1330. JMLR. org (2017)
Google Scholar
Haarnoja, T., Ajay, A., Levine, S., Abbeel, P.: Backprop kf: learning discriminative deterministic state estimators. In: Advances in Neural Information Processing Systems, pp. 4376–4384 (2016)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1026–1034 (2015)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
Hendrycks, D., Basart, S., Mazeika, M., Mostajabi, M., Steinhardt, J., Song, D.: A benchmark for anomaly segmentation. arXiv preprint arXiv:1911.11132 (2019)
Hendrycks, D., Gimpel, K.: A baseline for detecting misclassified and out-of-distribution examples in neural networks. arXiv preprint arXiv:1610.02136 (2016)
Hernández-Lobato, J.M., Adams, R.: Probabilistic backpropagation for scalable learning of bayesian neural networks. In: International Conference on Machine Learning, pp. 1861–1869 (2015)
Google Scholar
Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167 (2015)
Izmailov, P., Podoprikhin, D., Garipov, T., Vetrov, D., Wilson, A.G.: Averaging weights leads to wider optima and better generalization. arXiv preprint arXiv:1803.05407 (2018)
Kalman, R.E.: A new approach to linear filtering and prediction problems. J. Basic Eng. 82(1), 35–45 (1960)
Article MathSciNet Google Scholar
Kendall, A., Badrinarayanan, V., Cipolla, R.: Bayesian segnet: model uncertainty in deep convolutional encoder-decoder architectures for scene understanding. arXiv preprint arXiv:1511.02680 (2015)
Kendall, A., Gal, Y.: What uncertainties do we need in bayesian deep learning for computer vision? In: Advances in Neural Information Processing Systems, pp. 5574–5584 (2017)
Google Scholar
Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Kingma, D.P., Welling, M.: Auto-encoding variational bayes. In: 2nd International Conference on Learning Representations, ICLR 2014, Banff, AB, Canada, 14–16 April 2014, Conference Track Proceedings (2014)
Google Scholar
Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images. Tecnical report, Citeseer (2009)
Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)
Google Scholar
Lakshminarayanan, B., Pritzel, A., Blundell, C.: Simple and scalable predictive uncertainty estimation using deep ensembles. In: Advances in Neural Information Processing Systems, pp. 6402–6413 (2017)
Google Scholar
Lambert, J., Sener, O., Savarese, S.: Deep learning under privileged information using heteroscedastic dropout. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8886–8895 (2018)
Google Scholar
Lan, J., Liu, R., Zhou, H., Yosinski, J.: LCA: loss change allocation for neural network training. In: Advances in Neural Information Processing Systems, pp. 3614–3624 (2019)
Google Scholar
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P., et al.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)
Article Google Scholar
Lee, S., Purushwalkam, S., Cogswell, M., Crandall, D., Batra, D.: Why m heads are better than one: Training a diverse ensemble of deep networks. arXiv preprint arXiv:1511.06314 (2015)
Liu, C., Gu, J., Kim, K., Narasimhan, S.G., Kautz, J.: Neural rgb (r) d sensing: depth and uncertainty from a video camera. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 10986–10995 (2019)
Google Scholar
Maddox, W., Garipov, T., Izmailov, P., Vetrov, D., Wilson, A.G.: A simple baseline for bayesian uncertainty in deep learning. arXiv preprint arXiv:1902.02476 (2019)
Mukhoti, J., Gal, Y.: Evaluating bayesian deep learning methods for semantic segmentation. CoRR abs/1811.12709 (2018). http://arxiv.org/abs/1811.12709
Neal, R.M.: Bayesian Learning for Neural Networks. Springer, Heidelberg (1996). https://doi.org/10.1007/978-1-4612-0745-0
Book MATH Google Scholar
Ollivier, Y.: The extended kalman filter is a natural gradient descent in trajectory space. arXiv preprint arXiv:1901.00696 (2019)
Osband, I.: Risk versus uncertainty in deep learning: Bayes, bootstrap and the dangers of dropout (2016)
Google Scholar
Osband, I., Aslanides, J., Cassirer, A.: Randomized prior functions for deep reinforcement learning. In: NeurIPS (2018)
Google Scholar
Paszke, A., Chaurasia, A., Kim, S., Culurciello, E.: Enet: a deep neural network architecture for real-time semantic segmentation. arXiv preprint arXiv:1606.02147 (2016)
Paszke, A., et al.: Pytorch: An imperative style, high-performance deep learning library. In: Advances in Neural Information Processing Systems, pp. 8024–8035 (2019)
Google Scholar
Rahimi, A., Recht, B.: Random features for large-scale kernel machines. In: Advances in Neural Information Processing Systems, pp. 1177–1184 (2007)
Google Scholar
Salimans, T., Kingma, D.P.: Weight normalization: a simple reparameterization to accelerate training of deep neural networks. In: Advances in Neural Information Processing Systems, pp. 901–909 (2016)
Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014), http://dl.acm.org/citation.cfm?id=2627435.2670313
Szegedy, C., et al.: Going deeper with convolutions. arxiv 2014. arXiv preprint arXiv:1409.4842 1409 (2014)
Teye, M., Azizpour, H., Smith, K.: Bayesian uncertainty estimation for batch normalized deep networks. In: ICML (2018)
Google Scholar
Wang, G., Peng, J., Luo, P., Wang, X., Lin, L.: Batch kalman normalization: Towards training deep neural networks with micro-batches. arXiv preprint arXiv:1802.03133 (2018)
Williams, C.K., Rasmussen, C.E.: Gaussian Processes for Machine Learning, vol. 2. MIT press, Cambridge (2006)
MATH Google Scholar
Yang, G.: Scaling limits of wide neural networks with weight sharing: Gaussian process behavior, gradient independence, and neural tangent kernel derivation. arXiv preprint arXiv:1902.04760 (2019)
Yu, F., et al.: Bdd100k: A diverse driving video database with scalable annotation tooling. arXiv preprint arXiv:1805.04687 (2018)
Zagoruyko, S., Komodakis, N.: Wide residual networks. arXiv preprint arXiv:1605.07146 (2016)
Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.: Pyramid scene parsing network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2881–2890 (2017)
Google Scholar

Download references

Author information

Authors and Affiliations

ENSTA Paris, Institut Polytechnique de Paris, Palaiseau, France
Gianni Franchi
SATIE, Université Paris-Sud, Université Paris-Saclay, Gif-sur-Yvette, France
Gianni Franchi & Emanuel Aldea
valeo.ai, Paris, France
Andrei Bursuc
CNRS, LIS, Aix Marseille University, Marseille, France
Séverine Dubuisson
LTCI, Télécom Paris, Institut Polytechnique de Paris, Palaiseau, France
Isabelle Bloch

Authors

Gianni Franchi
View author publications
You can also search for this author in PubMed Google Scholar
Andrei Bursuc
View author publications
You can also search for this author in PubMed Google Scholar
Emanuel Aldea
View author publications
You can also search for this author in PubMed Google Scholar
Séverine Dubuisson
View author publications
You can also search for this author in PubMed Google Scholar
Isabelle Bloch
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Gianni Franchi .

Editor information

Editors and Affiliations

University of Oxford, Oxford, UK
Andrea Vedaldi
Graz University of Technology, Graz, Austria
Horst Bischof
University of Freiburg, Freiburg im Breisgau, Germany
Thomas Brox
University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
Jan-Michael Frahm

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 2438 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Franchi, G., Bursuc, A., Aldea, E., Dubuisson, S., Bloch, I. (2020). TRADI: Tracking Deep Neural Network Weight Distributions. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, JM. (eds) Computer Vision – ECCV 2020. ECCV 2020. Lecture Notes in Computer Science(), vol 12362. Springer, Cham. https://doi.org/10.1007/978-3-030-58520-4_7

Download citation

DOI: https://doi.org/10.1007/978-3-030-58520-4_7
Published: 19 November 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-58519-8
Online ISBN: 978-3-030-58520-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

TRADI: Tracking Deep Neural Network Weight Distributions

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

A Random Focusing Method with Jensen–Shannon Divergence for Improving Deep Neural Network Performance Ensuring Architecture Consistency

ComBiNet: Compact Convolutional Bayesian Neural Network for Image Segmentation

Gradient-Regularized Out-of-Distribution Detection

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 2438 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

TRADI: Tracking Deep Neural Network Weight Distributions

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

A Random Focusing Method with Jensen–Shannon Divergence for Improving Deep Neural Network Performance Ensuring Architecture Consistency

ComBiNet: Compact Convolutional Bayesian Neural Network for Image Segmentation

Gradient-Regularized Out-of-Distribution Detection

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 2438 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation