Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article
Free access

On the Implicit Bias in Deep-Learning Algorithms

Published: 24 May 2023 Publication History

Abstract

Examining the implicit bias in training neural networks using gradient-based methods.

Supplemental Material

PDF File
Supplemental material.

References

[1]
Arora, S., Cohen, N., Hu, W., Luo, Y. Implicit regularization in deep matrix factorization. In Advances in Neural Information Processing Systems (2019), Curran Associates, Inc, Red Hook, NY, 7413--7424.
[2]
Arora, S., Li, Z., Panigrahi, A. Understanding gradient descent on edge of stability in deep learning. arXiv preprint arXiv:2205.09745 (2022).
[3]
Azizan, N., Hassibi, B. Stochastic gradient/mirror descent: Minimax optimality and implicit regularization. In International Conference on Learning Representations (ICLR) (2019), OpenReview.net.
[4]
Azulay, S., Moroshko, E., Nacson, M.S., Woodworth, B., Srebro, N., Globerson, A., et al. On the implicit bias of initialization shape: Beyond infinitesimal mirror descent. In International Conference on Machine Learning (2021), PMLR, 468--477.
[5]
Bartlett, P.L., Foster, D.J., Telgarsky, M.J. Spectrally-normalized margin bounds for neural networks. Adv. Neural Inf. Process. Syst. 30 (2017), 6240--6249.
[6]
Belkin, M., Hsu, D., Ma, S., Mandal, S. Reconciling modern machine-learning practice and the classical bias--variance trade-off. Proc. Natl. Acad. Sci. 116, 32 (2019), 15849--15854.
[7]
Chen, L., Bruna, J. On gradient descent convergence beyond the edge of stability. arXiv preprint arXiv:2206.04172 (2022).
[8]
Chizat, L., Bach, F. Implicit bias of gradient descent for wide two-layer neural networks trained with the logistic loss. In Conference on Learning Theory (2020), PMLR, 1305--1338.
[9]
Cohen, J.M., Kaur, S., Li, Y., Zico Kolter, J., Talwalkar, A. Gradient descent on neural networks typically occurs at the edge of stability. In International Conference on Learning Representations (ICLR) (2021), OpenReview.net.
[10]
Dinh, L., Pascanu, R., Bengio, S., Bengio, Y. Sharp minima can generalize for deep nets. In International Conference on Machine Learning (2017), PMLR, 1019--1028.
[11]
Golowich, N., Rakhlin, A., Shamir, O. Size-independent sample complexity of neural networks. In Conference On Learning Theory (2018), PMLR, 297--299.
[12]
Gunasekar, S., Lee, J., Soudry, D., Srebro, N. Characterizing implicit bias in terms of optimization geometry. In International Conference on Machine Learning (2018), PMLR, 1832--1841.
[13]
Gunasekar, S., Lee, J.D., Soudry, D., Srebro, N. Implicit bias of gradient descent on linear convolutional networks. In Advances in Neural Information Processing Systems (2018), Curran Associates, Inc, Red Hook, NY, 9461--9471.
[14]
Gunasekar, S., Woodworth, E.B., Bhojanapalli, S., Neyshabur, B., Srebro, N. Implicit regularization in matrix factorization. Adv. Neural Inf. Process. Syst. 30 (2017), 6151--6159.
[15]
Jastrzębski, S., Kenton, Z., Arpit, D., Ballas, N., Fischer, A., Bengio, Y., et al. Three factors influencing minima in SGD. arXiv preprint arXiv:1711.04623 (2017).
[16]
Ji, Z., Telgarsky, M. Gradient descent aligns the layers of deep linear networks. In International Conference on Learning Representations (2018), OpenReview.net.
[17]
Ji, Z., Telgarsky, M. Directional convergence and alignment in deep learning. In Advances in Neural Information Processing Systems (2020), Curran Associates, Inc, Red Hook, NY.
[18]
Keskar, N.S., Mudigere, D., Nocedal, J., Smelyanskiy, M., Tang, P.T.P. On large-batch training for deep learning: Generalization gap and sharp minima. In International Conference on Learning Representations (ICLR) (2017), OpenReview.net.
[19]
Li, Y., Ma, T., Zhang, H. Algorithmic regularization in over-parameterized matrix sensing and neural networks with quadratic activations. In Conference On Learning Theory (2018), PMLR, 2--47.
[20]
Li, Z., Luo, Y., Lyu, K. Towards Resolving the Implicit Bias of Gradient Descent for Matrix Factorization: Greedy Low-Rank Learning. In International Conference on Learning Representations (2020), OpenReview.net.
[21]
Lyu, K., Li, J. Gradient descent maximizes the margin of homogeneous neural networks. In International Conference on Learning Representations (2019).
[22]
Lyu, K., Li, Z., Wang, R., Arora, S. Gradient descent on two-layer nets: Margin maximization and simplicity bias. Adv. Neural Inf. Process. Syst. 34 (2021), 12978--12991.
[23]
Ma, C., Ying, L. On linear stability of SGD and input-smoothness of neural networks. Adv. Neural Inf. Process. Syst. 34 (2021), 16805--16817.
[24]
Mulayoff, R., Michaeli, T. Unique properties of flat minima in deep networks. In International Conference on Machine Learning (2020), PMLR, 7108--7118.
[25]
Mulayoff, R., Michaeli, T., Soudry, D. The implicit bias of minima stability: A view from function space. Adv. Neural Inf. Process. Syst. 34 (2021), 17749--17761.
[26]
Nacson, M.S., Gunasekar, S., Lee, J., Srebro, N., Soudry, D. Lexicographic and depth-sensitive margins in homogeneous and non-homogeneous deep models. In International Conference on Machine Learning (2019), PMLR, 4683--4692.
[27]
Nacson, M.S., Ravichandran, K., Srebro, N., Soudry, D. Implicit Bias of the Step Size in Linear Diagonal Neural Networks. In International Conference on Machine Learning (2022), PMLR, 16270--16295.
[28]
Neyshabur, B., Bhojanapalli, S., McAllester, D., Srebro, N. Exploring generalization in deep learning. In Advances in Neural Information Processing Systems (2017), Curran Associates, Inc, Red Hook, NY, 5947--5956.
[29]
Razin, N., Cohen, N. Implicit regularization in deep learning may not be explainable by norms. Adv. Neural Inf. Process. Syst. 33 (2020), 21174--21187.
[30]
Safran, I., Vardi, G., Lee, J.D. On the effective number of linear regions in shallow univariate ReLU networks: Convergence guarantees and implicit bias. arXiv preprint arXiv:2205.09072 (2022).
[31]
Sarussi, R., Brutzkus, A., Globerson, A. Towards understanding learning in neural networks with linear teachers. In International Conference on Machine Learning (2021), PMLR, 9313--9322.
[32]
Shalev-Shwartz, S., Ben-David, S. Understanding Machine Learning: From Theory to Algorithms. Cambridge University Press, 2014.
[33]
Soudry, D., Hoffer, E., Nacson, M.S., Gunasekar, S., Srebro, N. The implicit bias of gradient descent on separable data. J. Mach. Learn. Res. 19, 1 (2018), 2822--2878.
[34]
Vardi, G., Shamir, O. Implicit regularization in relu networks with the square loss. In Conference on Learning Theory (2021), PMLR, 4224--4258.
[35]
Vardi, G., Shamir, O., Srebro, N. On margin maximization in linear and ReLU networks. arXiv preprint arXiv:2110.02732 (2021).
[36]
Woodworth, B., Gunasekar, S., Lee, J.D., Moroshko, E., Savarese, P., Golan, I., et al. Kernel and rich regimes in overparametrized models. In Conference on Learning Theory (2020), PMLR, 3635--3673.
[37]
Wu, L., Ma, C., et al. How sgd selects the global minima in overparameterized learning: A dynamical stability perspective. Adv. Neural Inf. Process. Syst. 31 (2018).
[38]
Wu, L., Wang, M., Su, W. When does SGD favor flat minima? A quantitative characterization via linear stability. arXiv preprint arXiv:2207.02628 (2022).
[39]
Yun, C., Krishnan, S., Mobahi, H. A unifying view on implicit bias in training linear neural networks. In International Conference on Learning Representations (ICLR) (2021), OpenReview.net.
[40]
Zhang, C., Bengio, S., Hardt, M., Recht, B., Vinyals, O. Understanding deep learning (still) requires rethinking generalization. Commun. ACM 64, 3 (2021), 107--115.

Cited By

View all
  • (2024)Artificial Intelligence-Based Algorithms in Medical Image Scan Segmentation and Intelligent Visual Content Generation—A Concise OverviewElectronics10.3390/electronics1304074613:4(746)Online publication date: 13-Feb-2024
  • (2024)Test Case Prioritization with Z-Score Based Neuron Coverage2024 26th International Conference on Advanced Communications Technology (ICACT)10.23919/ICACT60172.2024.10471933(23-28)Online publication date: 4-Feb-2024
  • (2024)Inteligencia artificial generativa y educaciónEducation in the Knowledge Society (EKS)10.14201/eks.3194225(e31942)Online publication date: 29-Apr-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Communications of the ACM
Communications of the ACM  Volume 66, Issue 6
June 2023
109 pages
ISSN:0001-0782
EISSN:1557-7317
DOI:10.1145/3599937
  • Editor:
  • James Larus
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 May 2023
Published in CACM Volume 66, Issue 6

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article

Data Availability

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1,654
  • Downloads (Last 6 weeks)192
Reflects downloads up to 02 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Artificial Intelligence-Based Algorithms in Medical Image Scan Segmentation and Intelligent Visual Content Generation—A Concise OverviewElectronics10.3390/electronics1304074613:4(746)Online publication date: 13-Feb-2024
  • (2024)Test Case Prioritization with Z-Score Based Neuron Coverage2024 26th International Conference on Advanced Communications Technology (ICACT)10.23919/ICACT60172.2024.10471933(23-28)Online publication date: 4-Feb-2024
  • (2024)Inteligencia artificial generativa y educaciónEducation in the Knowledge Society (EKS)10.14201/eks.3194225(e31942)Online publication date: 29-Apr-2024
  • (2024)Causal hybrid modeling with double machine learning—applications in carbon flux modelingMachine Learning: Science and Technology10.1088/2632-2153/ad5a605:3(035021)Online publication date: 19-Jul-2024
  • (2024)TranSalNet+: Distortion-aware saliency predictionNeurocomputing10.1016/j.neucom.2024.128155600(128155)Online publication date: Oct-2024
  • (2023)Improved Test Input Prioritization Using Verification Monitors with False Prediction Cluster CentroidsElectronics10.3390/electronics1301002113:1(21)Online publication date: 19-Dec-2023
  • (2023)Implementation of Power Grid Fault Diagnosis and Prediction Platform Based on Deep Learning Algorithms2023 IEEE 15th International Conference on Computational Intelligence and Communication Networks (CICN)10.1109/CICN59264.2023.10402185(482-487)Online publication date: 22-Dec-2023

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Digital Edition

View this article in digital edition.

Digital Edition

Magazine Site

View this article on the magazine site (external)

Magazine Site

Get Access

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media