Arora, S., Cohen, N., Hu, W., Luo, Y. Implicit regularization in deep matrix factorization. In Advances in Neural Information Processing Systems (2019), Curran Associates, Inc, Red Hook, NY, 7413--7424.

Google Scholar

[2]

Arora, S., Li, Z., Panigrahi, A. Understanding gradient descent on edge of stability in deep learning. arXiv preprint arXiv:2205.09745 (2022).

Google Scholar

[3]

Azizan, N., Hassibi, B. Stochastic gradient/mirror descent: Minimax optimality and implicit regularization. In International Conference on Learning Representations (ICLR) (2019), OpenReview.net.

Google Scholar

[4]

Azulay, S., Moroshko, E., Nacson, M.S., Woodworth, B., Srebro, N., Globerson, A., et al. On the implicit bias of initialization shape: Beyond infinitesimal mirror descent. In International Conference on Machine Learning (2021), PMLR, 468--477.

Google Scholar

[5]

Bartlett, P.L., Foster, D.J., Telgarsky, M.J. Spectrally-normalized margin bounds for neural networks. Adv. Neural Inf. Process. Syst. 30 (2017), 6240--6249.

Google Scholar

[6]

Belkin, M., Hsu, D., Ma, S., Mandal, S. Reconciling modern machine-learning practice and the classical bias--variance trade-off. Proc. Natl. Acad. Sci. 116, 32 (2019), 15849--15854.

Crossref

Google Scholar

[7]

Chen, L., Bruna, J. On gradient descent convergence beyond the edge of stability. arXiv preprint arXiv:2206.04172 (2022).

Google Scholar

[8]

Chizat, L., Bach, F. Implicit bias of gradient descent for wide two-layer neural networks trained with the logistic loss. In Conference on Learning Theory (2020), PMLR, 1305--1338.

Google Scholar

[9]

Cohen, J.M., Kaur, S., Li, Y., Zico Kolter, J., Talwalkar, A. Gradient descent on neural networks typically occurs at the edge of stability. In International Conference on Learning Representations (ICLR) (2021), OpenReview.net.

Google Scholar

[10]

Dinh, L., Pascanu, R., Bengio, S., Bengio, Y. Sharp minima can generalize for deep nets. In International Conference on Machine Learning (2017), PMLR, 1019--1028.

Digital Library

Google Scholar

[11]

Golowich, N., Rakhlin, A., Shamir, O. Size-independent sample complexity of neural networks. In Conference On Learning Theory (2018), PMLR, 297--299.

Google Scholar

[12]

Gunasekar, S., Lee, J., Soudry, D., Srebro, N. Characterizing implicit bias in terms of optimization geometry. In International Conference on Machine Learning (2018), PMLR, 1832--1841.

Google Scholar

[13]

Gunasekar, S., Lee, J.D., Soudry, D., Srebro, N. Implicit bias of gradient descent on linear convolutional networks. In Advances in Neural Information Processing Systems (2018), Curran Associates, Inc, Red Hook, NY, 9461--9471.

Google Scholar

[14]

Gunasekar, S., Woodworth, E.B., Bhojanapalli, S., Neyshabur, B., Srebro, N. Implicit regularization in matrix factorization. Adv. Neural Inf. Process. Syst. 30 (2017), 6151--6159.

Google Scholar

[15]

Jastrzębski, S., Kenton, Z., Arpit, D., Ballas, N., Fischer, A., Bengio, Y., et al. Three factors influencing minima in SGD. arXiv preprint arXiv:1711.04623 (2017).

Google Scholar

[16]

Ji, Z., Telgarsky, M. Gradient descent aligns the layers of deep linear networks. In International Conference on Learning Representations (2018), OpenReview.net.

Google Scholar

[17]

Ji, Z., Telgarsky, M. Directional convergence and alignment in deep learning. In Advances in Neural Information Processing Systems (2020), Curran Associates, Inc, Red Hook, NY.

Google Scholar

[18]

Keskar, N.S., Mudigere, D., Nocedal, J., Smelyanskiy, M., Tang, P.T.P. On large-batch training for deep learning: Generalization gap and sharp minima. In International Conference on Learning Representations (ICLR) (2017), OpenReview.net.

Google Scholar

[19]

Li, Y., Ma, T., Zhang, H. Algorithmic regularization in over-parameterized matrix sensing and neural networks with quadratic activations. In Conference On Learning Theory (2018), PMLR, 2--47.

Google Scholar

[20]

Li, Z., Luo, Y., Lyu, K. Towards Resolving the Implicit Bias of Gradient Descent for Matrix Factorization: Greedy Low-Rank Learning. In International Conference on Learning Representations (2020), OpenReview.net.

Google Scholar

[21]

Lyu, K., Li, J. Gradient descent maximizes the margin of homogeneous neural networks. In International Conference on Learning Representations (2019).

Google Scholar

[22]

Lyu, K., Li, Z., Wang, R., Arora, S. Gradient descent on two-layer nets: Margin maximization and simplicity bias. Adv. Neural Inf. Process. Syst. 34 (2021), 12978--12991.

Google Scholar

[23]

Ma, C., Ying, L. On linear stability of SGD and input-smoothness of neural networks. Adv. Neural Inf. Process. Syst. 34 (2021), 16805--16817.

Google Scholar

[24]

Mulayoff, R., Michaeli, T. Unique properties of flat minima in deep networks. In International Conference on Machine Learning (2020), PMLR, 7108--7118.

Google Scholar

[25]

Mulayoff, R., Michaeli, T., Soudry, D. The implicit bias of minima stability: A view from function space. Adv. Neural Inf. Process. Syst. 34 (2021), 17749--17761.

Google Scholar

[26]

Nacson, M.S., Gunasekar, S., Lee, J., Srebro, N., Soudry, D. Lexicographic and depth-sensitive margins in homogeneous and non-homogeneous deep models. In International Conference on Machine Learning (2019), PMLR, 4683--4692.

Google Scholar

[27]

Nacson, M.S., Ravichandran, K., Srebro, N., Soudry, D. Implicit Bias of the Step Size in Linear Diagonal Neural Networks. In International Conference on Machine Learning (2022), PMLR, 16270--16295.

Google Scholar

[28]

Neyshabur, B., Bhojanapalli, S., McAllester, D., Srebro, N. Exploring generalization in deep learning. In Advances in Neural Information Processing Systems (2017), Curran Associates, Inc, Red Hook, NY, 5947--5956.

Google Scholar

[29]

Razin, N., Cohen, N. Implicit regularization in deep learning may not be explainable by norms. Adv. Neural Inf. Process. Syst. 33 (2020), 21174--21187.

Google Scholar

[30]

Safran, I., Vardi, G., Lee, J.D. On the effective number of linear regions in shallow univariate ReLU networks: Convergence guarantees and implicit bias. arXiv preprint arXiv:2205.09072 (2022).

Google Scholar

[31]

Sarussi, R., Brutzkus, A., Globerson, A. Towards understanding learning in neural networks with linear teachers. In International Conference on Machine Learning (2021), PMLR, 9313--9322.

Google Scholar

[32]

Shalev-Shwartz, S., Ben-David, S. Understanding Machine Learning: From Theory to Algorithms. Cambridge University Press, 2014.

Crossref

Google Scholar

[33]

Soudry, D., Hoffer, E., Nacson, M.S., Gunasekar, S., Srebro, N. The implicit bias of gradient descent on separable data. J. Mach. Learn. Res. 19, 1 (2018), 2822--2878.

Google Scholar

[34]

Vardi, G., Shamir, O. Implicit regularization in relu networks with the square loss. In Conference on Learning Theory (2021), PMLR, 4224--4258.

Google Scholar

[35]

Vardi, G., Shamir, O., Srebro, N. On margin maximization in linear and ReLU networks. arXiv preprint arXiv:2110.02732 (2021).

Google Scholar

[36]

Woodworth, B., Gunasekar, S., Lee, J.D., Moroshko, E., Savarese, P., Golan, I., et al. Kernel and rich regimes in overparametrized models. In Conference on Learning Theory (2020), PMLR, 3635--3673.

Google Scholar

[37]

Wu, L., Ma, C., et al. How sgd selects the global minima in overparameterized learning: A dynamical stability perspective. Adv. Neural Inf. Process. Syst. 31 (2018).

Google Scholar

[38]

Wu, L., Wang, M., Su, W. When does SGD favor flat minima? A quantitative characterization via linear stability. arXiv preprint arXiv:2207.02628 (2022).

Google Scholar

[39]

Yun, C., Krishnan, S., Mobahi, H. A unifying view on implicit bias in training linear neural networks. In International Conference on Learning Representations (ICLR) (2021), OpenReview.net.

Google Scholar

[40]

Zhang, C., Bengio, S., Hardt, M., Recht, B., Vinyals, O. Understanding deep learning (still) requires rethinking generalization. Commun. ACM 64, 3 (2021), 107--115.

Digital Library

Google Scholar

Cited By

View all

Rudnicka ZSzczepanski JPregowska A(2024)Artificial Intelligence-Based Algorithms in Medical Image Scan Segmentation and Intelligent Visual Content Generation—A Concise OverviewElectronics10.3390/electronics1304074613:4(746)Online publication date: 13-Feb-2024
https://doi.org/10.3390/electronics13040746
Hwang HShin J(2024)Test Case Prioritization with Z-Score Based Neuron Coverage2024 26th International Conference on Advanced Communications Technology (ICACT)10.23919/ICACT60172.2024.10471933(23-28)Online publication date: 4-Feb-2024
https://doi.org/10.23919/ICACT60172.2024.10471933
García-Peñalvo F(2024)Inteligencia artificial generativa y educaciónEducation in the Knowledge Society (EKS)10.14201/eks.3194225(e31942)Online publication date: 29-Apr-2024
https://doi.org/10.14201/eks.31942
Show More Cited By

Index Terms

On the Implicit Bias in Deep-Learning Algorithms

Recommendations

A Deep Generative Recommendation Method for Unbiased Learning from Implicit Feedback
ICTIR '23: Proceedings of the 2023 ACM SIGIR International Conference on Theory of Information Retrieval

Variational autoencoders (VAEs) are the state-of-the-art model for recommendation with implicit feedback signals. Unfortunately, implicit feedback suffers from selection bias, e.g., popularity bias, position bias, etc., and as a result, training from ...
Implicit bias of (stochastic) gradient descent for rank-1 linear neural network
NIPS '23: Proceedings of the 37th International Conference on Neural Information Processing Systems

Studying the implicit bias of gradient descent (GD) and stochastic gradient descent (SGD) is critical to unveil the underlying mechanism of deep learning. Unfortunately, even for standard linear networks in regression setting, a comprehensive ...
Implicit bias in deep linear classification: initialization scale vs training accuracy
NIPS '20: Proceedings of the 34th International Conference on Neural Information Processing Systems

We provide a detailed asymptotic study of gradient flow trajectories and their implicit optimization bias when minimizing the exponential loss over "diagonal linear networks". This is the simplest model displaying a transition between "kernel" and non-...

Comments

Information & Contributors

Information

Published In

Communications of the ACM Volume 66, Issue 6

June 2023

109 pages

ISSN:0001-0782

EISSN:1557-7317

DOI:10.1145/3599937

Editor:
James Larus
Association for Computing Machinery, New York, NY

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 May 2023

Published in CACM Volume 66, Issue 6

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article

Data Availability

Supplemental material. https://dl.acm.org/doi/10.1145/3571070#p86-vardi-supp.pdf

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

7
Total Citations
View Citations
2,120
Total Downloads

Downloads (Last 12 months)1,654
Downloads (Last 6 weeks)192

Reflects downloads up to 02 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

View all

Rudnicka ZSzczepanski JPregowska A(2024)Artificial Intelligence-Based Algorithms in Medical Image Scan Segmentation and Intelligent Visual Content Generation—A Concise OverviewElectronics10.3390/electronics1304074613:4(746)Online publication date: 13-Feb-2024
https://doi.org/10.3390/electronics13040746
Hwang HShin J(2024)Test Case Prioritization with Z-Score Based Neuron Coverage2024 26th International Conference on Advanced Communications Technology (ICACT)10.23919/ICACT60172.2024.10471933(23-28)Online publication date: 4-Feb-2024
https://doi.org/10.23919/ICACT60172.2024.10471933
García-Peñalvo F(2024)Inteligencia artificial generativa y educaciónEducation in the Knowledge Society (EKS)10.14201/eks.3194225(e31942)Online publication date: 29-Apr-2024
https://doi.org/10.14201/eks.31942
Cohrs KVarando GCarvalhais NReichstein MCamps-Valls G(2024)Causal hybrid modeling with double machine learning—applications in carbon flux modelingMachine Learning: Science and Technology10.1088/2632-2153/ad5a605:3(035021)Online publication date: 19-Jul-2024
https://doi.org/10.1088/2632-2153/ad5a60
Lou JWu XCorcoran PRosin PLiu H(2024)TranSalNet+: Distortion-aware saliency predictionNeurocomputing10.1016/j.neucom.2024.128155600(128155)Online publication date: Oct-2024
https://doi.org/10.1016/j.neucom.2024.128155
Hwang HChun IShin J(2023)Improved Test Input Prioritization Using Verification Monitors with False Prediction Cluster CentroidsElectronics10.3390/electronics1301002113:1(21)Online publication date: 19-Dec-2023
https://doi.org/10.3390/electronics13010021
Wu J(2023)Implementation of Power Grid Fault Diagnosis and Prediction Platform Based on Deep Learning Algorithms2023 IEEE 15th International Conference on Computational Intelligence and Communication Networks (CICN)10.1109/CICN59264.2023.10402185(482-487)Online publication date: 22-Dec-2023
https://doi.org/10.1109/CICN59264.2023.10402185

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Digital Edition

View this article in digital edition.

Digital Edition

Magazine Site

View this article on the magazine site (external)

Magazine Site

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Abstract

Supplemental Material

References

Cited By

Index Terms

Recommendations

A Deep Generative Recommendation Method for Unbiased Learning from Implicit Feedback

Implicit bias of (stochastic) gradient descent for rank-1 linear neural network

Implicit bias in deep linear classification: initialization scale vs training accuracy

Comments

Information

Published In

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Data Availability

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options

PDF

eReader

Digital Edition

Magazine Site

Get Access

Login options

Full Access

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations