Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–7 of 7 results for author: Mukkamala, M C

Searching in archive cs. Search in all archives.
.
  1. arXiv:2012.13161  [pdf, other

    math.OC cs.CV cs.LG math.NA

    Global Convergence of Model Function Based Bregman Proximal Minimization Algorithms

    Authors: Mahesh Chandra Mukkamala, Jalal Fadili, Peter Ochs

    Abstract: Lipschitz continuity of the gradient mapping of a continuously differentiable function plays a crucial role in designing various optimization algorithms. However, many functions arising in practical applications such as low rank matrix factorization or deep neural network problems do not have a Lipschitz continuous gradient. This led to the development of a generalized notion known as the $L$-smad… ▽ More

    Submitted 24 December, 2020; originally announced December 2020.

    Comments: 44 pages, 22 figures

    MSC Class: 90C25; 26B25; 49M27; 52A41; 65K05

  2. arXiv:1910.03638  [pdf, other

    math.OC cs.CV cs.IR cs.LG

    Bregman Proximal Framework for Deep Linear Neural Networks

    Authors: Mahesh Chandra Mukkamala, Felix Westerkamp, Emanuel Laude, Daniel Cremers, Peter Ochs

    Abstract: A typical assumption for the analysis of first order optimization methods is the Lipschitz continuity of the gradient of the objective function. However, for many practical applications this assumption is violated, including loss functions in deep learning. To overcome this issue, certain extensions based on generalized proximity measures known as Bregman distances were introduced. This initiated… ▽ More

    Submitted 8 October, 2019; originally announced October 2019.

    Comments: 34 pages, 54 images

    MSC Class: 90C26; 26B25; 90C30; 49M27; 47J25; 65K05; 65F22

  3. arXiv:1905.09050  [pdf, other

    math.OC cs.CV cs.IR stat.ML

    Beyond Alternating Updates for Matrix Factorization with Inertial Bregman Proximal Gradient Algorithms

    Authors: Mahesh Chandra Mukkamala, Peter Ochs

    Abstract: Matrix Factorization is a popular non-convex optimization problem, for which alternating minimization schemes are mostly used. They usually suffer from the major drawback that the solution is biased towards one of the optimization variables. A remedy is non-alternating schemes. However, due to a lack of Lipschitz continuity of the gradient in matrix factorization problems, convergence cannot be gu… ▽ More

    Submitted 6 December, 2019; v1 submitted 22 May, 2019; originally announced May 2019.

    Comments: Accepted at NeuRIPS 2019. Paper url: http://papers.nips.cc/paper/8679-beyond-alternating-updates-for-matrix-factorization-with-inertial-bregman-proximal-gradient-algorithms

  4. arXiv:1904.03537  [pdf, other

    math.OC cs.CV cs.LG math.NA

    Convex-Concave Backtracking for Inertial Bregman Proximal Gradient Algorithms in Non-Convex Optimization

    Authors: Mahesh Chandra Mukkamala, Peter Ochs, Thomas Pock, Shoham Sabach

    Abstract: Backtracking line-search is an old yet powerful strategy for finding a better step sizes to be used in proximal gradient algorithms. The main principle is to locally find a simple convex upper bound of the objective function, which in turn controls the step size that is used. In case of inertial proximal gradient algorithms, the situation becomes much more difficult and usually leads to very restr… ▽ More

    Submitted 5 November, 2019; v1 submitted 6 April, 2019; originally announced April 2019.

    Comments: 29 pages

    MSC Class: 90C25; 26B25; 49M27; 52A41; 65K05

  5. arXiv:1809.10749  [pdf, other

    cs.LG cs.AI cs.CV stat.ML

    On the loss landscape of a class of deep neural networks with no bad local valleys

    Authors: Quynh Nguyen, Mahesh Chandra Mukkamala, Matthias Hein

    Abstract: We identify a class of over-parameterized deep neural networks with standard activation functions and cross-entropy loss which provably have no bad local valley, in the sense that from any point in parameter space there exists a continuous path on which the cross-entropy loss is non-increasing and gets arbitrarily close to zero. This implies that these networks have no sub-optimal strict local min… ▽ More

    Submitted 23 December, 2018; v1 submitted 27 September, 2018; originally announced September 2018.

    Comments: Accepted at ICLR 2019

  6. arXiv:1803.00094  [pdf, other

    cs.LG cs.AI cs.CV stat.ML

    Neural Networks Should Be Wide Enough to Learn Disconnected Decision Regions

    Authors: Quynh Nguyen, Mahesh Chandra Mukkamala, Matthias Hein

    Abstract: In the recent literature the important role of depth in deep learning has been emphasized. In this paper we argue that sufficient width of a feedforward network is equally important by answering the simple question under which conditions the decision regions of a neural network are connected. It turns out that for a class of activation functions including leaky ReLU, neural networks having a pyram… ▽ More

    Submitted 8 June, 2018; v1 submitted 28 February, 2018; originally announced March 2018.

    Comments: Accepted at ICML 2018. Added discussion for non-pyramidal networks and ReLU activation function

  7. arXiv:1706.05507  [pdf, other

    cs.LG cs.AI cs.CV cs.NE stat.ML

    Variants of RMSProp and Adagrad with Logarithmic Regret Bounds

    Authors: Mahesh Chandra Mukkamala, Matthias Hein

    Abstract: Adaptive gradient methods have become recently very popular, in particular as they have been shown to be useful in the training of deep neural networks. In this paper we have analyzed RMSProp, originally proposed for the training of deep neural networks, in the context of online convex optimization and show $\sqrt{T}$-type regret bounds. Moreover, we propose two variants SC-Adagrad and SC-RMSProp… ▽ More

    Submitted 28 November, 2017; v1 submitted 17 June, 2017; originally announced June 2017.

    Comments: ICML 2017, 16 pages, 23 figures