Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–3 of 3 results for author: Modoranu, I

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.15593  [pdf, other

    cs.LG math.NA

    MicroAdam: Accurate Adaptive Optimization with Low Space Overhead and Provable Convergence

    Authors: Ionut-Vlad Modoranu, Mher Safaryan, Grigory Malinovsky, Eldar Kurtic, Thomas Robert, Peter Richtarik, Dan Alistarh

    Abstract: We propose a new variant of the Adam optimizer [Kingma and Ba, 2014] called MICROADAM that specifically minimizes memory overheads, while maintaining theoretical convergence guarantees. We achieve this by compressing the gradient information before it is fed into the optimizer state, thereby reducing its memory footprint significantly. We control the resulting compression error via a novel instanc… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

  2. arXiv:2306.06098  [pdf, other

    cs.LG math.NA math.OC

    Error Feedback Can Accurately Compress Preconditioners

    Authors: Ionut-Vlad Modoranu, Aleksei Kalinov, Eldar Kurtic, Elias Frantar, Dan Alistarh

    Abstract: Leveraging second-order information about the loss at the scale of deep networks is one of the main lines of approach for improving the performance of current optimizers for deep learning. Yet, existing approaches for accurate full-matrix preconditioning, such as Full-Matrix Adagrad (GGT) or Matrix-Free Approximate Curvature (M-FAC) suffer from massive storage costs when applied even to small-scal… ▽ More

    Submitted 5 June, 2024; v1 submitted 9 June, 2023; originally announced June 2023.

  3. arXiv:2010.02432  [pdf, other

    cs.LG cs.CR

    A Panda? No, It's a Sloth: Slowdown Attacks on Adaptive Multi-Exit Neural Network Inference

    Authors: Sanghyun Hong, Yiğitcan Kaya, Ionuţ-Vlad Modoranu, Tudor Dumitraş

    Abstract: Recent increases in the computational demands of deep neural networks (DNNs), combined with the observation that most input samples require only simple models, have sparked interest in $input$-$adaptive$ multi-exit architectures, such as MSDNets or Shallow-Deep Networks. These architectures enable faster inferences and could bring DNNs to low-power devices, e.g., in the Internet of Things (IoT). H… ▽ More

    Submitted 25 February, 2021; v1 submitted 5 October, 2020; originally announced October 2020.

    Comments: Accepted to ICLR 2021 [Spotlight]; First two authors contributed equally