Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–42 of 42 results for author: Khan, M E

Searching in archive cs. Search in all archives.
.
  1. arXiv:2404.08168  [pdf, other

    cs.LG stat.ML

    Conformal Prediction via Regression-as-Classification

    Authors: Etash Guha, Shlok Natarajan, Thomas Möllenhoff, Mohammad Emtiyaz Khan, Eugene Ndiaye

    Abstract: Conformal prediction (CP) for regression can be challenging, especially when the output distribution is heteroscedastic, multimodal, or skewed. Some of the issues can be addressed by estimating a distribution over the output, but in reality, such approaches can be sensitive to estimation error and yield unstable intervals.~Here, we circumvent the challenges by converting regression to a classifica… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

    Comments: International Conference of Learning Representations 2024

    Journal ref: International Conference of Learning Representations 2024

  2. arXiv:2402.17641  [pdf, other

    cs.LG cs.AI cs.CL math.OC stat.ML

    Variational Learning is Effective for Large Deep Networks

    Authors: Yuesong Shen, Nico Daheim, Bai Cong, Peter Nickl, Gian Maria Marconi, Clement Bazan, Rio Yokota, Iryna Gurevych, Daniel Cremers, Mohammad Emtiyaz Khan, Thomas Möllenhoff

    Abstract: We give extensive empirical evidence against the common belief that variational learning is ineffective for large neural networks. We show that an optimizer called Improved Variational Online Newton (IVON) consistently matches or outperforms Adam for training large networks such as GPT-2 and ResNets from scratch. IVON's computational costs are nearly identical to Adam but its predictive uncertaint… ▽ More

    Submitted 6 June, 2024; v1 submitted 27 February, 2024; originally announced February 2024.

    Comments: Published at International Conference on Machine Learning (ICML), 2024. The first two authors contributed equally. Code is available here: https://github.com/team-approx-bayes/ivon

  3. arXiv:2402.00809  [pdf, other

    cs.LG stat.ML

    Position: Bayesian Deep Learning is Needed in the Age of Large-Scale AI

    Authors: Theodore Papamarkou, Maria Skoularidou, Konstantina Palla, Laurence Aitchison, Julyan Arbel, David Dunson, Maurizio Filippone, Vincent Fortuin, Philipp Hennig, José Miguel Hernández-Lobato, Aliaksandr Hubin, Alexander Immer, Theofanis Karaletsos, Mohammad Emtiyaz Khan, Agustinus Kristiadi, Yingzhen Li, Stephan Mandt, Christopher Nemeth, Michael A. Osborne, Tim G. J. Rudner, David Rügamer, Yee Whye Teh, Max Welling, Andrew Gordon Wilson, Ruqi Zhang

    Abstract: In the current landscape of deep learning research, there is a predominant emphasis on achieving high predictive accuracy in supervised tasks involving large image and language datasets. However, a broader perspective reveals a multitude of overlooked metrics, tasks, and data types, such as uncertainty, active and continual learning, and scientific data, that demand attention. Bayesian deep learni… ▽ More

    Submitted 2 June, 2024; v1 submitted 1 February, 2024; originally announced February 2024.

  4. arXiv:2310.19273  [pdf, other

    cs.LG cs.AI stat.ML

    The Memory Perturbation Equation: Understanding Model's Sensitivity to Data

    Authors: Peter Nickl, Lu Xu, Dharmesh Tailor, Thomas Möllenhoff, Mohammad Emtiyaz Khan

    Abstract: Understanding model's sensitivity to its training data is crucial but can also be challenging and costly, especially during training. To simplify such issues, we present the Memory-Perturbation Equation (MPE) which relates model's sensitivity to perturbation in its training data. Derived using Bayesian principles, the MPE unifies existing sensitivity measures, generalizes them to a wide-variety of… ▽ More

    Submitted 16 January, 2024; v1 submitted 30 October, 2023; originally announced October 2023.

    Comments: 37th Conference on Neural Information Processing Systems (NeurIPS 2023)

  5. arXiv:2310.12808  [pdf, other

    cs.LG cs.AI cs.CL

    Model Merging by Uncertainty-Based Gradient Matching

    Authors: Nico Daheim, Thomas Möllenhoff, Edoardo Maria Ponti, Iryna Gurevych, Mohammad Emtiyaz Khan

    Abstract: Models trained on different datasets can be merged by a weighted-averaging of their parameters, but why does it work and when can it fail? Here, we connect the inaccuracy of weighted-averaging to mismatches in the gradients and propose a new uncertainty-based scheme to improve the performance by reducing the mismatch. The connection also reveals implicit assumptions in other schemes such as averag… ▽ More

    Submitted 19 October, 2023; originally announced October 2023.

    Comments: Preprint. Under review

  6. arXiv:2306.15169  [pdf, other

    cs.LG stat.ML

    Exploiting Inferential Structure in Neural Processes

    Authors: Dharmesh Tailor, Mohammad Emtiyaz Khan, Eric Nalisnick

    Abstract: Neural Processes (NPs) are appealing due to their ability to perform fast adaptation based on a context set. This set is encoded by a latent variable, which is often assumed to follow a simple distribution. However, in real-word settings, the context set may be drawn from richer distributions having multiple modes, heavy tails, etc. In this work, we provide a framework that allows NPs' latent vari… ▽ More

    Submitted 26 June, 2023; originally announced June 2023.

    Comments: Uncertainty in Artificial Intelligence (UAI) 2023

  7. arXiv:2306.03566  [pdf, other

    cs.LG stat.ML

    Memory-Based Dual Gaussian Processes for Sequential Learning

    Authors: Paul E. Chang, Prakhar Verma, S. T. John, Arno Solin, Mohammad Emtiyaz Khan

    Abstract: Sequential learning with Gaussian processes (GPs) is challenging when access to past data is limited, for example, in continual and active learning. In such cases, errors can accumulate over time due to inaccuracies in the posterior, hyperparameters, and inducing points, making accurate learning challenging. Here, we present a method to keep all such errors in check using the recently proposed dua… ▽ More

    Submitted 6 June, 2023; originally announced June 2023.

    Comments: International Conference on Machine Learning (ICML) 2023

  8. arXiv:2304.14251  [pdf, ps, other

    cs.LG cs.AI stat.ML

    Variational Bayes Made Easy

    Authors: Mohammad Emtiyaz Khan

    Abstract: Variational Bayes is a popular method for approximate inference but its derivation can be cumbersome. To simplify the process, we give a 3-step recipe to identify the posterior form by explicitly looking for linearity with respect to expectations of well-known distributions. We can then directly write the update by simply ``reading-off'' the terms in front of those expectations. The recipe makes t… ▽ More

    Submitted 10 July, 2023; v1 submitted 27 April, 2023; originally announced April 2023.

    Journal ref: Presented at the 5th Symposium on Advances in Approximate Bayesian Inference (AABI 2023)

  9. arXiv:2303.04397  [pdf, other

    cs.LG stat.ML

    The Lie-Group Bayesian Learning Rule

    Authors: Eren Mehmet Kıral, Thomas Möllenhoff, Mohammad Emtiyaz Khan

    Abstract: The Bayesian Learning Rule provides a framework for generic algorithm design but can be difficult to use for three reasons. First, it requires a specific parameterization of exponential family. Second, it uses gradients which can be difficult to compute. Third, its update may not always stay on the manifold. We address these difficulties by proposing an extension based on Lie-groups where posterio… ▽ More

    Submitted 8 March, 2023; originally announced March 2023.

    Comments: AISTATS 2023

  10. arXiv:2302.09738  [pdf, other

    stat.ML cs.LG

    Simplifying Momentum-based Positive-definite Submanifold Optimization with Applications to Deep Learning

    Authors: Wu Lin, Valentin Duruisseaux, Melvin Leok, Frank Nielsen, Mohammad Emtiyaz Khan, Mark Schmidt

    Abstract: Riemannian submanifold optimization with momentum is computationally challenging because, to ensure that the iterates remain on the submanifold, we often need to solve difficult differential equations. Here, we simplify such difficulties for a class of sparse or structured symmetric positive-definite matrices with the affine-invariant metric. We do so by proposing a generalized version of the Riem… ▽ More

    Submitted 16 March, 2024; v1 submitted 19 February, 2023; originally announced February 2023.

    Comments: A long version of the ICML 2023 paper. Updated the main text to emphasize challenges of using existing Riemannian methods to estimate sparse and structured SPD matrices

  11. arXiv:2210.12282   

    cs.LG

    Bridging the Gap Between Target Networks and Functional Regularization

    Authors: Alexandre Piche, Valentin Thomas, Joseph Marino, Rafael Pardinas, Gian Maria Marconi, Christopher Pal, Mohammad Emtiyaz Khan

    Abstract: Bootstrapping is behind much of the successes of Deep Reinforcement Learning. However, learning the value function via bootstrapping often leads to unstable training due to fast-changing target values. Target Networks are employed to stabilize training by using an additional set of lagging parameters to estimate the target values. Despite the popularity of Target Networks, their effect on the opti… ▽ More

    Submitted 3 January, 2024; v1 submitted 21 October, 2022; originally announced October 2022.

    Comments: The published version of this paper (TMLR 2023) is available at arXiv:2106.02613 and https://openreview.net/forum?id=BFvoemrmqX

  12. arXiv:2210.06592  [pdf, other

    cs.LG

    Can Calibration Improve Sample Prioritization?

    Authors: Ganesh Tata, Gautham Krishna Gudur, Gopinath Chennupati, Mohammad Emtiyaz Khan

    Abstract: Calibration can reduce overconfident predictions of deep neural networks, but can calibration also accelerate training? In this paper, we show that it can when used to prioritize some examples for performing subset selection. We study the effect of popular calibration techniques in selecting better subsets of samples during training (also called sample prioritization) and observe that calibration… ▽ More

    Submitted 15 November, 2022; v1 submitted 12 October, 2022; originally announced October 2022.

  13. arXiv:2210.01620  [pdf, other

    cs.LG cs.AI math.OC stat.ML

    SAM as an Optimal Relaxation of Bayes

    Authors: Thomas Möllenhoff, Mohammad Emtiyaz Khan

    Abstract: Sharpness-aware minimization (SAM) and related adversarial deep-learning methods can drastically improve generalization, but their underlying mechanisms are not yet fully understood. Here, we establish SAM as a relaxation of the Bayes objective where the expected negative-loss is replaced by the optimal convex lower bound, obtained by using the so-called Fenchel biconjugate. The connection enables… ▽ More

    Submitted 10 December, 2023; v1 submitted 4 October, 2022; originally announced October 2022.

    Comments: Accepted at ICLR 2023. Changes: Link to source code (https://github.com/team-approx-bayes/bayesian-sam), fix a typo in Appendix D

  14. arXiv:2111.03412  [pdf, other

    cs.LG stat.ML

    Dual Parameterization of Sparse Variational Gaussian Processes

    Authors: Vincent Adam, Paul E. Chang, Mohammad Emtiyaz Khan, Arno Solin

    Abstract: Sparse variational Gaussian process (SVGP) methods are a common choice for non-conjugate Gaussian process inference because of their computational benefits. In this paper, we improve their computational efficiency by using a dual parameterization where each data example is assigned dual parameters, similarly to site parameters used in expectation propagation. Our dual parameterization speeds-up in… ▽ More

    Submitted 19 January, 2022; v1 submitted 5 November, 2021; originally announced November 2021.

    Comments: Advances in Neural Information Processing Systems (NeurIPS 2021)

  15. arXiv:2107.10884  [pdf, other

    stat.ML cs.LG

    Structured second-order methods via natural gradient descent

    Authors: Wu Lin, Frank Nielsen, Mohammad Emtiyaz Khan, Mark Schmidt

    Abstract: In this paper, we propose new structured second-order methods and structured adaptive-gradient methods obtained by performing natural-gradient descent on structured parameter spaces. Natural-gradient descent is an attractive approach to design new algorithms in many settings such as gradient-free, adaptive-gradient, and second-order methods. Our structured methods not only enjoy a structural invar… ▽ More

    Submitted 19 February, 2022; v1 submitted 22 July, 2021; originally announced July 2021.

    Comments: Fixed some typos and added a new figure. ICML 2021 workshop paper. A short version of arXiv:2102.07405 with a focus on optimization tasks

  16. arXiv:2107.08265  [pdf, other

    stat.ML cs.LG

    Subset-of-Data Variational Inference for Deep Gaussian-Processes Regression

    Authors: Ayush Jain, P. K. Srijith, Mohammad Emtiyaz Khan

    Abstract: Deep Gaussian Processes (DGPs) are multi-layer, flexible extensions of Gaussian processes but their training remains challenging. Sparse approximations simplify the training but often require optimization over a large number of inducing inputs and their locations across layers. In this paper, we simplify the training by setting the locations to a fixed subset of data and sampling the inducing inpu… ▽ More

    Submitted 17 July, 2021; originally announced July 2021.

    Comments: Accepted in the 37th Conference on Uncertainty in Artificial Intelligence (UAI 2021)

  17. arXiv:2107.04562  [pdf, other

    stat.ML cs.LG

    The Bayesian Learning Rule

    Authors: Mohammad Emtiyaz Khan, Håvard Rue

    Abstract: We show that many machine-learning algorithms are specific instances of a single algorithm called the \emph{Bayesian learning rule}. The rule, derived from Bayesian principles, yields a wide-range of algorithms from fields such as optimization, deep learning, and graphical models. This includes classical algorithms such as ridge regression, Newton's method, and Kalman filter, as well as modern dee… ▽ More

    Submitted 8 June, 2024; v1 submitted 9 July, 2021; originally announced July 2021.

    Journal ref: Journal of Machine Learning Research 24, no. 281 (2023): 1-46

  18. arXiv:2106.08769  [pdf, other

    cs.LG cs.AI stat.ML

    Knowledge-Adaptation Priors

    Authors: Mohammad Emtiyaz Khan, Siddharth Swaroop

    Abstract: Humans and animals have a natural ability to quickly adapt to their surroundings, but machine-learning models, when subjected to changes, often require a complete retraining from scratch. We present Knowledge-adaptation priors (K-priors) to reduce the cost of retraining by enabling quick and accurate adaptation for a wide-variety of tasks and models. This is made possible by a combination of weigh… ▽ More

    Submitted 27 October, 2021; v1 submitted 16 June, 2021; originally announced June 2021.

  19. arXiv:2106.02613  [pdf, other

    stat.ML cs.LG

    Bridging the Gap Between Target Networks and Functional Regularization

    Authors: Alexandre Piché, Valentin Thomas, Rafael Pardinas, Joseph Marino, Gian Maria Marconi, Christopher Pal, Mohammad Emtiyaz Khan

    Abstract: Bootstrapping is behind much of the successes of deep Reinforcement Learning. However, learning the value function via bootstrapping often leads to unstable training due to fast-changing target values. Target Networks are employed to stabilize training by using an additional set of lagging parameters to estimate the target values. Despite the popularity of Target Networks, their effect on the opti… ▽ More

    Submitted 7 September, 2023; v1 submitted 4 June, 2021; originally announced June 2021.

    Comments: The first two authors contributed equally

  20. arXiv:2104.04975  [pdf, other

    stat.ML cs.LG

    Scalable Marginal Likelihood Estimation for Model Selection in Deep Learning

    Authors: Alexander Immer, Matthias Bauer, Vincent Fortuin, Gunnar Rätsch, Mohammad Emtiyaz Khan

    Abstract: Marginal-likelihood based model-selection, even though promising, is rarely used in deep learning due to estimation difficulties. Instead, most approaches rely on validation data, which may not be readily available. In this work, we present a scalable marginal-likelihood estimation method to select both hyperparameters and network architectures, based on the training data alone. Some hyperparamete… ▽ More

    Submitted 15 June, 2021; v1 submitted 11 April, 2021; originally announced April 2021.

    Comments: ICML 2021

  21. arXiv:2102.07405  [pdf, other

    stat.ML cs.LG

    Tractable structured natural gradient descent using local parameterizations

    Authors: Wu Lin, Frank Nielsen, Mohammad Emtiyaz Khan, Mark Schmidt

    Abstract: Natural-gradient descent (NGD) on structured parameter spaces (e.g., low-rank covariances) is computationally challenging due to difficult Fisher-matrix computations. We address this issue by using \emph{local-parameter coordinates} to obtain a flexible and efficient NGD method that works well for a wide-variety of structured parameterizations. We show four applications where our method (1) genera… ▽ More

    Submitted 17 January, 2022; v1 submitted 15 February, 2021; originally announced February 2021.

    Comments: An extended version of the ICML 2021 paper. Note: A workshop (short) paper with a focus on optimization tasks can be found at arXiv:2107.10884

  22. arXiv:2007.04731  [pdf, other

    cs.LG stat.ML

    Fast Variational Learning in State-Space Gaussian Process Models

    Authors: Paul E. Chang, William J. Wilkinson, Mohammad Emtiyaz Khan, Arno Solin

    Abstract: Gaussian process (GP) regression with 1D inputs can often be performed in linear time via a stochastic differential equation formulation. However, for non-Gaussian likelihoods, this requires application of approximate inference methods which can make the implementation difficult, e.g., expectation propagation can be numerically unstable and variational inference can be computationally inefficient.… ▽ More

    Submitted 17 July, 2020; v1 submitted 9 July, 2020; originally announced July 2020.

    Comments: To appear in MLSP 2020

  23. arXiv:2004.14070  [pdf, other

    stat.ML cs.LG

    Continual Deep Learning by Functional Regularisation of Memorable Past

    Authors: Pingbo Pan, Siddharth Swaroop, Alexander Immer, Runa Eschenhagen, Richard E. Turner, Mohammad Emtiyaz Khan

    Abstract: Continually learning new skills is important for intelligent systems, yet standard deep learning methods suffer from catastrophic forgetting of the past. Recent works address this with weight regularisation. Functional regularisation, although computationally expensive, is expected to perform better, but rarely does so in practice. In this paper, we fix this issue by using a new functional-regular… ▽ More

    Submitted 8 January, 2021; v1 submitted 29 April, 2020; originally announced April 2020.

  24. arXiv:2002.10778  [pdf, other

    cs.LG stat.ML

    Training Binary Neural Networks using the Bayesian Learning Rule

    Authors: Xiangming Meng, Roman Bachmann, Mohammad Emtiyaz Khan

    Abstract: Neural networks with binary weights are computation-efficient and hardware-friendly, but their training is challenging because it involves a discrete optimization problem. Surprisingly, ignoring the discrete nature of the problem and using gradient-based methods, such as the Straight-Through Estimator, still works well in practice. This raises the question: are there principled approaches which ju… ▽ More

    Submitted 17 August, 2020; v1 submitted 25 February, 2020; originally announced February 2020.

    Comments: accepted by ICML 2020, the camera-ready version

  25. arXiv:2002.10060  [pdf, other

    stat.ML cs.LG

    Handling the Positive-Definite Constraint in the Bayesian Learning Rule

    Authors: Wu Lin, Mark Schmidt, Mohammad Emtiyaz Khan

    Abstract: The Bayesian learning rule is a natural-gradient variational inference method, which not only contains many existing learning algorithms as special cases but also enables the design of new algorithms. Unfortunately, when variational parameters lie in an open constraint set, the rule may not satisfy the constraint and requires line-searches which could slow down the algorithm. In this work, we addr… ▽ More

    Submitted 25 October, 2020; v1 submitted 23 February, 2020; originally announced February 2020.

    Comments: Fixed typos and updated the abstract (ICML 2020)

  26. arXiv:1910.13398  [pdf, ps, other

    stat.ML cs.LG

    Stein's Lemma for the Reparameterization Trick with Exponential Family Mixtures

    Authors: Wu Lin, Mohammad Emtiyaz Khan, Mark Schmidt

    Abstract: Stein's method (Stein, 1973; 1981) is a powerful tool for statistical applications, and has had a significant impact in machine learning. Stein's lemma plays an essential role in Stein's method. Previous applications of Stein's lemma either required strong technical assumptions or were limited to Gaussian distributions with restricted covariance structures. In this work, we extend Stein's lemma to… ▽ More

    Submitted 29 October, 2019; originally announced October 2019.

  27. arXiv:1909.06769  [pdf, other

    cs.LG stat.ML

    VILD: Variational Imitation Learning with Diverse-quality Demonstrations

    Authors: Voot Tangkaratt, Bo Han, Mohammad Emtiyaz Khan, Masashi Sugiyama

    Abstract: The goal of imitation learning (IL) is to learn a good policy from high-quality demonstrations. However, the quality of demonstrations in reality can be diverse, since it is easier and cheaper to collect demonstrations from a mix of experts and amateurs. IL in such situations can be challenging, especially when the level of demonstrators' expertise is unknown. We propose a new IL method called \un… ▽ More

    Submitted 15 September, 2019; originally announced September 2019.

  28. arXiv:1906.02914  [pdf, other

    stat.ML cs.LG

    Fast and Simple Natural-Gradient Variational Inference with Mixture of Exponential-family Approximations

    Authors: Wu Lin, Mohammad Emtiyaz Khan, Mark Schmidt

    Abstract: Natural-gradient methods enable fast and simple algorithms for variational inference, but due to computational difficulties, their use is mostly limited to \emph{minimal} exponential-family (EF) approximations. In this paper, we extend their application to estimate \emph{structured} approximations such as mixtures of EF distributions. Such approximations can fit complex, multimodal posterior distr… ▽ More

    Submitted 6 November, 2020; v1 submitted 7 June, 2019; originally announced June 2019.

    Comments: Corrected some typos and updated the appendix (ICML 2019)

  29. arXiv:1906.02506  [pdf, other

    stat.ML cs.LG

    Practical Deep Learning with Bayesian Principles

    Authors: Kazuki Osawa, Siddharth Swaroop, Anirudh Jain, Runa Eschenhagen, Richard E. Turner, Rio Yokota, Mohammad Emtiyaz Khan

    Abstract: Bayesian methods promise to fix many shortcomings of deep learning, but they are impractical and rarely match the performance of standard methods, let alone improve them. In this paper, we demonstrate practical training of deep networks with natural-gradient variational inference. By applying techniques such as batch normalisation, data augmentation, and distributed training, we achieve similar pe… ▽ More

    Submitted 29 October, 2019; v1 submitted 6 June, 2019; originally announced June 2019.

    Comments: NeurIPS 2019

  30. arXiv:1906.01930  [pdf, other

    stat.ML cs.AI cs.LG

    Approximate Inference Turns Deep Networks into Gaussian Processes

    Authors: Mohammad Emtiyaz Khan, Alexander Immer, Ehsan Abedi, Maciej Korzepa

    Abstract: Deep neural networks (DNN) and Gaussian processes (GP) are two powerful models with several theoretical connections relating them, but the relationship between their training methods is not well understood. In this paper, we show that certain Gaussian posterior approximations for Bayesian DNNs are equivalent to GP posteriors. This enables us to relate solutions and iterations of a deep-learning al… ▽ More

    Submitted 19 July, 2020; v1 submitted 5 June, 2019; originally announced June 2019.

    Comments: published at NeurIPS 2019: https://papers.nips.cc/paper/8573-approximate-inference-turns-deep-networks-into-gaussian-processes.pdf

  31. arXiv:1905.10969  [pdf, other

    stat.ML cs.LG

    Scalable Training of Inference Networks for Gaussian-Process Models

    Authors: Jiaxin Shi, Mohammad Emtiyaz Khan, Jun Zhu

    Abstract: Inference in Gaussian process (GP) models is computationally challenging for large data, and often difficult to approximate with a small number of inducing points. We explore an alternative approximation that employs stochastic inference networks for a flexible inference. Unfortunately, for such networks, minibatch training is difficult to be able to learn meaningful correlations over function out… ▽ More

    Submitted 27 May, 2019; originally announced May 2019.

    Comments: ICML 2019. Update results added in the camera-ready version

  32. arXiv:1904.03920  [pdf, other

    stat.ML cs.LG math.ST stat.CO

    A Generalization Bound for Online Variational Inference

    Authors: Badr-Eddine Chérief-Abdellatif, Pierre Alquier, Mohammad Emtiyaz Khan

    Abstract: Bayesian inference provides an attractive online-learning framework to analyze sequential data, and offers generalization guarantees which hold even with model mismatch and adversaries. Unfortunately, exact Bayesian inference is rarely feasible in practice and approximation methods are usually employed, but do such methods preserve the generalization properties of Bayesian inference ? In this pape… ▽ More

    Submitted 10 December, 2019; v1 submitted 8 April, 2019; originally announced April 2019.

    Comments: Published in the proceedings of ACML 2019

    Journal ref: Proceedings in Machine Learning Research, 2019, vol. 101, pp. 662-677

  33. TD-Regularized Actor-Critic Methods

    Authors: Simone Parisi, Voot Tangkaratt, Jan Peters, Mohammad Emtiyaz Khan

    Abstract: Actor-critic methods can achieve incredible performance on difficult reinforcement learning problems, but they are also prone to instability. This is partly due to the interaction between the actor and critic during learning, e.g., an inaccurate step taken by one of them might adversely affect the other and destabilize the learning. To avoid such issues, we propose to regularize the learning objec… ▽ More

    Submitted 25 February, 2019; v1 submitted 19 December, 2018; originally announced December 2018.

  34. arXiv:1811.04504  [pdf, other

    cs.LG cs.AI stat.ML

    SLANG: Fast Structured Covariance Approximations for Bayesian Deep Learning with Natural Gradient

    Authors: Aaron Mishkin, Frederik Kunstner, Didrik Nielsen, Mark Schmidt, Mohammad Emtiyaz Khan

    Abstract: Uncertainty estimation in large deep-learning models is a computationally challenging task, where it is difficult to form even a Gaussian approximation to the posterior distribution. In such situations, existing methods usually resort to a diagonal approximation of the covariance matrix despite, the fact that these matrices are known to result in poor uncertainty estimates. To address this issue,… ▽ More

    Submitted 11 January, 2019; v1 submitted 11 November, 2018; originally announced November 2018.

    Comments: NeurIPS 2018 final version

  35. arXiv:1807.04489  [pdf, other

    stat.ML cs.IT cs.LG stat.CO

    Fast yet Simple Natural-Gradient Descent for Variational Inference in Complex Models

    Authors: Mohammad Emtiyaz Khan, Didrik Nielsen

    Abstract: Bayesian inference plays an important role in advancing machine learning, but faces computational challenges when applied to complex models such as deep neural networks. Variational inference circumvents these challenges by formulating Bayesian inference as an optimization problem and solving it using gradient-based optimization. In this paper, we argue in favor of natural-gradient approaches whic… ▽ More

    Submitted 2 August, 2018; v1 submitted 12 July, 2018; originally announced July 2018.

    Comments: Camera-ready version

    Journal ref: International Symposium on Information Theory and Its Applications (ISITA), 2018

  36. arXiv:1806.04854  [pdf, other

    stat.ML cs.AI cs.LG stat.CO

    Fast and Scalable Bayesian Deep Learning by Weight-Perturbation in Adam

    Authors: Mohammad Emtiyaz Khan, Didrik Nielsen, Voot Tangkaratt, Wu Lin, Yarin Gal, Akash Srivastava

    Abstract: Uncertainty computation in deep learning is essential to design robust and reliable systems. Variational inference (VI) is a promising approach for such computation, but requires more effort to implement and execute compared to maximum-likelihood methods. In this paper, we propose new natural-gradient algorithms to reduce such efforts for Gaussian mean-field VI. Our algorithms can be implemented w… ▽ More

    Submitted 2 August, 2018; v1 submitted 13 June, 2018; originally announced June 2018.

    Comments: Camera ready version

    Journal ref: Thirty-fifth International Conference on Machine Learning, 2018

  37. arXiv:1805.08465  [pdf, other

    cs.LG stat.ML

    Beyond Unfolding: Exact Recovery of Latent Convex Tensor Decomposition under Reshuffling

    Authors: Chao Li, Mohammad Emtiyaz Khan, Zhun Sun, Gang Niu, Bo Han, Shengli Xie, Qibin Zhao

    Abstract: Exact recovery of tensor decomposition (TD) methods is a desirable property in both unsupervised learning and scientific data analysis. The numerical defects of TD methods, however, limit their practical applications on real-world data. As an alternative, convex tensor decomposition (CTD) was proposed to alleviate these problems, but its exact-recovery property is not properly addressed so far. To… ▽ More

    Submitted 28 January, 2020; v1 submitted 22 May, 2018; originally announced May 2018.

    Comments: AAAI-2020

  38. arXiv:1712.01038  [pdf, other

    stat.ML cs.LG

    Vprop: Variational Inference using RMSprop

    Authors: Mohammad Emtiyaz Khan, Zuozhu Liu, Voot Tangkaratt, Yarin Gal

    Abstract: Many computationally-efficient methods for Bayesian deep learning rely on continuous optimization algorithms, but the implementation of these methods requires significant changes to existing code-bases. In this paper, we propose Vprop, a method for Gaussian variational inference that can be implemented with two minor changes to the off-the-shelf RMSprop optimizer. Vprop also reduces the memory req… ▽ More

    Submitted 4 December, 2017; originally announced December 2017.

  39. arXiv:1711.05560  [pdf, other

    stat.ML cs.LG

    Variational Adaptive-Newton Method for Explorative Learning

    Authors: Mohammad Emtiyaz Khan, Wu Lin, Voot Tangkaratt, Zuozhu Liu, Didrik Nielsen

    Abstract: We present the Variational Adaptive Newton (VAN) method which is a black-box optimization method especially suitable for explorative-learning tasks such as active learning and reinforcement learning. Similar to Bayesian methods, VAN estimates a distribution that can be used for exploration, but requires computations that are similar to continuous optimization methods. Our theoretical contribution… ▽ More

    Submitted 15 November, 2017; originally announced November 2017.

  40. arXiv:1703.04265  [pdf, other

    cs.LG

    Conjugate-Computation Variational Inference : Converting Variational Inference in Non-Conjugate Models to Inferences in Conjugate Models

    Authors: Mohammad Emtiyaz Khan, Wu Lin

    Abstract: Variational inference is computationally challenging in models that contain both conjugate and non-conjugate terms. Methods specifically designed for conjugate models, even though computationally efficient, find it difficult to deal with non-conjugate terms. On the other hand, stochastic-gradient methods can handle the non-conjugate terms but they usually ignore the conjugate structure of the mode… ▽ More

    Submitted 13 April, 2017; v1 submitted 13 March, 2017; originally announced March 2017.

    Comments: Published in AI-Stats 2017. Fixed some typos. This version contains a short paragraph in the conclusions section which we could not add in the conference version due to space constraints

  41. arXiv:1511.00146  [pdf, other

    stat.ML cs.LG stat.CO

    Faster Stochastic Variational Inference using Proximal-Gradient Methods with General Divergence Functions

    Authors: Mohammad Emtiyaz Khan, Reza Babanezhad, Wu Lin, Mark Schmidt, Masashi Sugiyama

    Abstract: Several recent works have explored stochastic gradient methods for variational inference that exploit the geometry of the variational-parameter space. However, the theoretical properties of these methods are not well-understood and these methods typically only apply to conditionally-conjugate models. We present a new stochastic method for variational inference which exploits the geometry of the va… ▽ More

    Submitted 11 August, 2016; v1 submitted 31 October, 2015; originally announced November 2015.

    Comments: Published in UAI 2016. We have made the following change in this revision: instead of expressing convergence rate results in terms of the iterate difference, we state them in terms of the iterate distance divided by the step-size (a measure of first-order optimality). We also removed some claims about the performance with a fixed step size

  42. arXiv:1510.03592  [pdf, other

    cs.AI

    UAVs using Bayesian Optimization to Locate WiFi Devices

    Authors: Mattia Carpin, Stefano Rosati, Mohammad Emtiyaz Khan, Bixio Rimoldi

    Abstract: We address the problem of localizing non-collaborative WiFi devices in a large region. Our main motive is to localize humans by localizing their WiFi devices, e.g. during search-and-rescue operations after a natural disaster. We use an active sensing approach that relies on Unmanned Aerial Vehicles (UAVs) to collect signal-strength measurements at informative locations. The problem is challenging… ▽ More

    Submitted 14 October, 2015; v1 submitted 13 October, 2015; originally announced October 2015.