Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–39 of 39 results for author: Amid, E

Searching in archive cs. Search in all archives.
.
  1. arXiv:2410.06424  [pdf, other

    cs.LG cs.CV

    Restructuring Vector Quantization with the Rotation Trick

    Authors: Christopher Fifty, Ronald G. Junkins, Dennis Duan, Aniketh Iger, Jerry W. Liu, Ehsan Amid, Sebastian Thrun, Christopher Ré

    Abstract: Vector Quantized Variational AutoEncoders (VQ-VAEs) are designed to compress a continuous input to a discrete latent space and reconstruct it with minimal distortion. They operate by maintaining a set of vectors -- often referred to as the codebook -- and quantizing each encoder output to the nearest vector in the codebook. However, as vector quantization is non-differentiable, the gradient to the… ▽ More

    Submitted 8 October, 2024; originally announced October 2024.

  2. arXiv:2403.09086  [pdf, other

    cs.LG

    Learning from straggler clients in federated learning

    Authors: Andrew Hard, Antonious M. Girgis, Ehsan Amid, Sean Augenstein, Lara McConnaughey, Rajiv Mathews, Rohan Anil

    Abstract: How well do existing federated learning algorithms learn from client devices that return model updates with a significant time delay? Is it even possible to learn effectively from clients that report back minutes, hours, or days after being scheduled? We answer these questions by developing Monte Carlo simulations of client latency that are guided by real-world applications. We study synchronous o… ▽ More

    Submitted 14 March, 2024; originally announced March 2024.

  3. arXiv:2403.05530  [pdf, other

    cs.CL cs.AI

    Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

    Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1110 additional authors not shown)

    Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More

    Submitted 8 August, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

  4. arXiv:2403.02697  [pdf, other

    stat.ML cs.LG

    Noise misleads rotation invariant algorithms on sparse targets

    Authors: Manfred K. Warmuth, Wojciech Kotłowski, Matt Jones, Ehsan Amid

    Abstract: It is well known that the class of rotation invariant algorithms are suboptimal even for learning sparse linear problems when the number of examples is below the "dimension" of the problem. This class includes any gradient descent trained neural net with a fully-connected input layer (initialized with a rotationally symmetric distribution). The simplest sparse problem is learning a single feature… ▽ More

    Submitted 5 March, 2024; originally announced March 2024.

  5. arXiv:2402.04163  [pdf, ps, other

    cs.LG

    Tempered Calculus for ML: Application to Hyperbolic Model Embedding

    Authors: Richard Nock, Ehsan Amid, Frank Nielsen, Alexander Soen, Manfred K. Warmuth

    Abstract: Most mathematical distortions used in ML are fundamentally integral in nature: $f$-divergences, Bregman divergences, (regularized) optimal transport distances, integral probability metrics, geodesic distances, etc. In this paper, we unveil a grounded theory and tools which can help improve these distortions to better cope with ML requirements. We start with a generalization of Riemann integration… ▽ More

    Submitted 27 September, 2024; v1 submitted 6 February, 2024; originally announced February 2024.

    Comments: Subsumed by paper "Hyperbolic Embeddings of Supervised Models" by Richard Nock, Ehsan Amid, Frank Nielsen, Alexander Soen and Manfred K. Warmuth, appearing at NeurIPS'24

    ACM Class: I.2.6

  6. arXiv:2312.11805  [pdf, other

    cs.CL cs.AI cs.CV

    Gemini: A Family of Highly Capable Multimodal Models

    Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1325 additional authors not shown)

    Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More

    Submitted 17 June, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

  7. arXiv:2311.13459  [pdf, other

    cs.LG stat.ML

    The Tempered Hilbert Simplex Distance and Its Application To Non-linear Embeddings of TEMs

    Authors: Ehsan Amid, Frank Nielsen, Richard Nock, Manfred K. Warmuth

    Abstract: Tempered Exponential Measures (TEMs) are a parametric generalization of the exponential family of distributions maximizing the tempered entropy function among positive measures subject to a probability normalization of their power densities. Calculus on TEMs relies on a deformed algebra of arithmetic operators induced by the deformed logarithms used to define the tempered entropy. In this work, we… ▽ More

    Submitted 22 November, 2023; originally announced November 2023.

  8. arXiv:2310.10971  [pdf, other

    cs.LG cs.CV

    Context-Aware Meta-Learning

    Authors: Christopher Fifty, Dennis Duan, Ronald G. Junkins, Ehsan Amid, Jure Leskovec, Christopher Re, Sebastian Thrun

    Abstract: Large Language Models like ChatGPT demonstrate a remarkable capacity to learn new concepts during inference without any fine-tuning. However, visual models trained to detect new objects during inference have been unable to replicate this ability, and instead either perform poorly or require meta-training and/or fine-tuning on similar objects. In this work, we propose a meta-learning algorithm that… ▽ More

    Submitted 25 March, 2024; v1 submitted 16 October, 2023; originally announced October 2023.

    Comments: ICLR 2024

  9. arXiv:2310.02549  [pdf, other

    cs.LG

    Heterogeneous Federated Learning Using Knowledge Codistillation

    Authors: Jared Lichtarge, Ehsan Amid, Shankar Kumar, Tien-Ju Yang, Rohan Anil, Rajiv Mathews

    Abstract: Federated Averaging, and many federated learning algorithm variants which build upon it, have a limitation: all clients must share the same model architecture. This results in unused modeling capacity on many clients, which limits model performance. To address this issue, we propose a method that involves training a small model on the entire pool and a larger model on a subset of clients with high… ▽ More

    Submitted 3 October, 2023; originally announced October 2023.

  10. arXiv:2309.08825  [pdf, other

    cs.LG cs.AI

    Distributionally Robust Post-hoc Classifiers under Prior Shifts

    Authors: Jiaheng Wei, Harikrishna Narasimhan, Ehsan Amid, Wen-Sheng Chu, Yang Liu, Abhishek Kumar

    Abstract: The generalization ability of machine learning models degrades significantly when the test distribution shifts away from the training distribution. We investigate the problem of training models that are robust to shifts caused by changes in the distribution of class-priors or group-priors. The presence of skewed training priors can often lead to the models overfitting to spurious features. Unlike… ▽ More

    Submitted 15 September, 2023; originally announced September 2023.

    Comments: Camera ready version, accepted at ICLR 2023

  11. arXiv:2309.04015  [pdf, other

    cs.LG math.OC

    Optimal Transport with Tempered Exponential Measures

    Authors: Ehsan Amid, Frank Nielsen, Richard Nock, Manfred K. Warmuth

    Abstract: In the field of optimal transport, two prominent subfields face each other: (i) unregularized optimal transport, "à-la-Kantorovich", which leads to extremely sparse plans but with algorithms that scale poorly, and (ii) entropic-regularized optimal transport, "à-la-Sinkhorn-Cuturi", which gets near-linear approximation algorithms but leads to maximally un-sparse plans. In this paper, we show that a… ▽ More

    Submitted 16 February, 2024; v1 submitted 7 September, 2023; originally announced September 2023.

  12. arXiv:2306.07179  [pdf, other

    cs.LG stat.ML

    Benchmarking Neural Network Training Algorithms

    Authors: George E. Dahl, Frank Schneider, Zachary Nado, Naman Agarwal, Chandramouli Shama Sastry, Philipp Hennig, Sourabh Medapati, Runa Eschenhagen, Priya Kasimbeg, Daniel Suo, Juhan Bae, Justin Gilmer, Abel L. Peirson, Bilal Khan, Rohan Anil, Mike Rabbat, Shankar Krishnan, Daniel Snider, Ehsan Amid, Kongtao Chen, Chris J. Maddison, Rakshith Vasudev, Michal Badura, Ankush Garg, Peter Mattson

    Abstract: Training algorithms, broadly construed, are an essential part of every deep learning pipeline. Training algorithm improvements that speed up training across a wide variety of workloads (e.g., better update rules, tuning protocols, learning rate schedules, or data selection schemes) could save time, save computational resources, and lead to better, more accurate, models. Unfortunately, as a communi… ▽ More

    Submitted 12 June, 2023; originally announced June 2023.

    Comments: 102 pages, 8 figures, 41 tables

  13. arXiv:2306.05487  [pdf, ps, other

    cs.LG stat.ML

    Boosting with Tempered Exponential Measures

    Authors: Richard Nock, Ehsan Amid, Manfred K. Warmuth

    Abstract: One of the most popular ML algorithms, AdaBoost, can be derived from the dual of a relative entropy minimization problem subject to the fact that the positive weights on the examples sum to one. Essentially, harder examples receive higher probabilities. We generalize this setup to the recently introduced {\it tempered exponential measure}s (TEMs) where normalization is enforced on a specific power… ▽ More

    Submitted 8 June, 2023; originally announced June 2023.

    ACM Class: I.2.6

  14. arXiv:2302.02055  [pdf, other

    cs.LG

    Implicit Geometry and Interaction Embeddings Improve Few-Shot Molecular Property Prediction

    Authors: Christopher Fifty, Joseph M. Paggi, Ehsan Amid, Jure Leskovec, Ron Dror

    Abstract: Few-shot learning is a promising approach to molecular property prediction as supervised data is often very limited. However, many important molecular properties depend on complex molecular characteristics -- such as the various 3D geometries a molecule may adopt or the types of chemical interactions it can form -- that are not explicitly encoded in the feature space and must be approximated from… ▽ More

    Submitted 6 October, 2023; v1 submitted 3 February, 2023; originally announced February 2023.

  15. arXiv:2211.02765  [pdf, other

    cs.LG

    Clustering above Exponential Families with Tempered Exponential Measures

    Authors: Ehsan Amid, Richard Nock, Manfred Warmuth

    Abstract: The link with exponential families has allowed $k$-means clustering to be generalized to a wide variety of data generating distributions in exponential families and clustering distortions among Bregman divergences. Getting the framework to work above exponential families is important to lift roadblocks like the lack of robustness of some population minimizers carved in their axiomatization. Curren… ▽ More

    Submitted 4 November, 2022; originally announced November 2022.

    ACM Class: I.2.6

  16. arXiv:2209.07080  [pdf, other

    cs.LG

    Layerwise Bregman Representation Learning with Applications to Knowledge Distillation

    Authors: Ehsan Amid, Rohan Anil, Christopher Fifty, Manfred K. Warmuth

    Abstract: In this work, we propose a novel approach for layerwise representation learning of a trained neural network. In particular, we form a Bregman divergence based on the layer's transfer function and construct an extension of the original Bregman PCA formulation by incorporating a mean vector and normalizing the principal directions with respect to the geometry of the local convex function around the… ▽ More

    Submitted 15 September, 2022; originally announced September 2022.

  17. arXiv:2206.07181  [pdf, ps, other

    cs.LG

    To Aggregate or Not? Learning with Separate Noisy Labels

    Authors: Jiaheng Wei, Zhaowei Zhu, Tianyi Luo, Ehsan Amid, Abhishek Kumar, Yang Liu

    Abstract: The rawly collected training data often comes with separate noisy labels collected from multiple imperfect annotators (e.g., via crowdsourcing). A typical way of using these separate labels is to first aggregate them into one and apply standard training methods. The literature has also studied extensively on effective aggregation approaches. This paper revisits this choice and aims to provide an a… ▽ More

    Submitted 19 October, 2022; v1 submitted 14 June, 2022; originally announced June 2022.

    Comments: Paper under Review

  18. arXiv:2204.08345  [pdf, other

    cs.SD cs.CR cs.LG eess.AS

    Extracting Targeted Training Data from ASR Models, and How to Mitigate It

    Authors: Ehsan Amid, Om Thakkar, Arun Narayanan, Rajiv Mathews, Françoise Beaufays

    Abstract: Recent work has designed methods to demonstrate that model updates in ASR training can leak potentially sensitive attributes of the utterances used in computing the updates. In this work, we design the first method to demonstrate information leakage about training data from trained ASR models. We design Noise Masking, a fill-in-the-blank style method for extracting targeted parts of training data… ▽ More

    Submitted 27 June, 2022; v1 submitted 18 April, 2022; originally announced April 2022.

    Comments: Accepted to appear at Interspeech'22

  19. arXiv:2202.06438  [pdf, other

    cs.LG

    Learning from Randomly Initialized Neural Network Features

    Authors: Ehsan Amid, Rohan Anil, Wojciech Kotłowski, Manfred K. Warmuth

    Abstract: We present the surprising result that randomly initialized neural networks are good feature extractors in expectation. These random features correspond to finite-sample realizations of what we call Neural Network Prior Kernel (NNPK), which is inherently infinite-dimensional. We conduct ablations across multiple architectures of varying sizes as well as initializations and activation functions. Our… ▽ More

    Submitted 13 February, 2022; originally announced February 2022.

  20. arXiv:2202.00145  [pdf, other

    cs.LG

    Step-size Adaptation Using Exponentiated Gradient Updates

    Authors: Ehsan Amid, Rohan Anil, Christopher Fifty, Manfred K. Warmuth

    Abstract: Optimizers like Adam and AdaGrad have been very successful in training large-scale neural networks. Yet, the performance of these methods is heavily dependent on a carefully tuned learning rate schedule. We show that in many large-scale applications, augmenting a given optimizer with an adaptive tuning method of the step-size greatly improves the performance. More precisely, we maintain a global s… ▽ More

    Submitted 31 January, 2022; originally announced February 2022.

  21. arXiv:2112.00193  [pdf, other

    cs.LG cs.CR

    Public Data-Assisted Mirror Descent for Private Model Training

    Authors: Ehsan Amid, Arun Ganesh, Rajiv Mathews, Swaroop Ramaswamy, Shuang Song, Thomas Steinke, Vinith M. Suriyakumar, Om Thakkar, Abhradeep Thakurta

    Abstract: In this paper, we revisit the problem of using in-distribution public data to improve the privacy/utility trade-offs for differentially private (DP) model training. (Here, public data refers to auxiliary data sets that have no privacy concerns.) We design a natural variant of DP mirror descent, where the DP gradients of the private/sensitive data act as the linear term, and the loss generated by t… ▽ More

    Submitted 27 March, 2022; v1 submitted 30 November, 2021; originally announced December 2021.

    Comments: 20 pages, 8 figures, 3 tables

  22. arXiv:2111.05428  [pdf, other

    cs.LG stat.ML

    Constrained Instance and Class Reweighting for Robust Learning under Label Noise

    Authors: Abhishek Kumar, Ehsan Amid

    Abstract: Deep neural networks have shown impressive performance in supervised learning, enabled by their ability to fit well to the provided training data. However, their performance is largely dependent on the quality of the training data and often degrades in the presence of noise. We propose a principled approach for tackling label noise with the aim of assigning importance weights to individual instanc… ▽ More

    Submitted 9 November, 2021; originally announced November 2021.

    Comments: 27 pages, including Appendix

  23. arXiv:2109.04617  [pdf, other

    cs.LG cs.AI cs.CV

    Efficiently Identifying Task Groupings for Multi-Task Learning

    Authors: Christopher Fifty, Ehsan Amid, Zhe Zhao, Tianhe Yu, Rohan Anil, Chelsea Finn

    Abstract: Multi-task learning can leverage information learned by one task to benefit the training of other tasks. Despite this capacity, naively training all tasks together in one model often degrades performance, and exhaustively searching through combinations of task groupings can be prohibitively expensive. As a result, efficiently identifying the tasks that would benefit from training together remains… ▽ More

    Submitted 25 October, 2021; v1 submitted 9 September, 2021; originally announced September 2021.

    Comments: In NeurIPS 2021 (spotlight). Code is available at https://github.com/google-research/google-research/tree/master/tag

  24. arXiv:2106.06199  [pdf, other

    cs.LG

    LocoProp: Enhancing BackProp via Local Loss Optimization

    Authors: Ehsan Amid, Rohan Anil, Manfred K. Warmuth

    Abstract: Second-order methods have shown state-of-the-art performance for optimizing deep neural networks. Nonetheless, their large memory requirement and high computational complexity, compared to first-order methods, hinder their versatility in a typical low-budget setup. This paper introduces a general framework of layerwise loss construction for multilayer neural networks that achieves a performance cl… ▽ More

    Submitted 5 March, 2022; v1 submitted 11 June, 2021; originally announced June 2021.

    Journal ref: International Conference on Artificial Intelligence and Statistics (AISTATS) 2022

  25. arXiv:2104.01493  [pdf, other

    cs.LG stat.ML

    Exponentiated Gradient Reweighting for Robust Training Under Label Noise and Beyond

    Authors: Negin Majidi, Ehsan Amid, Hossein Talebi, Manfred K. Warmuth

    Abstract: Many learning tasks in machine learning can be viewed as taking a gradient step towards minimizing the average loss of a batch of examples in each training iteration. When noise is prevalent in the data, this uniform treatment of examples can lead to overfitting to noisy examples with larger loss values and result in poor generalization. Inspired by the expert setting in on-line learning, we prese… ▽ More

    Submitted 3 April, 2021; originally announced April 2021.

  26. arXiv:2102.10639  [pdf, other

    cs.IT eess.SP

    Privacy-Preserving Wireless Federated Learning Exploiting Inherent Hardware Impairments

    Authors: Sina Rezaei Aghdam, Ehsan Amid, Marija Furdek, Alexandre Graell i Amat

    Abstract: We consider a wireless federated learning system where multiple data holder edge devices collaborate to train a global model via sharing their parameter updates with an honest-but-curious parameter server. We demonstrate that the inherent hardware-induced distortion perturbing the model updates of the edge devices can be exploited as a privacy-preserving mechanism. In particular, we model the dist… ▽ More

    Submitted 29 August, 2021; v1 submitted 21 February, 2021; originally announced February 2021.

    Comments: 6 pages, 2 figures, submitted to IEEE 26th International Workshop on Computer Aided Modeling and Design of Communication Links and Networks (CAMAD2021) SS4: Physical-Layer Methods for Security and Privacy in Beyond 5G/6G and Internet of Things Networks

  27. arXiv:2011.10893  [pdf, other

    cs.CV cs.LG

    Rank-smoothed Pairwise Learning In Perceptual Quality Assessment

    Authors: Hossein Talebi, Ehsan Amid, Peyman Milanfar, Manfred K. Warmuth

    Abstract: Conducting pairwise comparisons is a widely used approach in curating human perceptual preference data. Typically raters are instructed to make their choices according to a specific set of rules that address certain dimensions of image quality and aesthetics. The outcome of this process is a dataset of sampled image pairs with their associated empirical preference probabilities. Training a model o… ▽ More

    Submitted 21 November, 2020; originally announced November 2020.

    Journal ref: IEEE International Conference on Image Processing (ICIP) 2020

  28. arXiv:2010.15413  [pdf, other

    cs.LG cs.AI cs.CV cs.RO

    Measuring and Harnessing Transference in Multi-Task Learning

    Authors: Christopher Fifty, Ehsan Amid, Zhe Zhao, Tianhe Yu, Rohan Anil, Chelsea Finn

    Abstract: Multi-task learning can leverage information learned by one task to benefit the training of other tasks. Despite this capacity, naive formulations often degrade performance and in particular, identifying the tasks that would benefit from co-training remains a challenging design question. In this paper, we analyze the dynamics of information transfer, or transference, across tasks throughout traini… ▽ More

    Submitted 10 September, 2021; v1 submitted 29 October, 2020; originally announced October 2020.

  29. arXiv:2010.08625  [pdf, other

    cs.LG stat.ML

    A case where a spindly two-layer linear network whips any neural network with a fully connected input layer

    Authors: Manfred K. Warmuth, Wojciech Kotłowski, Ehsan Amid

    Abstract: It was conjectured that any neural network of any structure and arbitrary differentiable transfer functions at the nodes cannot learn the following problem sample efficiently when trained with gradient descent: The instances are the rows of a $d$-dimensional Hadamard matrix and the target is one of the features, i.e. very sparse. We essentially prove this conjecture: We show that after receiving a… ▽ More

    Submitted 16 October, 2020; originally announced October 2020.

  30. arXiv:2002.10487  [pdf, other

    cs.LG stat.ML

    Reparameterizing Mirror Descent as Gradient Descent

    Authors: Ehsan Amid, Manfred K. Warmuth

    Abstract: Most of the recent successful applications of neural networks have been based on training with gradient descent updates. However, for some small networks, other mirror descent updates learn provably more efficiently when the target is sparse. We present a general framework for casting a mirror descent update as a gradient descent update on a different set of parameters. In some cases, the mirror d… ▽ More

    Submitted 22 June, 2020; v1 submitted 24 February, 2020; originally announced February 2020.

  31. arXiv:1910.00204  [pdf, other

    cs.LG stat.ML

    TriMap: Large-scale Dimensionality Reduction Using Triplets

    Authors: Ehsan Amid, Manfred K. Warmuth

    Abstract: We introduce "TriMap"; a dimensionality reduction technique based on triplet constraints, which preserves the global structure of the data better than the other commonly used methods such as t-SNE, LargeVis, and UMAP. To quantify the global accuracy of the embedding, we introduce a score that roughly reflects the relative placement of the clusters rather than the individual points. We empirically… ▽ More

    Submitted 25 March, 2022; v1 submitted 1 October, 2019; originally announced October 2019.

  32. arXiv:1909.04803  [pdf, other

    cs.LG stat.ML

    An Implicit Form of Krasulina's k-PCA Update without the Orthonormality Constraint

    Authors: Ehsan Amid, Manfred K. Warmuth

    Abstract: We shed new insights on the two commonly used updates for the online $k$-PCA problem, namely, Krasulina's and Oja's updates. We show that Krasulina's update corresponds to a projected gradient descent step on the Stiefel manifold of the orthonormal $k$-frames, while Oja's update amounts to a gradient descent step using the unprojected gradient. Following these observations, we derive a more \emph{… ▽ More

    Submitted 10 September, 2019; originally announced September 2019.

  33. arXiv:1906.03361  [pdf, other

    cs.LG stat.ML

    Robust Bi-Tempered Logistic Loss Based on Bregman Divergences

    Authors: Ehsan Amid, Manfred K. Warmuth, Rohan Anil, Tomer Koren

    Abstract: We introduce a temperature into the exponential function and replace the softmax output layer of neural nets by a high temperature generalization. Similarly, the logarithm in the log loss we use for training is replaced by a low temperature logarithm. By tuning the two temperatures we create loss functions that are non-convex already in the single layer case. When replacing the last layer of the n… ▽ More

    Submitted 23 September, 2019; v1 submitted 7 June, 2019; originally announced June 2019.

    Journal ref: Neural Information Processing Systems 2019

  34. arXiv:1902.04107  [pdf, other

    cs.LG stat.ML

    Divergence-Based Motivation for Online EM and Combining Hidden Variable Models

    Authors: Ehsan Amid, Manfred K. Warmuth

    Abstract: Expectation-Maximization (EM) is a prominent approach for parameter estimation of hidden (aka latent) variable models. Given the full batch of data, EM forms an upper-bound of the negative log-likelihood of the model at each iteration and updates to the minimizer of this upper-bound. We first provide a "model level" interpretation of the EM upper-bound as sum of relative entropy divergences to a s… ▽ More

    Submitted 21 February, 2020; v1 submitted 11 February, 2019; originally announced February 2019.

  35. arXiv:1803.00854  [pdf, other

    cs.LG

    A more globally accurate dimensionality reduction method using triplets

    Authors: Ehsan Amid, Manfred K. Warmuth

    Abstract: We first show that the commonly used dimensionality reduction (DR) methods such as t-SNE and LargeVis poorly capture the global structure of the data in the low dimensional embedding. We show this via a number of tests for the DR methods that can be easily applied by any practitioner to the dataset at hand. Surprisingly enough, t-SNE performs the best w.r.t. the commonly used measures that reward… ▽ More

    Submitted 1 March, 2018; originally announced March 2018.

  36. arXiv:1705.07210  [pdf, other

    cs.LG stat.ML

    Two-temperature logistic regression based on the Tsallis divergence

    Authors: Ehsan Amid, Manfred K. Warmuth, Sriram Srinivasan

    Abstract: We develop a variant of multiclass logistic regression that is significantly more robust to noise. The algorithm has one weight vector per class and the surrogate loss is a function of the linear activations (one per class). The surrogate loss of an example with linear activation vector $\mathbf{a}$ and class $c$ has the form $-\log_{t_1} \exp_{t_2} (a_c - G_{t_2}(\mathbf{a}))$ where the two tempe… ▽ More

    Submitted 2 August, 2019; v1 submitted 19 May, 2017; originally announced May 2017.

    Journal ref: In The 22nd International Conference on Artificial Intelligence and Statistics, pp. 2388-2396. 2019

  37. arXiv:1612.00086  [pdf, other

    cs.LG stat.ML

    Semi-supervised Kernel Metric Learning Using Relative Comparisons

    Authors: Ehsan Amid, Aristides Gionis, Antti Ukkonen

    Abstract: We consider the problem of metric learning subject to a set of constraints on relative-distance comparisons between the data items. Such constraints are meant to reflect side-information that is not expressed directly in the feature vectors of the data items. The relative-distance constraints used in this work are particularly effective in expressing structures at finer level of detail than must-l… ▽ More

    Submitted 3 December, 2016; v1 submitted 30 November, 2016; originally announced December 2016.

  38. arXiv:1611.09957  [pdf, other

    cs.AI cs.LG stat.ML

    Low-dimensional Data Embedding via Robust Ranking

    Authors: Ehsan Amid, Nikos Vlassis, Manfred K. Warmuth

    Abstract: We describe a new method called t-ETE for finding a low-dimensional embedding of a set of objects in Euclidean space. We formulate the embedding problem as a joint ranking problem over a set of triplets, where each triplet captures the relative similarities between three objects in the set. By exploiting recent advances in robust ranking, t-ETE produces high-quality embeddings even in the presence… ▽ More

    Submitted 16 May, 2017; v1 submitted 29 November, 2016; originally announced November 2016.

  39. arXiv:1505.05821  [pdf, other

    cs.IR

    Optimizing the Information Retrieval Trade-off in Data Visualization Using $α$-Divergence

    Authors: Ehsan Amid, Onur Dikmen, Erkki Oja

    Abstract: Data visualization is one of the major applications of nonlinear dimensionality reduction. From the information retrieval perspective, the quality of a visualization can be evaluated by considering the extent that the neighborhood relation of each data point is maintained while the number of unrelated points that are retrieved is minimized. This property can be quantified as a trade-off between th… ▽ More

    Submitted 29 March, 2016; v1 submitted 21 May, 2015; originally announced May 2015.