Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–31 of 31 results for author: Anil, R

.
  1. arXiv:2403.09086  [pdf, other

    cs.LG

    Learning from straggler clients in federated learning

    Authors: Andrew Hard, Antonious M. Girgis, Ehsan Amid, Sean Augenstein, Lara McConnaughey, Rajiv Mathews, Rohan Anil

    Abstract: How well do existing federated learning algorithms learn from client devices that return model updates with a significant time delay? Is it even possible to learn effectively from clients that report back minutes, hours, or days after being scheduled? We answer these questions by developing Monte Carlo simulations of client latency that are guided by real-world applications. We study synchronous o… ▽ More

    Submitted 14 March, 2024; originally announced March 2024.

  2. arXiv:2403.08295  [pdf, other

    cs.CL cs.AI

    Gemma: Open Models Based on Gemini Research and Technology

    Authors: Gemma Team, Thomas Mesnard, Cassidy Hardin, Robert Dadashi, Surya Bhupatiraju, Shreya Pathak, Laurent Sifre, Morgane Rivière, Mihir Sanjay Kale, Juliette Love, Pouya Tafti, Léonard Hussenot, Pier Giuseppe Sessa, Aakanksha Chowdhery, Adam Roberts, Aditya Barua, Alex Botev, Alex Castro-Ros, Ambrose Slone, Amélie Héliou, Andrea Tacchetti, Anna Bulanova, Antonia Paterson, Beth Tsai, Bobak Shahriari , et al. (83 additional authors not shown)

    Abstract: This work introduces Gemma, a family of lightweight, state-of-the art open models built from the research and technology used to create Gemini models. Gemma models demonstrate strong performance across academic benchmarks for language understanding, reasoning, and safety. We release two sizes of models (2 billion and 7 billion parameters), and provide both pretrained and fine-tuned checkpoints. Ge… ▽ More

    Submitted 16 April, 2024; v1 submitted 13 March, 2024; originally announced March 2024.

  3. arXiv:2403.05530  [pdf, other

    cs.CL cs.AI

    Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

    Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1092 additional authors not shown)

    Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More

    Submitted 14 June, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

  4. arXiv:2312.11805  [pdf, other

    cs.CL cs.AI cs.CV

    Gemini: A Family of Highly Capable Multimodal Models

    Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1325 additional authors not shown)

    Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More

    Submitted 17 June, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

  5. arXiv:2311.10085  [pdf, other

    cs.LG cs.CL math.OC

    A Computationally Efficient Sparsified Online Newton Method

    Authors: Fnu Devvrit, Sai Surya Duvvuri, Rohan Anil, Vineet Gupta, Cho-Jui Hsieh, Inderjit Dhillon

    Abstract: Second-order methods hold significant promise for enhancing the convergence of deep neural network training; however, their large memory and computational demands have limited their practicality. Thus there is a need for scalable second-order methods that can efficiently train large models. In this paper, we introduce the Sparsified Online Newton (SONew) method, a memory-efficient second-order alg… ▽ More

    Submitted 16 November, 2023; originally announced November 2023.

    Comments: 30 pages. First two authors contributed equally. Accepted at NeurIPS 2023

  6. arXiv:2310.02549  [pdf, other

    cs.LG

    Heterogeneous Federated Learning Using Knowledge Codistillation

    Authors: Jared Lichtarge, Ehsan Amid, Shankar Kumar, Tien-Ju Yang, Rohan Anil, Rajiv Mathews

    Abstract: Federated Averaging, and many federated learning algorithm variants which build upon it, have a limitation: all clients must share the same model architecture. This results in unused modeling capacity on many clients, which limits model performance. To address this issue, we propose a method that involves training a small model on the entire pool and a larger model on a subset of clients with high… ▽ More

    Submitted 3 October, 2023; originally announced October 2023.

  7. arXiv:2306.07179  [pdf, other

    cs.LG stat.ML

    Benchmarking Neural Network Training Algorithms

    Authors: George E. Dahl, Frank Schneider, Zachary Nado, Naman Agarwal, Chandramouli Shama Sastry, Philipp Hennig, Sourabh Medapati, Runa Eschenhagen, Priya Kasimbeg, Daniel Suo, Juhan Bae, Justin Gilmer, Abel L. Peirson, Bilal Khan, Rohan Anil, Mike Rabbat, Shankar Krishnan, Daniel Snider, Ehsan Amid, Kongtao Chen, Chris J. Maddison, Rakshith Vasudev, Michal Badura, Ankush Garg, Peter Mattson

    Abstract: Training algorithms, broadly construed, are an essential part of every deep learning pipeline. Training algorithm improvements that speed up training across a wide variety of workloads (e.g., better update rules, tuning protocols, learning rate schedules, or data selection schemes) could save time, save computational resources, and lead to better, more accurate, models. Unfortunately, as a communi… ▽ More

    Submitted 12 June, 2023; originally announced June 2023.

    Comments: 102 pages, 8 figures, 41 tables

  8. arXiv:2305.10403  [pdf, other

    cs.CL cs.AI

    PaLM 2 Technical Report

    Authors: Rohan Anil, Andrew M. Dai, Orhan Firat, Melvin Johnson, Dmitry Lepikhin, Alexandre Passos, Siamak Shakeri, Emanuel Taropa, Paige Bailey, Zhifeng Chen, Eric Chu, Jonathan H. Clark, Laurent El Shafey, Yanping Huang, Kathy Meier-Hellstern, Gaurav Mishra, Erica Moreira, Mark Omernick, Kevin Robinson, Sebastian Ruder, Yi Tay, Kefan Xiao, Yuanzhong Xu, Yujing Zhang, Gustavo Hernandez Abrego , et al. (103 additional authors not shown)

    Abstract: We introduce PaLM 2, a new state-of-the-art language model that has better multilingual and reasoning capabilities and is more compute-efficient than its predecessor PaLM. PaLM 2 is a Transformer-based model trained using a mixture of objectives. Through extensive evaluations on English and multilingual language, and reasoning tasks, we demonstrate that PaLM 2 has significantly improved quality on… ▽ More

    Submitted 13 September, 2023; v1 submitted 17 May, 2023; originally announced May 2023.

  9. arXiv:2302.03764  [pdf, other

    stat.ML cs.AI cs.LG

    Sketchy: Memory-efficient Adaptive Regularization with Frequent Directions

    Authors: Vladimir Feinberg, Xinyi Chen, Y. Jennifer Sun, Rohan Anil, Elad Hazan

    Abstract: Adaptive regularization methods that exploit more than the diagonal entries exhibit state of the art performance for many tasks, but can be prohibitive in terms of memory and running time. We find the spectra of the Kronecker-factored gradient covariance matrix in deep learning (DL) training tasks are concentrated on a small leading eigenspace that changes throughout training, motivating a low-ran… ▽ More

    Submitted 16 October, 2023; v1 submitted 7 February, 2023; originally announced February 2023.

    Comments: 22 pages, 6 figures, 7 tables, NeurIPS 2023

  10. arXiv:2210.15374  [pdf, other

    cs.CV

    2T-UNET: A Two-Tower UNet with Depth Clues for Robust Stereo Depth Estimation

    Authors: Rohit Choudhary, Mansi Sharma, Rithvik Anil

    Abstract: Stereo correspondence matching is an essential part of the multi-step stereo depth estimation process. This paper revisits the depth estimation problem, avoiding the explicit stereo matching step using a simple two-tower convolutional neural network. The proposed algorithm is entitled as 2T-UNet. The idea behind 2T-UNet is to replace cost volume construction with twin convolution towers. These tow… ▽ More

    Submitted 27 October, 2022; originally announced October 2022.

  11. arXiv:2209.07080  [pdf, other

    cs.LG

    Layerwise Bregman Representation Learning with Applications to Knowledge Distillation

    Authors: Ehsan Amid, Rohan Anil, Christopher Fifty, Manfred K. Warmuth

    Abstract: In this work, we propose a novel approach for layerwise representation learning of a trained neural network. In particular, we form a Bregman divergence based on the layer's transfer function and construct an extension of the original Bregman PCA formulation by incorporating a mean vector and normalizing the principal directions with respect to the geometry of the local convex function around the… ▽ More

    Submitted 15 September, 2022; originally announced September 2022.

  12. arXiv:2209.05310  [pdf, other

    cs.IR cs.LG

    On the Factory Floor: ML Engineering for Industrial-Scale Ads Recommendation Models

    Authors: Rohan Anil, Sandra Gadanho, Da Huang, Nijith Jacob, Zhuoshu Li, Dong Lin, Todd Phillips, Cristina Pop, Kevin Regan, Gil I. Shamir, Rakesh Shivanna, Qiqi Yan

    Abstract: For industrial-scale advertising systems, prediction of ad click-through rate (CTR) is a central problem. Ad clicks constitute a significant class of user engagements and are often used as the primary signal for the usefulness of ads to users. Additionally, in cost-per-click advertising systems where advertisers are charged per click, click rate expectations feed directly into value estimation. Ac… ▽ More

    Submitted 12 September, 2022; originally announced September 2022.

    Comments: ORSUM - ACM RecSys, September 23, 2022, Seattle, WA

  13. arXiv:2207.06366  [pdf, other

    cs.CL cs.LG

    N-Grammer: Augmenting Transformers with latent n-grams

    Authors: Aurko Roy, Rohan Anil, Guangda Lai, Benjamin Lee, Jeffrey Zhao, Shuyuan Zhang, Shibo Wang, Ye Zhang, Shen Wu, Rigel Swavely, Tao, Yu, Phuong Dao, Christopher Fifty, Zhifeng Chen, Yonghui Wu

    Abstract: Transformer models have recently emerged as one of the foundational models in natural language processing, and as a byproduct, there is significant recent interest and investment in scaling these models. However, the training and inference costs of these large Transformer language models are prohibitive, thus necessitating more research in identifying more efficient variants. In this work, we prop… ▽ More

    Submitted 13 July, 2022; originally announced July 2022.

    Comments: 8 pages, 2 figures

  14. arXiv:2206.10375  [pdf, other

    cs.CV

    MEStereo-Du2CNN: A Novel Dual Channel CNN for Learning Robust Depth Estimates from Multi-exposure Stereo Images for HDR 3D Applications

    Authors: Rohit Choudhary, Mansi Sharma, Uma T V, Rithvik Anil

    Abstract: Display technologies have evolved over the years. It is critical to develop practical HDR capturing, processing, and display solutions to bring 3D technologies to the next level. Depth estimation of multi-exposure stereo image sequences is an essential task in the development of cost-effective 3D HDR video content. In this paper, we develop a novel deep architecture for multi-exposure stereo depth… ▽ More

    Submitted 21 June, 2022; originally announced June 2022.

  15. arXiv:2202.06438  [pdf, other

    cs.LG

    Learning from Randomly Initialized Neural Network Features

    Authors: Ehsan Amid, Rohan Anil, Wojciech Kotłowski, Manfred K. Warmuth

    Abstract: We present the surprising result that randomly initialized neural networks are good feature extractors in expectation. These random features correspond to finite-sample realizations of what we call Neural Network Prior Kernel (NNPK), which is inherently infinite-dimensional. We conduct ablations across multiple architectures of varying sizes as well as initializations and activation functions. Our… ▽ More

    Submitted 13 February, 2022; originally announced February 2022.

  16. arXiv:2202.00145  [pdf, other

    cs.LG

    Step-size Adaptation Using Exponentiated Gradient Updates

    Authors: Ehsan Amid, Rohan Anil, Christopher Fifty, Manfred K. Warmuth

    Abstract: Optimizers like Adam and AdaGrad have been very successful in training large-scale neural networks. Yet, the performance of these methods is heavily dependent on a carefully tuned learning rate schedule. We show that in many large-scale applications, augmenting a given optimizer with an adaptive tuning method of the step-size greatly improves the performance. More precisely, we maintain a global s… ▽ More

    Submitted 31 January, 2022; originally announced February 2022.

  17. arXiv:2109.04617  [pdf, other

    cs.LG cs.AI cs.CV

    Efficiently Identifying Task Groupings for Multi-Task Learning

    Authors: Christopher Fifty, Ehsan Amid, Zhe Zhao, Tianhe Yu, Rohan Anil, Chelsea Finn

    Abstract: Multi-task learning can leverage information learned by one task to benefit the training of other tasks. Despite this capacity, naively training all tasks together in one model often degrades performance, and exhaustively searching through combinations of task groupings can be prohibitively expensive. As a result, efficiently identifying the tasks that would benefit from training together remains… ▽ More

    Submitted 25 October, 2021; v1 submitted 9 September, 2021; originally announced September 2021.

    Comments: In NeurIPS 2021 (spotlight). Code is available at https://github.com/google-research/google-research/tree/master/tag

  18. arXiv:2108.01624  [pdf, other

    cs.LG cs.CL cs.CR

    Large-Scale Differentially Private BERT

    Authors: Rohan Anil, Badih Ghazi, Vineet Gupta, Ravi Kumar, Pasin Manurangsi

    Abstract: In this work, we study the large-scale pretraining of BERT-Large with differentially private SGD (DP-SGD). We show that combined with a careful implementation, scaling up the batch size to millions (i.e., mega-batches) improves the utility of the DP-SGD step for BERT; we also enhance its efficiency by using an increasing batch size schedule. Our implementation builds on the recent work of [SVK20],… ▽ More

    Submitted 3 August, 2021; originally announced August 2021.

    Comments: 12 pages, 6 figures

  19. arXiv:2106.06199  [pdf, other

    cs.LG

    LocoProp: Enhancing BackProp via Local Loss Optimization

    Authors: Ehsan Amid, Rohan Anil, Manfred K. Warmuth

    Abstract: Second-order methods have shown state-of-the-art performance for optimizing deep neural networks. Nonetheless, their large memory requirement and high computational complexity, compared to first-order methods, hinder their versatility in a typical low-budget setup. This paper introduces a general framework of layerwise loss construction for multilayer neural networks that achieves a performance cl… ▽ More

    Submitted 5 March, 2022; v1 submitted 11 June, 2021; originally announced June 2021.

    Journal ref: International Conference on Artificial Intelligence and Statistics (AISTATS) 2022

  20. arXiv:2106.05237  [pdf, other

    cs.CV cs.AI cs.LG

    Knowledge distillation: A good teacher is patient and consistent

    Authors: Lucas Beyer, Xiaohua Zhai, Amélie Royer, Larisa Markeeva, Rohan Anil, Alexander Kolesnikov

    Abstract: There is a growing discrepancy in computer vision between large-scale models that achieve state-of-the-art performance and models that are affordable in practical applications. In this paper we address this issue and significantly bridge the gap between these two types of models. Throughout our empirical investigation we do not aim to necessarily propose a new method, but strive to identify a robu… ▽ More

    Submitted 21 June, 2022; v1 submitted 9 June, 2021; originally announced June 2021.

    Comments: Lucas, Xiaohua, Amélie, Larisa, and Alex contributed equally; CVPR 2022

  21. arXiv:2102.06356  [pdf, other

    cs.LG stat.ML

    A Large Batch Optimizer Reality Check: Traditional, Generic Optimizers Suffice Across Batch Sizes

    Authors: Zachary Nado, Justin M. Gilmer, Christopher J. Shallue, Rohan Anil, George E. Dahl

    Abstract: Recently the LARS and LAMB optimizers have been proposed for training neural networks faster using large batch sizes. LARS and LAMB add layer-wise normalization to the update rules of Heavy-ball momentum and Adam, respectively, and have become popular in prominent benchmarks and deep learning libraries. However, without fair comparisons to standard optimizers, it remains an open question whether L… ▽ More

    Submitted 9 June, 2021; v1 submitted 12 February, 2021; originally announced February 2021.

  22. arXiv:2010.15413  [pdf, other

    cs.LG cs.AI cs.CV cs.RO

    Measuring and Harnessing Transference in Multi-Task Learning

    Authors: Christopher Fifty, Ehsan Amid, Zhe Zhao, Tianhe Yu, Rohan Anil, Chelsea Finn

    Abstract: Multi-task learning can leverage information learned by one task to benefit the training of other tasks. Despite this capacity, naive formulations often degrade performance and in particular, identifying the tasks that would benefit from co-training remains a challenging design question. In this paper, we analyze the dynamics of information transfer, or transference, across tasks throughout traini… ▽ More

    Submitted 10 September, 2021; v1 submitted 29 October, 2020; originally announced October 2020.

  23. arXiv:2010.13639  [pdf, other

    cs.LG math.OC stat.ML

    Stochastic Optimization with Laggard Data Pipelines

    Authors: Naman Agarwal, Rohan Anil, Tomer Koren, Kunal Talwar, Cyril Zhang

    Abstract: State-of-the-art optimization is steadily shifting towards massively parallel pipelines with extremely large batch sizes. As a consequence, CPU-bound preprocessing and disk/memory/network operations have emerged as new performance bottlenecks, as opposed to hardware-accelerated gradient computations. In this regime, a recently proposed approach is data echoing (Choi et al., 2019), which takes repe… ▽ More

    Submitted 26 October, 2020; originally announced October 2020.

    Comments: Published as a conference paper at NeurIPS 2020

  24. arXiv:2002.11803  [pdf, other

    cs.LG stat.ML

    Disentangling Adaptive Gradient Methods from Learning Rates

    Authors: Naman Agarwal, Rohan Anil, Elad Hazan, Tomer Koren, Cyril Zhang

    Abstract: We investigate several confounding factors in the evaluation of optimization algorithms for deep learning. Primarily, we take a deeper look at how adaptive gradient methods interact with the learning rate schedule, a notoriously difficult-to-tune hyperparameter which has dramatic effects on the convergence and generalization of neural network training. We introduce a "grafting" experiment which de… ▽ More

    Submitted 26 February, 2020; originally announced February 2020.

  25. arXiv:2002.09018  [pdf, other

    cs.LG math.OC stat.ML

    Scalable Second Order Optimization for Deep Learning

    Authors: Rohan Anil, Vineet Gupta, Tomer Koren, Kevin Regan, Yoram Singer

    Abstract: Optimization in machine learning, both theoretical and applied, is presently dominated by first-order gradient methods such as stochastic gradient descent. Second-order optimization methods, that involve second derivatives and/or second order statistics of the data, are far less prevalent despite strong theoretical properties, due to their prohibitive computation, memory and communication costs. I… ▽ More

    Submitted 5 March, 2021; v1 submitted 20 February, 2020; originally announced February 2020.

    Comments: 24 pages, Code available here: https://bit.ly/3uXXtKy

  26. arXiv:1906.03361  [pdf, other

    cs.LG stat.ML

    Robust Bi-Tempered Logistic Loss Based on Bregman Divergences

    Authors: Ehsan Amid, Manfred K. Warmuth, Rohan Anil, Tomer Koren

    Abstract: We introduce a temperature into the exponential function and replace the softmax output layer of neural nets by a high temperature generalization. Similarly, the logarithm in the log loss we use for training is replaced by a low temperature logarithm. By tuning the two temperatures we create loss functions that are non-convex already in the single layer case. When replacing the last layer of the n… ▽ More

    Submitted 23 September, 2019; v1 submitted 7 June, 2019; originally announced June 2019.

    Journal ref: Neural Information Processing Systems 2019

  27. arXiv:1902.08295  [pdf, other

    cs.LG stat.ML

    Lingvo: a Modular and Scalable Framework for Sequence-to-Sequence Modeling

    Authors: Jonathan Shen, Patrick Nguyen, Yonghui Wu, Zhifeng Chen, Mia X. Chen, Ye Jia, Anjuli Kannan, Tara Sainath, Yuan Cao, Chung-Cheng Chiu, Yanzhang He, Jan Chorowski, Smit Hinsu, Stella Laurenzo, James Qin, Orhan Firat, Wolfgang Macherey, Suyog Gupta, Ankur Bapna, Shuyuan Zhang, Ruoming Pang, Ron J. Weiss, Rohit Prabhavalkar, Qiao Liang, Benoit Jacob , et al. (66 additional authors not shown)

    Abstract: Lingvo is a Tensorflow framework offering a complete solution for collaborative deep learning research, with a particular focus towards sequence-to-sequence models. Lingvo models are composed of modular building blocks that are flexible and easily extensible, and experiment configurations are centralized and highly customizable. Distributed training and quantized inference are supported directly w… ▽ More

    Submitted 21 February, 2019; originally announced February 2019.

  28. arXiv:1901.11150  [pdf, other

    cs.LG math.OC stat.ML

    Memory-Efficient Adaptive Optimization

    Authors: Rohan Anil, Vineet Gupta, Tomer Koren, Yoram Singer

    Abstract: Adaptive gradient-based optimizers such as Adagrad and Adam are crucial for achieving state-of-the-art performance in machine translation and language modeling. However, these methods maintain second-order statistics for each parameter, thus introducing significant memory overheads that restrict the size of the model being used as well as the number of examples in a mini-batch. We describe an effe… ▽ More

    Submitted 11 September, 2019; v1 submitted 30 January, 2019; originally announced January 2019.

  29. TF-Ranking: Scalable TensorFlow Library for Learning-to-Rank

    Authors: Rama Kumar Pasumarthi, Sebastian Bruch, Xuanhui Wang, Cheng Li, Michael Bendersky, Marc Najork, Jan Pfeifer, Nadav Golbandi, Rohan Anil, Stephan Wolf

    Abstract: Learning-to-Rank deals with maximizing the utility of a list of examples presented to the user, with items of higher relevance being prioritized. It has several practical applications such as large-scale search, recommender systems, document summarization and question answering. While there is widespread support for classification and regression based learning, support for learning-to-rank in deep… ▽ More

    Submitted 17 May, 2019; v1 submitted 30 November, 2018; originally announced December 2018.

    Comments: KDD 2019

  30. arXiv:1804.03235  [pdf, other

    cs.LG cs.AI stat.ML

    Large scale distributed neural network training through online distillation

    Authors: Rohan Anil, Gabriel Pereyra, Alexandre Passos, Robert Ormandi, George E. Dahl, Geoffrey E. Hinton

    Abstract: Techniques such as ensembling and distillation promise model quality improvements when paired with almost any base model. However, due to increased test-time cost (for ensembles) and increased complexity of the training pipeline (for distillation), these techniques are challenging to use in industrial settings. In this paper we explore a variant of distillation which is relatively straightforward… ▽ More

    Submitted 20 August, 2020; v1 submitted 9 April, 2018; originally announced April 2018.

    Comments: Clarify that implementations should use available parallelism in pseudo-code

  31. arXiv:1606.07792  [pdf, other

    cs.LG cs.IR stat.ML

    Wide & Deep Learning for Recommender Systems

    Authors: Heng-Tze Cheng, Levent Koc, Jeremiah Harmsen, Tal Shaked, Tushar Chandra, Hrishi Aradhye, Glen Anderson, Greg Corrado, Wei Chai, Mustafa Ispir, Rohan Anil, Zakaria Haque, Lichan Hong, Vihan Jain, Xiaobing Liu, Hemal Shah

    Abstract: Generalized linear models with nonlinear feature transformations are widely used for large-scale regression and classification problems with sparse inputs. Memorization of feature interactions through a wide set of cross-product feature transformations are effective and interpretable, while generalization requires more feature engineering effort. With less feature engineering, deep neural networks… ▽ More

    Submitted 24 June, 2016; originally announced June 2016.