Export Citations
Save this search
Please login to be able to save your searches and receive alerts for new content matching your search criteria.
- research-articleMarch 2024
Discovering salient neurons in deep NLP models
The Journal of Machine Learning Research (JMLR), Volume 24, Issue 1Article No.: 362, Pages 17438–17477While a lot of work has been done in understanding representations learned within deep NLP models and what knowledge they capture, work done towards analyzing individual neurons is relatively sparse. We present a technique called Linguistic Correlation ...
- research-articleMarch 2024
ProtoryNet - interpretable text classification via prototype trajectories
The Journal of Machine Learning Research (JMLR), Volume 24, Issue 1Article No.: 264, Pages 12344–12382We propose a novel interpretable deep neural network for text classification, called ProtoryNet, based on a new concept of prototype trajectories. Motivated by the prototype theory in modern linguistics, ProtoryNet makes a prediction by finding the most ...
- research-articleMarch 2024
Atlas: few-shot learning with retrieval augmented language models
- Gautier Izacard,
- Patrick Lewis,
- Maria Lomeli,
- Lucas Hosseini,
- Fabio Petroni,
- Timo Schick,
- Jane Dwivedi-Yu,
- Armand Joulin,
- Sebastian Riedel,
- Edouard Grave
The Journal of Machine Learning Research (JMLR), Volume 24, Issue 1Article No.: 251, Pages 11912–11954Large language models have shown impressive few-shot results on a wide range of tasks. However, when knowledge is key for such results, as is the case for tasks such as question answering and fact checking, massive parameter counts to store knowledge ...
- research-articleMarch 2024
PaLM: scaling language modeling with pathways
- Aakanksha Chowdhery,
- Sharan Narang,
- Jacob Devlin,
- Maarten Bosma,
- Gaurav Mishra,
- Adam Roberts,
- Paul Barham,
- Hyung Won Chung,
- Charles Sutton,
- Sebastian Gehrmann,
- Parker Schuh,
- Kensen Shi,
- Sashank Tsvyashchenko,
- Joshua Maynez,
- Abhishek Rao,
- Parker Barnes,
- Yi Tay,
- Noam Shazeer,
- Vinodkumar Prabhakaran,
- Emily Reif,
- Nan Du,
- Ben Hutchinson,
- Reiner Pope,
- James Bradbury,
- Jacob Austin,
- Michael Isard,
- Guy Gur-Ari,
- Pengcheng Yin,
- Toju Duke,
- Anselm Levskaya,
- Sanjay Ghemawat,
- Sunipa Dev,
- Henryk Michalewski,
- Xavier Garcia,
- Vedant Misra,
- Kevin Robinson,
- Liam Fedus,
- Denny Zhou,
- Daphne Ippolito,
- David Luan,
- Hyeontaek Lim,
- Barret Zoph,
- Alexander Spiridonov,
- Ryan Sepassi,
- David Dohan,
- Shivani Agrawal,
- Mark Omernick,
- Andrew M. Dai,
- Thanumalayan Sankaranarayana Pillai,
- Marie Pellat,
- Aitor Lewkowycz,
- Erica Moreira,
- Rewon Child,
- Oleksandr Polozov,
- Katherine Lee,
- Zongwei Zhou,
- Xuezhi Wang,
- Brennan Saeta,
- Mark Diaz,
- Orhan Firat,
- Michele Catasta,
- Jason Wei,
- Kathy Meier-Hellstern,
- Douglas Eck,
- Jeff Dean,
- Slav Petrov,
- Noah Fiedel
The Journal of Machine Learning Research (JMLR), Volume 24, Issue 1Article No.: 240, Pages 11324–11436Large language models have been shown to achieve remarkable performance across a variety of natural language tasks using few-shot learning, which drastically reduces the number of task-specific training examples needed to adapt the model to a particular ...
- research-articleJanuary 2022
Switch transformers: scaling to trillion parameter models with simple and efficient sparsity
The Journal of Machine Learning Research (JMLR), Volume 23, Issue 1Article No.: 120, Pages 5232–5270In deep learning, models typically reuse the same parameters for all inputs. Mixture of Experts (MoE) models defy this and instead select different parameters for each incoming example. The result is a sparsely-activated model--with an outrageous number ...
-
- research-articleJanuary 2022
A statistical approach for optimal topic model identification
The Journal of Machine Learning Research (JMLR), Volume 23, Issue 1Article No.: 58, Pages 2553–2572Latent Dirichlet Allocation (LDA) is a popular machine-learning technique that identifies latent structures in a corpus of documents. This paper addresses the ongoing concern that formal procedures for determining the optimal LDA configuration do not ...
- research-articleJanuary 2021
Further results on latent discourse models and word embeddings
The Journal of Machine Learning Research (JMLR), Volume 22, Issue 1Article No.: 270, Pages 12376–12411We discuss some properties of generative models for word embeddings. Namely, (Arora et al., 2016) proposed a latent discourse model implying the concentration of the partition function of the word vectors. This concentration phenomenon led to an ...
- research-articleJanuary 2021
Beyond english-centric multilingual machine translation
- Angela Fan,
- Shruti Bhosale,
- Holger Schwenk,
- Zhiyi Ma,
- Ahmed El-Kishky,
- Siddharth Goyal,
- Mandeep Baines,
- Onur Celebi,
- Guillaume Wenzek,
- Vishrav Chaudhary,
- Naman Goyal,
- Tom Birch,
- Vitaliy Liptchinsky,
- Sergey Edunov,
- Edouard Grave,
- Michael Auli,
- Armand Joulin
The Journal of Machine Learning Research (JMLR), Volume 22, Issue 1Article No.: 107, Pages 4839–4886Existing work in translation demonstrated the potential of massively multilingual machine translation by training a single model able to translate between any pair of languages. However, much of this work is English-Centric, training only on data which ...
- research-articleJanuary 2021
LocalGAN: modeling local distributions for adversarial response generation
The Journal of Machine Learning Research (JMLR), Volume 22, Issue 1Article No.: 101, Pages 4578–4606This paper presents a new methodology for modeling the local semantic distribution of responses to a given query in the human-conversation corpus, and on this basis, explores a specified adversarial learning mechanism for training Neural Response ...
- research-articleJanuary 2021
Bayesian text classification and summarization via a class-specified topic model
The Journal of Machine Learning Research (JMLR), Volume 22, Issue 1Article No.: 89, Pages 3971–4018We propose the class-specified topic model (CSTM) to deal with the tasks of text classification and class-specific text summarization. The model assumes that in addition to a set of latent topics that are shared across classes, there is a set of class-...
- research-articleJanuary 2021
Residual energy-based models for text
The Journal of Machine Learning Research (JMLR), Volume 22, Issue 1Article No.: 40, Pages 1840–1880Current large-scale auto-regressive language models (Radford et al., 2019; Liu et al., 2018; Graves, 2013) display impressive fluency and can generate convincing text. In this work we start by asking the question: Can the generations of these models be ...
- research-articleJanuary 2020
Exploring the limits of transfer learning with a unified text-to-text transformer
- Colin Raffel,
- Noam Shazeer,
- Adam Roberts,
- Katherine Lee,
- Sharan Narang,
- Michael Matena,
- Yanqi Zhou,
- Wei Li,
- Peter J. Liu
The Journal of Machine Learning Research (JMLR), Volume 21, Issue 1Article No.: 140, Pages 5485–5551Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a ...
- articleFebruary 2013
Ranked bandits in metric spaces: learning diverse rankings over large document collections
Most learning to rank research has assumed that the utility of different documents is independent, which results in learned ranking functions that return redundant results. The few approaches that avoid this have rather unsatisfyingly lacked theoretical ...
- articleJanuary 2013
MAGIC summoning: towards automatic suggesting and testing of gestures with low probability of false positives during use
Gestures for interfaces should be short, pleasing, intuitive, and easily recognized by a computer. However, it is a challenge for interface designers to create gestures easily distinguishable from users' normal movements. Our tool MAGIC Summoning ...
- articleDecember 2012
Exploration in relational domains for model-based reinforcement learning
A fundamental problem in reinforcement learning is balancing exploration and exploitation. We address this problem in the context of model-based reinforcement learning in large stochastic relational domains by developing relational extensions of the ...
- articleDecember 2012
Security analysis of online centroid anomaly detection
Security issues are crucial in a number of machine learning applications, especially in scenarios dealing with human activity rather than natural phenomena (e.g., information ranking, spam detection, malware detection, etc.). In such cases, learning ...
- articleDecember 2012
Smoothing multivariate performance measures
Optimizing multivariate performance measure is an important task in Machine Learning. Joachims (2005) introduced a Support Vector Method whose underlying optimization problem is commonly solved by cutting plane methods (CPMs) such as SVM-Perf and BMRM. ...
- articleDecember 2012
SVDFeature: a toolkit for feature-based collaborative filtering
In this paper we introduce SVDFeature, a machine learning toolkit for feature-based collaborative filtering. SVDFeature is designed to efficiently solve the feature-based matrix factorization. The feature-based setting allows us to build factorization ...
- articleDecember 2012
Learning symbolic representations of hybrid dynamical systems
A hybrid dynamical system is a mathematical model suitable for describing an extensive spectrum of multi-modal, time-series behaviors, ranging from bouncing balls to air traffic controllers. This paper describes multi-modal symbolic regression (MMSR): a ...
- articleDecember 2012
Regularized bundle methods for convex and non-convex risks
Machine learning is most often cast as an optimization problem. Ideally, one expects a convex objective function to rely on efficient convex optimizers with nice guarantees such as no local optima. Yet, non-convexity is very frequent in practice and it ...