-
A Discriminative Latent-Variable Model for Bilingual Lexicon Induction
Authors:
Sebastian Ruder,
Ryan Cotterell,
Yova Kementchedjhieva,
Anders Søgaard
Abstract:
We introduce a novel discriminative latent variable model for bilingual lexicon induction. Our model combines the bipartite matching dictionary prior of Haghighi et al. (2008) with a representation-based approach (Artetxe et al., 2017). To train the model, we derive an efficient Viterbi EM algorithm. We provide empirical results on six language pairs under two metrics and show that the prior impro…
▽ More
We introduce a novel discriminative latent variable model for bilingual lexicon induction. Our model combines the bipartite matching dictionary prior of Haghighi et al. (2008) with a representation-based approach (Artetxe et al., 2017). To train the model, we derive an efficient Viterbi EM algorithm. We provide empirical results on six language pairs under two metrics and show that the prior improves the induced bilingual lexicons. We also demonstrate how previous work may be viewed as a similarly fashioned latent-variable model, albeit with a different prior.
△ Less
Submitted 10 March, 2024; v1 submitted 28 August, 2018;
originally announced August 2018.
-
On the Limitations of Unsupervised Bilingual Dictionary Induction
Authors:
Anders Søgaard,
Sebastian Ruder,
Ivan Vulić
Abstract:
Unsupervised machine translation---i.e., not assuming any cross-lingual supervision signal, whether a dictionary, translations, or comparable corpora---seems impossible, but nevertheless, Lample et al. (2018) recently proposed a fully unsupervised machine translation (MT) model. The model relies heavily on an adversarial, unsupervised alignment of word embedding spaces for bilingual dictionary ind…
▽ More
Unsupervised machine translation---i.e., not assuming any cross-lingual supervision signal, whether a dictionary, translations, or comparable corpora---seems impossible, but nevertheless, Lample et al. (2018) recently proposed a fully unsupervised machine translation (MT) model. The model relies heavily on an adversarial, unsupervised alignment of word embedding spaces for bilingual dictionary induction (Conneau et al., 2018), which we examine here. Our results identify the limitations of current unsupervised MT: unsupervised bilingual dictionary induction performs much worse on morphologically rich languages that are not dependent marking, when monolingual corpora from different domains or different embedding algorithms are used. We show that a simple trick, exploiting a weak supervision signal from identical words, enables more robust induction, and establish a near-perfect correlation between unsupervised bilingual dictionary induction performance and a previously unexplored graph similarity metric.
△ Less
Submitted 9 May, 2018;
originally announced May 2018.
-
Multi-task Learning of Pairwise Sequence Classification Tasks Over Disparate Label Spaces
Authors:
Isabelle Augenstein,
Sebastian Ruder,
Anders Søgaard
Abstract:
We combine multi-task learning and semi-supervised learning by inducing a joint embedding space between disparate label spaces and learning transfer functions between label embeddings, enabling us to jointly leverage unlabelled data and auxiliary, annotated datasets. We evaluate our approach on a variety of sequence classification tasks with disparate label spaces. We outperform strong single and…
▽ More
We combine multi-task learning and semi-supervised learning by inducing a joint embedding space between disparate label spaces and learning transfer functions between label embeddings, enabling us to jointly leverage unlabelled data and auxiliary, annotated datasets. We evaluate our approach on a variety of sequence classification tasks with disparate label spaces. We outperform strong single and multi-task baselines and achieve a new state-of-the-art for topic-based sentiment analysis.
△ Less
Submitted 9 April, 2018; v1 submitted 27 February, 2018;
originally announced February 2018.
-
Using millions of emoji occurrences to learn any-domain representations for detecting sentiment, emotion and sarcasm
Authors:
Bjarke Felbo,
Alan Mislove,
Anders Søgaard,
Iyad Rahwan,
Sune Lehmann
Abstract:
NLP tasks are often limited by scarcity of manually annotated data. In social media sentiment analysis and related tasks, researchers have therefore used binarized emoticons and specific hashtags as forms of distant supervision. Our paper shows that by extending the distant supervision to a more diverse set of noisy labels, the models can learn richer representations. Through emoji prediction on a…
▽ More
NLP tasks are often limited by scarcity of manually annotated data. In social media sentiment analysis and related tasks, researchers have therefore used binarized emoticons and specific hashtags as forms of distant supervision. Our paper shows that by extending the distant supervision to a more diverse set of noisy labels, the models can learn richer representations. Through emoji prediction on a dataset of 1246 million tweets containing one of 64 common emojis we obtain state-of-the-art performance on 8 benchmark datasets within sentiment, emotion and sarcasm detection using a single pretrained model. Our analyses confirm that the diversity of our emotional labels yield a performance improvement over previous distant supervision approaches.
△ Less
Submitted 7 October, 2017; v1 submitted 1 August, 2017;
originally announced August 2017.
-
Latent Multi-task Architecture Learning
Authors:
Sebastian Ruder,
Joachim Bingel,
Isabelle Augenstein,
Anders Søgaard
Abstract:
Multi-task learning (MTL) allows deep neural networks to learn from related tasks by sharing parameters with other networks. In practice, however, MTL involves searching an enormous space of possible parameter sharing architectures to find (a) the layers or subspaces that benefit from sharing, (b) the appropriate amount of sharing, and (c) the appropriate relative weights of the different task los…
▽ More
Multi-task learning (MTL) allows deep neural networks to learn from related tasks by sharing parameters with other networks. In practice, however, MTL involves searching an enormous space of possible parameter sharing architectures to find (a) the layers or subspaces that benefit from sharing, (b) the appropriate amount of sharing, and (c) the appropriate relative weights of the different task losses. Recent work has addressed each of the above problems in isolation. In this work we present an approach that learns a latent multi-task architecture that jointly addresses (a)--(c). We present experiments on synthetic data and data from OntoNotes 5.0, including four different tasks and seven different domains. Our extension consistently outperforms previous approaches to learning latent architectures for multi-task problems and achieves up to 15% average error reductions over common approaches to MTL.
△ Less
Submitted 19 November, 2018; v1 submitted 23 May, 2017;
originally announced May 2017.
-
Multi-Task Learning of Keyphrase Boundary Classification
Authors:
Isabelle Augenstein,
Anders Søgaard
Abstract:
Keyphrase boundary classification (KBC) is the task of detecting keyphrases in scientific articles and labelling them with respect to predefined types. Although important in practice, this task is so far underexplored, partly due to the lack of labelled data. To overcome this, we explore several auxiliary tasks, including semantic super-sense tagging and identification of multi-word expressions, a…
▽ More
Keyphrase boundary classification (KBC) is the task of detecting keyphrases in scientific articles and labelling them with respect to predefined types. Although important in practice, this task is so far underexplored, partly due to the lack of labelled data. To overcome this, we explore several auxiliary tasks, including semantic super-sense tagging and identification of multi-word expressions, and cast the task as a multi-task learning problem with deep recurrent neural networks. Our multi-task models perform significantly better than previous state of the art approaches on two scientific KBC datasets, particularly for long keyphrases.
△ Less
Submitted 26 April, 2017; v1 submitted 3 April, 2017;
originally announced April 2017.
-
Decorrelated Jet Substructure Tagging using Adversarial Neural Networks
Authors:
Chase Shimmin,
Peter Sadowski,
Pierre Baldi,
Edison Weik,
Daniel Whiteson,
Edward Goul,
Andreas Søgaard
Abstract:
We describe a strategy for constructing a neural network jet substructure tagger which powerfully discriminates boosted decay signals while remaining largely uncorrelated with the jet mass. This reduces the impact of systematic uncertainties in background modeling while enhancing signal purity, resulting in improved discovery significance relative to existing taggers. The network is trained using…
▽ More
We describe a strategy for constructing a neural network jet substructure tagger which powerfully discriminates boosted decay signals while remaining largely uncorrelated with the jet mass. This reduces the impact of systematic uncertainties in background modeling while enhancing signal purity, resulting in improved discovery significance relative to existing taggers. The network is trained using an adversarial strategy, resulting in a tagger that learns to balance classification accuracy with decorrelation. As a benchmark scenario, we consider the case where large-radius jets originating from a boosted resonance decay are discriminated from a background of nonresonant quark and gluon jets. We show that in the presence of systematic uncertainties on the background rate, our adversarially-trained, decorrelated tagger considerably outperforms a conventionally trained neural network, despite having a slightly worse signal-background separation power. We generalize the adversarial training technique to include a parametric dependence on the signal hypothesis, training a single network that provides optimized, interpolatable decorrelated jet tagging across a continuous range of hypothetical resonance masses, after training on discrete choices of the signal mass.
△ Less
Submitted 9 March, 2017;
originally announced March 2017.
-
Spikes as regularizers
Authors:
Anders Søgaard
Abstract:
We present a confidence-based single-layer feed-forward learning algorithm SPIRAL (Spike Regularized Adaptive Learning) relying on an encoding of activation spikes. We adaptively update a weight vector relying on confidence estimates and activation offsets relative to previous activity. We regularize updates proportionally to item-level confidence and weight-specific support, loosely inspired by t…
▽ More
We present a confidence-based single-layer feed-forward learning algorithm SPIRAL (Spike Regularized Adaptive Learning) relying on an encoding of activation spikes. We adaptively update a weight vector relying on confidence estimates and activation offsets relative to previous activity. We regularize updates proportionally to item-level confidence and weight-specific support, loosely inspired by the observation from neurophysiology that high spike rates are sometimes accompanied by low temporal precision. Our experiments suggest that the new learning algorithm SPIRAL is more robust and less prone to overfitting than both the averaged perceptron and AROW.
△ Less
Submitted 18 November, 2016;
originally announced November 2016.