Search | arXiv e-print repository

A Discriminative Latent-Variable Model for Bilingual Lexicon Induction

Authors: Sebastian Ruder, Ryan Cotterell, Yova Kementchedjhieva, Anders Søgaard

Abstract: We introduce a novel discriminative latent variable model for bilingual lexicon induction. Our model combines the bipartite matching dictionary prior of Haghighi et al. (2008) with a representation-based approach (Artetxe et al., 2017). To train the model, we derive an efficient Viterbi EM algorithm. We provide empirical results on six language pairs under two metrics and show that the prior impro… ▽ More We introduce a novel discriminative latent variable model for bilingual lexicon induction. Our model combines the bipartite matching dictionary prior of Haghighi et al. (2008) with a representation-based approach (Artetxe et al., 2017). To train the model, we derive an efficient Viterbi EM algorithm. We provide empirical results on six language pairs under two metrics and show that the prior improves the induced bilingual lexicons. We also demonstrate how previous work may be viewed as a similarly fashioned latent-variable model, albeit with a different prior. △ Less

Submitted 10 March, 2024; v1 submitted 28 August, 2018; originally announced August 2018.

Comments: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

arXiv:1805.03620 [pdf, other]

On the Limitations of Unsupervised Bilingual Dictionary Induction

Authors: Anders Søgaard, Sebastian Ruder, Ivan Vulić

Abstract: Unsupervised machine translation---i.e., not assuming any cross-lingual supervision signal, whether a dictionary, translations, or comparable corpora---seems impossible, but nevertheless, Lample et al. (2018) recently proposed a fully unsupervised machine translation (MT) model. The model relies heavily on an adversarial, unsupervised alignment of word embedding spaces for bilingual dictionary ind… ▽ More Unsupervised machine translation---i.e., not assuming any cross-lingual supervision signal, whether a dictionary, translations, or comparable corpora---seems impossible, but nevertheless, Lample et al. (2018) recently proposed a fully unsupervised machine translation (MT) model. The model relies heavily on an adversarial, unsupervised alignment of word embedding spaces for bilingual dictionary induction (Conneau et al., 2018), which we examine here. Our results identify the limitations of current unsupervised MT: unsupervised bilingual dictionary induction performs much worse on morphologically rich languages that are not dependent marking, when monolingual corpora from different domains or different embedding algorithms are used. We show that a simple trick, exploiting a weak supervision signal from identical words, enables more robust induction, and establish a near-perfect correlation between unsupervised bilingual dictionary induction performance and a previously unexplored graph similarity metric. △ Less

Submitted 9 May, 2018; originally announced May 2018.

Comments: ACL 2018

arXiv:1802.09913 [pdf, other]

Multi-task Learning of Pairwise Sequence Classification Tasks Over Disparate Label Spaces

Authors: Isabelle Augenstein, Sebastian Ruder, Anders Søgaard

Abstract: We combine multi-task learning and semi-supervised learning by inducing a joint embedding space between disparate label spaces and learning transfer functions between label embeddings, enabling us to jointly leverage unlabelled data and auxiliary, annotated datasets. We evaluate our approach on a variety of sequence classification tasks with disparate label spaces. We outperform strong single and… ▽ More We combine multi-task learning and semi-supervised learning by inducing a joint embedding space between disparate label spaces and learning transfer functions between label embeddings, enabling us to jointly leverage unlabelled data and auxiliary, annotated datasets. We evaluate our approach on a variety of sequence classification tasks with disparate label spaces. We outperform strong single and multi-task baselines and achieve a new state-of-the-art for topic-based sentiment analysis. △ Less

Submitted 9 April, 2018; v1 submitted 27 February, 2018; originally announced February 2018.

Comments: To appear at NAACL 2018 (long)

arXiv:1708.00524 [pdf, other]

doi 10.18653/v1/D17-1169

Using millions of emoji occurrences to learn any-domain representations for detecting sentiment, emotion and sarcasm

Authors: Bjarke Felbo, Alan Mislove, Anders Søgaard, Iyad Rahwan, Sune Lehmann

Abstract: NLP tasks are often limited by scarcity of manually annotated data. In social media sentiment analysis and related tasks, researchers have therefore used binarized emoticons and specific hashtags as forms of distant supervision. Our paper shows that by extending the distant supervision to a more diverse set of noisy labels, the models can learn richer representations. Through emoji prediction on a… ▽ More NLP tasks are often limited by scarcity of manually annotated data. In social media sentiment analysis and related tasks, researchers have therefore used binarized emoticons and specific hashtags as forms of distant supervision. Our paper shows that by extending the distant supervision to a more diverse set of noisy labels, the models can learn richer representations. Through emoji prediction on a dataset of 1246 million tweets containing one of 64 common emojis we obtain state-of-the-art performance on 8 benchmark datasets within sentiment, emotion and sarcasm detection using a single pretrained model. Our analyses confirm that the diversity of our emotional labels yield a performance improvement over previous distant supervision approaches. △ Less

Submitted 7 October, 2017; v1 submitted 1 August, 2017; originally announced August 2017.

Comments: Accepted at EMNLP 2017. Please include EMNLP in any citations. Minor changes from the EMNLP camera-ready version. 9 pages + references and supplementary material

Journal ref: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

arXiv:1705.08142 [pdf, other]

Latent Multi-task Architecture Learning

Authors: Sebastian Ruder, Joachim Bingel, Isabelle Augenstein, Anders Søgaard

Abstract: Multi-task learning (MTL) allows deep neural networks to learn from related tasks by sharing parameters with other networks. In practice, however, MTL involves searching an enormous space of possible parameter sharing architectures to find (a) the layers or subspaces that benefit from sharing, (b) the appropriate amount of sharing, and (c) the appropriate relative weights of the different task los… ▽ More Multi-task learning (MTL) allows deep neural networks to learn from related tasks by sharing parameters with other networks. In practice, however, MTL involves searching an enormous space of possible parameter sharing architectures to find (a) the layers or subspaces that benefit from sharing, (b) the appropriate amount of sharing, and (c) the appropriate relative weights of the different task losses. Recent work has addressed each of the above problems in isolation. In this work we present an approach that learns a latent multi-task architecture that jointly addresses (a)--(c). We present experiments on synthetic data and data from OntoNotes 5.0, including four different tasks and seven different domains. Our extension consistently outperforms previous approaches to learning latent architectures for multi-task problems and achieves up to 15% average error reductions over common approaches to MTL. △ Less

Submitted 19 November, 2018; v1 submitted 23 May, 2017; originally announced May 2017.

Comments: To appear in Proceedings of AAAI 2019

arXiv:1704.00514 [pdf, ps, other]

Multi-Task Learning of Keyphrase Boundary Classification

Authors: Isabelle Augenstein, Anders Søgaard

Abstract: Keyphrase boundary classification (KBC) is the task of detecting keyphrases in scientific articles and labelling them with respect to predefined types. Although important in practice, this task is so far underexplored, partly due to the lack of labelled data. To overcome this, we explore several auxiliary tasks, including semantic super-sense tagging and identification of multi-word expressions, a… ▽ More Keyphrase boundary classification (KBC) is the task of detecting keyphrases in scientific articles and labelling them with respect to predefined types. Although important in practice, this task is so far underexplored, partly due to the lack of labelled data. To overcome this, we explore several auxiliary tasks, including semantic super-sense tagging and identification of multi-word expressions, and cast the task as a multi-task learning problem with deep recurrent neural networks. Our multi-task models perform significantly better than previous state of the art approaches on two scientific KBC datasets, particularly for long keyphrases. △ Less

Submitted 26 April, 2017; v1 submitted 3 April, 2017; originally announced April 2017.

Comments: ACL 2017

arXiv:1703.03507 [pdf, other]

doi 10.1103/PhysRevD.96.074034

Decorrelated Jet Substructure Tagging using Adversarial Neural Networks

Authors: Chase Shimmin, Peter Sadowski, Pierre Baldi, Edison Weik, Daniel Whiteson, Edward Goul, Andreas Søgaard

Abstract: We describe a strategy for constructing a neural network jet substructure tagger which powerfully discriminates boosted decay signals while remaining largely uncorrelated with the jet mass. This reduces the impact of systematic uncertainties in background modeling while enhancing signal purity, resulting in improved discovery significance relative to existing taggers. The network is trained using… ▽ More We describe a strategy for constructing a neural network jet substructure tagger which powerfully discriminates boosted decay signals while remaining largely uncorrelated with the jet mass. This reduces the impact of systematic uncertainties in background modeling while enhancing signal purity, resulting in improved discovery significance relative to existing taggers. The network is trained using an adversarial strategy, resulting in a tagger that learns to balance classification accuracy with decorrelation. As a benchmark scenario, we consider the case where large-radius jets originating from a boosted resonance decay are discriminated from a background of nonresonant quark and gluon jets. We show that in the presence of systematic uncertainties on the background rate, our adversarially-trained, decorrelated tagger considerably outperforms a conventionally trained neural network, despite having a slightly worse signal-background separation power. We generalize the adversarial training technique to include a parametric dependence on the signal hypothesis, training a single network that provides optimized, interpolatable decorrelated jet tagging across a continuous range of hypothetical resonance masses, after training on discrete choices of the signal mass. △ Less

Submitted 9 March, 2017; originally announced March 2017.

Journal ref: Phys. Rev. D 96, 074034 (2017)

arXiv:1611.06245 [pdf, other]

Spikes as regularizers

Authors: Anders Søgaard

Abstract: We present a confidence-based single-layer feed-forward learning algorithm SPIRAL (Spike Regularized Adaptive Learning) relying on an encoding of activation spikes. We adaptively update a weight vector relying on confidence estimates and activation offsets relative to previous activity. We regularize updates proportionally to item-level confidence and weight-specific support, loosely inspired by t… ▽ More We present a confidence-based single-layer feed-forward learning algorithm SPIRAL (Spike Regularized Adaptive Learning) relying on an encoding of activation spikes. We adaptively update a weight vector relying on confidence estimates and activation offsets relative to previous activity. We regularize updates proportionally to item-level confidence and weight-specific support, loosely inspired by the observation from neurophysiology that high spike rates are sometimes accompanied by low temporal precision. Our experiments suggest that the new learning algorithm SPIRAL is more robust and less prone to overfitting than both the averaged perceptron and AROW. △ Less

Submitted 18 November, 2016; originally announced November 2016.

Comments: Computing with Spikes at NIPS 2016

Showing 1–8 of 8 results for author: Søgaard, A