Article

Distributed representations of words and phrases and their compositionality

Authors:

Ilya Sutskever,

Jeffrey DeanAuthors Info & Claims

NIPS'13: Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 2

Pages 3111 - 3119

Published: 05 December 2013 Publication History

Abstract

The recently introduced continuous Skip-gram model is an efficient method for learning high-quality distributed vector representations that capture a large number of precise syntactic and semantic word relationships. In this paper we present several extensions that improve both the quality of the vectors and the training speed. By subsampling of the frequent words we obtain significant speedup and also learn more regular word representations. We also describe a simple alternative to the hierarchical softmax called negative sampling.

An inherent limitation of word representations is their indifference to word order and their inability to represent idiomatic phrases. For example, the meanings of "Canada" and "Air" cannot be easily combined to obtain "Air Canada". Motivated by this example, we present a simple method for finding phrases in text, and show that learning good vector representations for millions of phrases is possible.

References

[1]

Yoshua Bengio, Réjean Ducharme, Pascal Vincent, and Christian Janvin. A neural probabilistic language model. The Journal of Machine Learning Research, 3:1137-1155, 2003.

Digital Library

[2]

Ronan Collobert and Jason Weston. A unified architecture for natural language processing: deep neural networks with multitask learning. In Proceedings of the 25th international conference on Machine learning, pages 160-167. ACM, 2008.

Digital Library

[3]

Xavier Glorot, Antoine Bordes, and Yoshua Bengio. Domain adaptation for large-scale sentiment classification: A deep learning approach. In ICML, 513-520, 2011.

Digital Library

[4]

Michael U Gutmann and Aapo Hyvärinen. Noise-contrastive estimation of unnormalized statistical models, with applications to natural image statistics. The Journal of Machine Learning Research, 13:307-361, 2012.

Digital Library

[5]

Tomas Mikolov, Stefan Kombrink, Lukas Burget, Jan Cernocky, and Sanjeev Khudanpur. Extensions of recurrent neural network language model. In Acoustics, Speech and Signal Processing (ICASSP), 2011 IEEE International Conference on, pages 5528-5531. IEEE, 2011.

[6]

Tomas Mikolov, Anoop Deoras, Daniel Povey, Lukas Burget and Jan Cernocky. Strategies for Training Large Scale Neural Network Language Models. In Proc. Automatic Speech Recognition and Understanding, 2011.

[7]

Tomas Mikolov. Statistical Language Models Based on Neural Networks. PhD thesis, PhD Thesis, Brno University of Technology, 2012.

[8]

Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. Efficient estimation of word representations in vector space. ICLR Workshop, 2013.

[9]

Tomas Mikolov, Wen-tau Yih and Geoffrey Zweig. Linguistic Regularities in Continuous Space Word Representations. In Proceedings of NAACL HLT, 2013.

[10]

Andriy Mnih and Geoffrey E Hinton. A scalable hierarchical distributed language model. Advances in neural information processing systems, 21:1081-1088, 2009.

Digital Library

[11]

Andriy Mnih and Yee Whye Teh. A fast and simple algorithm for training neural probabilistic language models. arXiv preprint arXiv:1206.6426, 2012.

Digital Library

[12]

Frederic Morin and Yoshua Bengio. Hierarchical probabilistic neural network language model. In Proceedings of the international workshop on artificial intelligence and statistics, pages 246-252, 2005.

[13]

David E Rumelhart, Geoffrey E Hintont, and Ronald J Williams. Learning representations by back-propagating errors. Nature, 323(6088):533-536, 1986.

[14]

Holger Schwenk. Continuous space language models. Computer Speech and Language, vol. 21, 2007.

Digital Library

[15]

Richard Socher, Cliff C. Lin, Andrew Y. Ng, and Christopher D. Manning. Parsing natural scenes and natural language with recursive neural networks. In Proceedings of the 26th International Conference on Machine Learning (ICML), volume 2, 2011.

[16]

Richard Socher, Brody Huval, Christopher D. Manning, and Andrew Y. Ng. Semantic Compositionality Through Recursive Matrix-Vector Spaces. In Proceedings of the 2012 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2012.

Digital Library

[17]

Joseph Turian, Lev Ratinov, and Yoshua Bengio. Word representations: a simple and general method for semi-supervised learning. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pages 384-394. Association for Computational Linguistics, 2010.

Digital Library

[18]

Peter D. Turney and Patrick Pantel. From frequency to meaning: Vector space models of semantics. In Journal of Artificial Intelligence Research, 37:141-188, 2010.

Digital Library

[19]

Peter D. Turney. Distributional semantics beyond words: Supervised learning of analogy and paraphrase. In Transactions of the Association for Computational Linguistics (TACL), 353-366, 2013.

[20]

Jason Weston, Samy Bengio, and Nicolas Usunier. Wsabie: Scaling up to large vocabulary image annotation. In Proceedings of the Twenty-Second international joint conference on Artificial Intelligence-Volume Volume Three, pages 2764-2770. AAAI Press, 2011.

Digital Library

Cited By

Xiang YDing ZGuo RWang SXie XZhou S(2025)Capsule: An Out-of-Core Training Mechanism for Colossal GNNsProceedings of the ACM on Management of Data10.1145/37096693:1(1-30)Online publication date: 11-Feb-2025
https://dl.acm.org/doi/10.1145/3709669
V. Dong HFang YLauw HNejdl WAuer SKarras OCha MMoens MNajork M(2025)A Contrastive Framework with User, Item and Review Alignment for RecommendationProceedings of the Eighteenth ACM International Conference on Web Search and Data Mining10.1145/3701551.3703530(117-126)Online publication date: 10-Mar-2025
https://dl.acm.org/doi/10.1145/3701551.3703530
Choi YChoi JKo TKim CNejdl WAuer SKarras OCha MMoens MNajork M(2025)Review-Based Hyperbolic Cross-Domain RecommendationProceedings of the Eighteenth ACM International Conference on Web Search and Data Mining10.1145/3701551.3703486(146-155)Online publication date: 10-Mar-2025
https://dl.acm.org/doi/10.1145/3701551.3703486
Show More Cited By

Index Terms

Distributed representations of words and phrases and their compositionality

Index terms have been assigned to the content through auto-classification.

Recommendations

Vietnamese Paraphrase Identification Using Matching Duplicate Phrases and Similar Words
Future Data and Security Engineering
Abstract
Paraphrase identification is a core component for many significant tasks in natural language processing (e.g., text summarization, headline generation). A method suggested by Bach et al. for detecting Vietnamese paraphrase text using nine ...
Learning Distributed Representations of Uyghur Words and Morphemes
Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data
Abstract
While distributed representations have proven to be very successful in a variety of NLP tasks, learning distributed representations for agglutinative languages such as Uyghur still faces a major challenge: most words are composed of many morphemes ...
Extension of Zipf's law to words and phrases
COLING '02: Proceedings of the 19th international conference on Computational linguistics - Volume 1

Zipf's law states that the frequency of word tokens in a large corpus of natural language is inversely proportional to the rank. The law is investigated for two languages English and Mandarin and for n-gram word phrases as well as for single words. The ...

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings

NIPS'13: Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 2

December 2013

3236 pages

Publisher

Curran Associates Inc.

Red Hook, NY, United States

Publication History

Published: 05 December 2013

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2,749
Total Citations
View Citations
1
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Xiang YDing ZGuo RWang SXie XZhou S(2025)Capsule: An Out-of-Core Training Mechanism for Colossal GNNsProceedings of the ACM on Management of Data10.1145/37096693:1(1-30)Online publication date: 11-Feb-2025
https://dl.acm.org/doi/10.1145/3709669
V. Dong HFang YLauw HNejdl WAuer SKarras OCha MMoens MNajork M(2025)A Contrastive Framework with User, Item and Review Alignment for RecommendationProceedings of the Eighteenth ACM International Conference on Web Search and Data Mining10.1145/3701551.3703530(117-126)Online publication date: 10-Mar-2025
https://dl.acm.org/doi/10.1145/3701551.3703530
Choi YChoi JKo TKim CNejdl WAuer SKarras OCha MMoens MNajork M(2025)Review-Based Hyperbolic Cross-Domain RecommendationProceedings of the Eighteenth ACM International Conference on Web Search and Data Mining10.1145/3701551.3703486(146-155)Online publication date: 10-Mar-2025
https://dl.acm.org/doi/10.1145/3701551.3703486
Xiao MWu MQiao ZFu YNing ZDu YZhou Y(2025)Interdisciplinary Fairness in Imbalanced Research Proposal Topic Inference: A Hierarchical Transformer-based Method with Selective InterpolationACM Transactions on Knowledge Discovery from Data10.1145/367114919:2(1-21)Online publication date: 14-Feb-2025
https://dl.acm.org/doi/10.1145/3671149
Zhong RHu BWang FFeng YLi ZSong XWang YLou STan J(2025)Multi-factor embedding GNN-based traffic flow prediction considering intersection similarityNeurocomputing10.1016/j.neucom.2024.129193620:COnline publication date: 1-Mar-2025
https://dl.acm.org/doi/10.1016/j.neucom.2024.129193
Zhang YLiu YZhu JChen ZZhang F(2025)FRGEMExpert Systems with Applications: An International Journal10.1016/j.eswa.2024.125589262:COnline publication date: 1-Mar-2025
https://dl.acm.org/doi/10.1016/j.eswa.2024.125589
Zhang XLei X(2025)Predicting miRNA-drug interactions via dual-channel network based on TCN and BiLSTMFrontiers of Computer Science: Selected Publications from Chinese Universities10.1007/s11704-024-3862-119:5Online publication date: 1-May-2025
https://dl.acm.org/doi/10.1007/s11704-024-3862-1
Zhang XChen JLuo ZBai YHu CZhang R(2025)A multi-projection recurrent model for hypernym detection and discoveryFrontiers of Computer Science: Selected Publications from Chinese Universities10.1007/s11704-024-3638-719:4Online publication date: 1-Apr-2025
https://dl.acm.org/doi/10.1007/s11704-024-3638-7
Yuan DPeng XChen ZZhang TLei R(2025)Code context-based reviewer recommendationFrontiers of Computer Science: Selected Publications from Chinese Universities10.1007/s11704-023-3256-919:1Online publication date: 1-Jan-2025
https://dl.acm.org/doi/10.1007/s11704-023-3256-9
Shimmi SRahman AGadde MOkhravi HRahimi MBalzarotti DXu W(2024)VulSimProceedings of the 33rd USENIX Conference on Security Symposium10.5555/3698900.3699000(1777-1794)Online publication date: 14-Aug-2024
https://dl.acm.org/doi/10.5555/3698900.3699000
Show More Cited By

View Options

View options

Figures

Tables

Media

View Table of Conten