Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/2999792.2999959guideproceedingsArticle/Chapter ViewAbstractPublication PagesnipsConference Proceedingsconference-collections
Article

Distributed representations of words and phrases and their compositionality

Published: 05 December 2013 Publication History

Abstract

The recently introduced continuous Skip-gram model is an efficient method for learning high-quality distributed vector representations that capture a large number of precise syntactic and semantic word relationships. In this paper we present several extensions that improve both the quality of the vectors and the training speed. By subsampling of the frequent words we obtain significant speedup and also learn more regular word representations. We also describe a simple alternative to the hierarchical softmax called negative sampling.
An inherent limitation of word representations is their indifference to word order and their inability to represent idiomatic phrases. For example, the meanings of "Canada" and "Air" cannot be easily combined to obtain "Air Canada". Motivated by this example, we present a simple method for finding phrases in text, and show that learning good vector representations for millions of phrases is possible.

References

[1]
Yoshua Bengio, Réjean Ducharme, Pascal Vincent, and Christian Janvin. A neural probabilistic language model. The Journal of Machine Learning Research, 3:1137-1155, 2003.
[2]
Ronan Collobert and Jason Weston. A unified architecture for natural language processing: deep neural networks with multitask learning. In Proceedings of the 25th international conference on Machine learning, pages 160-167. ACM, 2008.
[3]
Xavier Glorot, Antoine Bordes, and Yoshua Bengio. Domain adaptation for large-scale sentiment classification: A deep learning approach. In ICML, 513-520, 2011.
[4]
Michael U Gutmann and Aapo Hyvärinen. Noise-contrastive estimation of unnormalized statistical models, with applications to natural image statistics. The Journal of Machine Learning Research, 13:307-361, 2012.
[5]
Tomas Mikolov, Stefan Kombrink, Lukas Burget, Jan Cernocky, and Sanjeev Khudanpur. Extensions of recurrent neural network language model. In Acoustics, Speech and Signal Processing (ICASSP), 2011 IEEE International Conference on, pages 5528-5531. IEEE, 2011.
[6]
Tomas Mikolov, Anoop Deoras, Daniel Povey, Lukas Burget and Jan Cernocky. Strategies for Training Large Scale Neural Network Language Models. In Proc. Automatic Speech Recognition and Understanding, 2011.
[7]
Tomas Mikolov. Statistical Language Models Based on Neural Networks. PhD thesis, PhD Thesis, Brno University of Technology, 2012.
[8]
Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. Efficient estimation of word representations in vector space. ICLR Workshop, 2013.
[9]
Tomas Mikolov, Wen-tau Yih and Geoffrey Zweig. Linguistic Regularities in Continuous Space Word Representations. In Proceedings of NAACL HLT, 2013.
[10]
Andriy Mnih and Geoffrey E Hinton. A scalable hierarchical distributed language model. Advances in neural information processing systems, 21:1081-1088, 2009.
[11]
Andriy Mnih and Yee Whye Teh. A fast and simple algorithm for training neural probabilistic language models. arXiv preprint arXiv:1206.6426, 2012.
[12]
Frederic Morin and Yoshua Bengio. Hierarchical probabilistic neural network language model. In Proceedings of the international workshop on artificial intelligence and statistics, pages 246-252, 2005.
[13]
David E Rumelhart, Geoffrey E Hintont, and Ronald J Williams. Learning representations by back-propagating errors. Nature, 323(6088):533-536, 1986.
[14]
Holger Schwenk. Continuous space language models. Computer Speech and Language, vol. 21, 2007.
[15]
Richard Socher, Cliff C. Lin, Andrew Y. Ng, and Christopher D. Manning. Parsing natural scenes and natural language with recursive neural networks. In Proceedings of the 26th International Conference on Machine Learning (ICML), volume 2, 2011.
[16]
Richard Socher, Brody Huval, Christopher D. Manning, and Andrew Y. Ng. Semantic Compositionality Through Recursive Matrix-Vector Spaces. In Proceedings of the 2012 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2012.
[17]
Joseph Turian, Lev Ratinov, and Yoshua Bengio. Word representations: a simple and general method for semi-supervised learning. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pages 384-394. Association for Computational Linguistics, 2010.
[18]
Peter D. Turney and Patrick Pantel. From frequency to meaning: Vector space models of semantics. In Journal of Artificial Intelligence Research, 37:141-188, 2010.
[19]
Peter D. Turney. Distributional semantics beyond words: Supervised learning of analogy and paraphrase. In Transactions of the Association for Computational Linguistics (TACL), 353-366, 2013.
[20]
Jason Weston, Samy Bengio, and Nicolas Usunier. Wsabie: Scaling up to large vocabulary image annotation. In Proceedings of the Twenty-Second international joint conference on Artificial Intelligence-Volume Volume Three, pages 2764-2770. AAAI Press, 2011.

Cited By

View all
  • (2025)Capsule: An Out-of-Core Training Mechanism for Colossal GNNsProceedings of the ACM on Management of Data10.1145/37096693:1(1-30)Online publication date: 11-Feb-2025
  • (2025)A Contrastive Framework with User, Item and Review Alignment for RecommendationProceedings of the Eighteenth ACM International Conference on Web Search and Data Mining10.1145/3701551.3703530(117-126)Online publication date: 10-Mar-2025
  • (2025)Review-Based Hyperbolic Cross-Domain RecommendationProceedings of the Eighteenth ACM International Conference on Web Search and Data Mining10.1145/3701551.3703486(146-155)Online publication date: 10-Mar-2025
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings
NIPS'13: Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 2
December 2013
3236 pages

Publisher

Curran Associates Inc.

Red Hook, NY, United States

Publication History

Published: 05 December 2013

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 05 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2025)Capsule: An Out-of-Core Training Mechanism for Colossal GNNsProceedings of the ACM on Management of Data10.1145/37096693:1(1-30)Online publication date: 11-Feb-2025
  • (2025)A Contrastive Framework with User, Item and Review Alignment for RecommendationProceedings of the Eighteenth ACM International Conference on Web Search and Data Mining10.1145/3701551.3703530(117-126)Online publication date: 10-Mar-2025
  • (2025)Review-Based Hyperbolic Cross-Domain RecommendationProceedings of the Eighteenth ACM International Conference on Web Search and Data Mining10.1145/3701551.3703486(146-155)Online publication date: 10-Mar-2025
  • (2025)Interdisciplinary Fairness in Imbalanced Research Proposal Topic Inference: A Hierarchical Transformer-based Method with Selective InterpolationACM Transactions on Knowledge Discovery from Data10.1145/367114919:2(1-21)Online publication date: 14-Feb-2025
  • (2025)Multi-factor embedding GNN-based traffic flow prediction considering intersection similarityNeurocomputing10.1016/j.neucom.2024.129193620:COnline publication date: 1-Mar-2025
  • (2025)FRGEMExpert Systems with Applications: An International Journal10.1016/j.eswa.2024.125589262:COnline publication date: 1-Mar-2025
  • (2025)Predicting miRNA-drug interactions via dual-channel network based on TCN and BiLSTMFrontiers of Computer Science: Selected Publications from Chinese Universities10.1007/s11704-024-3862-119:5Online publication date: 1-May-2025
  • (2025)A multi-projection recurrent model for hypernym detection and discoveryFrontiers of Computer Science: Selected Publications from Chinese Universities10.1007/s11704-024-3638-719:4Online publication date: 1-Apr-2025
  • (2025)Code context-based reviewer recommendationFrontiers of Computer Science: Selected Publications from Chinese Universities10.1007/s11704-023-3256-919:1Online publication date: 1-Jan-2025
  • (2024)VulSimProceedings of the 33rd USENIX Conference on Security Symposium10.5555/3698900.3699000(1777-1794)Online publication date: 14-Aug-2024
  • Show More Cited By

View Options

View options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media