Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article
Free access

Further results on latent discourse models and word embeddings

Published: 01 January 2021 Publication History

Abstract

We discuss some properties of generative models for word embeddings. Namely, (Arora et al., 2016) proposed a latent discourse model implying the concentration of the partition function of the word vectors. This concentration phenomenon led to an asymptotic linear relation between the pointwise mutual information (PMI) of pairs of words and the scalar product of their vectors. Here, we first revisit this concentration phenomenon and prove it under slightly weaker assumptions, for a set of random vectors symmetrically distributed around the origin. Second, we empirically evaluate the relation between PMI and scalar products of word vectors satisfying the concentration property. Our empirical results indicate that, in practice, this relation does not hold with arbitrarily small error. This observation is further supported by two theoretical results: (i) the error cannot be exactly zero because the corresponding shifted PMI matrix cannot be positive semidefinite; (ii) under mild assumptions, there exist pairs of words for which the error cannot be close to zero. We deduce that either natural language does not follow the assumptions of the considered generative model, or the current word vector generation methods do not allow the construction of the hypothesized word embeddings.

References

[1]
Sanjeev Arora, Yuanzhi Li, Yingyu Liang, Tengyu Ma, and Andrej Risteski. A latent variable model approach to PMI-based word embeddings. Transactions of the Association for Computational Linguistics, 4:385-399, 2016.
[2]
Sanjeev Arora, Mikhail Khodak, Nikunj Saunshi, and Kiran Vodrahalli. A compressed sensing view of unsupervised text embeddings, bag-of-n-grams, and lstms. In Proceedings of the 6th International Conference on Learning Representations (ICLR), 2018a.
[3]
Sanjeev Arora, Yuanzhi Li, Yingyu Liang, Tengyu Ma, and Andrej Risteski. Linear algebraic structure of word senses, with applications to polysemy. Transactions of the Association for Computational Linguistics, 6:483-495, 2018b.
[4]
Sanjeev Arora, Yuanzhi Li, Yingyu Liang, Tengyu Ma, and Andrej Risteski. A latent variable model approach to PMI-based word embeddings, 2019. URL https://arxiv.org/abs/1502.03520.
[5]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171-4186, Minneapolis, Minnesota, June 2019. Association for Computational Linguistics.
[6]
John Duchi, Elad Hazan, and Yoram Singer. Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research, 12:2121-2159, 2011.
[7]
John Rupert Firth. A synopsis of linguistic theory. 1957.
[8]
G. A. Golub and C.F. Van Loan. Matrix Computations, 3rd edition. The John Hopkins University Press, London, 1996.
[9]
Sepp Hochreiter and Jürgen Schmidhuber. Long short-term memory. Neural computation, 9(8): 1735-1780, 1997.
[10]
Sammy Khalife, Leo Liberti, and Michalis Vazirgiannis. Geometry and analogies: a study and propagation method for word representations. In International Conference on Statistical Language and Speech Processing, pages 100-111. Springer, 2019.
[11]
Sammy Khalife, Douglas Gonçalves, and Leo Liberti. Distance geometry for word embeddings and applications. 2021.
[12]
Omer Levy and Yoav Goldberg. Neural word embedding as implicit matrix factorization. In Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger, editors, Advances in Neural Information Processing Systems 27, pages 2177-2185. Curran Associates, Inc., 2014. URL http://papers.nips.cc/paper/5477-neural-word-embedding-as-implicit-matrix-factorization.pdf.
[13]
Gian Marconi, Carlo Ciliberto, and Lorenzo Rosasco. Hyperbolic manifold regression. In International Conference on Artificial Intelligence and Statistics, pages 2570-2580. PMLR, 2020.
[14]
Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781, 2013a.
[15]
Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems, pages 3111-3119, 2013b.
[16]
Jeffrey Pennington, Richard Socher, and Christopher Manning. Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pages 1532-1543, 2014.
[17]
Francois Role and Mohamed Nadif. Handling the impact of low frequency events on co-occurrence based measures of word similarity. 2011.
[18]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need. In Advances in neural information processing systems, pages 5998-6008, 2017.

Index Terms

  1. Further results on latent discourse models and word embeddings
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image The Journal of Machine Learning Research
        The Journal of Machine Learning Research  Volume 22, Issue 1
        January 2021
        13310 pages
        ISSN:1532-4435
        EISSN:1533-7928
        Issue’s Table of Contents
        CC-BY 4.0

        Publisher

        JMLR.org

        Publication History

        Accepted: 01 November 2021
        Revised: 01 May 2021
        Published: 01 January 2021
        Received: 01 December 2020
        Published in JMLR Volume 22, Issue 1

        Author Tags

        1. generative models
        2. latent variable models
        3. asymptotic concentration
        4. natural language processing
        5. matrix factorization

        Qualifiers

        • Research-article

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • 0
          Total Citations
        • 47
          Total Downloads
        • Downloads (Last 12 months)26
        • Downloads (Last 6 weeks)9
        Reflects downloads up to 28 Jan 2025

        Other Metrics

        Citations

        View Options

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Login options

        Full Access

        Figures

        Tables

        Media

        Share

        Share

        Share this Publication link

        Share on social media