Abstract
In natural language processing, most text representation methods can be generally categorized into two paradigms: static and dynamic. Both have distinctive advantages, which are reflected in the cost of training resources, the scale of input data, and the interpretability of the representation model. Dynamic representation methods, such as BERT, have achieved excellent results on many tasks based on expensive pre-training. However, this representation paradigm is black-box, and the intrinsic properties cannot be measured by standard word similarity and analogy benchmarks. Most importantly, it is not in all cases that there are adequate resources and unlimited data to use. While static methods are solid alternatives for these scenarios because they can be efficiently trained with limited resources, keeping straightforward interpretability and verifiable intrinsic properties. Although many static embedding methods have been proposed, few attempts have been made to investigate the connections between these algorithms. Thus, it is natural to ask which implementation is more efficient, and is there any way to combine the merits of these algorithms into a generalized framework? In this paper, we try to explore answers to these questions by focusing on two popular static embedding models, Continual-Bag-of-Words (CBOW) and Skip-gram (SG), with detailed analysis of their merits and drawbacks under both Negative Sampling (NS) and Hierarchy Softmax (HS) settings. Then, we propose a novel learning framework to train generalized static embeddings in a unified architecture. Our proposed method is estimator-agnostic. Thus, it can be optimized by either NS, HS, or any other equivalent estimators. Experiments show that embeddings learned from the proposed framework outperform strong baselines on standard intrinsic evaluations. We also test the proposed method on three extrinsic tasks. Empirical results show that the proposed method achieves considerable improvements across all these tasks.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
References
Chen K, Zhang Z, Long J, Zhang H (2016) Turning from TF-IDF to TF-IGM for term weighting in text classification. Expert Syst Appl 66:245. https://doi.org/10.1016/j.eswa.2016.09.009
Wang T, Guo J, Wu Z, Xu T (2021) IFTA: Iterative Filtering by using TF-AICL algorithm for Chinese encyclopedia knowledge refinement Applied Intelligence. https://doi.org/10.1007/s10489-021-02220-w
Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient Estimation of Word Representations in Vector Space, in 1st International Conference on Learning Representations, ICLR 2013, Scottsdale, Arizona, USA, May 2-4, 2013 Workshop Track Proceedings
Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed Representations of Words and Phrases and their Compositionality, in Advances in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems 2013. Proceedings of a meeting held December 5-8, 2013, Lake Tahoe, Nevada, United States, pp 3111–3119
Pennington J, Socher R, Manning C (2014) Glove: global vectors for word representation. In: proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp 1532–1543
Peters M, Neumann M, Iyyer M, Gardner M, Clark C, Lee K, Zettlemoyer L (2018) Deep contextualized word representations, in Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational linguistics: Human Language Technologies, Volume 1 (Long Papers), pp 2227–2237
Devlin J, Chang MW, Lee K, Toutanova K (2019) BERT:, Pre-training of Deep Bidirectional Transformers for Language Understanding, in Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp 4171–4186
Pota M, Ventura M, Fujita H, Esposito M (2021) Multilingual evaluation of pre-processing for BERT-based sentiment analysis of tweets. Expert Syst Appl 181:115119. https://doi.org/10.1016/j.eswa.2021.115119
Guarasci R, Silvestri S, De Pietro G, Fujita H, Esposito M (2021) Assessing BERT’s ability to learn Italian syntax: a study on null-subject and agreement phenomena Journal of Ambient Intelligence and Humanized Computing. https://doi.org/10.1007/s12652-021-03297-4
Williams A, Nangia N, Bowman SR (2018) A broad-Coverage challenge corpus for sentence understanding through inference, inproceedings of the 2018 conference of the north american chapter of the association for computational linguistics: human language technologies, NAACL-HLT 2018, new orleans, louisiana, USA, June 1-6, 2018, Volume 1 (Long, Papers), pp 1112–1122
Pota M, Esposito M, De Pietro G, Fujita H (2020) Best practices of convolutional neural networks for question classification. Appl Sci 10:4710. https://doi.org/10.3390/app10144710
Catelli R, Gargiulo F, Casola V, De Pietro G, Fujita H, Esposito M (2020) Crosslingual named entity recognition for clinical de-identification applied to a COVID-19 Italian data set. Appl Soft Comput 97:106779. https://doi.org/10.1016/j.asoc.2020.106779
Shen D, Cheng P, Sundararaman D, Zhang X, Yang Q, Tang M, Celikyilmaz A (2019)
Strubell E, Ganesh A, McCallum A (2019) Energy and policy considerations for deep learning in NLP. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp 3645–3650
Joulin A, Grave E, Bojanowski P, Mikolov T (2017) Bag of tricks for efficient text classification, in Proceedings of the 15th Conference of the European Chapter of the Association for Computational linguistics: Volume 2, Short Papers, pp 427–431
Kandola EJ, Hofmann T, Poggio T, Shawe-Taylor J (2006) A Neural Probabilistic Language Model, 194, pp 137–186
Kenter T, Borisov A, de Rijke M (2016) Siamese CBOW: optimizing word embeddings for sentence representations. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume, 1: Long Papers), pp 941–951
Florian M, Lukas G (2019) Ansgar scherp CBOW is not all you need: combining CBOW with the compositional matrix space model
Luo Q, Xu W, Guo J (2014) A Study on the CBOW Model’s Overfitting and Stability
Mu C, Yang G, Yan Z (2018) Revisiting skip-Gram negative sampling model with rectification
Leimeister M, Wilson BJ (2018) Skip-gram word embeddings in hyperbolic space, CoRR abs/1809.01498
Fonarev A, Grinchuk O, Gusev G, Serdyukov P, Oseledets I (2017) Riemannian optimization for skip-Gram negative sampling. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume, 1: Long Papers), pp 2028–2036
Brazinskas A, Havrylov S, Titov I (2018) Embedding Words as Distributions with a Bayesian Skip-gram Model, Inproceedings of the 27th International Conference on Computational Linguistics, COLING 2018, Santa Fe, New Mexico, USA, August 20-26, 2018, pp 1775–1789
Seymour Z, Li Y, Zhang Z (2015) Multimodal Skip-gram Using Convolutional Pseudowords computer science
Lazaridou A, Pham NT, Baroni M (2015) Combining Language and Vision with a Multimodal Skip-gram Model. In: Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics:, Human Language Technologies, pp 153–163
Kocmi T, Bojar O (2016) SubGram:, Extending Skip-Gram Word Representation with Substrings, in Text, Speech, and Dialogue - 19th International Conference, TSD 2016, Brno, Czech Republic, September 12-16, 2016, Proceedings, vol. 9924, pp 182–189
Schlechtweg D, Oguz C, Walde SSI (2019) Second-order Co-occurrence Sensitivity of Skip-Gram with Negative Sampling CoRR
Jameel S, Schockaert S (2016) D-GloVe: A Feasible Least Squares Model for Estimating Word Embedding Densities, in COLING 2016, 26th International Conference on Computational Linguistics, Proceedings of the conference: Technical Papers, December 11-16, 2016, Osaka Japan, pp 1849–1860
Jameel S, Fu Z, Shi B, Lam W, Schockaert S (2019) Estimation, Word Embedding as Maximum A Posteriori. In The thirty-Third AAAI Conference on Artificial Intelligence, AAAI 2019, The thirty-First Innovative Applications of Artificial Intelligence Conference, IAAI 2019, The Ninth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI Honolulu, Hawaii, USA, January 27 - February 1, 2019, pp 6562–6569
Mikolov T, Yih WT, Zweig G (2013) Linguistic Regularities in Continuous Space Word Representations. In: Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics:, Human Language Technologies, pp 746–751
Derczynski L, Bontcheva K, Roberts I (2016) Broad Twitter corpus: A Diverse Named Entity Recognition Resource, Inproceedings of COLING 2016, the 26th International Conference on Computational linguistics: Technical Papers (Osaka Japan, pp 1169–1179
Sang EFTK, Meulder FD (2003) Introduction to the coNLL-2003 Shared task: language-Independent Named Entity Recognition, Inproceedings of the Seventh Conference on Natural Language Learning, coNLL 2003, Held in cooperation with HLT-NAACL 2003, Edmonton, Canada, May 31 - June 1, 2003, pp 142–147
Basile V, Bos J, Evang K, Venhuizen N (2012) Developing a large semantically annotated corpus, Inproceedings of the Eighth International Conference on Language Resources and Evaluation, LREC, Istanbul, Turkey, May 23-25, 2012, pp 3196–3200
Lim SK, Muis AO, Lu W, Hui OC (2017) MalwaretextDB: A Database for Annotated Malware Articles, Inproceedings of the 55th Annual Meeting of the Association for Computational Linguistics, ACL 2017, Vancouver, Canada, July 30 - August 4, Volume 1:, Long Papers, pp 1557–1567
Liu J, Pasupat P, Cyphers S, Glass JR (2013) Asgard: A portable architecture for multilingual dialogue systems, in IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2013, Vancouver, BC, Canada, May 26-31, 2013, pp 8386–8390
Bowman SR, Angeli G, Potts C, Manning CD (2015) A large annotated corpus for learning natural language inference, Inproceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, Lisbon, Portugal, September 17-21, 2015, pp 632–642
Wang A, Singh A, Michael J, Hill F, Levy O, Bowman SR (2019) GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding, in 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9 2019
Khot T, Sabharwal A, Clark P (2018) Scitail: A Textual Entailment Dataset from Science Question Answering, Inproceedings of the thirty-Second AAAI Conference on Artificial Intelligence, (AAAI-18), the 30th innovative Applications of Artificial Intelligence (IAAI-18), and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI-18), New Orleans, Louisiana, USA, February 2-7, 2018, pp 5189– 5197
Maas AL, Daly RE, Pham PT, Huang D, Ng AY, Potts C (2011) Learning Word Vectors for Sentiment Analysis, in The 49th Annual Meeting of the Association for Computational linguistics: Human Language Technologies, Proceedings of the Conference, 19-24 June, 2011, Portland, Oregon, USA, pp 142– 150
Lang K (1995) Newsweeder: Learning to Filter Netnews, in Machine Learning, Proceedings of the Twelfth International Conference on Machine Learning, Tahoe City, California, USA, July 9-12, 1995, pp 331–339
Soleimani H, Miller DJ (2016) Semi-supervised Multi-Label Topic Models for Document Classification and Sentence Labeling, Inproceedings of the 25th ACM International Conference on Information and Knowledge Management, CIKM 2016, Indianapolis, IN, USA, October 24-28, 2016, pp 105–114
Acknowledgements
We would like to thank all the editors and anonymous reviewers for the insightful feedback, which helps a lot to improve the quality of our article.
This work was supported by the National Key R&D Program of China (NO. 2018AAA0100300) and The Innovation Foundation of Science and Technology of Dalian via the project Study on the Key Management and Privacy Preservation in VANET (NO. 2018J12GX045).
The views and conclusions contained in this paper are those of the authors and should not be interpreted as representing official policies or endorsements of any supporting organizations or governments.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interests
The authors declare that they have no conflict of interest.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Gong, N., Yao, N. GeSe: Generalized static embedding. Appl Intell 52, 10148–10160 (2022). https://doi.org/10.1007/s10489-021-03001-1
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-021-03001-1