survey

A Brief Overview of Universal Sentence Representation Methods: A Linguistic View

Authors:

Marie-Francine MoensAuthors Info & Claims

ACM Computing Surveys (CSUR), Volume 55, Issue 3

Article No.: 56, Pages 1 - 42

https://doi.org/10.1145/3482853

Published: 26 March 2022 Publication History

Abstract

How to transfer the semantic information in a sentence to a computable numerical embedding form is a fundamental problem in natural language processing. An informative universal sentence embedding can greatly promote subsequent natural language processing tasks. However, unlike universal word embeddings, a widely accepted general-purpose sentence embedding technique has not been developed. This survey summarizes the current universal sentence-embedding methods, categorizes them into four groups from a linguistic view, and ultimately analyzes their reported performance. Sentence embeddings trained from words in a bottom-up manner are observed to have different, nearly opposite, performance patterns in downstream tasks compared to those trained from logical relationships between sentences. By comparing differences of training schemes in and between groups, we analyze possible essential reasons for different performance patterns. We additionally collect incentive strategies handling sentences from other models and propose potentially inspiring future research directions.

References

[1]

Yossi Adi, Einat Kermany, Yonatan Belinkov, Ofer Lavi, and Yoav Goldberg. 2016. Fine-grained analysis of sentence embeddings using auxiliary prediction tasks. Corr abs/1608.04207 (2016).

[2]

Eneko Agirre, Carmen Banea, Claire Cardie, Daniel M. Cer, Mona T. Diab, Aitor Gonzalez-Agirre, Weiwei Guo, Iñigo Lopez-Gazpio, Montse Maritxalar, Rada Mihalcea, German Rigau, Larraitz Uria, and Janyce Wiebe. 2015. SemEval-2015 task 2: Semantic textual similarity, English, Spanish and pilot on interpretability. In Proceedings of the 9th International Workshop on Semantic Evaluation. 252–263.

[3]

Eneko Agirre, Carmen Banea, Claire Cardie, Daniel M. Cer, Mona T. Diab, Aitor Gonzalez-Agirre, Weiwei Guo, Rada Mihalcea, German Rigau, and Janyce Wiebe. 2014. SemEval-2014 task 10: Multilingual semantic textual similarity. In Proceedings of the 8th International Workshop on Semantic Evaluation. 81–91.

[4]

Eneko Agirre, Carmen Banea, Daniel M. Cer, Mona T. Diab, Aitor Gonzalez-Agirre, Rada Mihalcea, German Rigau, and Janyce Wiebe. 2016. SemEval-2016 task 1: Semantic textual similarity, monolingual and cross-lingual evaluation. In Proceedings of the 10th International Workshop on Semantic Evaluation. 497–511.

[5]

Eneko Agirre, Daniel M. Cer, Mona T. Diab, and Aitor Gonzalez-Agirre. 2012. SemEval-2012 task 6: a pilot on semantic textual similarity. In Proceedings of the 6th International Workshop on Semantic Evaluation. 385–393.

[6]

Eneko Agirre, Daniel M. Cer, Mona T. Diab, Aitor Gonzalez-Agirre, and Weiwei Guo. 2013. *SEM 2013 shared task: Semantic textual similarity. In Proceedings of the 2nd Joint Conference on Lexical and Computational Semantics. 32–43.

[7]

Sanjeev Arora, Yuanzhi Li, Yingyu Liang, Tengyu Ma, and Andrej Risteski. 2016. A latent variable model approach to PMI-based word embeddings. Trans. Assoc. Comput. Ling. 4 (2016), 385–399.

[8]

Sanjeev Arora, Yingyu Liang, and Tengyu Ma. 2017. A simple but tough-to-beat baseline for sentence embeddings. In Proceedings of 5th International Conference on Learning Representations.

[9]

Lei Jimmy Ba, Ryan Kiros, and Geoffrey E. Hinton. 2016. Layer normalization. Corr abs/1607.06450 (2016).

[10]

Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2015. Neural machine translation by jointly learning to align and translate. In Proceedings of the 3rd International Conference on Learning Representations.

[11]

Mohit Bansal, Kevin Gimpel, and Karen Livescu. 2014. Tailoring continuous word representations for dependency parsing. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics. 809–815.

[12]

Jonathan Baxter. 2000. A model of inductive bias learning. J. Artif. Intell. Res. 12 (2000), 149–198.

[13]

Piotr Bojanowski, Edouard Grave, Armand Joulin, and Tomas Mikolov. 2017. Enriching word vectors with subword information. Trans. Assoc. Comput. Ling. 5 (2017), 135–146.

[14]

Samuel R. Bowman, Gabor Angeli, Christopher Potts, and Christopher D. Manning. 2015. A large annotated corpus for learning natural language inference. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 632–642.

[15]

C. Christine Camblin, Peter C. Gordon, and Tamara Y. Swaab. 2007. The interplay of discourse congruence and lexical association during sentence processing: Evidence from ERPs and eye tracking. J. Mem. Lang. 56, 1 (2007), 103–128.

[16]

Daniel Cer, Mona Diab, Eneko Agirre, Inigo Lopez-Gazpio, and Lucia Specia. 2017. SemEval-2017 task 1: Semantic textual similarity multilingual and crosslingual focused evaluation. In Proceedings of the 11th International Workshop on Semantic Evaluation. 1–14.

[17]

Daniel Cer, Yinfei Yang, Sheng-yi Kong, Nan Hua, Nicole Limtiaco, Rhomni St. John, Noah Constant, Mario Guajardo-Cespedes, Steve Yuan, Chris Tar, Brian Strope, and Ray Kurzweil. 2018. Universal sentence encoder for English. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 169–174.

[18]

Spearman Charles. 1904. The proof and measurement of association between two things. Amer. J. Psychol. 15, 1 (1904), 72–101.

[19]

Liang-Chieh Chen, George Papandreou, Iasonas Kokkinos, Kevin Murphy, and Alan L. Yuille. 2015. Semantic image segmentation with deep convolutional nets and fully connected CRFs. In Proceedings of 3rd International Conference on Learning Representations.

[20]

Liang-Chieh Chen, George Papandreou, Iasonas Kokkinos, Kevin Murphy, and Alan L. Yuille. 2018. DeepLab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans. Pattern Anal. Mach. Intell. 40, 4 (2018), 834–848.

[21]

Yubo Chen, Liheng Xu, Kang Liu, Daojian Zeng, and Jun Zhao. 2015. Event extraction via dynamic multi-pooling convolutional neural networks. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing. 167–176.

[22]

Chiou-Lan Chern. 1993. Chinese students word-solving strategies in reading in English. Sec. Lang. Read. Vocab. Learn. (1993), 67–85.

[23]

Kyunghyun Cho, Bart van Merrienboer, Çaglar Gülçehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 1724–1734.

[24]

D. F. Clarke and Isp Nation. 1980. Guessing the meanings of words from context. System 8, 3 (1980), 211–220.

[25]

Guillem Collell and Marie-Francine Moens. 2016. Is an image worth more than a thousand words? On the fine-grain semantic differences between visual and linguistic representations. In Proceedings of the 26th International Conference on Computational Linguistics. 2807–2817.

[26]

Guillem Collell, Ted Zhang, and Marie-Francine Moens. 2017. Imagined visual representations as multimodal embeddings. In Proceedings of the 31st AAAI Conference on Artificial Intelligence. 4378–4384.

[27]

Ronan Collobert and Jason Weston. 2008. A unified architecture for natural language processing: Deep neural networks with multitask learning. In Proceedings of the 25th International Conference on Machine Learning. 160–167.

Digital Library

[28]

Ronan Collobert, Jason Weston, Léon Bottou, Michael Karlen, Koray Kavukcuoglu, and Pavel P. Kuksa. 2011. Natural language processing (almost) from scratch. J. Mach. Learn. Res. 12 (2011), 2493–2537.

[29]

Alexis Conneau and Douwe Kiela. 2018. SentEval: An evaluation toolkit for universal sentence representations. In Proceedings of the 11th International Conference on Language Resources and Evaluation.

[30]

Alexis Conneau, Douwe Kiela, Holger Schwenk, Loïc Barrault, and Antoine Bordes. 2017. Supervised learning of universal sentence representations from natural language inference data. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 670–680.

[31]

Alexis Conneau, Germán Kruszewski, Guillaume Lample, Loïc Barrault, and Marco Baroni. 2018. What you can cram into a single \$&!#* vector: Probing sentence embeddings for linguistic properties. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. 2126–2136.

[32]

Fabrizio Costa, Paolo Frasconi, Vincenzo Lombardo, and Giovanni Soda. 2003. Towards incremental parsing of natural language using recursive neural networks. Appl. Intell. 19, 1–2 (2003), 9–25.

Digital Library

[33]

Andrew M. Dai, Christopher Olah, and Quoc V. Le. 2015. Document embedding with paragraph vectors. arxiv:1507.07998.

[34]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1, Jill Burstein, Christy Doran, and Thamar Solorio (Eds.). Association for Computational Linguistics, 4171–4186.

[35]

Bill Dolan, Chris Quirk, and Chris Brockett. 2004. Unsupervised construction of large paraphrase corpora: Exploiting massively parallel news sources. In Proceedings of the 20th International Conference on Computational Linguistics.

Digital Library

[36]

Jingfei Du, Edouard Grave, Beliz Gunel, Vishrav Chaudhary, Onur Celebi, Michael Auli, Ves Stoyanov, and Alexis Conneau. 2020. Self-training improves pre-training for natural language understanding. arxiv:2010.02194.

[37]

Jeffrey L. Elman. 1990. Finding structure in time. Cogn. Sci. 14, 2 (1990), 179–211.

[38]

Kawin Ethayarajh. 2019. How contextual are contextualized word representations? Comparing the geometry of BERT, ELMo, and GPT-2 embeddings. In Proceedings of the Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. 55–65.

[39]

Allyson Ettinger. 2020. What BERT is not: Lessons from a new suite of psycholinguistic diagnostics for language models. Trans. Assoc. Comput. Ling. 8 (2020), 34–48.

[40]

Lanting Fang, Yong Luo, Kaiyu Feng, Kaiqi Zhao, and Aiqun Hu. 2019. Knowledge-enhanced ensemble learning for word embeddings. In Proceedings of the World Wide Web Conference. 427–437.

Digital Library

[41]

California State University Fresno. 2018. LibGuides: Literature Review: Transition Words.

[42]

Peter H. Fries. 1981. On the status of theme in English: Arguments from discourse. In Forum Linguisticum, Vol. 6. Helmut Buske Verlag (Papers in Textlinguistics 45) Hamburg, 1–38.

[43]

Juri Ganitkevitch, Benjamin Van Durme, and Chris Callison-Burch. 2013. PPDB: the paraphrase database. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 758–764.

[44]

Merrill F. Garrett. 1989. Processes in language production. Ling.: Cambr. Surv. 3 (1989), 69–96.

[45]

Jonas Gehring, Michael Auli, David Grangier, Denis Yarats, and Yann N. Dauphin. 2017. Convolutional sequence to sequence learning. In Proceedings of the 34th International Conference on Machine Learning. 1243–1252.

Digital Library

[46]

Yoav Goldberg. 2019. Assessing BERT’s syntactic abilities. arxiv:1901.05287.

[47]

Christoph Goller and Andreas Kuchler. 1996. Learning task-dependent distributed representations by backpropagation through structure. Neural Netw. 1 (1996), 347–352.

[48]

Guibing Guo, Songlin Zhai, Fajie Yuan, Yuan Liu, and Xingwei Wang. 2018. VSE-ens: Visual-semantic embeddings with efficient negative sampling. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence. 290–297.

[49]

William L. Hamilton, Zhitao Ying, and Jure Leskovec. 2017. Inductive representation learning on large graphs. In Proceedings of the Advances in Neural Information Processing Systems Conference. 1024–1034.

[50]

Yu Hao, Xien Liu, Ji Wu, and Ping Lv. 2019. Exploiting sentence embedding for medical question answering. In Proceedings of the 33rd AAAI Conference on Artificial Intelligence, the 31st Innovative Applications of Artificial Intelligence Conference, and the 9th AAAI Symposium on Educational Advances in Artificial Intelligence. 938–945.

Digital Library

[51]

Matthew Henderson, Rami Al-Rfou, Brian Strope, Yun-Hsuan Sung, László Lukács, Ruiqi Guo, Sanjiv Kumar, Balint Miklos, and Ray Kurzweil. 2017. Efficient natural language response suggestion for smart reply. Corr abs/1705.00652.

[52]

John Hewitt and Christopher D. Manning. 2019. A structural probe for finding syntax in word representations. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 4129–4138.

[53]

Felix Hill, Kyunghyun Cho, and Anna Korhonen. 2016. Learning distributed representations of sentences from unlabelled data. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 1367–1377.

[54]

Geoffrey E. Hinton. 1984. Distributed representations. In Parallel Distributed Processing: Explorations in the Microstructure of Cognition: Foundations. MIT Press, 77–109.

[55]

Sepp Hochreiter and Jürgen Schmidhuber. 1996. LSTM can solve hard long time lag problems. In Proceedings of the Advances in Neural Information Processing Systems Conference. 473–479.

[56]

Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural Comput. 9, 8 (1997), 1735–1780.

Digital Library

[57]

Minqing Hu and Bing Liu. 2004. Mining and summarizing customer reviews. In Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 168–177.

Digital Library

[58]

Thomas Huckin, Joel Bloch, et al. 1993. Strategies for inferring word-meanings in context: A cognitive model. In Second Language Reading And Vocabulary Learning. Ablex Publishing Corporation, 153–178.

[59]

Mohit Iyyer, Varun Manjunatha, Jordan L. Boyd-Graber, and Hal Daumé III. 2015. Deep unordered composition rivals syntactic methods for text classification. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing. 1681–1691.

[60]

Ganesh Jawahar, Benoît Sagot, and Djamé Seddah. 2019. What does BERT learn about the structure of language? In Proceedings of the 57th Conference of the Association for Computational Linguistics. 3651–3657.

[61]

Yacine Jernite, Samuel R. Bowman, and David Sontag. 2017. Discourse-based objectives for fast unsupervised sentence representation learning. Corr abs/1705.00557 (2017).

[62]

Rie Johnson and Tong Zhang. 2017. Deep pyramid convolutional neural networks for text categorization. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. 562–570.

[63]

Nal Kalchbrenner, Edward Grefenstette, and Phil Blunsom. 2014. A convolutional neural network for modelling sentences. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics. 655–665.

[64]

Pearson Karl. 1895. note on regression and inheritance in the case of two parents. Proc. Roy. Societ. Lond. Series I 58 (1895), 240–242.

[65]

Taeuk Kim, Jihun Choi, Daniel Edmiston, and Sang-goo Lee. 2020. Are pre-trained language models aware of phrases? Simple but strong baselines for grammar induction. In Proceedings of the International Conference on Learning Representations. https://openreview.net/pdf?id=H1xPR3NtPB.

[66]

Yoon Kim. 2014. Convolutional neural networks for sentence classification. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 1746–1751.

[67]

Diederik P. Kingma and Jimmy Ba. 2015. Adam: a method for stochastic optimization. In Proceedings of the 3rd International Conference on Learning Representations.

[68]

Thomas N. Kipf and Max Welling. 2017. Semi-supervised classification with graph convolutional networks. In Proceedings of the 5th International Conference on Learning Representations.

[69]

Jamie Kiros and William Chan. 2018. InferLite: Simple universal sentence representations from natural language inference data. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 4868–4874.

[70]

Ryan Kiros, Ruslan Salakhutdinov, and Richard S. Zemel. 2014. Unifying visual-semantic embeddings with multimodal neural language models. Corr abs/1411.2539 (2014).

[71]

Ryan Kiros, Yukun Zhu, Ruslan Salakhutdinov, Richard S. Zemel, Raquel Urtasun, Antonio Torralba, and Sanja Fidler. 2015. Skip-thought vectors. In Proceedings of the Advances in Neural Information Processing Systems Conference. 3294–3302.

[72]

Taku Kudo. 2018. Subword regularization: Improving neural network translation models with multiple subword candidates. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. 66–75.

[73]

Taku Kudo and John Richardson. 2018. SentencePiece: A simple and language independent subword tokenizer and detokenizer for neural text processing. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 66–71.

[74]

Karen Kukich. 1992. Techniques for automatically correcting words in text. ACM Comput. Surv. 24, 4 (1992), 377–439.

Digital Library

[75]

G. R. Kuperberg, D. Caplan, M. Eddy, J. Cotton, and P. J. Holcomb. 2011. Electrophysiological correlates of processing causal relationships between sentences. J. Cogn. Neurosci. 23, 5 (2011), 1230–1246.

[76]

Siwei Lai, Liheng Xu, Kang Liu, and Jun Zhao. 2015. Recurrent convolutional neural networks for text classification. In Proceedings of the 29th AAAI Conference on Artificial Intelligence. 2267–2273.

Digital Library

[77]

Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, and Radu Soricut. 2019. ALBERT: a lite BERT for self-supervised learning of language representations. arxiv:1909.11942.

[78]

Quoc V. Le and Tomas Mikolov. 2014. Distributed representations of sentences and documents. In Proceedings of the 31st International Conference on Machine Learning. 1188–1196.

Digital Library

[79]

Bohan Li, Hao Zhou, Junxian He, Mingxuan Wang, Yiming Yang, and Lei Li. 2020. On the sentence embeddings from pre-trained language models. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 9119–9130.

[80]

Jiwei Li, Thang Luong, Dan Jurafsky, and Eduard H. Hovy. 2015. When are tree structures necessary for deep learning of representations? In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 2304–2314.

[81]

Xin Li and Dan Roth. 2002. Learning question classifiers. In Proceedings of the 19th International Conference on Computational Linguistics.

Digital Library

[82]

Yikang Li, Nan Duan, Bolei Zhou, Xiao Chu, Wanli Ouyang, Xiaogang Wang, and Ming Zhou. 2018. Visual question generation as dual task of visual question answering. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6116–6124.

[83]

Tsung-Yi Lin, Michael Maire, Serge J. Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C. Lawrence Zitnick. 2014. Microsoft COCO: Common objects in context. In Proceedings of the 13th European Conference on Computer Vision. 740–755.

[84]

Zhouhan Lin, Minwei Feng, Cícero Nogueira dos Santos, Mo Yu, Bing Xiang, Bowen Zhou, and Yoshua Bengio. 2017. A structured self-attentive sentence embedding. Corr abs/1703.03130 (2017).

[85]

Tianyu Liu, Kexiang Wang, Lei Sha, Baobao Chang, and Zhifang Sui. 2018. Table-to-text generation by structure-aware Seq2seq learning. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence, the 30th Innovative Applications of Artificial Intelligence, and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence. 4881–4888.

[86]

Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. RoBERTa: a robustly optimized BERT pretraining approach. Corr abs/1907.11692.

[87]

Yang Liu, Chengjie Sun, Lei Lin, and Xiaolong Wang. 2016. Learning natural language inference using bidirectional LSTM model and inner-attention. Corr abs/1605.09090.

[88]

Lajanugen Logeswaran and Honglak Lee. 2018. An efficient framework for learning sentence representations. In Proceedings of the 6th International Conference on Learning Representations Conference.

[89]

Marco Marelli, Stefano Menini, Marco Baroni, Luisa Bentivogli, Raffaella Bernardi, and Roberto Zamparelli. 2014. A SICK cure for the evaluation of compositional distributional semantic models. In Proceedings of the 9th International Conference on Language Resources and Evaluation. 216–223.

[90]

A. A. Markov. 1913. Essai d’une recherche statistique sur le texte du roman “Eugene Onegin” illustrant la liaison des epreuve en chain (‘Example of a statistical investigation of the text of ‘Eugene Onegin’ illustrating the dependence between samples in chain”). Izvistia Imperatorskoi Akademii Nauk (Bulletin De l’académie impériale Des Sciences De st.-pétersbourg) 7 (1913), 153–162. English translation by Morris Halle, 1956.

[91]

Marek Medved and Ales Horák. 2018. Sentence and word embedding employed in open question-answering. In Proceedings of the 10th International Conference on Agents and Artificial Intelligence. 486–492.

[92]

Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. In Proceedings of the 1st International Conference on Learning Representations.

[93]

Tomás Mikolov, Quoc V. Le, and Ilya Sutskever. 2013. Exploiting similarities among languages for machine translation. Retrieved from http://arxiv.org/abs/1309.4168.

[94]

Tomas Mikolov, Ilya Sutskever, Kai Chen, Gregory S. Corrado, and Jeffrey Dean. 2013. Distributed representations of words and phrases and their compositionality. In Proceedings of the Advances in Neural Information Processing Systems Conference. 3111–3119.

[95]

Federico Monti, Davide Boscaini, Jonathan Masci, Emanuele Rodolà, Jan Svoboda, and Michael M. Bronstein. 2017. Geometric deep learning on graphs and manifolds using mixture model CNNs. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5425–5434.

[96]

Nikola Mrkšić, Ivan Vulić, Diarmuid Ó. Séaghdha, Ira Leviant, Roi Reichart, Milica Gašić, Anna Korhonen, and Steve Young. 2017. Semantic specialization of distributional word vector spaces using monolingual and cross-lingual constraints. Trans. Assoc. Comput. Ling. 5 (2017), 309–324.

[97]

Alan F. Newell, Stefan Langer, and Marianne Hickey. 1998. The role of natural language processing in alternative and augmentative communication. Nat. Lang. Eng. 4, 1 (1998), 1–16.

Digital Library

[98]

Thien Huu Nguyen and Ralph Grishman. 2015. Event detection and domain adaptation with convolutional neural networks. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing. 365–371.

[99]

Allen Nie, Erin D. Bennett, and Noah D. Goodman. 2017. DisSent: Sentence representation learning from explicit discourse relations. Corr abs/1710.04334.

[100]

Mante S. Nieuwland and Jos J. A. Van Berkum. 2006. When peanuts fall in love: N400 evidence for the power of discourse. J. Cogn. Neurosci. 18, 7 (2006), 1098–1111.

Digital Library

[101]

Fedor Nikolaev and Alexander Kotov. [n.d.]. Joint word and entity embeddings for entity retrieval from a knowledge graph. In Proceedings of the 42nd European Conference on IR Research: Advances in Information Retrieval.

[102]

Martin Paczynski, Tali Ditman, Kana Okano, and Gina R. Kuperberg. 2007. Drawing inferences during discourse comprehension: An ERP study. Retrieved on 10 Nov., 2021 from https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.505.3466&rep=rep1&type=pdf.

[103]

Bo Pang and Lillian Lee. 2004. A sentimental education: Sentiment analysis using subjectivity summarization based on minimum cuts. In Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics. 271–278.

Digital Library

[104]

Bo Pang and Lillian Lee. 2005. Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics. 115–124.

Digital Library

[105]

Andreas Peldszus and Manfred Stede. 2015. An annotated corpus of argumentative microtexts. In Proceedings of the 1st Conference on Argumentation.

[106]

Jeffrey Pennington, Richard Socher, and Christopher D. Manning. 2014. GloVe: Global vectors for word representation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 1532–1543.

[107]

Christian S. Perone, Roberto Silveira, and Thomas S. Paula. 2018. Evaluation of sentence embeddings in downstream and linguistic probing tasks. Corr abs/1806.06259.

[108]

Matthew E. Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer. 2018. Deep contextualized word representations. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2227–2237.

[109]

Jordan B. Pollack. 1990. Recursive distributed representations. Artif. Intell. 46, 1–2 (1990), 77–105.

Digital Library

[110]

Peter Prettenhofer and Benno Stein. 2011. Cross-lingual adaptation using structural correspondence learning. ACM Trans. Intell. Syst. Technol. 3, 1 (2011), 13:1–13:22.

Digital Library

[111]

Xipeng Qiu, Tianxiang Sun, Yige Xu, Yunfan Shao, Ning Dai, and Xuanjing Huang. 2020. Pre-trained models for natural language processing: a survey. arxiv:2003.08271.

[112]

Alec Radford, Karthik Narasimhan, Tim Salimans, and Ilya Sutskever. 2018. Improving language understanding by generative pre-training. (2018). https://www.cs.ubc.ca/amuham01/LING530/papers/radford2018improving.pdf.

[113]

Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, and Percy Liang. 2016. SQuAD: 100, 000+ questions for machine comprehension of text. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 2383–2392.

[114]

Nils Reimers and Iryna Gurevych. 2019. Sentence-BERT: Sentence embeddings using siamese BERT-Networks. In Proceedings of the Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. 3980–3990.

[115]

Nils Reimers and Iryna Gurevych. 2020. Making monolingual sentence embeddings multilingual using knowledge distillation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 4512–4525.

[116]

Tanya Reinhart. 1980. Conditions for text coherence. Poet. Today 1, 4 (1980), 161–180.

[117]

Susanne Rott. 1999. The effect of exposure frequency on intermediate language learners’ incidental vocabulary acquisition and retention through reading. Stud. Sec. Lang. Acquis. 21, 4 (1999), 589–619.

[118]

Dwaipayan Roy, Debasis Ganguly, Sumit Bhatia, Srikanta Bedathur, and Mandar Mitra. 2018. Using word embeddings for information retrieval: How collection and term normalization choices affect performance. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management. 1835–1838.

Digital Library

[119]

Andreas Rücklé, Steffen Eger, Maxime Peyrard, and Iryna Gurevych. 2018. Concatenated \(p\) -mean word embeddings as universal cross-lingual sentence representations. Corr abs/1803.01400 (2018).

[120]

Stuart J. Russell and Peter Norvig. 2020. Artificial Intelligence: a Modern Approach (4th Edition). Pearson.

[121]

Mike Schuster and Kaisuke Nakajima. 2012. Japanese and Korean voice search. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing. 5149–5152.

[122]

Holger Schwenk. 2018. Filtering and mining parallel data in a joint multilingual space. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. 228–234.

[123]

Holger Schwenk and Matthijs Douze. 2017. Learning joint multilingual sentence representations with neural machine translation. In Proceedings of the 2nd Workshop on Representation Learning for NLP. 157–167.

[124]

Holger Schwenk and Xian Li. 2018. A corpus for multilingual document classification in eight languages. In Proceedings of the 11th International Conference on Language Resources and Evaluation.

[125]

Abigail See, Peter J. Liu, and Christopher D. Manning. 2017. Get to the point: Summarization with pointer-generator networks. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. 1073–1083.

[126]

Rico Sennrich, Barry Haddow, and Alexandra Birch. 2016. Neural machine translation of rare words with subword units. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics.

[127]

Haoyue Shi, Jiayuan Mao, Tete Xiao, Yuning Jiang, and Jian Sun. 2018. Learning visually-grounded semantics from contrastive adversarial samples. In Proceedings of the 27th International Conference on Computational Linguistics. 3715–3727.

[128]

Damien Sileo, Tim Van de Cruys, Camille Pradel, and Philippe Muller. 2019. Mining discourse markers for unsupervised sentence representation learning. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 3477–3486.

[129]

Richard Socher, Eric H. Huang, Jeffrey Pennington, Andrew Y. Ng, and Christopher D. Manning. 2011. Dynamic pooling and unfolding recursive autoencoders for paraphrase detection. In Proceedings of the Advances in Neural Information Processing Systems Conference. 801–809.

[130]

Richard Socher, Alex Perelygin, Jean Wu, Jason Chuang, Christopher D. Manning, Andrew Y. Ng, and Christopher Potts. 2013. Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 1631–1642.

[131]

Viggo Sorensen. 1981. Coherence as a pragmatic concept. In Possibilities and Limitations of Pragmatics. John Benjamins, 657.

[132]

Christian Stab and Iryna Gurevych. 2017. Parsing argumentation structures in persuasive essays. Comput. Ling. 43, 3 (2017), 619–659.

[133]

Sandeep Subramanian, Adam Trischler, Yoshua Bengio, and Christopher J. Pal. 2018. Learning general purpose distributed sentence representations via large scale multi-task learning. In Proceedings of the 6th International Conference on Learning Representations.

[134]

Ilya Sutskever, Oriol Vinyals, and Quoc V. Le. 2014. Sequence to sequence learning with neural networks. In Proceedings of the Annual Conference on Neural Information Processing Systems. 3104–3112.

[135]

Oscar Täckström and Ryan T. McDonald. 2011. Discovering fine-grained sentiment with latent variable structured prediction models. In Proceedings of the 33rd European Conference on IR Research. 368–374.

Digital Library

[136]

Kai Sheng Tai, Richard Socher, and Christopher D. Manning. 2015. Improved semantic representations from tree-structured long short-term memory networks. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing. 1556–1566.

[137]

Yi Tay, Vinh Q. Tran, Sebastian Ruder, Jai Prakash Gupta, Hyung Won Chung, Dara Bahri, Zhen Qin, Simon Baumgartner, Cong Yu, and Donald Metzler. 2021. Charformer: Fast character transformers via gradient-based subword tokenization. Retrieved from https://arxiv.org/abs/2106.12672.

[138]

Teun A. Van Dijk. 1985. Handbook of discourse analysis. In Discourse and Dialogue. Citeseer.

[139]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Proceedings of the Annual Conference on Neural Information Processing Systems. 5998–6008.

[140]

Petar Velickovic, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Liò, and Yoshua Bengio. 2018. Graph attention networks. In Proceedings of the 6th International Conference on Learning Representations.

[141]

Pascal Vincent, Hugo Larochelle, Yoshua Bengio, and Pierre-Antoine Manzagol. 2008. Extracting and composing robust features with denoising autoencoders. In Proceedings of the 25th International Conference on Machine Learning. 1096–1103.

Digital Library

[142]

Oriol Vinyals, Lukasz Kaiser, Terry Koo, Slav Petrov, Ilya Sutskever, and Geoffrey E. Hinton. 2015. Grammar as a foreign language. In Proceedings of the Advances in Neural Information Processing Systems Conference. 2773–2781.

[143]

Ivan Vulic, Nikola Mrksic, Roi Reichart, Diarmuid Ó. Séaghdha, Steve J. Young, and Anna Korhonen. 2017. Morph-fitting: Fine-tuning word vector spaces with simple language-specific rules. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. 56–68.

[144]

Alex Wang, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, and Samuel R. Bowman. 2019. GLUE: A multi-task benchmark and analysis platform for natural language understanding. In Proceedings of the 7th International Conference on Learning Representations.

[145]

Suhang Wang, Jiliang Tang, Charu C. Aggarwal, and Huan Liu. 2016. Linked document embedding for classification. In Proceedings of the 25th ACM International Conference on Information and Knowledge Management. 115–124.

Digital Library

[146]

Henry George Widdowson. 1979. Explorations in Applied Linguistics. Vol. 1. Oxford University Press.

[147]

Janyce Wiebe, Theresa Wilson, and Claire Cardie. 2005. Annotating expressions of opinions and emotions in language. Lang. Resour. Eval. 39, 2–3 (2005), 165–210.

[148]

John Wieting, Mohit Bansal, Kevin Gimpel, and Karen Livescu. 2015. From paraphrase database to compositional paraphrase model and back. Trans. Assoc. Comput. Ling. 3 (2015), 345–358.

[149]

John Wieting, Mohit Bansal, Kevin Gimpel, and Karen Livescu. 2016. Towards universal paraphrastic sentence embeddings. In Proceedings of the 4th International Conference on Learning Representations.

[150]

John Wieting and Kevin Gimpel. 2017. Revisiting recurrent networks for paraphrastic sentence embeddings. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. 2078–2088.

[151]

Adina Williams, Nikita Nangia, and Samuel R. Bowman. 2018. A broad-coverage challenge corpus for sentence understanding through inference. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 1112–1122.

[152]

Wei Wu, Houfeng Wang, Tianyu Liu, and Shuming Ma. 2018. Phrase-level self-attention networks for universal sentence encoding. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 3729–3738.

[153]

Wei Xu, Chris Callison-Burch, and Bill Dolan. 2015. SemEval-2015 Task 1: Paraphrase and semantic similarity in Twitter (PIT). In Proceedings of the 9th International Workshop on Semantic Evaluation. 1–11.

[154]

Linting Xue, Aditya Barua, Noah Constant, Rami Al-Rfou, Sharan Narang, Mihir Kale, Adam Roberts, and Colin Raffel. 2021. ByT5: Towards a token-free future with pre-trained byte-to-byte models. (2021). arXiv 2105.13626.

[155]

Yinfei Yang, Daniel Cer, Amin Ahmad, Mandy Guo, Jax Law, Noah Constant, Gustavo Hernández Ábrego, Steve Yuan, Chris Tar, Yun-Hsuan Sung, Brian Strope, and Ray Kurzweil. 2020. Multilingual universal sentence encoder for semantic retrieval. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations. 87–94.

[156]

Zhilin Yang, Zihang Dai, Yiming Yang, Jaime G. Carbonell, Ruslan Salakhutdinov, and Quoc V. Le. 2019. XLNet: Generalized autoregressive pretraining for language understanding. In Proceedings of the Annual Conference on Neural Information Processing Systems. 5754–5764.

[157]

George Yule and Gillian R. Brown. 1986. Discourse Analysis. Cambridge University Press.

[158]

M. A. Zaid. 2009. A comparison of inferencing and meaning-guessing of new lexicon in context versus non-context vocabulary presentation. Read. Matrix: Int. Online J. 9, 1 (2009), 56–66.

[159]

Rowan Zellers, Yonatan Bisk, Roy Schwartz, and Yejin Choi. 2018. SWAG: a large-scale adversarial dataset for grounded commonsense inference. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 93–104.

[160]

Han Zhang, Tao Xu, and Hongsheng Li. 2017. StackGAN: Text to photo-realistic image synthesis with stacked generative adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision. 5908–5916.

[161]

Minghua Zhang, Yunfang Wu, Weikang Li, and Wei Li. 2018. Learning universal sentence representations with mean-max attention autoencoder. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 4514–4523.

[162]

Yuan Zhang, Jason Baldridge, and Luheng He. 2019. PAWS: Paraphrase adversaries from word scrambling. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies). 1298–1308.

[163]

Han Zhao, Zhengdong Lu, and Pascal Poupart. 2015. Self-adaptive hierarchical sentence model. In Proceedings of the 24th International Joint Conference on Artificial Intelligence. 4069–4076.

Digital Library

[164]

Yukun Zhu, Ryan Kiros, Richard S. Zemel, Ruslan Salakhutdinov, Raquel Urtasun, Antonio Torralba, and Sanja Fidler. 2015. Aligning books and movies: Towards story-like visual explanations by watching movies and reading books. In Proceedings of IEEE International Conference on Computer Vision. 19–27.

Cited By

Lalor JAbbasi AOketch KYang YForsgren N(2024)Should Fairness be a Metric or a Model? A Model-based Framework for Assessing Bias in Machine Learning PipelinesACM Transactions on Information Systems10.1145/364127642:4(1-41)Online publication date: 22-Mar-2024
https://dl.acm.org/doi/10.1145/3641276
Kou T(2024)From Model Performance to Claim: How a Change of Focus in Machine Learning Replicability Can Help Bridge the Responsibility GapProceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency10.1145/3630106.3658951(1002-1013)Online publication date: 3-Jun-2024
https://dl.acm.org/doi/10.1145/3630106.3658951
Xiao YZhu JZhang SLiu XGuo S(2024)Fall-Attention: An Attention-Based Fall Detection Method for Adjoint ActivitiesIEEE Transactions on Mobile Computing10.1109/TMC.2023.334412523:7(7895-7909)Online publication date: Jul-2024
https://doi.org/10.1109/TMC.2023.3344125
Show More Cited By

Index Terms

A Brief Overview of Universal Sentence Representation Methods: A Linguistic View
1. Computing methodologies
  1. Artificial intelligence
    1. Knowledge representation and reasoning
    2. Natural language processing
      1. Information extraction
  2. Machine learning
    1. Machine learning approaches
      1. Neural networks
2. Information systems
  1. Information retrieval
    1. Document representation

Recommendations

Subword-based Sentence Representation Model for Sentiment Classification
SMA 2020: The 9th International Conference on Smart Media and Applications

While most embedding methods in the Korean language focus on morpheme unit to alleviate the out of vocabulary problem, recent researches in the English use the subword unit for embedding. Considering that a word is composed of subwords, which have a ...
PKU Paraphrase Bank: A Sentence-Level Paraphrase Corpus for Chinese
Natural Language Processing and Chinese Computing
Abstract
One of the main challenges of conducting research on paraphrase is the lack of large-scale, high-quality corpus, which is particularly serious for non-English investigations. In this paper, we present a simple and effective unsupervised learning ...
Unsupervised sentence representations as word information series: Revisiting TF–IDF
Highlights
- An unsupervised sentence representation (embedding) method is proposed.
- Our ...
Abstract
Sentence representation at the semantic level is a challenging task for natural language processing and Artificial Intelligence. Despite the advances in word embeddings (i.e. word vector representations), capturing sentence meaning is ...

Comments

Information & Contributors

Information

Published In

cover image ACM Computing Surveys

ACM Computing Surveys Volume 55, Issue 3

March 2023

772 pages

ISSN:0360-0300

EISSN:1557-7341

DOI:10.1145/3514180

Editor:
Albert Zomaya
University of Sydney, Australia

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 March 2022

Accepted: 01 August 2021

Revised: 01 April 2021

Received: 01 August 2019

Published in CSUR Volume 55, Issue 3

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Survey
Refereed

Funding Sources

NSFC
NSF of Hunan Province
Science and Technology Innovation Program of Hunan Province
European Research Council

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

15
Total Citations
View Citations
2,059
Total Downloads

Downloads (Last 12 months)518
Downloads (Last 6 weeks)45

Reflects downloads up to 10 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Lalor JAbbasi AOketch KYang YForsgren N(2024)Should Fairness be a Metric or a Model? A Model-based Framework for Assessing Bias in Machine Learning PipelinesACM Transactions on Information Systems10.1145/364127642:4(1-41)Online publication date: 22-Mar-2024
https://dl.acm.org/doi/10.1145/3641276
Kou T(2024)From Model Performance to Claim: How a Change of Focus in Machine Learning Replicability Can Help Bridge the Responsibility GapProceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency10.1145/3630106.3658951(1002-1013)Online publication date: 3-Jun-2024
https://dl.acm.org/doi/10.1145/3630106.3658951
Xiao YZhu JZhang SLiu XGuo S(2024)Fall-Attention: An Attention-Based Fall Detection Method for Adjoint ActivitiesIEEE Transactions on Mobile Computing10.1109/TMC.2023.334412523:7(7895-7909)Online publication date: Jul-2024
https://doi.org/10.1109/TMC.2023.3344125
Li YXu CCai JXia Y(2024)Multi-label Classification of News Topics Based on Universal Sentence Encoder2024 5th International Conference on Electronic Communication and Artificial Intelligence (ICECAI)10.1109/ICECAI62591.2024.10675181(419-422)Online publication date: 31-May-2024
https://doi.org/10.1109/ICECAI62591.2024.10675181
Upreti RLind PElmokashfi AYazidi A(2024)Trustworthy machine learning in the context of security and privacyInternational Journal of Information Security10.1007/s10207-024-00813-323:3(2287-2314)Online publication date: 3-Apr-2024
https://dl.acm.org/doi/10.1007/s10207-024-00813-3
Ghanbari Haez SSegala MBellan PMagnolini SSanna LConsolandi MDragoni M(2024)A Retrieval-Augmented Generation Strategy to Enhance Medical Chatbot ReliabilityArtificial Intelligence in Medicine10.1007/978-3-031-66538-7_22(213-223)Online publication date: 9-Jul-2024
https://dl.acm.org/doi/10.1007/978-3-031-66538-7_22
Zahirnia KHu YCoates MSchulte OOh ANaumann TGloberson ASaenko KHardt MLevine S(2023)Neural graph generation from graph statisticsProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3667698(36324-36338)Online publication date: 10-Dec-2023
https://dl.acm.org/doi/10.5555/3666122.3667698
Masotina MMusi ESpagnolli A(2023)Transparency is Crucial for User-Centered AI, or is it? How this Notion Manifests in the UK Press Coverage of GPTProceedings of the 15th Biannual Conference of the Italian SIGCHI Chapter10.1145/3605390.3605413(1-8)Online publication date: 20-Sep-2023
https://dl.acm.org/doi/10.1145/3605390.3605413
Du SFang ZLan STan YGünther MWang SGuo WEl Saddik AMei TCucchiara RBertini MTobon Vallejo DAtrey PHossain M(2023)Bridging Trustworthiness and Open-World Learning: An Exploratory Neural Approach for Enhancing Interpretability, Generalization, and RobustnessProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3612352(8719-8729)Online publication date: 26-Oct-2023
https://dl.acm.org/doi/10.1145/3581783.3612352
Wang ZGuan JYang MXiao TChi ZEl Saddik AMei TCucchiara RBertini MTobon Vallejo DAtrey PHossain M(2023)Semantic-Aware Generator and Low-level Feature Augmentation for Few-shot Image GenerationProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3612219(5079-5088)Online publication date: 26-Oct-2023
https://dl.acm.org/doi/10.1145/3581783.3612219
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Full Text

View this article in Full Text.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View full text|Download PDF

View Issue’s Table of Contents