Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3366424.3383758acmconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
research-article

Semantic Textual Similarity of Sentences with Emojis

Published: 20 April 2020 Publication History

Abstract

In this paper, we extend the task of semantic textual similarity to include sentences which contain emojis. Emojis are ubiquitous on social media today, but are often removed in the pre-processing stage of curating datasets for NLP tasks. In this paper, we qualitatively ascertain the amount of semantic information lost by discounting emojis, as well as show a mechanism of accounting for emojis in a semantic task. We create a sentence similarity dataset of 4000 pairs of tweets with emojis, which have been annotated for relatedness. The corpus contains tweets curated based on common topic as well as by replacement of emojis. The latter was done to analyze the difference in semantics associated with different emojis. We aim to provide an understanding of the information lost by removing emojis by providing a qualitative analysis of the dataset. We also aim to present a method of using both emojis and words for downstream NLP tasks beyond sentiment analysis.

References

[1]
Palakorn Achananuparp, Xiaohua Hu, and Xiajiong Shen. 2008. The evaluation of sentence similarity measures. In International Conference on data warehousing and knowledge discovery. Springer, 305–316.
[2]
Francesco Barbieri, Miguel Ballesteros, and Horacio Saggion. 2017. Are Emojis Predictable?. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers, Vol. 2. 105–111.
[3]
Francesco Barbieri, Francesco Ronzano, and Horacio Saggion. 2016. What does this emoji mean? a vector space skip-gram model for twitter emojis. In Calzolari N, Choukri K, Declerck T, et al, editors. Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016); 2016 May 23-28; Portorož, Slovenia. Paris: European Language Resources Association (ELRA); 2016. p. 3967-72. ELRA (European Language Resources Association).
[4]
Francesco Barbieri, Francesco Ronzano, and Horacio Saggion. 2016. What does this Emoji Mean? A Vector Space Skip-Gram Model for Twitter Emojis. In Language Resources and Evaluation conference, LREC. Portoroz, Slovenia.
[5]
Jacob Benesty, Jingdong Chen, Yiteng Huang, and Israel Cohen. 2009. Pearson correlation coefficient. In Noise reduction in speech processing. Springer, 1–4.
[6]
Ben Eisner, Tim Rocktäschel, Isabelle Augenstein, Matko Bosnjak, and Sebastian Riedel. 2016. emoji2vec: Learning Emoji Representations from their Description. In Proceedings of The Fourth International Workshop on Natural Language Processing for Social Media. 48–54.
[7]
Bjarke Felbo, Alan Mislove, Anders Søgaard, Iyad Rahwan, and Sune Lehmann. 2017. Using millions of emoji occurrences to learn any-domain representations for detecting sentiment, emotion and sarcasm. arXiv preprint arXiv:1708.00524(2017).
[8]
Rafael Ferreira, Rafael Dueire Lins, Steven J Simske, Fred Freitas, and Marcelo Riss. 2016. Assessing sentence similarity through lexical, syntactic and semantic analysis. Computer Speech & Language 39 (2016), 1–28.
[9]
Hua He, Kevin Gimpel, and Jimmy Lin. 2015. Multi-perspective sentence similarity modeling with convolutional neural networks. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. 1576–1586.
[10]
Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation 9, 8 (1997), 1735–1780.
[11]
CS Pavan Kumar and LD Dhinesh Babu. 2019. Novel Text Preprocessing Framework for Sentiment Analysis. In Smart Intelligent Computing and Applications. Springer, 309–317.
[12]
Marco Marelli, Stefano Menini, Marco Baroni, Luisa Bentivogli, Raffaella Bernardi, and Roberto Zamparelli. 2014. A SICK Cure for the Evaluation of Compositional Distributional Semantic Models. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC’14) (26-31), Nicoletta Calzolari (Conference Chair), Khalid Choukri, Thierry Declerck, Hrafn Loftsson, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, and Stelios Piperidis (Eds.). European Language Resources Association (ELRA), Reykjavik, Iceland.
[13]
Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781(2013).
[14]
Jonas Mueller and Aditya Thyagarajan. 2016. Siamese Recurrent Architectures for Learning Sentence Similarity. In AAAI, Vol. 16. 2786–2792.
[15]
Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. BLEU: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting on association for computational linguistics. Association for Computational Linguistics, 311–318.
[16]
Jeffrey Pennington, Richard Socher, and Christopher Manning. 2014. Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP). 1532–1543.
[17]
Anukarsh G Prasad, S Sanjana, Skanda M Bhat, and BS Harish. 2017. Sentiment analysis for sarcasm detection on streaming short text data. In 2017 2nd International Conference on Knowledge Engineering and Applications (ICKEA). IEEE, 1–5.
[18]
Jiliang Tang, Yi Chang, and Huan Liu. 2014. Mining social media with social theories: a survey. ACM Sigkdd Explorations Newsletter 15, 2 (2014), 20–29.
[19]
Sanjaya Wijeratne, Lakshika Balasuriya, Amit Sheth, and Derek Doran. 2017. A semantics-based measure of emoji similarity. In Proceedings of the International Conference on Web Intelligence. ACM, 646–653.
[20]
Matthew D Zeiler. 2012. ADADELTA: an adaptive learning rate method. arXiv preprint arXiv:1212.5701(2012).
[21]
Zongkui Zhu, Zhengqiu He, Ziyi Tang, Baohui Wang, and Wenliang Chen. 2018. A Semantic Similarity Computing Model based on Siamese Network for Duplicate Questions Identification. In CCKS Tasks. 44–51.

Cited By

View all
  • (2022)A Novel Emoji Based Deep Super Learner (EDSL) for Sentiment ClassificationProceedings of the 13th International Conference on Soft Computing and Pattern Recognition (SoCPaR 2021)10.1007/978-3-030-96302-6_29(312-325)Online publication date: 22-Feb-2022
  • (2021)Ensembles for Text-Based Sarcasm Detection2021 IEEE 19th Student Conference on Research and Development (SCOReD)10.1109/SCOReD53546.2021.9652768(284-289)Online publication date: 23-Nov-2021
  • (2020)Automatic Text Summarization using Soft-Cosine Similarity and Centrality Measures2020 4th International Conference on Electronics, Communication and Aerospace Technology (ICECA)10.1109/ICECA49313.2020.9297583(1021-1028)Online publication date: 5-Nov-2020

Index Terms

  1. Semantic Textual Similarity of Sentences with Emojis
          Index terms have been assigned to the content through auto-classification.

          Recommendations

          Comments

          Information & Contributors

          Information

          Published In

          cover image ACM Conferences
          WWW '20: Companion Proceedings of the Web Conference 2020
          April 2020
          854 pages
          ISBN:9781450370240
          DOI:10.1145/3366424
          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Sponsors

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          Published: 20 April 2020

          Permissions

          Request permissions for this article.

          Check for updates

          Author Tags

          1. datasets
          2. emoji
          3. sentence similarity

          Qualifiers

          • Research-article
          • Research
          • Refereed limited

          Conference

          WWW '20
          Sponsor:
          WWW '20: The Web Conference 2020
          April 20 - 24, 2020
          Taipei, Taiwan

          Acceptance Rates

          Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

          Contributors

          Other Metrics

          Bibliometrics & Citations

          Bibliometrics

          Article Metrics

          • Downloads (Last 12 months)12
          • Downloads (Last 6 weeks)0
          Reflects downloads up to 10 Nov 2024

          Other Metrics

          Citations

          Cited By

          View all
          • (2022)A Novel Emoji Based Deep Super Learner (EDSL) for Sentiment ClassificationProceedings of the 13th International Conference on Soft Computing and Pattern Recognition (SoCPaR 2021)10.1007/978-3-030-96302-6_29(312-325)Online publication date: 22-Feb-2022
          • (2021)Ensembles for Text-Based Sarcasm Detection2021 IEEE 19th Student Conference on Research and Development (SCOReD)10.1109/SCOReD53546.2021.9652768(284-289)Online publication date: 23-Nov-2021
          • (2020)Automatic Text Summarization using Soft-Cosine Similarity and Centrality Measures2020 4th International Conference on Electronics, Communication and Aerospace Technology (ICECA)10.1109/ICECA49313.2020.9297583(1021-1028)Online publication date: 5-Nov-2020

          View Options

          Get Access

          Login options

          View options

          PDF

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader

          HTML Format

          View this article in HTML Format.

          HTML Format

          Media

          Figures

          Other

          Tables

          Share

          Share

          Share this Publication link

          Share on social media