Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1007/978-3-031-76806-4_14guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Emoji Retrieval from Gibberish or Garbled Social Media Text: A Novel Methodology and a Case Study

Published: 17 December 2024 Publication History

Abstract

Emojis, considered an integral aspect of social media conversations, are widely used on almost all social media platforms. However, social media data may be noisy and may also include gibberish or garbled text which is difficult to detect and work with. Most naïve data preprocessing approaches recommend removing such gibberish or garbled text from social media posts before performing any form of data analysis or before passing such data to any machine learning model. However, it is important to note that such gibberish or garbled text may have been an emoji(s) in the original social media post(s) and failure to retrieve the actual emoji(s) may result in the loss or lack of contextual meaning of the analyzed social media data. The work presented in this paper aims to address this challenge by proposing a three-step reverse engineering-based novel methodology for retrieving emojis from garbled or gibberish text in social media posts. The development of this methodology also helped to unravel the reasons that could lead to the generation of gibberish or garbled text related to data mining of social media posts. To evaluate the effectiveness of the proposed methodology, the model was applied to a dataset of 509,248 Tweets about the Mpox outbreak, that has been used in about 30 prior works in this field, none of which were able to retrieve the emojis in the original Tweets from the gibberish text present in this dataset. Using our methodology, we were able to retrieve a total of 157,748 emojis present in 76,914 Tweets in this dataset by processing the gibberish or garbled text. The effectiveness of this methodology has been discussed in the paper through the presentation of multiple metrics related to text readability and text coherence which include the Flesch Reading Ease, Flesch Kincaid Grade Score, Coleman Liau index, Automated Readability Index, Dale Chall Readability Score, Text Standard, and Reading Time for the Tweets before and after the application of the methodology to the Tweets. The results showed that the application of this methodology to the Tweets improved the readability and coherence scores. Finally, as a case study, the frequency of emoji usage in these Tweets about the Mpox outbreak was analyzed and the results are presented.

References

[1]
Aichner T, Grünfelder M, Maurer O, and Jegeni D Twenty-five years of social media: a review of social media applications and definitions from 1994 to 2019 Cyberpsychol. Behav. Soc. Netw. 2021 24 215-222
[2]
[3]
Belle Wong, J.D.: Top social media statistics and trends of 2024. https://www.forbes.com/advisor/business/social-media-statistics/. Accessed 30 Mar 2024
[4]
Number of worldwide social network users 2027. https://www.statista.com/statistics/278414/number-of-worldwide-social-network-users/. Accessed 30 Mar 2024
[5]
Rodríguez-Ibánez M, Casánez-Ventura A, Castejón-Mateos F, and Cuenca-Jiménez P-M A review on sentiment analysis from social media platforms Expert Syst. Appl. 2023 223 119862
[6]
Dhiman DB Ethical issues and challenges in social media: A current scenario SSRN Electron. J. 2023
[7]
Thakur N and Han C An exploratory study of tweets about the SARS-CoV-2 Omicron variant: insights from sentiment analysis, language interpretation, source tracking, type classification, and embedded URL detection COVID. 2022 2 1026-1049
[8]
Thakur, N.: A large-scale dataset of Twitter chatter about online learning during the current COVID-19 Omicron wave. Data (Basel) 7, 109 (2022).
[9]
Ge J and Gretzel U Emoji rhetoric: a social media influencer perspective J. Mark. Manag. 2018 34 1272-1295
[10]
World Emoji Day statistics —. https://worldemojiday.com/statistics. Accessed 30 Mar 2024
[11]
Smileys, People: Emoji statistics. https://emojipedia.org/stats. Accessed 30 Mar 2024
[12]
Tang, J., Chang, Y., Liu, H.: Mining social media with social theories: a survey. https://www.cse.msu.edu/~tangjili/publication/Tang-Chang-Liu.pdf. Accessed 30 Mar 2024
[13]
Agarwal, N., Yiliyasi, Y.: Information quality challenges in social media. In: MIT International Conference on Information Quality (2010)
[14]
Social Data Mining for Crime Intelligence: Contributions to Social Data Quality Assessment and Prediction Methods. https://bradscholars.brad.ac.uk/handle/10454/16066. Accessed 30 Mar 2024
[15]
Date, D.#: P., Sg-, P.L.C., Reply-to:, S.-22, Jabot, C.: Correct UTF-8 handling during phase 1 of translation. https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2021/p2295r0.pdf. Accessed 03 May 2024
[16]
Mohapatra, R.K., et al.: Transmission dynamics, complications and mitigation strategies of the current mpox outbreak: a comprehensive review with bibliometric study. Rev. Med. Virol. 34 (2024).
[17]
Cuetos-Suárez D, Gan RK, Cuetos-Suárez D, Arcos González P, and Castro-Delgado R A review of mpox outbreak and public health response in Spain Risk Manag. Healthc. Policy. 2024 17 297-310
[18]
Masirika, L.M., et al.: Ongoing mpox outbreak in Kamituga, South Kivu province, associated with monkeypox virus of a novel Clade I sub-lineage, Democratic Republic of the Congo, 2024. Euro Surveill. 29 (2024).
[19]
Multi-country outbreak of mpox, External situation report#33, 31 May 2024. https://www.who.int/publications/m/item/multi-country-outbreak-of-mpox--external-situation-report-33--31-may-2024. Accessed 07 Jun 2024
[20]
Chouhan, A., Nanda, D., Jain, J., Pattni, K., Kurup, L.: Emotion prediction of comments in Twitch.Tv livestream environment. In: Fong, S., Dey, N., Joshi, A. (eds.) ICT Analysis and Applications. Lecture Notes in Networks and Systems, vol. 517. Springer, Singapore (2023).
[23]
Kone VS, Anagal AM, Anegundi S, Jadekar P, and Patil P Emoji prediction using bi-directional LSTM ITM Web Conf. 2023 53 02004
[24]
Ranjan, R., Yadav, P.: Emoji prediction using LSTM and Naive Bayes. In: TENCON 2021 - 2021 IEEE Region 10 Conference (TENCON). IEEE (2021)
[25]
Stoikos, S., Izbicki, M.: Multilingual emoticon prediction of tweets about COVID-19. In: Nissim, M., Patti, V., Plank, B., Durmus, E. (eds.) Proceedings of the Third Workshop on Computational Modeling of People’s Opinions, Personality, and Emotion’s in Social Media, pp. 109–118. Association for Computational Linguistics, Barcelona, Spain (Online) (2020)
[26]
Inan, E.: An active learning based emoji prediction method in Turkish. Int. J. Intell. Syst. Appl. Eng. 8, 1–5 (2020).
[27]
Kumar, S., Harichandana, B.S.S., Arora, H.: VoiceMoji: a novel on-device pipeline for seamless emoji insertion in dictation. In: 2021 IEEE 18th India Council International Conference (INDICON). IEEE (2021)
[28]
Barbieri, F., Ronzano, F., Saggion, H.: What does this emoji mean? a vector space skip-gram model for twitter emojis (2016)
[29]
Gupta, A., et al.: Context-aware emoji prediction using deep learning. In: Dev, A., Agrawal, S.S., Sharma, A. (eds.) Artificial Intelligence and Speech Technology. AIST 2021. Communications in Computer and Information Science, vol. 1546. Springer, Cham (2022).
[30]
Shobana, J., Amudha, S., Kumar, S.: Emoji anticipation and prediction using deep neural network model. In: 2022 International Conference on Power, Energy, Control and Transmission Systems (ICPECTS). IEEE (2022)
[31]
Barbieri, F., Ballesteros, M., Saggion, H.: Are emojis predictable? In: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics, vol. 2, Short Papers. Association for Computational Linguistics, Stroudsburg, PA, USA (2017)
[32]
Zhao S et al. PEDM: A multi-task learning model for persona-aware Emoji-embedded dialogue generation ACM Trans. Multimed. Comput. Commun. Appl. 2023 19 1-21
[33]
Sv P and Ittamalla R What concerns the general public the most about monkeypox virus? – a text analytics study based on Natural Language Processing (NLP) Travel Med. Infect. Dis. 2022 49 102404
[34]
Ng QX, Yau CE, Lim YL, Wong LKT, and Liew TM Public sentiment on the global outbreak of monkeypox: an unsupervised machine learning analysis of 352,182 twitter posts Publ. Health 2022 213 1-4
[35]
Cooper, L.N., et al.: Analyzing an emerging pandemic on Twitter: Monkeypox. Open Forum Infect. Dis. 10 (2023).
[36]
Iparraguirre-Villanueva, O., et al.: The public health contribution of sentiment analysis of Monkeypox tweets to detect polarities using the CNN-LSTM model. Vaccines (Basel) 11, 312 (2023).
[37]
Dsouza VS et al. A sentiment and content analysis of tweets on monkeypox stigma among the LGBTQ+ community: a cue to risk communication plan Dialogues Health. 2023 2 100095
[38]
Zuhanda, M.K., Syofra, A.H.S., Mathelinea, D., Gio, P.U., Anisa, Y.A., Novita, N.: Analysis of twitter user sentiment on the monkeypox virus issue using the NRC lexicon. Mantik 6, 3854–3860 (2023).
[39]
Knudsen B, Høeg TB, and Prasad V Analysis of tweets discussing the risk of Mpox among children and young people in school (May–October 2022): a retrospective observational study BMJ Paediatr. Open. 2024 8 e002236
[40]
Bengesi S, Oladunni T, Olusegun R, and Audu H A machine learning-sentiment analysis on Monkeypox outbreak: an extensive dataset to show the polarity of public opinion from twitter tweets IEEE Access. 2023 11 11811-11826
[41]
Farahat RA, Yassin MA, Al-Tawfiq JA, Bejan CA, and Abdelazeem B Public perspectives of monkeypox in Twitter: A social media analysis using machine learning New Microbes New Infect. 2022 49–50 101053
[42]
Chen, Y., Yuan, J., You, Q., Luo, J.: Twitter sentiment analysis via bi-sense emoji embedding and attention-based LSTM. In: Proceedings of the 26th ACM International Conference on Multimedia. ACM, New York (2018)
[43]
Lou, Y., Zhang, Y., Li, F., Qian, T., Ji, D.: Emoji-based sentiment analysis using attention networks. ACM Trans. Asian Low-resour. Lang. Inf. Process. 19, 1–13 (2020).
[44]
Thakur N, Patel KA, Poon A, Shah R, Azizi N, and Han C A comprehensive analysis and investigation of the public discourse on twitter about exoskeletons from 2017 to 2023 Future Int. 2023 15 346
[45]
Liu C et al. Improving sentiment analysis accuracy with emoji embedding J. Safety Sci. Resil. 2021 2 246-252
[46]
Grover V Exploiting emojis in sentiment analysis: a survey J. Inst. Eng. (India): Series B 2021 103 1 259-272
[47]
Thakur N, Cui S, Khanna K, Knieling V, Duggal YN, and Shao M Investigation of the gender-specific discourse about online learning during COVID-19 on Twitter using sentiment analysis, subjectivity analysis, and toxicity analysis Computers. 2023 12 221
[48]
Calisir, E., Brambilla, M.: The problem of data cleaning for knowledge extraction from social media. In: Pautasso, C., Sánchez-Figueroa, F., Systä, K., Murillo Rodríguez, J. (eds.) Current Trends in Web Engineering. ICWE 2018. Lecture Notes in Computer Science(), vol. 11153. Springer, Cham (2018).
[49]
Batrinca B and Treleaven PC Social media analytics: a survey of techniques, tools and platforms AI Soc. 2015 30 89-116
[51]
Thakur N MonkeyPox2022Tweets: A large-scale Twitter dataset on the 2022 Monkeypox outbreak, findings from analysis of Tweets, and open research questions Infect. Dis. Rep. 2022 14 855-883
[52]
Malaeb D et al. Knowledge, attitude and conspiracy beliefs of healthcare workers in Lebanon towards Monkeypox Trop. Med. Infect. Dis. 2023 8 81
[53]
Mohbey KK, Meena G, Kumar S, and Lokesh K A CNN-LSTM-based hybrid deep learning approach for sentiment analysis on Monkeypox tweets New Gener. Comput. 2024 42 89-107
[54]
Subramani N, Veerappampalayam Easwaramoorthy S, Mohan P, Subramanian M, and Sambath V A gradient boosted decision tree-based influencer prediction in social network analysis Big Data Cogn. Comput. 2023 7 6
[55]
Hassani H, Komendantova N, Rovenskaya E, and Yeganegi MR Social intelligence mining: unlocking insights from X Mach. Learn. Knowl. Extr. 2023 5 1921-1936
[58]
Encodings supported by Python 3.12. https://docs.python.org/3.12/library/codecs.html. Accessed 07 Jun 2024
[59]
Encodings supported by Python 2.5. https://docs.python.org/2.5/lib/standard-encodings.html. Accessed 07 Jun 2024
[60]
Encodings supported by Python 2.6, https://docs.python.org/2.6/library/codecs.html. Accessed 07 Jun 2024
[61]
Encodings supported by Python 2.7. https://docs.python.org/2.7/library/codecs.html. Accessed 07 Jun 2024
[62]
Encodings supported by Python 3.0. https://docs.python.org/3.0/library/codecs.html. Accessed 07 Jun 2024
[63]
Encodings supported by Python 3.1. https://docs.python.org/3.1/library/codecs.html. Accessed 07 Jun 2024
[64]
Java, A., Song, X., Finin, T., Tseng, B.: Why we Twitter: understanding microblogging usage and communities. In: Proceedings of the 9th WebKDD and 1st SNA-KDD 2007 Workshop on Web Mining and Social Network Analysis. ACM, New York (2007)
[65]
Jansen BJ, Zhang M, Sobel K, and Chowdury A Twitter power: tweets as electronic word of mouth J. Am. Soc. Inf. Sci. Technol. 2009 60 2169-2188
[66]
Python. https://www.python.org/. Accessed 07 Jun 2024

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings
HCI International 2024 – Late Breaking Papers: 26th International Conference on Human-Computer Interaction, HCII 2024, Washington, DC, USA, June 29 – July 4, 2024, Proceedings, Part II
Jun 2024
404 pages
ISBN:978-3-031-76805-7
DOI:10.1007/978-3-031-76806-4

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 17 December 2024

Author Tags

  1. Emoji
  2. Social Media
  3. Big Data
  4. Data Analysis
  5. Natural Language Processing

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 01 Feb 2025

Other Metrics

Citations

View Options

View options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media