Contrastive Learning of Emoji-Based Representations for Resource-Poor Languages

Choudhary, Nurendra; Singh, Rajat; Bindlish, Ishita; Shrivastava, Manish

doi:10.1007/978-3-031-23804-8_11

Nurendra Choudhary⁸,
Rajat Singh⁸,
Ishita Bindlish⁸ &
…
Manish Shrivastava⁸

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13397))

Included in the following conference series:

International Conference on Computational Linguistics and Intelligent Text Processing

297 Accesses

Abstract

The introduction of emojis (or emoticons) in social media platforms has given the users an increased potential for expression. We propose a novel method called Classification of Emojis using Siamese Network Architecture (CESNA) to learn emoji-based representations of resource-poor languages by jointly training them with resource-rich languages using a siamese network.

CESNA model consists of twin Bi-directional Long Short-Term Memory Recurrent Neural Networks (Bi-LSTM RNN) with shared parameters joined by a contrastive loss function based on a similarity metric. The model learns the representations of resource-poor and resource-rich language in a common emoji space by using a similarity metric based on the emojis present in sentences from both languages. The model, hence, projects sentences with similar emojis closer to each other and the sentences with different emojis farther from one another. Experiments on large-scale Twitter datasets of resource-rich languages - English and Spanish and resource-poor languages - Hindi and Telugu reveal that CESNA outperforms the state-of-the-art emoji prediction approaches based on distributional semantics, semantic rules, lexicon lists and deep neural network representations without shared parameters.

N. Choudhary and R. Singh–These authors have contributed equally to this work.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Emoji Prediction for Portuguese

Joint Emoji Classification and Embedding Learning

Comparative analysis of Deep Learning and Machine Learning algorithms for emoji prediction from Arabic text

Article 25 March 2024

Notes

1.
The Many Tongues of Twitter - MIT Technology Review.

References

Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014)
Balamurali, A., Joshi, A., Bhattacharyya, P.: Cross-lingual sentiment analysis for indian languages using linked wordnets. In: Proceedings of COLING 2012: Posters, pp. 73–82 (2012)
Google Scholar
Barbieri, F., Ballesteros, M., Saggion, H.: Are emojis predictable? arXiv preprint arXiv:1702.07285 (2017)
Boden, M.: A guide to recurrent neural networks and backpropagation. the Dallas project (2002)
Google Scholar
Bromley, J., Guyon, I., LeCun, Y., Säckinger, E., Shah, R.: Signature verification using a" siamese" time delay neural network. In: Advances in Neural Information Processing Systems, pp. 737–744 (1994)
Google Scholar
Chopra, S., Hadsell, R., LeCun, Y.: Learning a similarity metric discriminatively, with application to face verification. In: Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on. vol. 1, pp. 539–546. IEEE (2005)
Google Scholar
Chung, J., Gulcehre, C., Cho, K., Bengio, Y.: Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555 (2014)
Das, A., Yenala, H., Chinnakotla, M., Shrivastava, M.: Together we stand: Siamese networks for similar question retrieval. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). vol. 1, pp. 378–387 (2016)
Google Scholar
Dhingra, B., Zhou, Z., Fitzpatrick, D., Muehl, M., Cohen, W.W.: Tweet2vec: Character-based distributed representations for social media. arXiv preprint arXiv:1605.03481 (2016)
Ding, S., Cong, G., Lin, C.Y., Zhu, X.: Using conditional random fields to extract contexts and answers of questions from online forums. In: ACL. vol. 8, pp. 710–718 (2008)
Google Scholar
Dyer, C., Ballesteros, M., Ling, W., Matthews, A., Smith, N.A.: Transition-based dependency parsing with stack long short-term memory. arXiv preprint arXiv:1505.08075 (2015)
Joshi, A., Balamurali, A., Bhattacharyya, P.: A fall-back strategy for sentiment analysis in hindi: a case study. In: Proceedings of the 8th ICON (2010)
Google Scholar
Lample, G., Ballesteros, M., Subramanian, S., Kawakami, K., Dyer, C.: Neural architectures for named entity recognition. arXiv preprint arXiv:1603.01360 (2016)
LeCun, Y., Huang, F.J.: Loss functions for discriminative training of energy-based models. In: AIStats (2005)
Google Scholar
Liu, Y., Li, S., Cao, Y., Lin, C.Y., Han, D., Yu, Y.: Understanding and summarizing answers in community-based question answering services. In: Proceedings of the 22nd International Conference on Computational Linguistics-Volume 1, pp. 497–504. Association for Computational Linguistics (2008)
Google Scholar
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in neural information processing systems, pp. 3111–3119 (2013)
Google Scholar
Mukku, S.S., Choudhary, N., Mamidi, R.: Enhanced sentiment classification of telugu text using ml techniques. In: SAAIP@ IJCAI. pp. 29–34 (2016)
Google Scholar
Mukku, Sandeep Sricharan, Oota, Subba Reddy, Mamidi, Radhika: Tag me a label with multi-arm: active learning for Telugu sentiment analysis. In: Bellatreche, Ladjel, Chakravarthy, Sharma (eds.) DaWaK 2017. LNCS, vol. 10440, pp. 355–367. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-64283-3_26
Chapter Google Scholar
Sarkar, Kamal, Chakraborty, Saikat: A sentiment analysis system for indian language tweets. In: Prasath, Rajendra, Vuppala, Anil Kumar, Kathirvalavakumar, T.. (eds.) MIKE 2015. LNCS (LNAI), vol. 9468, pp. 694–702. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-26832-3_66
Chapter Google Scholar
Taggart, C.: New Words for Old: Recycling Our Language for the Modern World. Michael O’Mara Books (2015)
Google Scholar
Vinyals, O., Kaiser, Ł., Koo, T., Petrov, S., Sutskever, I., Hinton, G.: Grammar as a foreign language. In: Advances in Neural Information Processing Systems, pp. 2773–2781 (2015)
Google Scholar
Wang, P., Qian, Y., Soong, F.K., He, L., Zhao, H.: Learning distributed word representations for bidirectional lstm recurrent neural network. In: HLT-NAACL, pp. 527–533 (2016)
Google Scholar

Download references

Author information

Authors and Affiliations

Language Technologies Research Centre (LTRC), Kohli Center on Intelligent Systems (KCIS), International Institute of Information Technology, Hyderabad, India
Nurendra Choudhary, Rajat Singh, Ishita Bindlish & Manish Shrivastava

Authors

Nurendra Choudhary
View author publications
You can also search for this author in PubMed Google Scholar
Rajat Singh
View author publications
You can also search for this author in PubMed Google Scholar
Ishita Bindlish
View author publications
You can also search for this author in PubMed Google Scholar
Manish Shrivastava
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nurendra Choudhary .

Editor information

Editors and Affiliations

Instituto Politécnico Nacional, Mexico City, Mexico
Alexander Gelbukh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Choudhary, N., Singh, R., Bindlish, I., Shrivastava, M. (2023). Contrastive Learning of Emoji-Based Representations for Resource-Poor Languages. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2018. Lecture Notes in Computer Science, vol 13397. Springer, Cham. https://doi.org/10.1007/978-3-031-23804-8_11

Download citation

DOI: https://doi.org/10.1007/978-3-031-23804-8_11
Published: 26 February 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-23803-1
Online ISBN: 978-3-031-23804-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Contrastive Learning of Emoji-Based Representations for Resource-Poor Languages

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Emoji Prediction for Portuguese

Joint Emoji Classification and Embedding Learning

Comparative analysis of Deep Learning and Machine Learning algorithms for emoji prediction from Arabic text

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Contrastive Learning of Emoji-Based Representations for Resource-Poor Languages

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Emoji Prediction for Portuguese

Joint Emoji Classification and Embedding Learning

Comparative analysis of Deep Learning and Machine Learning algorithms for emoji prediction from Arabic text

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation