research-article

HindiPersonalityNet: Personality Detection in Hindi Conversational Data Using Deep Learning with Static Embedding

Authors:

Rohit BeniwalAuthors Info & Claims

ACM Transactions on Asian and Low-Resource Language Information Processing, Volume 23, Issue 8

Article No.: 117, Pages 1 - 13

https://doi.org/10.1145/3625228

Published: 07 August 2024 Publication History

Abstract

Personality detection along with other behavioral and cognitive assessment can essentially explain why people act the way they do and can be useful to various online applications such as recommender systems, job screening, matchmaking, and counseling. Additionally, psychometric natural language processing relying on textual cues and distinctive markers in writing style within conversational utterances reveals signs of individual personalities. This work demonstrates a text-based deep neural model, HindiPersonalityNet, of classifying conversations into three personality categories (ambivert, extrovert, introvert) for detecting personality in Hindi conversational data. The model utilizes a gated recurrent unit with BioWordVec embeddings for text classification and is trained/tested on a novel dataset, शख्सियत (pronounced as Shakhsiyat) curated using dialogues from an Indian crime-thriller drama series, Aarya. The model achieves an F1-score of 0.701 and shows the potential for leveraging conversational data from various sources to understand and predict a person's personality traits. It exhibits the ability to capture both semantic and long-distance dependencies in conversations and establishes the effectiveness of our dataset as a benchmark for personality detection in Hindi dialogue data. Further, a comprehensive comparison of various static and dynamic word embedding is done on our standardized dataset to ascertain the most suitable embedding method for personality detection.

References

[1]

T. Yoneda, T. Lozinski, N. Turiano, T. Booth, E. K. Graham, D. Mroczek, and G. M. Terrera. 2023. The Big Five personality traits and allostatic load in middle to older adulthood: A systematic review and meta-analysis. Neuroscience & Biobehavioral Reviews 148 (2023), 105145.

[2]

V. Ong, A. D. S. Rahmanto, Williem, D. Suhartono, A. E. Nugroho, E. W. Andangsari, and M. N. Suprayogi. 2017. Personality prediction based on Twitter information in Bahasa Indonesia. In Proceedings of the 2017 Federated Conference on Computer Science and Information Systems. 367--372. DOI:

[3]

A. Kumar, R. Beniwal, and D. Jain. 2023. Personality detection using kernel-based ensemble model for leveraging social psychology in online networks. ACM Transactions on Asian and Low-Resource Language Information Processing 22, 5 (2023), Article 151, 20 pages.

Digital Library

[4]

D. Jain, A. Kumar, and R. Beniwal. 2022. Personality BERT: A transformer-based model for personality detection from textual data. In Proceedings of the International Conference on Computing and Communication Networks (ICCCN’21). 515–522.

[5]

Z. Ren, Q. Shen, X. Diao, and H. Xu. 2021. A sentiment-aware deep learning approach for personality detection from text. Information Processing & Management 58, 3 (2021), 102532.

Digital Library

[6]

N. Cerkez, B. Vrdoljak, and S. Skansi. 2021. A method for MBTI classification based on impact of class components. IEEE Access 9 (2021), 146550–146567.

[7]

H. Shafi, A. Sikander, I. M. Jamal, J. Ahmad, and M. A. Aboamer. 2021. A machine learning approach for personality type identification using MBTI framework. Journal of Independent Studies and Research Computing 19, 2 (2021), 6–10.

[8]

M. C. Ashton and K. Lee. 2007. Empirical, theoretical, and practical advantages of the HEXACO model of personality structure. Personality and Social Psychology Review 11, 2 (2007), 150–166.

[9]

N. Aghababaei and A. Arji. 2014. Well-being and the HEXACO model of personality. Personality and Individual Differences 56 (2014), 139–142.

[10]

C. Ross, E. S. Orr, M. Sisic, J. M. Arseneault, M. G. Simmering, and R. R. Orr. 2009. Personality and motivations associated with Facebook use. Computers in Human Behavior 25, 2 (2009), 578–586.

Digital Library

[11]

I. J. Davidson. 2017. The ambivert: A failed attempt at a normal personality. Journal of the History of the Behavioral Sciences 53, 4 (Sept. 2017), 313–331.

[12]

D. R. Riso and R. Hudson. 2000. Understanding the Enneagram: The Practical Guide to Personality Types. Houghton Mifflin Harcourt.

[13]

I. Montag and J. Levin. 1994. The five-factor personality model in applied settings. European Journal of Personality 8, 1 (1994), 1–11.

[14]

J. P. Guilford and K. W. Braly. 1930. Extroversion and introversion. Psychological Bulletin 27, 2 (Feb. 1930), 96.

[15]

A. Kumar and V. H. C. Albuquerque. 2021. Sentiment analysis using XLM-R transformer and zero-shot transfer learning on resource-poor Indian language. Transactions on Asian and Low-Resource Language Information Processing 20, 5 (2021), 1–13.

Digital Library

[16]

D. Jain, A. Kumar, and G. Garg. 2020. Sarcasm detection in mash-up language using soft-attention based bi-directional LSTM and feature-rich CNN. Applied Soft Computing 91 (2020), 106198.

[17]

M. S. Salem, S. S. Ismail, and M. Aref. 2019. Personality traits for Egyptian Twitter users dataset. In Proceedings of the 2019 8th International Conference on Software and Information Engineering. 206–211.

Digital Library

[18]

S. Fatehi, Z. Anvarian, Y. Madani, M. Mehditabar, and S. Eetemadi. 2022. MBTI personality prediction approach on Persian Twitter. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP’22).

[19]

M. S. Anari, K. Rezaee, and A. Ahmadi. 2022. TraitLWNet: A novel predictor of personality trait by analyzing Persian handwriting based on lightweight deep convolutional neural network. Multimedia Tools and Applications 81, 8 (March 2022), 10673–10693.

Digital Library

[20]

G. Y. Adi, M. H. Tandio, V. Ong, and D. Suhartono. 2018.Optimization for automatic personality recognition on Twitter in Bahasa Indonesia. Procedia Computer Science 135 (Jan. 2018), 473–480.

[21]

S. N. Khan, M. Leekha, J. Shukla, and R. R. Shah. 2020. Vyaktitv: A multimodal peer-to-peer Hindi conversations-based dataset for personality assessment. In Proceedings of the 2020 IEEE 6th International Conference on Multimedia Big Data (BigMM’20). IEEE, Los Alamitos, CA, 103–111.

[22]

U. Rudra, A. N. Chy, and M. H. Seddiqui. 2020. Personality traits detection in Bangla: A benchmark dataset with comparative performance analysis of state-of-the-art methods. In Proceedings of the 2020 23rd International Conference on Computer and Information Technology (ICCIT’20). IEEE, Los Alamitos, CA, 1–6.

[23]

Y. Mehta, N. Majumder, A. Gelbukh, and E. Cambria. 2020. Recent trends in deep learning-based personality detection. Artificial Intelligence Review 53 (April 2020), 2313–2339.

[24]

R. L. Vásquez and J. Ochoa-Luna. 2021. Transformer-based approaches for personality detection using the MBTI model. In Proceedings of the 2021 XLVII Latin American Computing Conference (CLEI’21). IEEE, Los Alamitos, CA, 1–7.

[25]

J. K. Singh, G. Misra, and B. De Raad. 2013. Personality structure in the trait lexicon of Hindi, a major language spoken in India. European Journal of Personality 27, 6 (Nov. 2013), 605–620.

[26]

J. K. Singh and B. De Raad. 2017. The personality trait structure in Hindi replicated. International Journal of Personality Psychology 3 (June 2017), 26–35.

[27]

J. Pennington, R. Socher, and C. D. Manning. 2014. GloVe: Global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP’14). 1532–1543.

[28]

A. Kumar, K. Srinivasan, W. H. Cheng, and A. Y. Zomaya. 2020. Hybrid context enriched deep learning model for fine-grained sentiment analysis in textual and visual semiotic modality social data. Information Processing & Management 57, 1 (2020), 102141.

Digital Library

[29]

P. K. Sarma, Y. Liang, and W. A. Sethares. 2018. Domain adapted word embeddings for improved sentiment classification. arXiv preprint arXiv:1805.04576 (2018).

[30]

E. Sheehan, C. Meng, M. Tan, B. Uzkent, N. Jean, M. Burke, D. Lobell, and S. Ermon. 2019. Predicting economic development using geolocated Wikipedia articles. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2698–2706.

Digital Library

[31]

Y. Zhang, Q. Chen, Z. Yang, H. Lin, and Z. Lu. 2019. BioWordVec, improving biomedical word embeddings with subword information and MeSH. Scientific Data 6, 1 (2019), 52.

[32]

S. Wang, B. Tseng, and T. Hernandez-Boussard. 2021. Development and evaluation of novel ophthalmology domain-specific neural word embeddings to predict visual prognosis. International Journal of Medical Informatics 150 (2021), 104464.

[33]

Y. Wang, S. Liu, N. Afzal, M. Rastegar-Mojarad, L. Wang, F. Shen, P. Kingsbury, and H. Liu. 2018. A comparison of word embeddings for the biomedical natural language processing. Journal of Biomedical Informatics 87 (2018), 12–20.

[34]

S. Sharma and R. Daniel Jr. 2019. Bioflair: Pretrained pooled contextualized embeddings for biomedical sequence labeling tasks. arXiv preprint arXiv:1908.05760 (2019).

[35]

A. Kumar and N. Sachdeva. 2022. A Bi-GRU with attention and CapsNet hybrid model for cyberbullying detection on social media. World Wide Web 25, 4 (2022), 1537–1550.

Digital Library

[36]

D. K. Jain, A. Kumar, and S. R. Sangwan. 2022. TANA: The amalgam neural architecture for sarcasm detection in Indian indigenous language combining LSTM and SVM with word-emoji embeddings. Pattern Recognition Letters 160 (2022), 11–18.

Digital Library

[37]

S. Hu, A. Kumar, F. Al-Turjman, S. Gupta, and S. Seth. 2020. Reviewer credibility and sentiment analysis based user profile modelling for online product recommendation. IEEE Access 8 (2020), 26172–26189.

[38]

J. Ni, T. Young, V. Pandelea, F. Xue, and E. Cambria. 2023. Recent advances in deep learning based dialogue systems: A systematic survey. Artificial Intelligence Review 56, 4 (2023), 3055–3155.

Digital Library

[39]

J. Dodge, I. Gurevych, R. Schwartz, Schwartz, E. Strubell, and B. van Aken. 2023. Report from Dagstuhl Seminar 22232: Efficient and equitable natural language processing in the age of deep learning. In Dagstuhl Reports, J. Dodge, I. Gurevych, R. Schwartz, and E. Strubell (Eds.). Vol. 12. Schloss Dagstuhl–Leibniz-Zentrum für Informatik, Dagstuhl, Germany, 14–27.

[40]

R. Dey and F. M. Salem. 2017. Gate-variants of gated recurrent unit (GRU) neural networks. In Proceedings of the 2017 IEEE 60th International Midwest Symposium on Circuits and Systems (MWSCAS’17). IEEE, Los Alamitos, CA, 1597–1600.

Index Terms

HindiPersonalityNet: Personality Detection in Hindi Conversational Data Using Deep Learning with Static Embedding
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Language resources
  2. Machine learning
2. Information systems
  1. Information retrieval

Recommendations

Personality Analysis of Embodied Conversational Agents
IVA '18: Proceedings of the 18th International Conference on Intelligent Virtual Agents

People tend to personify machines. Giving machines the ability to actually produce social information can help improve human-machine interactions. Embodied Conversational Agents (ECAs) are virtual software agents that can process and produce speech, ...
Influence of personality traits on backchannel selection
IVA'10: Proceedings of the 10th international conference on Intelligent virtual agents

Our aim is to build a real-time Embodied Conversational Agent able to act as an interlocutor in interaction, generating automatically verbal and non verbal signals. These signals, called backchannels, provide information about the listener's mental ...
Influence of personality traits on users’ viewing behaviour

Different views on the role of personal factors in moderating individual viewing behaviour exist. This study examined the impact of personality traits on individual viewing behaviour of facial stimulus. A total of 96 students (46 males and 50 females, ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Asian and Low-Resource Language Information Processing

ACM Transactions on Asian and Low-Resource Language Information Processing Volume 23, Issue 8

August 2024

343 pages

EISSN:2375-4702

DOI:10.1145/3613611

Editor:
Imed Zitouni
Google, USA
,
Guest Editors:
Deepak Kumar Jain,
Thierry Boumans,
Stefano Berretti

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 August 2024

Online AM: 29 September 2023

Accepted: 16 September 2023

Revised: 14 August 2023

Received: 24 May 2023

Published in TALLIP Volume 23, Issue 8

Check for updates

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
296
Total Downloads

Downloads (Last 12 months)238
Downloads (Last 6 weeks)35

Reflects downloads up to 13 Nov 2024

Other Metrics

View Author Metrics

Citations

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Full Text

View this article in Full Text.

Media

Figures

Other

Tables

View full text|Download PDF

View Issue’s Table of Contents