A Multi-input Multi-output Transformer-based Hybrid Neural Network for Multi-class Privacy Disclosure Detection

Mehdy, A K M Nuhil; Mehrpouyan, Hoda

Computer Science > Machine Learning

arXiv:2108.08483 (cs)

[Submitted on 19 Aug 2021 (v1), last revised 20 Aug 2021 (this version, v2)]

Title:A Multi-input Multi-output Transformer-based Hybrid Neural Network for Multi-class Privacy Disclosure Detection

Authors:A K M Nuhil Mehdy, Hoda Mehrpouyan

View PDF

Abstract:The concern regarding users' data privacy has risen to its highest level due to the massive increase in communication platforms, social networking sites, and greater users' participation in online public discourse. An increasing number of people exchange private information via emails, text messages, and social media without being aware of the risks and implications. Researchers in the field of Natural Language Processing (NLP) have concentrated on creating tools and strategies to identify, categorize, and sanitize private information in text data since a substantial amount of data is exchanged in textual form. However, most of the detection methods solely rely on the existence of pre-identified keywords in the text and disregard the inference of the underlying meaning of the utterance in a specific context. Hence, in some situations, these tools and algorithms fail to detect disclosure, or the produced results are miss-classified. In this paper, we propose a multi-input, multi-output hybrid neural network which utilizes transfer-learning, linguistics, and metadata to learn the hidden patterns. Our goal is to better classify disclosure/non-disclosure content in terms of the context of situation. We trained and evaluated our model on a human-annotated ground truth dataset, containing a total of 5,400 tweets. The results show that the proposed model was able to identify privacy disclosure through tweets with an accuracy of 77.4% while classifying the information type of those tweets with an impressive accuracy of 99%, by jointly learning for two separate tasks.

Comments:	20 pages
Subjects:	Machine Learning (cs.LG)
Cite as:	arXiv:2108.08483 [cs.LG]
	(or arXiv:2108.08483v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2108.08483
Journal reference:	2nd International Conference on Machine Learning Techniques and NLP (MLNLP 2021), September 18 - 19, 2021, Copenhagen, Denmark

Submission history

From: A K M Nuhil Mehdy [view email]
[v1] Thu, 19 Aug 2021 03:58:49 UTC (905 KB)
[v2] Fri, 20 Aug 2021 18:09:22 UTC (988 KB)

Computer Science > Machine Learning

Title:A Multi-input Multi-output Transformer-based Hybrid Neural Network for Multi-class Privacy Disclosure Detection

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:A Multi-input Multi-output Transformer-based Hybrid Neural Network for Multi-class Privacy Disclosure Detection

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators