Abstract
The online platform has evolved into an unparalleled storehouse of information. People use various social question-and-answer websites such as Quora, Form-spring, Stack-Overflow, Twitter, and Beepl to ask questions, clarify doubts, and share ideas and expertise with others. An increase in inappropriate and insincere comments by users without a genuine motive is a major issue with such Q & A websites. Individuals tend to share harmful and toxic content intended to make a statement rather than look for helpful answers. In the world of natural language processing (NLP), Bidirectional Encoder Representations from Transformers (BERT) has been a game-changer. It has dominated performance benchmarks and thereby pushed the limits of researchers’ ability to experiment and produce similar models. This resulted in improvements in language models by introducing lighter models while maintaining efficiency and performance. This study utilized pre-trained state-of-the-art language models for understanding whether posted questions are sincere or insincere with limited computation. To overcome the high computation problem of NLP, the BERT, XLNet, StructBERT, and DeBERTa models were trained on three samples of data. The metrics proved that even with limited resources, recent transformer-based models outscore previous studies with remarkable results. Amongst the four, DeBERTa stands out with the highest balanced accuracy, macro, and weighted f1-score of 80%, 0.83 and 0.96, respectively.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Hosseinmardi, H., Mattson, S. A., Ibn Rafiq, R., Han, R., Lv, Q., Mishra, S.. Analyzing labeled cyberbullying incidents on the instagram social network. In: Liu, TY., Scollon, C., Zhu, W. (eds.) Social Informatics. SocInfo 2015. Lecture Notes in Computer Science, vol. 9471, pp. 49-66. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-27433-1_4
Maslej-Krešňáková, V., Sarnovský, M., Butka, P., Machová, K.: Comparison of deep learning models and various text pre-processing techniques for the toxic comments classification. Appl. Sci. 10(23), 8631 (2020)
Del Vicario, M., et al.: The spreading of misinformation online. Proc. Natl. Acad. Sci. 113(3), 554–559 (2016)
Morzhov, S.: Avoiding unintended bias in toxicity classification with neural networks. In: 2020 26th Conference of Open Innovations Association (FRUCT), pp. 314–320. IEEE (2020)
Quora Insincere Questions Classification | Kaggle, https://www.kaggle.com/c/quora-insincere-questions-classification/data. Accessed 02 Nov 2021
Kumar, A., Makhija, P., Gupta, A. Noisy Text Data: Achilles’ Heel of BERT. arXiv preprint arXiv:2003.12932 (2020)
Wirth, R., Hipp, J.: CRISP-DM: Towards a standard process model for data mining. In Proceedings of the 4th international conference on the practical applications of knowledge discovery and data mining, vol. 1, pp. 29–39 (2000)
Aslam, I., et al.: Classification of Insincere Questions Using Deep Learning: Quora Dataset Case Study. Springer International Publishing, Cham (2021)
Al-Ramahi, M.A. Alsmadi, I.: Using data analytics to filter insincere posts from online social networks. a case study: Quora Insincere Questions (2020)
Rachha, A. Vanmane, G.: Detecting insincere questions from text: a transfer learning approach (2020)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Chakraborty, S. et al. (2023). Quora Insincere Questions Classification Using Attention Based Model. In: Wah, Y.B., Berry, M.W., Mohamed, A., Al-Jumeily, D. (eds) Data Science and Emerging Technologies. DaSET 2022. Lecture Notes on Data Engineering and Communications Technologies, vol 165. Springer, Singapore. https://doi.org/10.1007/978-981-99-0741-0_26
Download citation
DOI: https://doi.org/10.1007/978-981-99-0741-0_26
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-0740-3
Online ISBN: 978-981-99-0741-0
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)