research-article

Topic enhanced word embedding for toxic content detection in Q&A sites

Authors:

Roy Ka-Wei LeeAuthors Info & Claims

ASONAM '19: Proceedings of the 2019 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining

Pages 1064 - 1071

https://doi.org/10.1145/3341161.3345332

Published: 15 January 2020 Publication History

Abstract

Increasingly, users are adopting community question-and-answer (Q&A) sites to exchange information. Detecting and eliminating toxic and divisive content in these Q&A sites are paramount tasks to ensure a safe and constructive environment for the users. Insincere question, which is founded upon false premises, is one type of toxic content in Q&A sites. In this paper, we proposed a novel deep learning framework enhanced pre-trained word embeddings with topical information for insincere question classification. We evaluated our proposed framework on a large real-world dataset from Quora Q&A site and showed that the topically enhanced word embedding is able to achieve better results in toxic content classification. An empirical study was also conducted to analyze the topics of the insincere questions on Quora, and we found that topics on "religion", "gender" and "politics" has a higher proportion of insincere questions.

References

[1]

G. Wang, K. Gill, M. Mohanlal, H. Zheng, and B. Y. Zhao, "Wisdom in the social crowd: an analysis of quora," in Proceedings of the 22nd international conference on World Wide Web. ACM, 2013, pp. 1341--1352.

Digital Library

[2]

Y. Liu and E. Agichtein, "On the evolution of the yahoo! answers qa community," in Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 2008, pp. 737--738.

[3]

A. Schultze-Krumbholz, K. Göbel, H. Scheithauer, A. Brighi, A. Guarini, H. Tsorbatzoudis, V. Barkoukis, J. Pyżalski, P. Plichta, R. Del Rey et al., "A comparison of classification approaches for cyberbullying and traditional bullying using data from six european countries," Journal of School Violence, vol. 14, no. 1, pp. 47--65, 2015.

[4]

K. Nalini and L. J. Sheela, "Classification of tweets using text classifier to detect cyber bullying," in Emerging ICT for Bridging the Future-Proceedings of the 49th Annual Convention of the Computer Society of India CSI Volume 2. Springer, 2015, pp. 637--645.

[5]

Q. Huang, V. K. Singh, and P. K. Atrey, "Cyber bullying detection using social and textual analysis," in Proceedings of the 3rd International Workshop on Socially-Aware Multimedia. ACM, 2014, pp. 3--6.

[6]

B. Nandhini and J. Sheeba, "Cyberbullying detection and classification using information retrieval algorithm," in Proceedings of the 2015 International Conference on Advanced Research in Computer Science Engineering & Technology (ICARCSET 2015). ACM, 2015, p. 20.

[7]

N. Djuric, J. Zhou, R. Morris, M. Grbovic, V. Radosavljevic, and N. Bhamidipati, "Hate speech detection with comment embeddings," in WWW, 2015.

[8]

P. Burnap and M. L. Williams, "Cyber hate speech on twitter: An application of machine classification and statistical modeling for policy and decision making," Policy & Internet, vol. 7, no. 2, pp. 223--242, 2015.

[9]

B. Gambäck and U. K. Sikdar, "Using convolutional neural networks to classify hate-speech," in Proceedings of the first workshop on abusive language online, 2017, pp. 85--90.

[10]

W. Warner and J. Hirschberg, "Detecting hate speech on the world wide web," in Proceedings of the second workshop on language in social media. Association for Computational Linguistics, 2012, pp. 19--26.

[11]

P. Badjatiya, S. Gupta, M. Gupta, and V. Varma, "Deep learning for hate speech detection in tweets," in Proceedings of the 26th International Conference on World Wide Web Companion. International World Wide Web Conferences Steering Committee, 2017, pp. 759--760.

[12]

A. Schmidt and M. Wiegand, "A survey on hate speech detection using natural language processing," in Proceedings of the Fifth International Workshop on Natural Language Processing for Social Media, 2017, pp. 1--10.

[13]

K. Kowsari, K. Jafari Meimandi, M. Heidarysafa, S. Mendu, L. Barnes, and D. Brown, "Text classification algorithms: A survey," Information, vol. 10, no. 4, p. 150, 2019.

[14]

Y. Kim, "Convolutional neural networks for sentence classification," in EMNLP, 2014.

[15]

S. Lai, L. Xu, K. Liu, and J. Zhao, "Recurrent convolutional neural networks for text classification," in AAAI, 2015.

[16]

P. Zhou, Z. Qi, S. Zheng, J. Xu, H. Bao, and B. Xu, "Text classification improved by integrating bidirectional lstm with two-dimensional max pooling," in International Conference on Computational Linguistics, 2016.

[17]

C. Zhou, C. Sun, Z. Liu, and F. Lau, "A c-lstm neural network for text classification," arXiv preprint arXiv:1511.08630, 2015.

[18]

P. Liu, X. Qiu, and X. Huang, "Recurrent neural network for text classification with multi-task learning," in IJCAI, 2016, pp. 2873--2879.

[19]

J. L. Elman, "Finding structure in time," Cognitive science, vol. 14, no. 2, pp. 179--211, 1990.

[20]

S. Hochreiter and J. Schmidhuber, "Long short-term memory," Neural computation, vol. 9, no. 8, pp. 1735--1780, 1997.

Digital Library

[21]

T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean, "Distributed representations of words and phrases and their compositionality," in NIPS, 2013.

[22]

J. Pennington, R. Socher, and C. Manning, "Glove: Global vectors for word representation," in EMNLP, 2014.

[23]

J. Wieting, M. Bansal, K. Gimpel, and K. Livescu, "From paraphrase database to compositional paraphrase model and back," Transactions of the Association for Computational Linguistics, vol. 3, pp. 345--358, 2015.

[24]

T. Mikolov, E. Grave, P. Bojanowski, C. Puhrsch, and A. Joulin, "Advances in pre-training distributed word representations," in Proceedings of the International Conference on Language Resources and Evaluation, 2018.

[25]

Y. Liu, Z. Liu, T.-S. Chua, and M. Sun, "Topical word embeddings," in Twenty-Ninth AAAI Conference on Artificial Intelligence, 2015.

[26]

S. Li, T.-S. Chua, J. Zhu, and C. Miao, "Generative topic embedding: a continuous representation of documents," in Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), vol. 1, 2016, pp. 666--675.

[27]

Y. Ren, R. Wang, and D. Ji, "A topic-enhanced word embedding for twitter sentiment classification," Information Sciences, vol. 369, pp. 188--198, 2016.

Digital Library

[28]

J. He, Z. Hu, T. Berg-Kirkpatrick, Y. Huang, and E. P. Xing, "Efficient correlated topic modeling with topic embedding," in Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2017, pp. 225--233.

[29]

C. Xing, W. Wu, Y. Wu, J. Liu, Y. Huang, M. Zhou, and W.-Y. Ma, "Topic aware neural response generation," in Thirty-First AAAI Conference on Artificial Intelligence, 2017.

[30]

A. Graves and J. Schmidhuber, "Framewise phoneme classification with bidirectional lstm and other neural network architectures," Neural Networks, vol. 18, no. 5-6, pp. 602--610, 2005.

Digital Library

[31]

T. Mikolov, K. Chen, G. Corrado, and J. Dean, "Efficient estimation of word representations in vector space," in ICLR Workshop, 2013.

[32]

J. Ganitkevitch, B. Van Durme, and C. Callison-Burch, "Ppdb: The paraphrase database," in NAACL, 2013.

[33]

D. M. Blei, A. Y. Ng, and M. I. Jordan, "Latent dirichlet allocation," JMLR, vol. 3, no. Jan, pp. 993--1022, 2003.

Digital Library

[34]

R. Collobert, J. Weston, L. Bottou, M. Karlen, K. Kavukcuoglu, and P. Kuksa, "Natural language processing (almost) from scratch," JMLR, vol. 12, no. Aug, pp. 2493--2537, 2011.

Digital Library

[35]

M. Röder, A. Both, and A. Hinneburg, "Exploring the space of topic coherence measures," in Proceedings of the eighth ACM international conference on Web search and data mining. ACM, 2015, pp. 399--408.

Cited By

Zuo WRaman AMondragÓN RTyson G(2023)A First Look at User-Controlled Moderation on Web3 Social Media: The Case of Memo.cashProceedings of the 3rd International Workshop on Open Challenges in Online Social Networks10.1145/3599696.3612901(29-37)Online publication date: 4-Sep-2023
https://dl.acm.org/doi/10.1145/3599696.3612901
Museng FJessica AWijaya NAnderies AIswanto I(2022)Systematic Literature Review: Toxic Comment Classification2022 IEEE 7th International Conference on Information Technology and Digital Applications (ICITDA)10.1109/ICITDA55840.2022.9971338(1-7)Online publication date: 4-Nov-2022
https://doi.org/10.1109/ICITDA55840.2022.9971338
Gontumukkala SGodavarthi YGonugunta BGupta DPalaniswamy S(2022)Quora Question Pairs Identification and Insincere Questions Classification2022 13th International Conference on Computing Communication and Networking Technologies (ICCCNT)10.1109/ICCCNT54827.2022.9984492(1-6)Online publication date: 3-Oct-2022
https://doi.org/10.1109/ICCCNT54827.2022.9984492
Show More Cited By

Recommendations

A topic-enhanced word embedding for Twitter sentiment classification

Word representation is crucial to lexical features used in Twitter sentiment analysis models. Recent work has demonstrated that dense, low-dimensional and real-valued word embedding gives competitive performance for Twitter sentiment classification. We ...
Morphological Word Embedding for Arabic
Abstract
Word embedding has opened new and exciting avenues for understanding and processing languages. The simple yet effective word embedding models rapidly became a dominant building block for Natural Language Processing (NLP) applications as they ...
Multi-prototype Morpheme Embedding for Text Classification
SMA 2020: The 9th International Conference on Smart Media and Applications

Representing a word into a continuous space, also known as a word vector, has been successful in various NLP tasks. The word-based embedding has two problems; one is the out-of-vocabulary problem and the other is does not take into account the context ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ASONAM '19: Proceedings of the 2019 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining

August 2019

1228 pages

ISBN:9781450368681

DOI:10.1145/3341161

Editors:
Francesca Spezzano
Boise State University
,
Wei Chen
Microsoft Research, China
,
Xiaokui Xiao
National University of Singapore, Singapore

Copyright © 2019 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGKDD: ACM Special Interest Group on Knowledge Discovery in Data

In-Cooperation

IEEE CS

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 January 2020

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

ASONAM '19

Sponsor:

SIGKDD

ASONAM '19: International Conference on Advances in Social Networks Analysis and Mining

August 27 - 30, 2019

British Columbia, Vancouver, Canada

Acceptance Rates

ASONAM '19 Paper Acceptance Rate 41 of 286 submissions, 14%;

Overall Acceptance Rate 116 of 549 submissions, 21%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

5
Total Citations
View Citations
145
Total Downloads

Downloads (Last 12 months)16
Downloads (Last 6 weeks)2

Reflects downloads up to 30 Aug 2024

Other Metrics

View Author Metrics

Citations

Cited By

Zuo WRaman AMondragÓN RTyson G(2023)A First Look at User-Controlled Moderation on Web3 Social Media: The Case of Memo.cashProceedings of the 3rd International Workshop on Open Challenges in Online Social Networks10.1145/3599696.3612901(29-37)Online publication date: 4-Sep-2023
https://dl.acm.org/doi/10.1145/3599696.3612901
Museng FJessica AWijaya NAnderies AIswanto I(2022)Systematic Literature Review: Toxic Comment Classification2022 IEEE 7th International Conference on Information Technology and Digital Applications (ICITDA)10.1109/ICITDA55840.2022.9971338(1-7)Online publication date: 4-Nov-2022
https://doi.org/10.1109/ICITDA55840.2022.9971338
Gontumukkala SGodavarthi YGonugunta BGupta DPalaniswamy S(2022)Quora Question Pairs Identification and Insincere Questions Classification2022 13th International Conference on Computing Communication and Networking Technologies (ICCCNT)10.1109/ICCCNT54827.2022.9984492(1-6)Online publication date: 3-Oct-2022
https://doi.org/10.1109/ICCCNT54827.2022.9984492
Singh IGoyal GChandel A(2022)AlexNet architecture based convolutional neural network for toxic comments classificationJournal of King Saud University - Computer and Information Sciences10.1016/j.jksuci.2022.06.00734:9(7547-7558)Online publication date: Oct-2022
https://doi.org/10.1016/j.jksuci.2022.06.007
Ghosh KBanerjee ABhattacharjee MChatterjee S(2021)Improved Twitter Sarcasm Detection by Addressing Imbalanced Class ProblemAdvances in Smart Communication Technology and Information Processing10.1007/978-981-15-9433-5_14(135-145)Online publication date: 16-Feb-2021
https://doi.org/10.1007/978-981-15-9433-5_14

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents