Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3350546.3352518acmotherconferencesArticle/Chapter ViewAbstractPublication PageswiConference Proceedingsconference-collections
research-article

Word Embedding based Clustering to Detect Topics in Social Media

Published: 14 October 2019 Publication History

Abstract

Social media are playing an increasingly important role in reporting major events happening in the world. However, detecting events and topics of interest from social media is a challenging task due to the huge magnitude of the data and the complex semantics of the language being processed. The paper proposes an online algorithm to discover topics that incrementally groups short text by incorporating the textual content with latent feature vector representations of words appearing in the text, trained on very large corpora to improve the check-in topic mapping learnt on a smaller corpus. Experimental results show that by using information from the external corpora, the approach obtains significant improvements with respect to classical topic detection methods.

References

[1]
2017. Amazon Mechanical Turk. https://www.mturk.com
[2]
2017. Using Machine Learning to Analyze Twitter for Real Time Influenza Surveillance. https://medium.com/@justinzcai/
[3]
Luca Maria Aiello, Georgios Petkos, Carlos Martin, David Corney, Symeon Papadopoulos, Ryan Skraba, Ayse Goker, Ioannis Kompatsiaris, and Alejandro Jaimes. 2013. Sensing Trending Topics in Twitter. IEEE Transactions on Multimedia 15, 6 (2013), 1268–1282.
[4]
Nasser Alsaedi, Pete Burnap, and Omer Rana. 2017. Can We Predict a Riot? Disruptive Event Detection Using Twitter. ACM Trans. Internet Technol. 17, 2 (March 2017), 18:1–18:26.
[5]
Farzindar Atefeh and Wael Khreich. 2015. A Survey of Techniques for Event Detection in Twitter. Comput. Intell. 31, 1 (Feb. 2015), 132–164.
[6]
Hila Becker, Mor Naaman, and Luis Gravano. 2011. Beyond Trending Topics: Real-World Event Identification on Twitter. In Proceedings of the Fifth International Conference on Weblogs and Social Media.
[7]
David M. Blei, Andrew Y. Ng, Michael I. Jordan, and John Lafferty. 2003. Latent Dirichlet Allocation. Journal of Machine Learning Research 3 (2003), 2003.
[8]
Moody Christopher. 2016. Mixing Dirichlet Topic Models and Word Embeddings to Make lda2vec. https://arxiv.org/pdf/1605.02019.pdf(2016).
[9]
Carmela Comito, Clara Pizzuti, and Nicola Procopio. 2016. Online Clustering for Topic Detection in Social Data Streams. In 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI. 362–369.
[10]
Carmela Comito, Clara Pizzuti, and Nicola Procopio. 2017. How people talk about health?: Detecting Health Topics from Twitter Streams. In Proceedings of the International Conference on Big Data and Internet of Things,. 85–90.
[11]
Mário Cordeiro and João Gama. 2016. Online Social Networks Event Detection: A Survey. Springer International Publishing, Cham, 1–41.
[12]
Xiangfeng Dai, Marwan Bikdash, and Bradley Meyer. 2017. From social media to public health surveillance: Word embedding based clustering method for twitter classification. In Proc. of the IEEE SoutheastCon, Charlotte NC March 30 ? April 2, 2017. 1–7.
[13]
Maha Fraj, Mohamed Aymen Ben HajKacem, and Nadia Essoussi. 2018. A Novel Tweets Clustering Method using Word Embeddings. In 15th IEEE/ACS International Conference on Computer Systems and Applications, AICCSA 2018, Aqaba, Jordan, October 28 - Nov. 1, 2018. 1–7.
[14]
Anuradha Goswami and Ajey Kumar. 2016. A survey of event detection techniques in online social networks. Social Network Analysis and Mining 6, 1 (17 Nov 2016), 107.
[15]
Mahmud Hasan, Mehmet A Orgun, and Rolf Schwitter. 2017. A survey on real-time event detection from Twitter data stream. Journal of Information Science online March 29017, https://doi.org/10.1177/0165551517698564(2017), 1–21.
[16]
Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient Estimation of Word Representations in Vector Space. In 1st International Conference on Learning Representations, ICLR 2013.
[17]
Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Distributed Representations of Words and Phrases and Their Compositionality. In Proceedings of the 26th International Conference on Neural Information Processing Systems - Volume 2(NIPS’13). 3111–3119.
[18]
Michael J. Paul and Mark Dredze. 2011. You Are What You Tweet: Analyzing Twitter for Public Health. In Proceedings of the Fifth International Conference on Weblogs and Social Media, Barcelona, Catalonia, Spain, July 17-21, 2011.
[19]
Michael J. Paul and Mark Dredze. 2014. Discovering Health Topics in Social Media Using Topic Models. PLoS ONE 9, 8 (2014).
[20]
Saša Petrović, Miles Osborne, and Victor Lavrenko. 2010. Streaming First Story Detection with Application to Twitter. In Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics. Association for Computational Linguistics, 181–189.
[21]
Cao Qimin, Guo Qiao, Wang Yongliang, and Wu Xianghua. 2015. Text Clustering Using VSM with Feature Clusters. Neural Comput. Appl. 26, 4 (May 2015), 995–1003.
[22]
Gerard Salton and Michael J. McGill. 1986. Introduction to Modern Information Retrieval. McGraw-Hill, Inc., New York, NY, USA.
[23]
Yee Whye Teh, David Newman, and Max Welling. 2007. A collapsed variational Bayesian inference algorithm for latent Dirichlet allocation. Adv. Neural Inf. Process. Syst 19 (2007).
[24]
Andreas Weiler, Michael Grossniklaus, and Marc H. Scholl. 2017. Survey and Experimental Analysis of Event Detection Techniques for Twitter. Computer Journal 60, 3 (2017), 329–346.
[25]
Yiming Yang, Tom Pierce, and Jaime Carbonell. 1998. A Study of Retrospective and On-line Event Detection. In Proc. of the 21st ACM SIGIR(SIGIR ’98). ACM, 28–36.
[26]
Jie Yin, Andrew Lampert, Mark A. Cameron, Bella Robinson, and Robert Power. 2012. Using Social Media to Enhance Emergency Situation Awareness. IEEE Intelligent Systems 27, 6 (2012), 52–59.

Cited By

View all
  • (2024)Hate Speech in Indian Cyber Space at the Intersection of Law and TechnologyIntersections Between Rights and Technology10.4018/979-8-3693-1127-1.ch009(156-191)Online publication date: 10-Jul-2024
  • (2024)Machine-Learning-Based Approaches for Multi-Level Sentiment Analysis of Romanian ReviewsMathematics10.3390/math1203045612:3(456)Online publication date: 31-Jan-2024
  • (2024)Hybrid topic modeling method based on dirichlet multinomial mixture and fuzzy match algorithm for short text clusteringJournal of Big Data10.1186/s40537-024-00930-911:1Online publication date: 9-May-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
WI '19: IEEE/WIC/ACM International Conference on Web Intelligence
October 2019
507 pages
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 October 2019

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Clustering
  2. Social Media
  3. Topic Detection
  4. Word Embedding

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

WI '19

Acceptance Rates

Overall Acceptance Rate 118 of 178 submissions, 66%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)96
  • Downloads (Last 6 weeks)11
Reflects downloads up to 26 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Hate Speech in Indian Cyber Space at the Intersection of Law and TechnologyIntersections Between Rights and Technology10.4018/979-8-3693-1127-1.ch009(156-191)Online publication date: 10-Jul-2024
  • (2024)Machine-Learning-Based Approaches for Multi-Level Sentiment Analysis of Romanian ReviewsMathematics10.3390/math1203045612:3(456)Online publication date: 31-Jan-2024
  • (2024)Hybrid topic modeling method based on dirichlet multinomial mixture and fuzzy match algorithm for short text clusteringJournal of Big Data10.1186/s40537-024-00930-911:1Online publication date: 9-May-2024
  • (2024)Towards a crowdsourced framework for online hate speech moderation - a case study in the Indian political scenarioCompanion Publication of the 16th ACM Web Science Conference10.1145/3630744.3663607(75-84)Online publication date: 21-May-2024
  • (2024)Analyzing supply chain technology trends through network analysis and clustering techniques: a patent-based studyAnnals of Operations Research10.1007/s10479-024-06119-w341:1(313-348)Online publication date: 26-Jun-2024
  • (2024)Bridging the Gap: Condensing Knowledge Graphs for Metaphor Processing by Visualizing Relationships in Figurative and Literal ExpressionsMachine Learning, Image Processing, Network Security and Data Sciences10.1007/978-3-031-62217-5_28(339-351)Online publication date: 11-Jun-2024
  • (2024)Blockchain technology in circular economy: Unpacking the potential issues and critical echoes through data triangulation and natural language processingBusiness Strategy and the Environment10.1002/bse.3763Online publication date: 8-Apr-2024
  • (2023)WhatsUp: An event resolution approach for co-occurring events in social mediaInformation Sciences10.1016/j.ins.2023.01.001625(553-577)Online publication date: May-2023
  • (2022)Methods, Models and Tools for Improving the Quality of Textual AnnotationsModelling10.3390/modelling30200153:2(224-242)Online publication date: 12-Apr-2022
  • (2022)A Literature Review of Textual Hate Speech Detection Methods and DatasetsInformation10.3390/info1306027313:6(273)Online publication date: 26-May-2022
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media