Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Can We Predict a Riot? Disruptive Event Detection Using Twitter

Published: 27 March 2017 Publication History

Abstract

In recent years, there has been increased interest in real-world event detection using publicly accessible data made available through Internet technology such as Twitter, Facebook, and YouTube. In these highly interactive systems, the general public are able to post real-time reactions to “real world” events, thereby acting as social sensors of terrestrial activity. Automatically detecting and categorizing events, particularly small-scale incidents, using streamed data is a non-trivial task but would be of high value to public safety organisations such as local police, who need to respond accordingly. To address this challenge, we present an end-to-end integrated event detection framework that comprises five main components: data collection, pre-processing, classification, online clustering, and summarization. The integration between classification and clustering enables events to be detected, as well as related smaller-scale “disruptive events,” smaller incidents that threaten social safety and security or could disrupt social order. We present an evaluation of the effectiveness of detecting events using a variety of features derived from Twitter posts, namely temporal, spatial, and textual content. We evaluate our framework on a large-scale, real-world dataset from Twitter. Furthermore, we apply our event detection system to a large corpus of tweets posted during the August 2011 riots in England. We use ground-truth data based on intelligence gathered by the London Metropolitan Police Service, which provides a record of actual terrestrial events and incidents during the riots, and show that our system can perform as well as terrestrial sources, and even better in some cases.

References

[1]
Fabian Abel, Claudia Hauf, Geert Houben, Richard Stronkman, and Ke Tao. 2012. Twitcident: Fighting fire with information from social web streams. In Proceedings of the 21st International Conference on World Wide Web (WWW’14 Companion). ACM, 305--308.
[2]
Manoj K. Agarwal, Krithi Ramamritham, and Manish Bhide. 2012. Real time discovery of dense clusters in highly dynamic graphs: Identifying real world events in highly dynamic environments. Proc. VLDB Endow. 5, 10 (June 2012), 980--991.
[3]
Nasser Alsaedi and Pete Burnap. 2015. Arabic event detection in social media. In Proceedings of the 16th International Conference on Intelligent Text Processing and Computational Linguistics (CICLing’15). 384--401.
[4]
Nasser Alsaedi, Pete Burnap, and Omer Rana. 2014. A combined classification-clustering framework for identifying disruptive events. In Proceedings of the 6th ASE International Conference on Social Computing (SocialCom’14).
[5]
Nasser Alsaedi, Pete Burnap, and Omer Rana. 2015. Identifying disruptive events from social media to enhance situational awareness. In Proceedings of the 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM’15).
[6]
Hila Becker, Mor Naaman, and Luis Gravano. 2011a. Beyond trending topics: Real-world event identification on twitter. In Proceedings of the 5th International AAAI Conference on Weblogs and Social Media (ICWSM’11).
[7]
Hila Becker, Mor Naaman, and Luis Gravano. 2011b. Selecting quality twitter content for events. In Proceedings of the 5th International Conference on Weblogs and Social Media.
[8]
Edward Benson, Aria Haghighi, and Regina Barzilay. 2011. Event discovery in social media feeds. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (HLT’11). Association for Computational Linguistics, Stroudsburg, PA, 389--398.
[9]
David M. Blei, Andrew Y. Ng, and Michael I. Jordan. 2003. Latent dirichlet allocation. Mach. Learn. Res. 3 (March 2003), 993--1022.
[10]
Alexander Boettcher and Dongman Lee. 2012. EventRadar: A real-time local event detection scheme using twitter stream. In Proceedings of the 2012 IEEE International Conference on Green Computing and Communications (GreenCom). 358--367.
[11]
Sergey Brin and Lawrence Page. 1998. The anatomy of a large-scale hypertextual web search engine. Comput. Netw. 30, 1--7 (1998), 107--117.
[12]
Pete Burnap, Amir Javed, Omer Rana, and Malik Shahzad Awan. 2015. Real-time classification of malicious URLs on twitter using machine activity data. In Proceedings of the 2015 ACM International Conference on Advances in Social Networks Analysis and Mining (SNAM’15). ACM, New York, NY.
[13]
Pete Burnap, Matthew Williams, Luke Sloan, Omer Rana, William Housley, Adam Edwards, Vincent Knight, Rob Procter, and Alex Voss. 2014. Tweeting the terror: Modelling the social media reaction to the Woolwich terrorist attack. Soc. Netw. Anal. Min. 4 (2014), 206.
[14]
Soudip Roy Chowdhury, Muhammad Imran, Muhammad Rizwan Asghar, Sihem Amer-Yahia, and Carlos Castillo. 2013. Tweet4act: Using incident-specific profiles for classifying crisis-related messages. In Proceedings of the 10th International Conference on Information Systems for Crisis Response and Management (ISCRAM’10).
[15]
Freddy Chong Tat Chua and Sitaram Asur. 2013. Automatic summarization of events from social media. In Proceedings of the Seventh International Conference on Weblogs and Social Media (ICWSM 2013).
[16]
Mário Cordeiro. 2012. Twitter event detection: Combining wavelet analysis and topic inference summarization. In Doctoral Symposium on Informatics Engineering, DSIE.
[17]
Bruce Croft, Donald Metzler, and Trevor Strohman. 2009. Search Engines: Information Retrieval in Practice (1st ed.). Addison-Wesley.
[18]
Mona Diab, Kadri Hacioglu, and Daniel Jurafsky. 2004. Automatic tagging of arabic text: From raw text to base phrase chunks. In Proceedings of HLT-NAACL 2004: Short Papers (HLT-NAACL-Short’04). Association for Computational Linguistics, Stroudsburg, PA, 149--152.
[19]
Xiaowen Dong, Dimitrios Mavroeidis, Francesco Calabrese, and Pascal Frossard. 2015. Multiscale event detection in social media. Data Min. Knowl. Discov. 29, 5 (2015), 1374--1405.
[20]
Gunes Erkan and Dragomir R. Radev. 2004. LexRank: Graph-based lexical centrality as salience in text summarization. J. Artif. Intell. Res. 22, 1 (2004), 457--479.
[21]
Atefeh Farzindar and Khreich Wael. 2015. A survey of techniques for event detection in twitter. Comput. Intell. 31, 1 (Feb. 2015), 132--164.
[22]
Jerome Friedman, Trevor Hastie, and Robert Tibshirani. 1998. Additive logistic regression: A statistical view of boosting. Ann. Stat. 28 (1998), 2000.
[23]
Brent Hecht, Lichan Hong, Bongwon Suh, and Ed H. Chi. 2011. Tweets from justin bieber’s heart: The dynamics of the location field in user profiles. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI’11). ACM, New York, NY, 237--246.
[24]
Muhammad Imran, Carlos Castillo, Fernando Diaz, and Sarah Vieweg. 2015. Processing social media messages in mass emergency: A survey. ACM Comput. Surv. 47, 4, Article 67 (June 2015), 38 pages.
[25]
Muhammad Imran, Carlos Castillo, Ji Lucas, Patrick Meier, and Sarah Vieweg. 2014. AIDR: Artificial intelligence for disaster response. In Proceedings of the 23rd International Conference on World Wide Web (WWW’14 Companion). ACM, 159--162.
[26]
Akshaya Iyengar, Tim Finin, and Anupam Joshi. 2011. Content-based prediction of temporal boundaries for events in twitter. In Proceedings of the 3rd IEEE International Conference on Social Computing. 186--191.
[27]
Thorsten Joachims. 1998. Text categorization with suport vector machines: Learning with many relevant features. In Proceedings of the 10th European Conference on Machine Learning (ECML’98). Springer-Verlag, London, UK, 137--142.
[28]
George Karypis, Rajat Aggarwal, Vipin Kumar, and Shashi Shekhar. 1997. Multilevel hypergraph partitioning: Application in VLSI domain. In Proceedings of the 34th Annual Design Automation Conference (DAC’97). ACM, New York, NY, 526--529.
[29]
David D. Lewis. 1998. Naive (bayes) at forty: The independence assumption in information retrieval. In Proceedings of the 10th European Conference on Machine Learning (ECML’98). Springer-Verlag, London, UK, 4--15.
[30]
Rui Li, Kin Hou Lei, Ravi Khadiwala, and Kevin Chen-Chuan Chang. 2012. TEDAS: A twitter-based event detection and analysis system. In ICDE. IEEE Computer Society, 1273--1276.
[31]
Yue Lu, ChengXiang Zhai, and Neel Sundaresan. 2009. Rated aspect summarization of short comments. In Proceedings of the 18th International Conference on World Wide Web (WWW’09). ACM, New York, NY, 131--140.
[32]
Zongyang Ma, Aixin Sun, and Gao Cong. 2013. On predicting the popularity of newly emerging hashtags in twitter. J. Assoc. Inf. Sci. Technol. 64, 7 (2013), 1399--1410.
[33]
Adam Marcus, Michael S. Bernstein, Osama Badar, David R. Karger, Samuel Madden, and Robert C. Miller. 2011. Twitinfo: Aggregating and visualizing microblogs for event exploration. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI’11). ACM, New York, NY, 227--236.
[34]
Michael Mathioudakis and Nick Koudas. 2010. TwitterMonitor: Trend detection over the twitter stream. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data (SIGMOD’10). ACM, New York, NY, 1155--1158.
[35]
Donald Metzler, Congxing Cai, and Eduard Hovy. 2012. Structured event retrieval over microblog archives. In Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL HLT’12). Association for Computational Linguistics, Stroudsburg, PA, 646--655.
[36]
Rada Mihalcea and Paul Tarau. 2004. TextRank: Bringing order into texts. In Proceedings of Empirical Methods in Natural Language Processing (EMNLP’04). Association for Computational Linguistics, 404--411.
[37]
Pabitra Mitra, C. A. Murthy, and Sankar K. Pal. 2002. Unsupervised feature selection using feature similarity. IEEE Trans. Pattern Anal. Mach. Intell. 24, 3 (March 2002), 301--312.
[38]
United kingdom Metropolitan Police Service MPS. 2012. 4 Days in August: Strategic Review into the Disorder of August 2011 - final report. Retrieved January 1, 2016 from http://www.met.police.uk/foi/pdfs/priorities_and_how_we_are_doing/corpo rate/4_days_in_august.pdf.
[39]
Jeffrey Nichols, Jalal Mahmud, and Clemens Drews. 2012. Summarizing sporting events using twitter. In Proceedings of the 2012 ACM International Conference on Intelligent User Interfaces (IUI’12). ACM, New York, NY, 189--198.
[40]
Yukio Ohsawa, Nels E. Benson, and Masahiko Yachida. 1998. KeyGraph: Automatic indexing by co-occurrence graph based on building construction metaphor. In Proceedings of the Advances in Digital Libraries Conference (ADL’98). IEEE Computer Society, Washington, DC, 12--.
[41]
Andrei Olariu. 2014. Efficient online summarization of microblogging streams. In Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2014). 236--240.
[42]
Alexandra Olteanu, Carlos Castillo, Fernando Diaz, and Sarah Vieweg. 2014. CrisisLex: A lexicon for collecting and filtering microblogged communications in crises. In Proceedings of the 8th International Conference on Weblogs and Social Media (ICWSM 2014).
[43]
Saša Petrović Miles Osborne, Richard McCreadie, Craig Macdonald, Iadh Ounis, and Luke Shrimptonand. 2013. Can twitter replace newswire for breaking news? In Proceedings of the 7th International AAAI Conference on Weblogs and Social Media (ICWSM’13).
[44]
Chi-Chun Pan and Prasenjit Mitra. 2011. Event detection with spatial latent dirichlet allocation. In Proceedings of the 11th Annual International ACM/IEEE Joint Conference on Digital Libraries (JCDL’11). ACM, New York, NY, 349--358.
[45]
Saša Petrović, Miles Osborne, and Victor Lavrenko. 2010. Streaming first story detection with application to twitter. In Proceedings of the 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics (HLT’10). Association for Computational Linguistics, Stroudsburg, PA, 181--189.
[46]
Swit Phuvipadawat and Tsuyoshi Murata. 2010. Breaking news detection and tracking in twitter. In Proceedings of the 2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT). 120--123.
[47]
M. F. Porter. 1997. An algorithm for suffix stripping. In Readings in Information Retrieval, Karen Sparck Jones and Peter Willett (Eds.). Morgan Kaufmann, San Francisco, CA, 313--316.
[48]
Dragomir R. Radev, Sasha Blair-Goldensohn, and Zhu Zhang. 2001. Experiments in single and multidocument summarization using MEAD. First Document Understanding Conference (2001).
[49]
Joel W. Reed, Yu Jiao, Thomas E. Potok, Brian A. Klump, Mark T. Elmore, and Ali R. Hurson. 2006. TF-ICF: A new term weighting scheme for clustering dynamic data streams. In Proceedings of the 5th International Conference on Machine Learning and Applications (ICMLA’06). IEEE Computer Society, Washington, DC, 258--263.
[50]
Gerard Salton and Christopher Buckley. 1988. Term-weighting approaches in automatic text retrieval. Inf. Process. Manage. 24, 5 (Aug. 1988), 513--523.
[51]
Hassan Sayyadi and Louiqa Raschid. 2013. A graph analytical approach for topic detection. ACM Trans. Internet Technol. 13, 2, Article 4 (Dec. 2013), 23 pages.
[52]
Emmanouil Schinas, Georgios Petkos, Symeon Papadopoulos, and Y. Kompatsiaris. 2012. CERTH @ mediaeval 2012 social event detection task. In Proceedings of the MediaEval 2012 Workshop. 6--7.
[53]
Axel Schulz, Benedikt Schmidt, and Thorsten Strufe. 2015. Small-scale incident detection based on microposts. In Proceedings of the 26th ACM Conference on Hypertext & Social Media (HT’15). ACM, New York, NY, 3--12.
[54]
David A. Shamma, Lyndon Kennedy, and Elizabeth F. Churchill. 2010. Tweetgeist: Can the twitter timeline reveal the structure of broadcast events?, Horizon, in CSCW 2010 (2010).
[55]
Beaux Sharifi, Mark-Anthony Hutton, and Jugal Kalita. 2010. Summarizing microblogs automatically. In Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics (HLT’10). Association for Computational Linguistics, Stroudsburg, PA, 685--688.
[56]
Chao Shen, Fei Liu, Fuliang Weng, and Tao Li. 2013. A participant-based approach for event summarization using twitter streams. In Proceedings of the Human Language Technologies: Conference of the North American Chapter of the Association of Computational Linguistics. 1152--1162.
[57]
Kate Starbird and Leysia Palen. 2012. (How) will the revolution be retweeted?: Information diffusion and the 2011 egyptian uprising. In Proceedings of the ACM 2012 Conference on Computer Supported Cooperative Work (CSCW’12). ACM, New York, NY, 7--16.
[58]
Nicholas A. Thapen, Donal Stephen Simmie, and Chris Hankin. 2015. The early bird catches the term: Combining twitter and news data for event detection and situational awareness. CoRR abs/1504.02335 (2015).
[59]
Mike Thelwall, Kevan Buckley, and Georgios Paltoglou. 2011. Sentiment in twitter events. J. Am. Soc. Inf. Sci. Technol. 62, 2 (Feb. 2011), 406--418.
[60]
Lucy Vanderwende, Hisami Suzuki, Chris Brockett, and Ani Nenkova. 2007. Beyond SumBasic: Task-focused summarization with sentence simplification and lexical expansion. Inf. Process. Manage. 43, 6 (Nov. 2007), 1606--1618.
[61]
Konstantinos N. Vavliakis, Andreas L. Symeonidis, and Pericles A. Mitkas. 2013. Event identification in web social media through named entity recognition and topic modeling. Data Knowl. Eng. 88 (2013), 1--24.
[62]
Sarah Vieweg, Carlos Castillo, and Muhammad Imran. 2014. Integrating social media communications into the rapid assessment of sudden onset disasters. In Proceedings of the 6th International Conference on Social Informatics. 444--461.
[63]
Sarah Vieweg, Amanda L. Hughes, Kate Starbird, and Leysia Palen. 2010. Microblogging during two natural hazards events: What twitter may contribute to situational awareness. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI’10). ACM, New York, NY, 1079--1088.
[64]
Maximilian Walther and Michael Kaisser. 2013. Geo-spatial event detection in the twitter stream. In Proceedings of the 35th European Conference on Advances in Information Retrieval (ECIR’13). Springer-Verlag, Berlin, 356--367.
[65]
Kazufumi Watanabe, Masanao Ochi, Makoto Okabe, and Rikio Onai. 2011. Jasmine: A real-time local-event detection system based on geolocation information propagated to microblogs. In Proceedings of the 20th ACM International Conference on Information and Knowledge Management (CIKM’11). ACM, New York, NY, 2541--2544.
[66]
Jianshu Weng and Bu-Sung Lee. 2011. Event detection in twitter. In Proceedings of the 5th International AAAI Conference on Weblogs and Social Media (ICWSM’11).
[67]
Matthew Williams and Pete Burnap. 2015. Cyberhate on social media in the aftermath of Woolwich: A case study in computational criminology and big data. Br. J. Criminol. (2015), 1--28.
[68]
Wei Xu, Ralph Grishman, Adam Meyers, and Alan Ritter. 2013. A preliminary study of tweet summarization using information extraction. In Proceedings of the Conference of the Association of Computational Linguistics and Workshop on Language in Social Media (LASM’13). 20--29.
[69]
Duan Yajuan, Chen Zhumin, Wei Furu, Zhou Ming, and Heung Y. Shum. 2012. Twitter topic summarization by ranking tweets using social influence and content quality. In Proceedings of the 24th International Conference on Computational Linguistics (COLING’12). 763--780.
[70]
Xintian Yang, Amol Ghoting, Yiye Ruan, and Srinivasan Parthasarathy. 2012. A framework for summarizing and analyzing twitter feeds. In Proceedings of the 18th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD’12). ACM, 370--378.
[71]
Jie Yin, Sarvnaz Karimi, Andrew Lampert, Mark A. Cameron, Bella Robinson, and Robert Power. 2015. Using social media to enhance emergency situation awareness: Extended abstract. In Proceedings of the 24th International Joint Conference on Artificial Intelligence, IJCAI. 4234--4239.
[72]
Arkaitz Zubiaga, Damiano Spina, Enrique Amigó, and Julio Gonzalo. 2012. Towards real-time summarization of scheduled events from twitter streams. In Proceedings of the 23rd ACM Conference on Hypertext and Social Media (HT’12). ACM, New York, NY, 319--320.

Cited By

View all
  • (2024)A Survey on Event Tracking in Social Media Data StreamsBig Data Mining and Analytics10.26599/BDMA.2023.90200217:1(217-243)Online publication date: Mar-2024
  • (2024)Characterizing, Modeling and Exploiting the Mobile Demand Footprint of Large Public ProtestsProceedings of the 2024 ACM on Internet Measurement Conference10.1145/3646547.3688421(516-529)Online publication date: 4-Nov-2024
  • (2024)Predicting Protests and Riots in Urban Environments With Satellite Imagery and Deep LearningTransactions in GIS10.1111/tgis.13236Online publication date: 30-Aug-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Internet Technology
ACM Transactions on Internet Technology  Volume 17, Issue 2
Special Issue on Advances in Social Computing and Regular Papers
May 2017
249 pages
ISSN:1533-5399
EISSN:1557-6051
DOI:10.1145/3068849
  • Editor:
  • Munindar P. Singh
Issue’s Table of Contents
This work is licensed under a Creative Commons Attribution-ShareAlike International 4.0 License.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 March 2017
Accepted: 01 September 2016
Revised: 01 July 2016
Received: 01 March 2016
Published in TOIT Volume 17, Issue 2

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Social media
  2. classification
  3. clustering
  4. evaluation
  5. event detection
  6. feature selection

Qualifiers

  • Research-article
  • Research
  • Refereed

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)60
  • Downloads (Last 6 weeks)11
Reflects downloads up to 09 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)A Survey on Event Tracking in Social Media Data StreamsBig Data Mining and Analytics10.26599/BDMA.2023.90200217:1(217-243)Online publication date: Mar-2024
  • (2024)Characterizing, Modeling and Exploiting the Mobile Demand Footprint of Large Public ProtestsProceedings of the 2024 ACM on Internet Measurement Conference10.1145/3646547.3688421(516-529)Online publication date: 4-Nov-2024
  • (2024)Predicting Protests and Riots in Urban Environments With Satellite Imagery and Deep LearningTransactions in GIS10.1111/tgis.13236Online publication date: 30-Aug-2024
  • (2024)D$i$E$v$D: Disruptive Event Detection From Dynamic Datastreams Using Continual Machine Learning: A Case Study With TwitterIEEE Transactions on Emerging Topics in Computing10.1109/TETC.2023.327297312:3(727-738)Online publication date: Jul-2024
  • (2024) DiEvD-SF : Disruptive Event Detection Using Continual Machine Learning With Selective Forgetting IEEE Transactions on Computational Social Systems10.1109/TCSS.2024.336454411:3(4189-4201)Online publication date: Jun-2024
  • (2024)Context-Aware Civil Unrest Event Prediction Using Neutrosophic-Aspect-Based Sentiment Analysis, PSO, and Hierarchical LSTMIEEE Transactions on Computational Social Systems10.1109/TCSS.2023.333850911:3(3667-3677)Online publication date: Jun-2024
  • (2024)Towards a resilience assessment framework for the airport passenger terminal operationsJournal of Air Transport Management10.1016/j.jairtraman.2023.102508114(102508)Online publication date: Jan-2024
  • (2023)Towards a Transparent and an Environmental-Friendly Approach for Short Text Topic Detection: A Comparison of Methods for Performance, Transparency, and Carbon FootprintJournal of Advances in Information Technology10.12720/jait.14.6.1240-125314:6(1240-1253)Online publication date: 2023
  • (2023)The Emotional Impact of COVID-19 News Reporting: A Longitudinal Study Using Natural Language ProcessingHuman Behavior and Emerging Technologies10.1155/2023/72831662023(1-16)Online publication date: 9-Mar-2023
  • (2023)Open-World Social Event ClassificationProceedings of the ACM Web Conference 202310.1145/3543507.3583291(1562-1571)Online publication date: 30-Apr-2023
  • Show More Cited By

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media