research-article

Automatic classification of Aurora-related tweets using machine learning methods

Authors:

Vyron Christodoulou,

Rosa Filgueira,

Elizabeth MacDonald,

Burcu KosarAuthors Info & Claims

ICGDA '19: Proceedings of the 2019 2nd International Conference on Geoinformatics and Data Analysis

Pages 115 - 119

https://doi.org/10.1145/3318236.3318242

Published: 15 March 2019 Publication History

Abstract

The constant flow of information by social media provides valuable information about all sorts of events at a high temporal and spatial resolution. Over the past few years we have been analyzing in real-time geological hazards/phenomena, such as earthquakes, volcanic eruptions, landslides, floods or the aurora, as part of the GeoSocial project, by geo-locating tweets filtered by keywords in a web-map. However, up to this date only a keyword-based filtering was applied that does not always filter out tweets that are unrelated to hazard-events. Therefore, this work explores five learning-based classification techniques: a Linear SVM and four Deep Neural Networks (DNNs): a Convolutional Neural Network (CNN), a Recurrent Neural Network (RNN), a RNN-Long-short-term memory (RNN-LSTM) and a RNN-Gated Recurrent Unit (GRU) for automatic hazard-event classification based on tweets about Aurora sightings. In addition, for the DNNS we also trained the algorithms using pre-trained word2vec word-embeddings. We finally evaluate the algorithms using two datasets, one from the Aurorasaurus application and one manually labeled in the BGS. We show that DNNs and especially the CNN perform better for both datasets and that there is potential for improvement. Our code is also available online.

References

[1]

Zahra Ashktorab, Christopher Brown, Manojit Nandi, and Aron Culotta. 2014. Tweedr: Mining twitter to inform disaster response. In ISCRAM.

[2]

Marco Avvenuti, Stefano Cresci, Andrea Marchetti, Carlo Meletti, and Maurizio Tesconi. 2014. EARS (earthquake alert and report system): a real time decision support system for earthquake crisis management. In Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, 1749--1758.

Digital Library

[3]

Diaz-Doce D. Stuteley J. Bee E., Poole J. 2017. GeoSocial V2.0 - An application aiming to detect natural geohazards using Twitter {Online}. http://www.bgs.ac. uk/citizenScience/geosocial/home.html. (2017).

[4]

Yoshua Bengio, Patrice Simard, and Paolo Frasconi. 1994. Learning long-term dependencies with gradient descent is difficult. IEEE transactions on neural networks 5, 2 (1994), 157--166.

Digital Library

[5]

Stuart J Blair, Yaxin Bi, and Maurice D Mulvenna. 2016. Sentiment Classification of Social Media Content with Features Generated Using Topic Models. In STAIRS. 155--166.

[6]

Nathan Anthony Case, Elizabeth A MacDonald, Sean McCloat, Nicolas Lalone, and Andrea Tapia. 2016. Determining the accuracy of crowdsourced tweet verification for auroral research. Citizen Science: Theory and Practice 2016 (2016).

[7]

Kyunghyun Cho, Bart Van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078 (2014).

[8]

Paul S Earle, Daniel C Bowden, and Michelle Guy. 2012. Twitter earthquake detection: earthquake monitoring in a social world. Annals of Geophysics 54, 6 (2012).

[9]

Fréderic Godin, Baptist Vandersmissen, Wesley De Neve, and Rik Van de Walle. 2015. Multimedia Lab @ ACL WNUT NER Shared Task: Named Entity Recognition for Twitter Microposts using Distributed Word Representations. In Proceedings of the Workshop on Noisy User-generated Text. 146--153.

[10]

Jheser Guzman and Barbara Poblete. 2013. On-line relevant anomaly detection in the Twitter stream: an efficient bursty keyword detection model. In Proceedings of the acm sigkdd workshop on outlier detection and description. ACM, 31--39.

Digital Library

[11]

Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation 9, 8 (1997), 1735--1780.

Digital Library

[12]

Farhan Hassan Khan, Saba Bashir, and Usman Qamar. 2014. TOM: Twitter opinion mining framework using hybrid classification scheme. Decision Support Systems 57 (2014), 245--257.

Digital Library

[13]

Yoon Kim. 2014. Convolutional neural networks for sentence classification. arXiv preprint arXiv:1408.5882 (2014).

[14]

EA MacDonald, NA Case, JH Clayton, MK Hall, Matt Heavner, Nicolas Lalone, KG Patel, and Andrea Tapia. 2015. Aurorasaurus: A citizen science platform for viewing and reporting the aurora. Space Weather 13, 9 (2015), 548--559.

[15]

Tomas Mikolov, Edouard Grave, Piotr Bojanowski, Christian Puhrsch, and Armand Joulin. 2017. Advances in pre-training distributed word representations. arXiv preprint arXiv:1712.09405 (2017).

[16]

Venkata Kishore Neppalli, Cornelia Caragea, and Doina Caragea. Deep Neural Networks versus Naïve Bayes Classifiers for Identifying Informative Tweets during Disasters. (????).

[17]

Takeshi Sakaki, Makoto Okazaki, and Yutaka Matsuo. 2010. Earthquake shakes Twitter users: real-time event detection by social sensors. In Proceedings of the 19th international conference on World wide web. ACM, 851--860.

Digital Library

[18]

Hien To, Sumeet Agrawal, Seon Ho Kim, and Cyrus Shahabi. 2017. On Identifying Disaster-Related Tweets: Matching-based or Learning-based?. In Multimedia Big Data (BigMM), 2017 IEEE Third International Conference on. IEEE, 330--337.

Index Terms

Automatic classification of Aurora-related tweets using machine learning methods
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Supervised learning
        Supervised learning by classification
    2. Machine learning approaches
      1. Neural networks
2. Human-centered computing
  1. Collaborative and social computing
    1. Collaborative and social computing theory, concepts and paradigms
      1. Social media

Recommendations

A Word-Character Convolutional Neural Network for Language-Agnostic Twitter Sentiment Analysis
ADCS '17: Proceedings of the 22nd Australasian Document Computing Symposium

Convolutional Neural Networks (CNN) have been widely used for text classification. Both word-based CNNs and character-based CNNs have shown good performance for Twitter sentiment classification. Most research on CNNs is towards English Twitter sentiment ...
Classification of news-related tweets

It is important to obtain public opinion about a news article. Microblogs such as Twitter are popular and an important medium for people to share ideas. An important portion of tweets are related to news or events. Our aim is to find tweets about ...
Analysis of Tweets Related to Cyberbullying: Exploring Information Diffusion and Advice Available for Cyberbullying Victims

The use of Twitter, especially by teenagers and young people, has raised the issue of cyberbullying. There is a lack of research into what types of advice and support are available in tweets for cyberbullying victims, and into the features influencing ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

ICGDA '19: Proceedings of the 2019 2nd International Conference on Geoinformatics and Data Analysis

March 2019

156 pages

ISBN:9781450362450

DOI:10.1145/3318236

Copyright © 2019 ACM.

Publication rights licensed to ACM. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of the United States government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

In-Cooperation

Department of Informatics, University of Oslo

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 March 2019

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

ICGDA 2019

ICGDA 2019: 2019 2nd International Conference on Geoinformatics and Data Analysis & 2019 2nd International Conference on Software and Services Engineering

March 15 - 17, 2019

Prague, Czech Republic

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
109
Total Downloads

Downloads (Last 12 months)10
Downloads (Last 6 weeks)2

Reflects downloads up to 04 Oct 2024

Other Metrics

View Author Metrics

Citations

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents