Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

The climate change Twitter dataset

Published: 15 October 2022 Publication History

Abstract

This work creates and makes publicly available the most comprehensive dataset to date regarding climate change and human opinions via Twitter. It has the heftiest temporal coverage, spanning over 13 years, includes over 15 million tweets spatially distributed across the world, and provides the geolocation of most tweets. Seven dimensions of information are tied to each tweet, namely geolocation, user gender, climate change stance and sentiment, aggressiveness, deviations from historic temperature, and topic modeling, while accompanied by environmental disaster events information. These dimensions were produced by testing and evaluating a plethora of state-of-the-art machine learning algorithms and methods, both supervised and unsupervised, including BERT, RNN, LSTM, CNN, SVM, Naive Bayes, VADER, Textblob, Flair, and LDA.

Highlights

Create the most extensive dataset for climate change and human opinions via Twitter.
Make it publicly available.
Link 7 dimensions of information to each of the 15 million geolocated tweets.
Include Gender, Stance, Sentiment, Aggressiveness, Temperature, Topics, Disasters.
Use of BERT, RNN, LSTM, CNN, SVM, Naive Bayes, VADER, Textblob, Flair, and LDA.

References

[1]
Abdar M., Basiri M.E., Yin J., Habibnezhad M., Chi G., Nemati S., et al., Energy choices in Alaska: Mining people’s perception and attitudes from geotagged tweets, Renewable and Sustainable Energy Reviews 124 (2020).
[2]
Akbik, A., Bergmann, T., Blythe, D., Rasul, K., Schweter, S., & Vollgraf, R. (2019). FLAIR: An easy-to-use framework for state-of-the-art NLP. In NAACL 2019, 2019 annual conference of the north american chapter of the association for computational linguistics (pp. 54–59).
[3]
Akbik, A., Blythe, D., & Vollgraf, R. (2018). Contextual string embeddings for sequence labeling. In COLING 2018, 27th international conference on computational linguistics (pp. 1638–1649).
[4]
Al-Jarrah O.Y., Yoo P.D., Muhaidat S., Karagiannidis G.K., Taha K., Efficient machine learning for big data: A review, Big Data Research 2 (3) (2015) 87–93.
[5]
An, X., Ganguly, A. R., Fang, Y., Scyphers, S. B., Hunter, A. M., & Dy, J. G. (2014). Tracking climate change opinions from twitter data. In Workshop on data science for social good (pp. 1–6).
[6]
Anderson A.A., Huntington H.E., Social media, science, and attack discourse: How Twitter discussions of climate change use sarcasm and incivility, Science Communication 39 (5) (2017) 598–620.
[7]
Baylis P., Temperature and temperament: Evidence from Twitter, Journal of Public Economics 184 (2020).
[8]
Bird S., Klein E., Loper E., Natural language processing with python: analyzing text with the natural language toolkit, O’Reilly Media, Inc., 2009.
[9]
Blei D.M., Ng A.Y., Jordan M.I., Latent dirichlet allocation, Journal of Machine Learning Research 3 (2003) 993–1022.
[10]
Brulle R.J., Carmichael J., Jenkins J.C., Shifting public opinion on climate change: an empirical assessment of factors influencing concern over climate change in the US, 2002–2010, Climatic Change 114 (2) (2012) 169–188.
[11]
Chen E., Lerman K., Ferrara E., et al., Tracking social media discourse about the covid-19 pandemic: Development of a public coronavirus twitter data set, JMIR Public Health and Surveillance 6 (2) (2020).
[12]
Chen, X., Zou, L., & Zhao, B. (2019). Detecting climate change deniers on twitter using a deep neural network. In Proceedings of the 2019 11th international conference on machine learning and computing (pp. 204–210).
[13]
Cheng X., Yan X., Lan Y., Guo J., Btm: Topic modeling over short texts, IEEE Transactions on Knowledge and Data Engineering 26 (12) (2014) 2928–2941.
[14]
Cocos A., Fiks A.G., Masino A.J., Deep learning for pharmacovigilance: recurrent neural network architectures for labeling adverse drug reactions in Twitter posts, Journal of the American Medical Informatics Association 24 (4) (2017) 813–821.
[15]
Cody E.M., Reagan A.J., Mitchell L., Dodds P.S., Danforth C.M., Climate change sentiment on Twitter: An unsolicited public opinion poll, PLoS One 10 (8) (2015).
[16]
Crossley, S., Dascalu, M., & McNamara, D. (2017). How important is size? An investigation of corpus size and meaning in both latent semantic analysis and latent Dirichlet allocation. In The thirtieth international flairs conference.
[17]
Dahal B., Kumar S.A., Li Z., Topic modeling and sentiment analysis of global climate change tweets, Social Network Analysis and Mining 9 (1) (2019) 1–20.
[18]
De Smedt T., Daelemans W., Pattern for python, Journal of Machine Learning Research 13 (1) (2012) 2063–2067.
[19]
Devlin J., Chang M.-W., Lee K., Toutanova K., Bert: Pre-training of deep bidirectional transformers for language understanding, 2018, arXiv preprint arXiv:1810.04805.
[20]
Effrosynidis D., Symeonidis S., Arampatzis A., A comparison of pre-processing techniques for twitter sentiment analysis, in: International conference on theory and practice of digital libraries, Springer, 2017, pp. 394–406.
[21]
El Barachi M., AlKhatib M., Mathew S., Oroumchian F., A novel sentiment analysis framework for monitoring the evolving public opinion in real-time: Case study on climate change, Journal of Cleaner Production (2021).
[22]
Fownes J.R., Yu C., Margolin D.B., Twitter and climate change, Sociology Compass 12 (6) (2018).
[23]
Giachanou A., Crestani F., Like it or not: A survey of twitter sentiment analysis methods, ACM Computing Surveys 49 (2) (2016) 1–41.
[24]
Go A., Bhayani R., Huang L., Twitter sentiment classification using distant supervision, CS224N Project Report, Stanford 1 (12) (2009) 2009.
[25]
Graham M., Hale S.A., Gaffney D., Where in the world are you? Geolocation and language identification in Twitter, The Professional Geographer 66 (4) (2014) 568–578.
[26]
Hahnel U.J., Mumenthaler C., Brosch T., Emotional foundations of the public climate change divide, Climatic Change (2019) 1–11.
[27]
Holmberg K., Hellsten I., Gender differences in the climate change communication on Twitter, Internet Research (2015).
[28]
Honnibal M., Montani I., spaCy 2: Natural language understanding with bloom embeddings, convolutional neural networks and incremental parsing, 2017, (in press).
[29]
Hutto, C., & Gilbert, E. (2014). VADER: A Parsimonious Rule-Based Model for Sentiment Analysis of Social Media Text. In Proceedings of the international AAAI conference on web and social media, vol. 8, no. 1.
[30]
Jorgenson A.K., Fiske S., Hubacek K., Li J., McGovern T., Rick T., et al., Social science perspectives on drivers of and responses to global climate change, Wiley Interdisciplinary Reviews: Climate Change 10 (1) (2019).
[31]
Karasakalidis, A., Effrosynidis, D., & Arampatzis, A. (2021). DUTH at SemEval-2021 Task 7: Is Conventional Machine Learning for humorous and offensive tasks enough in 2021?. In Proceedings of the 15th international workshop on semantic evaluation (pp. 1125–1129).
[32]
Kibriya A.M., Frank E., Pfahringer B., Holmes G., Multinomial naive bayes for text categorization revisited, in: Australasian joint conference on artificial intelligence, Springer, 2004, pp. 488–499.
[33]
Kirilenko A.P., Molodtsova T., Stepchenkova S.O., People as sensors: Mass media and local temperature influence climate change discussion on Twitter, Global Environmental Change 30 (2015) 92–100.
[34]
Kirilenko A.P., Stepchenkova S.O., Public microblogging on climate change: One year of Twitter worldwide, Global Environmental Change 26 (2014) 171–182.
[35]
Koenecke A., Feliu-Fabà J., Learning Twitter user sentiments on climate change with limited labeled data, 2019, arXiv preprint arXiv:1904.07342.
[36]
Kryvasheyeu Y., Chen H., Obradovich N., Moro E., Van Hentenryck P., Fowler J., et al., Rapid assessment of disaster damage using social media activity, Science Advances 2 (3) (2016).
[37]
Li C., Duan Y., Wang H., Zhang Z., Sun A., Ma Z., Enhancing topic modeling for short texts with auxiliary word embeddings, ACM Transactions on Information Systems (TOIS) 36 (2) (2017) 1–30.
[38]
Li, C., Wang, H., Zhang, Z., Sun, A., & Ma, Z. (2016). Topic modeling for short texts with auxiliary word embeddings. In Proceedings of the 39th international ACM SIGIR conference on research and development in information retrieval (pp. 165–174).
[39]
Littman J., Wrubel L., Climate change tweets Ids, Harvard Dataverse, 2019.
[40]
Loria S., Textblob documentation, 2018, Release 0.15, 2.
[41]
Loshchilov I., Hutter F., Fixing weight decay regularization in Adam, 2017, CoRR, abs/1711.05101.
[42]
Loureiro M.L., Alló M., Sensing climate change and energy issues: Sentiment and emotion analysis with social media in the UK and Spain, Energy Policy 143 (2020).
[43]
Maas, A. L., Daly, R. E., Pham, P. T., Huang, D., Ng, A. Y., & Potts, C. (2011). Learning word vectors for sentiment analysis. In Proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies.
[44]
Masson-Delmotte V., Zhai P., Pörtner H.-O., Roberts D., Skea J., Shukla P.R., et al., An IPCC Special Report on the impacts of global warming of 1.5°C above pre-industrial levels and related global greenhouse gas emission pathways, in the context of strengthening the global response to the threat of climate change, sustainable development, and efforts to eradicate poverty, World Meteorological Organization, Geneva, Switzerland, 2018, p. 32.
[45]
Nguyen D.Q., Billingsley R., Du L., Johnson M., Improving topic models with latent feature word representations, Transactions of the Association for Computational Linguistics 3 (2015) 299–313.
[46]
Nigam K., McCallum A.K., Thrun S., Mitchell T., Text classification from labeled and unlabeled documents using EM, Machine Learning 39 (2) (2000) 103–134.
[47]
Palani S., Rajagopal P., Pancholi S., T-BERT–Model for sentiment analysis of micro-blogs integrating topic model and BERT, 2021, arXiv preprint arXiv:2106.01097.
[48]
Pedregosa F., Varoquaux G., Gramfort A., Michel V., Thirion B., Grisel O., et al., Scikit-learn: Machine learning in Python, Journal of Machine Learning Research 12 (2011) 2825–2830.
[49]
Philo G., Happer C., Communicating climate change and energy security: new methods in understanding audiences, Routledge, 2013.
[50]
Qiang J., Qian Z., Li Y., Yuan Y., Wu X., Short text topic modeling techniques, applications, and performance: A survey, IEEE Transactions on Knowledge and Data Engineering (2020).
[51]
Quan X., Kit C., Ge Y., Pan S.J., Short and sparse text topic modeling via self-aggregation, in: Twenty-fourth international joint conference on artificial intelligence, 2015.
[52]
Rehurek R., Sojka P., Software framework for topic modelling with large corpora, in: In proceedings of the LREC 2010 workshop on new challenges for NLP frameworks, Citeseer, 2010.
[53]
Rohde R.A., Hausfather Z., The berkeley earth land/ocean temperature record, Earth System Science Data 12 (4) (2020) 3469–3479.
[54]
Samantray A., Pin P., Credibility of climate change denial in social media, Palgrave Communications 5 (1) (2019) 1–8.
[55]
Shukla P., Skea J., Calvo Buendia E., Masson-Delmotte V., Pörtner H., Roberts D., et al., IPCC, 2019: Climate Change and Land: an IPCC special report on climate change, desertification, land degradation, sustainable land management, food security, and greenhouse gas fluxes in terrestrial ecosystems, Intergovernmental Panel on Climate Change (IPCC), 2019.
[56]
Sievert, C., & Shirley, K. (2014). LDAvis: A method for visualizing and interpreting topics. In Proceedings of the workshop on interactive language learning, visualization, and interfaces (pp. 63–70).
[57]
Sisco M.R., Bosetti V., Weber E.U., When do extreme weather events generate attention to climate change?, Climatic Change 143 (1) (2017) 227–241.
[58]
Sit M.A., Koylu C., Demir I., Identifying disaster-related tweets and their semantic, spatial and temporal context using deep learning, natural language processing and spatial analysis: a case study of Hurricane irma, International Journal of Digital Earth (2019).
[59]
Stojanovski, D., Strezoski, G., Madjarov, G., & Dimitrovski, I. (2015). Twitter sentiment analysis using deep convolutional neural network. In Hybrid artificial intelligent systems.
[60]
Sugg J.W., Exploratory geovisualization of the character and distribution of American climate change beliefs, Weather, Climate, and Society 13 (1) (2021) 67–82.
[61]
Symeonidis S., Effrosynidis D., Arampatzis A., A comparative evaluation of pre-processing techniques and their interactions for twitter sentiment analysis, Expert Systems with Applications 110 (2018) 298–310.
[62]
Tekumalla R., Banda J.M., Using weak supervision to generate training datasets from social media data: a proof of concept to identify drug mentions, Neural Computing and Applications (2021) 1–9.
[63]
Vapnik V., The nature of statistical learning theory, Springer science & business media, 1999.
[64]
Williams H.T., McMurray J.R., Kurz T., Lambert F.H., Network analysis reveals open forums and echo chambers in social media discussions of climate change, Global Environmental Change 32 (2015) 126–138.
[65]
Yeo S.K., Handlos Z., Karambelas A., Su L.Y.-F., Rose K.M., Brossard D., et al., The influence of temperature on# ClimateChange and# GlobalWarming discourses on Twitter, Journal of Science Communication 16 (5) (2017) A01.
[66]
Yin, J., & Wang, J. (2014). A dirichlet multinomial mixture model-based approach for short text clustering. In Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 233–242).
[67]
Zimbra D., Abbasi A., Zeng D., Chen H., The state-of-the-art in Twitter sentiment analysis: A review and benchmark evaluation, ACM Transactions on Management Information Systems (TMIS) 9 (2) (2018) 1–29.
[68]
Zuo, Y., Wu, J., Zhang, H., Lin, H., Wang, F., Xu, K., et al. (2016). Topic modeling of short texts: A pseudo-document view. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining (pp. 2105–2114).
[69]
Zuo Y., Zhao J., Xu K., Word network topic model: a simple but general solution for short and imbalanced texts, Knowledge and Information Systems 48 (2) (2016) 379–398.

Cited By

View all
  • (2024)Harnessing Empathy and Ethics for Relevance Detection and Information Categorization in Climate and COVID-19 TweetsProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679937(4091-4095)Online publication date: 21-Oct-2024
  • (2024)Inferring Climate Change Stances from Multimodal TweetsProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657950(2467-2471)Online publication date: 10-Jul-2024
  • (2024)Social media sentiment analysis and opinion mining in public securityJournal of King Saud University - Computer and Information Sciences10.1016/j.jksuci.2023.10177635:9Online publication date: 1-Feb-2024
  • Show More Cited By

Index Terms

  1. The climate change Twitter dataset
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image Expert Systems with Applications: An International Journal
        Expert Systems with Applications: An International Journal  Volume 204, Issue C
        Oct 2022
        1098 pages

        Publisher

        Pergamon Press, Inc.

        United States

        Publication History

        Published: 15 October 2022

        Author Tags

        1. Climate change
        2. Machine learning
        3. Sentiment analysis
        4. Topic modeling
        5. Twitter

        Qualifiers

        • Research-article

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)0
        • Downloads (Last 6 weeks)0
        Reflects downloads up to 15 Feb 2025

        Other Metrics

        Citations

        Cited By

        View all
        • (2024)Harnessing Empathy and Ethics for Relevance Detection and Information Categorization in Climate and COVID-19 TweetsProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679937(4091-4095)Online publication date: 21-Oct-2024
        • (2024)Inferring Climate Change Stances from Multimodal TweetsProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657950(2467-2471)Online publication date: 10-Jul-2024
        • (2024)Social media sentiment analysis and opinion mining in public securityJournal of King Saud University - Computer and Information Sciences10.1016/j.jksuci.2023.10177635:9Online publication date: 1-Feb-2024
        • (2024)Automatic Topic Title Assignment with Word EmbeddingJournal of Classification10.1007/s00357-024-09476-041:3(650-677)Online publication date: 1-Nov-2024
        • (2024)Overview of the CLEF 2024 LongEval Lab on Longitudinal Evaluation of Model PerformanceExperimental IR Meets Multilinguality, Multimodality, and Interaction10.1007/978-3-031-71908-0_10(208-230)Online publication date: 9-Sep-2024
        • (2024)LongEval: Longitudinal Evaluation of Model Performance at CLEF 2024Advances in Information Retrieval10.1007/978-3-031-56072-9_8(60-66)Online publication date: 24-Mar-2024

        View Options

        View options

        Figures

        Tables

        Media

        Share

        Share

        Share this Publication link

        Share on social media