Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3301326.3301373acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicnccConference Proceedingsconference-collections
research-article

A Framework for User Characterization based on Tweets Using Machine Learning Algorithms

Published: 14 December 2018 Publication History

Abstract

Twitter having more than three billion users is one of the most commercial and popular social networking sites. Twitter permits its users to post short messages and update their status. Tweets can be seen instantly by the followers of the users and other people with no twitter accounts. So by far most of the substance posted on the twitter is publicly accessible. Enormous number of political actors used twitter, who are interested in seeking extreme motives like radicalization, mobilization and recruiting activities. Twitters is used by large number of extremist organizations for press releases, public declaration and provide confirmation or motivation of their attacks. There have been several works looking at identifying extremist content based on twitter data but user identification using tweets has not been focused enough because of publication barrier and unavailability of data. In this research, a model is proposed which characterize a user into extremist and non-extremist categories. In this approach, pre-processing is done using natural language processing techniques and feature selection is performed using bag of words model. TF-IDF and word length is applied to obtain vector or feature to measure the significance of obtained vector in the whole document. We performed a methodology using classification through NB (Multinomial naïve Bayes) naïve Bayes on crises related tweets and Kaggle dataset related to tweets published by several Islamic State of Iraq and Sham to validate our proposed model. In this paper, a novel method is discussed for user characterization based on tweets posted by them. Evaluation results show that our suggested method gives best retrieval accuracies for word length feature extraction approach.

References

[1]
Lalrempuii, C. and Mittal, N., 2016, August. Sentiment Classification of Crisis Related Tweets using Segmentation. In Proceedings of the International Conference on Informatics and Analytics (p. 89). ACM.
[2]
Li, C., Sun, A. and Datta, A., 2012, October. Twevent: segment-based event detection from tweets. In Proceedings of the 21st ACM international conference on Information and knowledge management (pp. 155--164). ACM.
[3]
Othman, R., Belkaroui, R. and Faiz, R., 2017. Extracting Product Features for Opinion Mining Using Public Conversations in Twitter. Procedia Computer Science, 112, pp.927--935.
[4]
Belkaroui, R. and Faiz, R., 2015, July. Towards events tweet contextualization using social influence model and users conversations. In Proceedings of the 5th International Conference on Web Intelligence, Mining and Semantics (p. 3). ACM.
[5]
Uysal, I. and Croft, W.B., 2011, October. User oriented tweet ranking: a filtering approach to microblogs. In Proceedings of the 20th ACM international conference on Information and knowledge management (pp. 2261--2264). ACM.
[6]
Sriram, B., Fuhry, D., Demir, E., Ferhatosmanoglu, H. and Demirbas, M., 2010, July. Short text classification in twitter to improve information filtering. In Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval (pp. 841--842). ACM.
[7]
Mendoza, M., Poblete, B. and Castillo, C., 2010, July. Twitter Under Crisis: Can we trust what we RT?. In Proceedings of the first workshop on social media analytics (pp. 71--79). ACM.
[8]
Gao, H., HuJ., Wilson, C., Li, Z., Chen, Y. and Zhao, B.Y., 2010, November. Detecting and characterizing social spam campaigns. In Proceedings of the 10th ACM SIGCOMM conference on Internet measurement (pp. 35--47). ACM
[9]
Castillo, C., Mendoza, M. and Poblete, B., 2011, March. Information credibility on twitter. In Proceedings of the 20th international conference on World wide web (pp. 675--684). ACM.
[10]
Tuna, T., Akbas, E., Aksoy, A., Canbaz, M.A., Karabiyik, U., Gonen, B. and Aygun, R., 2016. User characterization for online social networks. Social Network Analysis and Mining, 6(1), p.104.
[11]
Yang, M.C. and Rim, H.C., 2014. Identifying interesting Twitter contents using topical analysis. Expert Systems with Applications, 41(9), pp.4330--4336.
[12]
Perikos, I. and Hatzilygeroudis, I., 2016. Recognizing emotions in text using ensemble of classifiers. Engineering Applications of Artificial Intelligence, 51, pp.191--201
[13]
Yang, Z., Guo, J., Cai, K., Tang, J., Li, J., Zhang, L. and Su, Z., 2010, October. Understanding retweeting behaviors in social networks. In Proceedings of the 19th ACM international conference on Information and knowledge management (pp. 1633--1636). ACM
[14]
Sboev, A., Litvinova, T., Gudovskikh, D., Rybka, R. and Moloshnikov, I., 2016. Machine Learning Models of Text Categorization by Author Gender Using Topic-independent Features. Procedia Computer Science, 101, pp.135--142.
[15]
Almehmadi, A., Joudaki, Z. and Jalali, R., 2017, October. Language usage on Twitter predicts crime rates. In Proceedings of the 10th International Conference on Security of Information and Networks (pp. 307--310). ACM.
[16]
Li, C., Sun, A., Weng, J. and He, Q., 2015. Tweet segmentation and its application to named entity recognition. IEEE Transactions on knowledge and data engineering, 27(2), pp.558--570.
[17]
Liu, X., Tang, K., Hancock, J., Han, J., Song, M., Xu, R. and Pokorny, B., 2013, April. A text cube approach to human, social and cultural behavior in the twitter stream. In International Conference on Social Computing, Behavioral-Cultural Modeling, and Prediction (pp. 321--330). Springer, Berlin, Heidelberg.
[18]
Singhal, K., Agrawal, B. and Mittal, N., 2015. Modeling Indian general elections: sentiment analysis of political Twitter data. In Information Systems Design and Intelligent Applications (pp. 469--477). Springer, New Delhi.
[19]
Elghazaly, T., Mahmoud, A. and Hefny, H.A., 2016, March. Political sentiment analysis using twitter data. In Proceedings of the International Conference on Internet of things and Cloud Computing (p. 11). ACM.
[20]
Olorunnimbe, M.K. and Viktor, H.L., 2015, October. Tweets as a vote: Exploring political sentiments on twitter for opinion mining. In International Symposium on Methodologies for Intelligent Systems (pp. 180--185). Springer, Cham.
[21]
Karami, A., Dahl, A.A., Turner-McGrievy, G., Kharrazi, H. and Shaw, G., 2018. Characterizing diabetes, diet, exercise, and obesity comments on Twitter. International Journal of Information Management, 38(1), pp.1--6.
[22]
Fernandez, M., Asif, M. and Alani, H., 2018. Understanding the Roots of Radicalisation on Twitter.
[23]
Qiu, Z. and Shen, H., 2017. User clustering in a dynamic social network topic model for short text streams. Information Sciences, 414, pp.102--116.
[24]
Moghaddam, F.M., 2005. The staircase to terrorism: A psychological exploration. American psychologist, 60(2), p.161
[25]
Schmid, A.P., 2013. Radicalisation, de-radicalisation, counter-radicalisation: A conceptual discussion and literature review. ICCT Research Paper, 97(1), p.22.
[26]
Silber, M.D., Bhatt, A. and Analysts, S.I., 2007. Radicalization in the West: The homegrown threat (pp. 1--90). New York: Police Department.
[27]
Ferrara, E., Wang, W.Q., Varol, O., Flammini, A. and Galstyan, A., 2016, November. Predicting online extremism, content adopters, and interaction reciprocity. In International conference on social informatics (pp. 22--39). Springer, Cham.
[28]
Agarwal, S. and Sureka, A., 2015, February. Using knn and svm based one-class classifier for detecting online radicalization on twitter. In International Conference on Distributed Computing and Internet Technology (pp. 431--442). Springer, Cham.
[29]
Lara-Cabrera, R., Gonzalez-Pardo, A. and Camacho, D., 2017. Statistical analysis of risk assessment factors and metrics to evaluate radicalisation in Twitter. Future Generation Computer Systems.
[30]
Chatfield, A.T., Reddick, C.G. and Brajawidagda, U., 2015, May. Tweeting propaganda, radicalization and recruitment: Islamic state supporters multi-sided twitter networks. In Proceedings of the 16th Annual International Conference on Digital Government Research (pp. 239--249). ACM.
[31]
Olteanu, A., Castillo, C., Diaz, F. and Vieweg, S., 2014, June. CrisisLex: A Lexicon for Collecting and Filtering Microblogged Communications in Crises. In ICWSM.
[32]
Lara-Cabrera, R., Pardo, A.G., Benouaret, K., Faci, N., Benslimane, D. and Camacho, D., 2017. Measuring the radicalisation risk in social networks. IEEE Access, 5, pp.10892--10900.

Cited By

View all
  • (2022)A Large-Scale Dataset of Twitter Chatter about Online Learning during the Current COVID-19 Omicron WaveData10.3390/data70801097:8(109)Online publication date: 4-Aug-2022
  • (2022)Twitter Big Data as a Resource for Exoskeleton Research: A Large-Scale Dataset of about 140,000 Tweets from 2017–2022 and 100 Research QuestionsAnalytics10.3390/analytics10200071:2(72-97)Online publication date: 23-Sep-2022
  • (2022)A Framework for Cybercrime Prediction on Twitter Tweets Using Text-Based Machine Learning Algorithm2022 5th International Conference on Pattern Recognition and Artificial Intelligence (PRAI)10.1109/PRAI55851.2022.9904212(235-240)Online publication date: 19-Aug-2022
  • Show More Cited By

Index Terms

  1. A Framework for User Characterization based on Tweets Using Machine Learning Algorithms

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    ICNCC '18: Proceedings of the 2018 VII International Conference on Network, Communication and Computing
    December 2018
    372 pages
    ISBN:9781450365536
    DOI:10.1145/3301326
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 14 December 2018

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Feature Selection
    2. Machine Learning Algorithms
    3. Natural Language Processing
    4. Radicalization
    5. Supervised Learning
    6. Text Mining
    7. Twitter
    8. User Characterization

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    ICNCC 2018

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)11
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 07 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2022)A Large-Scale Dataset of Twitter Chatter about Online Learning during the Current COVID-19 Omicron WaveData10.3390/data70801097:8(109)Online publication date: 4-Aug-2022
    • (2022)Twitter Big Data as a Resource for Exoskeleton Research: A Large-Scale Dataset of about 140,000 Tweets from 2017–2022 and 100 Research QuestionsAnalytics10.3390/analytics10200071:2(72-97)Online publication date: 23-Sep-2022
    • (2022)A Framework for Cybercrime Prediction on Twitter Tweets Using Text-Based Machine Learning Algorithm2022 5th International Conference on Pattern Recognition and Artificial Intelligence (PRAI)10.1109/PRAI55851.2022.9904212(235-240)Online publication date: 19-Aug-2022
    • (2022)Full/Regular Research Paper submission to (CSCI-RTCW): Multi Class Classification of Online Radicalization Using Transformer Models2022 International Conference on Computational Science and Computational Intelligence (CSCI)10.1109/CSCI58124.2022.00183(1034-1038)Online publication date: Dec-2022
    • (2022)RadScore: An Automated Technique to Measure Radicalness Score of Online Social Media UsersCybernetics and Systems10.1080/01969722.2022.205913454:4(406-431)Online publication date: 12-Apr-2022
    • (2022)A survey on extremism analysis using natural language processing: definitions, literature review, trends and challengesJournal of Ambient Intelligence and Humanized Computing10.1007/s12652-021-03658-z14:8(9869-9905)Online publication date: 12-Jan-2022

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media