Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

A visual approach for age and gender identification on Twitter

Published: 01 January 2018 Publication History

Abstract

The goal of Author Profiling (AP) is to identify demographic aspects (e.g., age, gender) from a given set of authors by analyzing their written texts. Recently, the AP task has gained interest in many problems related to computer forensics, psychology, marketing, but specially in those related with social media exploitation. As known, social media data is shared through a wide range of modalities (e.g., text, images and audio), representing valuable information to be exploited for extracting valuable insights from users. Nevertheless, most of the current work in AP using social media data has been devoted to analyze textual information only, and there are very few works that have started exploring the gender identification using visual information. Contrastingly, this paper focuses in exploiting the visual modality to perform both age and gender identification in social media, specifically in Twitter. Our goal is to evaluate the pertinence of using visual information in solving the AP task. Accordingly, we have extended the Twitter corpus from PAN 2014, incorporating posted images from all the users, making a distinction between tweeted and retweeted images. Performed experiments provide interesting evidence on the usefulness of visual information in comparison with traditional textual representations for the AP task.

References

[1]
Argamon S., Koppel M., Fine J. and Shimoni A.R., Gender, genre, and writing style in formal written texts, Text 23 (3) (2003), 321–346.
[2]
Argamon S., Koppel M., Pennebaker J.W. and Schler J., Mining the blogosphere: Age, gender and the varieties of selfexpression, First Monday 12 (9) (2007).
[3]
Argamon S., Koppel M., Pennebaker J.W. and Schler J., Automatically profiling the author of an anonymous text, Communications of the ACM 52 (2) (2009), 119–123.
[4]
Azam S. and Gavrilova M., Gender prediction using individual perceptual image aesthetics, Journal of WSCG 24 (2) (2016), 53–62.
[5]
Bergsma S., Post M. and Yarowsky D., Stylometric analysis of scientific articles. In Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies 2012, pp. 327–337.
[6]
Burger J.D., Henderson J., Kim G. and Zarrella G., Discriminating gender on twitter. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2011, pp. 1301–1309.
[7]
Can E.F., Oktay H. and Manmatha. R., Predicting retweet count using visual cues. In Proceedings of the 22Nd ACM International Conference on Information & Knowledge Management, CIKM ’13 2013, pp. 1481–1484.
[8]
Cheng N., Chandramouli R. and Subbalakshmi K., Author gender identification from text, Digital Investigation 8 (1) (2011), 78–88.
[9]
Cristani M., Vinciarelli A., Segalin C. and Perina A., Unveiling the multimedia unconscious: Implicit cognitive processes and multimedia content analysis. In Proceedings of the 21st ACM International Conference on Multimedia, MM’13, ACM, 2013, pp. 213–222.
[10]
Eftekhar A., Fullwood C. and Morris N., Capturing personality from facebook photos and photo-related activities, Comput Hum Behav 37 (C) (2014), 162–170.
[11]
Fan R.-E., Chang K.-W., Hsieh C.-J., Wang X.-R. and Lin C.-J., Liblinear: A library for large linear classification, J Mach Learn Res 9 (2008), 1871–1874.
[12]
Goswami S., Sarkar S. and Rustagi M., Stylometric analysis of bloggers age and gender, In Third International AAAI Conference on Weblogs and Social Media, 2009.
[13]
Herring S.C. and Paolillo J.C., Gender and genre variation in weblogs, Journal of Sociolinguistics 10(4) (2006), 439–459.
[14]
Hum N.J., Chamberlin P.E., Hambright B.L., Portwood A.C., Schat A.C. and Bevan J.L., A picture is worth a thousand words: A content analysis of facebook profile photographs, Computers in Human Behavior 27 (5) (2009), 1828–1833. 2009 Fifth International Conference on Intelligent Computing.
[15]
Jia Y., Shelhamer E., Donahue J., Karayev S., Long J., Girshick R., Guadarrama S. and Darrell T., Caffe: Convolutional architecture for fast feature embedding. arXiv preprint arXiv: 1408.5093, 2014.
[16]
Koppel M., Argamon S. and Shimoni A.R., Automatically categorizing written texts by author gender, Literary and Linguistic Computing 17 (4) (2002), 401–412.
[17]
Krizhevsky A., Sutskever I. and Hinton G.E., Imagenet classification with deep convolutional neural networks. In Pereira F., Burges C.J.C., Bottou L. and Weinberger K.Q., editors, Advances in Neural Information Processing Systems 25, Curran Associates, Inc., 2012, pp. 1097–1105.
[18]
Lecun Y., Bengio Y. and Hinton G., Deep learning, Nature 521 (7553) (2015), 436–444.
[19]
Li C., Cheung W.K., Ye Y., Zhang X., Chu D. and Li X., The author-topic-community model for author interest profiling and community discovery, Knowledge and Information Systems 44(2) (2015), 359–383.
[20]
Litvinova T., Zagorovskaya O., Litvinova O., Seredin P. Profiling a Set of Personality Traits of a Text’s Author: A Corpus-Based Approach, Springer International Publishing, Cham, 2016, pp. 555–562.
[21]
Litvinova T.A., Seredin P.V. and Litvinova O.A., Using partof-speech sequences frequencies in a text to predict author personality: A corpus study, Indian Journal of Science and Technology 8 (S9) (2015).
[22]
López-Monroy A.P., Montes-y Gómez M., Escalante H.J., Villaseñor Pineda L. and Stamatatos E., Discriminative subprofile-specific representations for author profiling in social media, Know-Based Syst 89(C) (2015)134–147.
[23]
Lovato P., Bicego M., Segalin C., Perina A., Sebe N. and Cristani M., Faved! biometrics: Tell me which image you like and i’ll tell you who you are, IEEE Transactions on Information Forensics and Security 9 (3) (2014), 364–374.
[24]
Ma X., Tsuboshita Y. and Kato N., Gender estimation for sns user profiling using automatic image annotation. In 2014 IEEE International Conference on Multimedia and Expo Workshops (ICMEW) (2014), pp. 1–6.
[25]
Merler M., Cao L. and Smith J.R., You are what you tweet...pic! gender prediction based on semantic analysis of social media images. In 2015 IEEE International Conference on Multimedia and Expo (ICME) (2015), pp. 1–6.
[26]
Mukherjee A. and Liu B., Improving gender classification of blog authors. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing (2010), pp. 207–217.
[27]
Nguyen D., Gravel R., Trieschnigg D. and Meder T., How old do you think i am?: A study of language and age in twitter. In Seventh International AAAI Conference on Weblogs and Social Media, 2013.
[28]
Nguyen D., Smith N.A. and Rosé C.P., Author age prediction from text using linear regression. In Association for Computational Linguistics, Proceedings of the 5th ACL-HLT Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities, 2011, pp. 115–123.
[29]
Oquab M., Bottou L., Laptev I. and Sivic J., Learning and transferring mid-level image representations using convolutional neural networks. In 2014 IEEE Conference on Computer Vision and Pattern Recognition (2014), 1717–1724.
[30]
Ortega-Mendoza R.M., Franco-Arcega A., López-Monroy A.P., Montes-y M., I, Me, Mine: The Role of Personal Phrases in Author Profiling, Gómez Springer International Publishing, Cham, 2016, pp. 110–122.
[31]
Otterbacher J., Inferring gender of movie reviewers: Exploiting writing style, content and metadata. In Proceedings of the 19th ACM International Conference on Information and Knowledge Management, 2010, pp. 369–378.
[32]
Peersman C., Daelemans W. and Van L., Vaerenbergh, Predicting age and gender in online social networks. In Proceedings of the 3rd International Workshop on Search and Mining User-Generated Contents, 2011, pp. 37–44.
[33]
Peñas P., del Hoyo R., Vea-Murguía J., González C. and Mayo S., Collective knowledge ontology user profiling for twitter – automatic user profiling. In 2013 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT), volume 1, 2013, pp. 439–444.
[34]
Rangel F., Rosso P., Chugur I., Potthast M., Trenkmann M., Stein B., Verhoeven B. and Daelemans W., Overview of the author profiling task at PAN 2014 In CLEF (Online Working Notes/Labs/Workshop), 2014, pp. 898–927.
[35]
Rao D., Yarowsky D., Shreevats A. and Gupta M., Classifying latent user attributes in twitter. In Proceedings of the 2nd International Workshop on Search and Mining User-Generated Contents 2010, pp. 37–44.
[36]
Rosso P., Bosco C., Damiano R., Patti V. and Cambria E., Emotion and sentiment in social and expressive media: Introduction to the special issue, Information Processing & Management 52 (1) (2016), 1–4.
[37]
Russakovsky O., Deng J., Su H., Krause J., Satheesh S., Ma S., Huang Z., Karpathy A., Khosla A., Bernstein M., Berg A.C. and Fei-Fei L., Imagenet large scale visual recognition challenge, Int J Comput Vision 115 (3) (2015), 211–252.
[38]
Sarawgi R., Gajulapalli K. and Choi Y., Gender attribution: Tracing stylometric evidence beyond topic and genre. In Proceedings of the Fifteenth Conference on Computational Natural Language Learning, 2011, pp. 78–86.
[39]
Schler J., Koppel M., Argamon S. and Pennebaker J., Effects of age and gender on blogging. In Proceedings of 2006 AAAI Spring Symposium on Computational Approaches for Analyzing Weblogs, 2006, pp. 199–205.
[40]
Shigenaka R., Tsuboshita Y. and Kato N., Content-aware multi-task neural networks for user gender inference based on social media images. In 2016 IEEE International Symposium on Multimedia (ISM), 2016, pp. 169–172.
[41]
Simonyan K. and Zisserman A., Very deep convolutional networks for large-scale image recognition. CoRR, abs/1409.1556, 2014.
[42]
Sorokin A. and Forsyth D., Utility data annotation with amazon mechanical turk. In 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops 2008, pp. 1–8.
[43]
Taniguchi T., Sakaki S., Shigenaka R., Tsuboshita Y. and Ohkuma T., A Weighted Combination of Text and Image Classifiers for User Gender Inference, Association for Computational Linguistics, 2015, pp. 87–93.
[44]
Wu Y.-C.J., Chang W.-H. and Yuan C.-H., Do facebook profile pictures reflect user’s personality? Comput Hum Behav 51 (PB) (2015), 880–889.
[45]
Yan X. and Yan L., Gender classification of weblog authors. In AAAI Spring Symposium: Computational Approaches to Analyzing Weblogs 2006, pp. 228–230.
[46]
Yang L., Hsieh C. and Estrin D., Beyond classification: Latent user interests profiling from visual contents analysis. CoRR, abs/1512.06785, 2015.
[47]
Yosinski J., Clune J., Bengio Y. and Lipson H., How transferable are features in deep neural networks? In Proceedings of the 27th International Conference on Neural Information Processing Systems, NIPS’14, 2014, pp. 3320–3328.
[48]
Yosinski J., Clune J., Bengio Y. and Lipson H., How transferable are features in deep neural networks? In Advances in Neural Information Processing Systems, 2014.
[49]
You Q., Bhatia S. and Luo J., A picture tells a thousand words - about you! user interest profiling from user generated visual content, Signal Processing 124 (2016), 45–53. Big Data Meets Multimedia Analytics.
[50]
You Q., Bhatia S., Sun T. and Luo J., The eyes of the beholder: Gender prediction using images posted in online social networks. In 2014 IEEE International Conference on Data Mining Workshop 2014, pp. 1026–1030.
[51]
You Q. and Luo J., Towards social imagematics: Sentiment analysis in social multimedia. In Proceedings of the Thirteenth International Workshop on Multimedia Data Mining (MDMKDD) 2013, 2013, pp. 3:1–3:8.

Cited By

View all
  • (2023)A survey of machine learning-based author profiling from texts analysis in social networksMultimedia Tools and Applications10.1007/s11042-023-14711-882:24(36653-36686)Online publication date: 1-Oct-2023

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Journal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology
Journal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology  Volume 34, Issue 5
Intelligent and Fuzzy Systems applied to Language & Knowledge Engineering
2018
528 pages

Publisher

IOS Press

Netherlands

Publication History

Published: 01 January 2018

Author Tags

  1. Visual author profiling
  2. age identification
  3. gender identification
  4. social media
  5. Twitter
  6. CNN representation

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 26 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2023)A survey of machine learning-based author profiling from texts analysis in social networksMultimedia Tools and Applications10.1007/s11042-023-14711-882:24(36653-36686)Online publication date: 1-Oct-2023

View Options

View options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media