Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Modeling Topics and Behavior of Microbloggers: An Integrated Approach

Published: 20 April 2017 Publication History

Abstract

Microblogging encompasses both user-generated content and behavior. When modeling microblogging data, one has to consider personal and background topics, as well as how these topics generate the observed content and behavior. In this article, we propose the Generalized Behavior-Topic (GBT) model for simultaneously modeling background topics and users’ topical interest in microblogging data. GBT considers multiple topical communities (or realms) with different background topical interests while learning the personal topics of each user and the user’s dependence on realms to generate both content and behavior. This differentiates GBT from other previous works that consider either one realm only or content data only. By associating user behavior with the latent background and personal topics, GBT helps to model user behavior by the two types of topics. GBT also distinguishes itself from other earlier works by modeling multiple types of behavior together. Our experiments on two Twitter datasets show that GBT can effectively mine the representative topics for each realm. We also demonstrate that GBT significantly outperforms other state-of-the-art models in modeling content topics and user profiling.

References

[1]
Edoardo M. Airoldi, David M. Blei, Stephen E. Fienberg, and Eric P. Xing. 2008. Mixed membership stochastic blockmodels. Journal of Machine Learning Research 9, 1981--2014.
[2]
Ramnath Balasubramanyan and William W. Cohen. 2013. Regularization of latent variable models to obtain sparsity. In SDM.
[3]
Ramnath Balasubramanyan, Bhavana Bharat Dalvi, and William W. Cohen. 2013. From topic models to semi-supervised learning: Biasing mixed-membership models to exploit topic-indicative features in entity clustering. In ECML/PKDD.
[4]
Nicola Barbieri, Francesco Bonchi, and Giuseppe Manco. 2014. Who to follow and why: Link prediction with explanations. In KDD.
[5]
David M. Blei, Andrew Y. Ng, and Michael I. Jordan. 2003. Latent Dirichlet allocation. Journal of Machine Learning Research 3, 993--1022.
[6]
Antoine Boutet, Hyoungshick Kim, and Eiko Yoneki. 2012. What’s in your tweets? I know who you supported in the UK 2010 general election. In ICWSM.
[7]
Kailong Chen, Tianqi Chen, Guoqing Zheng, Ou Jin, Enpeng Yao, and Yong Yu. 2012. Collaborative personalized tweet recommendation. In SIGIR.
[8]
Xueqi Cheng, Xiaohui Yan, Yanyan Lan, and Jiafeng Guo. 2014. BTM: Topic modeling over short texts. IEEE Transactions on Knowledge and Data Engineering 26, 12.
[9]
Michael Conover, Jacob Ratkiewicz, Matthew Francisco, Bruno Gonçalves, Alessandro Flammini, and Filippo Menczer. 2011. Political polarization on twitter. In ICWSM.
[10]
Peng Cui, Fei Wang, Shaowei Liu, Mingdong Ou, Shiqiang Yang, and Lifeng Sun. 2011. Who should share what? Item-level social influence prediction for users and posts ranking. In SIGIR.
[11]
Onkar Dabeer, Prachi Mehendale, Aditya Karnik, and Atul Saroop. 2011. Timing tweets to increase effectiveness of information campaigns. In ICWSM.
[12]
Gianmarco De Francisci Morales, Aristides Gionis, and Claudio Lucchese. 2012. From chatter to headlines: Harnessing the real-time web for personalized news recommendation. In WSDM.
[13]
Qiming Diao and Jing Jiang. 2013. A unified model for topics, events and users on twitter. In EMNLP.
[14]
Qiming Diao, Jing Jiang, Feida Zhu, and Ee-Peng Lim. 2012. Finding bursty topics from microblogs. In ACL.
[15]
Ying Ding. 2011. Community detection: Topological vs. topical. Journal of Informetrics 5, 4, 498--514.
[16]
Elena Erosheva, Stephen Fienberg, and John Lafferty. 2004. Mixed-membership models of scientific publications. PNAS.
[17]
Albert Feller, Matthias Kuhnert, Timm Oliver Sprenger, and Isabell M. Welpe. 2011. Divided they tweet: The network structure of political microbloggers and discussion topics. In ICWSM.
[18]
Wei Gao, Peng Li, and Kareem Darwish. 2012. Joint topic modeling for event summarization across news and social media streams. In CIKM.
[19]
Przemyslaw A. Grabowicz, Luca Maria Aiello, Victor M. Eguiluz, and Alejandro Jaimes. 2013. Distinguishing topical and social groups based on common identity and bond theory. In WSDM.
[20]
John Hannon, Mike Bennett, and Barry Smyth. 2010. Recommending Twitter users to follow using content and collaborative filtering approaches. In RecSys.
[21]
Gregor Heinrich. 2009. Parameter Estimation for Text Analysis. Technical Report.
[22]
Qirong Ho, James Cipar, Henggang Cui, Seunghak Lee, Jin Kyu Kim, Phillip B. Gibbons, Garth A. Gibson, Greg Ganger, and Eric Xing. 2013. More effective distributed ml via a stale synchronous parallel parameter server. In NIPS.
[23]
Tuan-Anh Hoang, William W. Cohen, and Ee-Peng Lim. 2014. On modeling community behaviors and sentiments in microblogging. In SDM.
[24]
Liangjie Hong and Brian D. Davison. 2010. Empirical study of topic modeling in Twitter. In SOMA.
[25]
Liangjie Hong, Byron Dom, Siva Gurumurthy, and Kostas Tsioutsiouliklis. 2011. A time-dependent topic model for multiple text streams. In KDD.
[26]
Yuheng Hu, Ajita John, Fei Wang, Doree Duncan Seligmann, and Subbarao Kambhampati. 2012. ET-LDA: Joint topic modeling for aligning, analyzing and sensemaking of public events and their Twitter feeds. In AAAI.
[27]
Akshay Java, Xiaodan Song, Tim Finin, and Belle Tseng. 2007. Why we twitter: Understanding microblogging usage and communities. In WebKDD/SNA-KDD’07.
[28]
Appelo Jurgen. 2009. Twitter top 100 for software Developers. Retrieved December 5, 2016 from http://www.noop.nl/2009/02/twitter-top-100-for-software-developers.html.
[29]
Farshad Kooti, Haeryun Yang, Meeyoung Cha, P. Krishna Gummadi, and Winter A. Mason. 2012. The emergence of conventions in online social networks. In ICWSM.
[30]
Haewoon Kwak, Hyunwoo Chun, and Sue Moon. 2011. Fragile online relationship: A first look at unfollow dynamics in Twitter. In CHI.
[31]
Haewoon Kwak, Changhyun Lee, Hosung Park, and Sue Moon. 2010. What is Twitter, a social network or a news media? In WWW.
[32]
Daifeng Li, Bing He, Ying Ding, Jie Tang, Cassidy Sugimoto, Zheng Qin, Erjia Yan, Juanzi Li, and Tianxi Dong. 2010. Community-based topic modeling for social tagging. In CIKM.
[33]
Kwan Hui Lim and Amitava Datta. 2012. Following the follower: Detecting communities with common interests on Twitter. In HT.
[34]
Kar Wai Lim and Wray Buntine. 2014. Twitter opinion topic model: Extracting product opinions from tweets by leveraging hashtags and sentiment lexicon. In CIKM.
[35]
Tianyi Lin, Wentao Tian, Qiaozhu Mei, and Hong Cheng. 2014. The dual-sparse topic model: Mining focused topics and focused terms in short text. In WWW.
[36]
Jun S. Liu. 1994. The collapsed Gibbs sampler in Bayesian computations with applications to a gene regulation problem. Journal of the American Statistical Association 89, 427, 958--966.
[37]
Yan Liu, Alexandru Niculescu-Mizil, and Wojciech Gryc. 2009. Topic-link LDA: Joint models of topic and author community. In ICML.
[38]
Zhunchen Luo, Miles Osborne, Jintao Tang, and Ting Wang. 2013. Who will retweet me? Finding retweeters in Twitter. In SIGIR2013.
[39]
Zhiqiang Ma, Wenwen Dou, Xiaoyu Wang, and Srinivas Akella. 2013. Tag-latent Dirichlet allocation: Understanding hashtags and their relationships. In WI/IAT 2013.
[40]
Zongyang Ma, Aixin Sun, Quan Yuan, and Gao Cong. 2015. A tri-role topic model for domain-specific question answering. In AAAI. 224--230.
[41]
Andrew McCallum, Andres Corrada-Emmanuel, and Xuerui Wang. 2005. Topic and role discovery in social networks. Computer Science Department Faculty Publication Series 3.
[42]
Rishabh Mehrotra, Scott Sanner, Wray Buntine, and Lexing Xie. 2013. Improving LDA topic models for microblogs via tweet pooling and automatic labeling. In SIGIR.
[43]
Matthew Michelson and Sofus A. Macskassy. 2010. Discovering users’ topics of interest on Twitter: A first look. In AND.
[44]
Ramesh M. Nallapati, Amr Ahmed, Eric P. Xing, and William W. Cohen. 2008. Joint latent topic models for text and citations. In KDD.
[45]
David Newman, Arthur Asuncion, Padhraic Smyth, and Max Welling. 2009. Distributed algorithms for topic models. Journal of Machine Learning Research 10, 1801--1828.
[46]
M. E. J. Newman. 2006. Modularity and community structure in networks. Proceedings of the National Academy of Sciences 103, 23, 8577--8582.
[47]
Ye Pan, Feng Cong, Kailong Chen, and Yong Yu. 2013. Diffusion-aware personalized social update recommendation. In RecSys.
[48]
Marco Pennacchiotti and Ana-Maria Popescu. 2011. Democrats, Republicans and Starbucks afficionados: User classification in Twitter. In KDD.
[49]
Deborah A. Prentice, Dale T. Miller, and Jenifer R. Lightdale. 1994. Asymmetries in attachments to groups and to their members: Distinguishing between common-identity and common-bond groups. Key Readings in Social Psychology. Psychology Press, 83.
[50]
Minghui Qiu, Jing Jiang, and Feida Zhu. 2013. It is not just what we say, but how we say them: LDA-based behavior-topic model. In SDM.
[51]
Daniel Ramage, Susan T. Dumais, and Daniel J. Liebling. 2010. Characterizing microblogs with topic models. In ICWSM.
[52]
Daniel Ramage, David Hall, Ramesh Nallapati, and Christopher D. Manning. 2009. Labeled LDA: A supervised topic model for credit attribution in multi-labeled corpora. In ECML.
[53]
Michal Rosen-Zvi, Thomas Griffiths, Mark Steyvers, and Padhraic Smyth. 2004. The author-topic model for authors and documents. In UAI.
[54]
Mrinmaya Sachan, Danish Contractor, Tanveer A. Faruquie, and L. Venkata Subramaniam. 2012. Using content and interactions for discovering communities in social networks. In WWW.
[55]
Mrinmaya Sachan, Avinava Dubey, Shashank Srivastava, Eric P. Xing, and Eduard Hovy. 2014. Spatial compactness meets topical consistency: Jointly modeling links and content for community detection. In WSDM.
[56]
Takeshi Sakaki, Makoto Okazaki, and Yutaka Matsuo. 2010. Earthquake shakes Twitter users: Real-time event detection by social sensors. In WWW.
[57]
Bongwon Suh, Lichan Hong, Peter Pirolli, and Ed H. Chi. 2010. Want to be retweeted? Large scale analytics on factors impacting retweet in Twitter network. In SocialCom.
[58]
Chenhao Tan, Lillian Lee, and Bo Pang. 2014a. The effect of wording on message propagation: Topic- and author-controlled natural experiments on Twitter. In ACL.
[59]
Shulong Tan, Yang Li, Huan Sun, Ziyu Guan, Xifeng Yan, Jiajun Bu, Chun Chen, and Xiaofei He. 2014b. Interpreting the public sentiment variations on Twitter. TKDE (2014).
[60]
Jan Vosecky, Di Jiang, Kenneth Wai-Ting Leung, Kai Xing, and Wilfred Ng. 2014. Integrating social and auxiliary semantics for multifaceted topic modeling in Twitter. ACM Transactions on Internet Technology 14, 4, 27.
[61]
Jinpeng Wang, Wayne Xin Zhao, Yulan He, and Xiaoming Li. 2014. Infer user interests via link structure regularization. ACM Transactions on Intelligent Systems and Technology 5, 2.
[62]
Michael J. Welch, Uri Schonfeld, Dan He, and Junghoo Cho. 2011. Topical semantics of Twitter links. In WSDM.
[63]
Shaomei Wu, Jake M. Hofman, Winter A. Mason, and Duncan J. Watts. 2011. Who says what to whom on Twitter. In WWW.
[64]
Pengtao Xie and Eric P. Xing. 2013. Integrating document clustering and topic modeling. In UAI.
[65]
Rui Yan, Mirella Lapata, and Xiaoming Li. 2012. Tweet recommendation with graph co-ranking. In ACL.
[66]
Xiaohui Yan, Jiafeng Guo, Yanyan Lan, and Xueqi Cheng. 2013. A biterm topic model for short texts. In WWW.
[67]
Jiang Yang and Scott Counts. 2010. Predicting the speed, scale, and range of information diffusion in Twitter. In ICWSM.
[68]
Jaewon Yang and Jure Leskovec. 2012. Community-affiliation graph model for overlapping network community detection. In ICDM.
[69]
Jaewon Yang, Julian McAuley, and Jure Leskovec. 2013. Community detection in networks with node attributes. In ICDM.
[70]
Jaewon Yang, Julian McAuley, and Jure Leskovec. 2014. Detecting cohesive and 2-mode communities indirected and undirected networks. In WSDM.
[71]
Lei Yang, Tao Sun, Ming Zhang, and Qiaozhu Mei. 2012. We know what@ you# tag: Does the dual role affect hashtag adoption? In WWW.
[72]
Shuang-Hong Yang, Alek Kolcz, Andy Schlaikjer, and Pankaj Gupta. 2014. Large-scale high-precision topic modeling on Twitter. In KDD.
[73]
Tae Yano, William W. Cohen, and Noah A. Smith. 2009. Predicting response to political blog posts with topic models. In NAACL.
[74]
Dawei Yin, Liangjie Hong, and Brian D. Davison. 2011. Structural link analysis and prediction in microblogs. In CIKM.
[75]
Hongzhi Yin, Bin Cui, Hua Lu, Yuxin Huang, and Junjie Yao. 2013. A unified model for stable and temporal topic detection from social media data. In ICDE.
[76]
Zhijun Yin, Liangliang Cao, Quanquan Gu, and Jiawei Han. 2012. Latent community topic analysis: Integration of community discovery with topic modeling. ACM Transactions on Intelligent Systems and Technology 3, 4, 63.
[77]
Michele Zappavigna. 2011. Ambient affiliation: A linguistic perspective on Twitter. New Media 8 Society 13, 5.
[78]
Dejin Zhao and Mary Beth Rosson. 2009. How and why people twitter: The role that micro-blogging plays in informal communication at work. In GROUP’09.
[79]
Wayne Xin Zhao, Jing Jiang, Jianshu Weng, Jing He, Ee-Peng Lim, Hongfei Yan, and Xiaoming Li. 2011. Comparing Twitter and traditional media using topic models. In ECIR.
[80]
Ding Zhou, Eren Manavoglu, Jia Li, C. Lee Giles, and Hongyuan Zha. 2006. Probabilistic models for discovering e-communities. In WWW.

Cited By

View all
  • (2021)Analysis of Simple K-Mean and Parallel K-Mean Clustering for Software Products and Organizational Performance Using Education Sector DatasetScientific Programming10.1155/2021/99883182021Online publication date: 1-Jan-2021
  • (2020)An User Intention Mining Model based on Fractal Time Series PatternFractals10.1142/S0218348X20400174Online publication date: 6-May-2020
  • (2020)Video Stream Distribution Scheme Based on Edge Computing Network and User Interest Content ModelIEEE Access10.1109/ACCESS.2020.29710258(30734-30744)Online publication date: 2020
  • Show More Cited By

Index Terms

  1. Modeling Topics and Behavior of Microbloggers: An Integrated Approach

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Intelligent Systems and Technology
    ACM Transactions on Intelligent Systems and Technology  Volume 8, Issue 3
    Special Issue: Mobile Social Multimedia Analytics in the Big Data Era and Regular Papers
    May 2017
    320 pages
    ISSN:2157-6904
    EISSN:2157-6912
    DOI:10.1145/3040485
    • Editor:
    • Yu Zheng
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 20 April 2017
    Accepted: 01 August 2016
    Revised: 01 September 2015
    Received: 01 May 2015
    Published in TIST Volume 8, Issue 3

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Social media
    2. behavior mining
    3. microblogging
    4. probabilistic graphic model
    5. topic modeling
    6. user behavior

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Funding Sources

    • IDM Programme Office
    • Singapore National Research Foundation
    • International Research Centre @ Singapore Funding Initiative and administered
    • Living Analytics Research Centre, Singapore Management University

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)10
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 02 Sep 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2021)Analysis of Simple K-Mean and Parallel K-Mean Clustering for Software Products and Organizational Performance Using Education Sector DatasetScientific Programming10.1155/2021/99883182021Online publication date: 1-Jan-2021
    • (2020)An User Intention Mining Model based on Fractal Time Series PatternFractals10.1142/S0218348X20400174Online publication date: 6-May-2020
    • (2020)Video Stream Distribution Scheme Based on Edge Computing Network and User Interest Content ModelIEEE Access10.1109/ACCESS.2020.29710258(30734-30744)Online publication date: 2020
    • (2020)Learning Deep Topics of InterestNew Trends in Computational Vision and Bio-inspired Computing10.1007/978-3-030-41862-5_156(1517-1532)Online publication date: 2020
    • (2018)Using Stigmergy to Distinguish Event-Specific Topics in Social DiscussionsSensors10.3390/s1807211718:7(2117)Online publication date: 2-Jul-2018
    • (2018)Content Popularity Prediction and Caching for ICN: A Deep Learning Approach With SDNIEEE Access10.1109/ACCESS.2017.27817166(5075-5089)Online publication date: 2018

    View Options

    Get Access

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media