Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

When Less Is More: Systematic Analysis of Cascade-Based Community Detection

Published: 08 January 2022 Publication History
  • Get Citation Alerts
  • Abstract

    Information diffusion, spreading of infectious diseases, and spreading of rumors are fundamental processes occurring in real-life networks. In many practical cases, one can observe when nodes become infected, but the underlying network, over which a contagion or information propagates, is hidden. Inferring properties of the underlying network is important since these properties can be used for constraining infections, forecasting, viral marketing, and so on. Moreover, for many applications, it is sufficient to recover only coarse high-level properties of this network rather than all its edges. This article conducts a systematic and extensive analysis of the following problem: Given only the infection times, find communities of highly interconnected nodes. This task significantly differs from the well-studied community detection problem since we do not observe a graph to be clustered. We carry out a thorough comparison between existing and new approaches on several large datasets and cover methodological challenges specific to this problem. One of the main conclusions is that the most stable performance and the most significant improvement on the current state-of-the-art are achieved by our proposed simple heuristic approaches agnostic to a particular graph structure and epidemic model. We also show that some well-known community detection algorithms can be enhanced by including edge weights based on the cascade data.

    References

    [1]
    Lada A. Adamic and Natalie Glance. 2005. The political blogosphere and the 2004 US election: Divided they blog. In Proceedings of the 3rd International Workshop on Link Discovery. ACM, 36–43.
    [2]
    James P. Bagrow. 2008. Evaluating local community methods in networks. Journal of Statistical Mechanics: Theory and Experiment 2008, 05 (2008), P05001.
    [3]
    Eytan Bakshy, Jake M. Hofman, Winter A. Mason, and Duncan J. Watts. 2011. Identifying influencers on twitter. In Proceedings of the 4th ACM International Conference on Web Seach and Data Mining.
    [4]
    Nicola Barbieri, Francesco Bonchi, and Giuseppe Manco. 2013. Cascade-based community detection. In Proceedings of the 6th ACM International Conference on Web Search and Data Mining. ACM, 33–42.
    [5]
    Nicola Barbieri, Francesco Bonchi, and Giuseppe Manco. 2013. Influence-based network-oblivious community detection. In Proceedings of the 2013 IEEE 13th International Conference on Data Mining. IEEE, 955–960.
    [6]
    Nicola Barbieri, Francesco Bonchi, and Giuseppe Manco. 2017. Efficient methods for influence-based network-oblivious community detection. ACM Transactions on Intelligent Systems and Technology 8, 2 (2017), 32.
    [7]
    Vincent D. Blondel, Jean-Loup Guillaume, Renaud Lambiotte, and Etienne Lefebvre. 2008. Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment 2008, 10 (2008), P10008.
    [8]
    Tanmoy Chakraborty, Ayushi Dalmia, Animesh Mukherjee, and Niloy Ganguly. 2017. Metrics for community analysis: A survey. ACM Computing Surveys 50, 4 (2017), 54.
    [9]
    Yen-Liang Chen, Ching-Hao Chuang, and Yu-Ting Chiu. 2014. Community detection based on social interactions in a social network. Journal of the Association for Information Science and Technology 65, 3 (2014), 539–550.
    [10]
    Michael D Conover, Jacob Ratkiewicz, Matthew Francisco, Bruno Gonçalves, Filippo Menczer, and Alessandro Flammini. 2011. Political polarization on twitter. In Proceedings of the 5th International AAAI Conference on Weblogs and Social Media.
    [11]
    Michele Coscia, Fosca Giannotti, and Dino Pedreschi. 2011. A classification for community discovery methods in complex networks. Statistical Analysis and Data Mining: The ASA Data Science Journal 4, 5 (2011), 512–546.
    [12]
    Hadi Daneshmand, Manuel Gomez-Rodriguez, Le Song, and Bernhard Schoelkopf. 2014. Estimating Diffusion Network Structures: Recovery Conditions, Sample Complexity & Soft-thresholding Algorithm. In Proceedings of the International Conference on Machine Learning. 793–801.
    [13]
    Aurelien Decelle, Florent Krzakala, Cristopher Moore, and Lenka Zdeborová. 2011. Asymptotic analysis of the stochastic block model for modular networks and its algorithmic applications. Physical Review E 84, 6 (2011), 066106.
    [14]
    Nan Du, Le Song, Ming Yuan, and Alex J. Smola. 2012. Learning networks of heterogeneous influence. In Proceedings of the Advances in Neural Information Processing Systems. 2780–2788.
    [15]
    Santo Fortunato. 2010. Community detection in graphs. Physics Reports 486, 3 (2010), 75–174.
    [16]
    Santo Fortunato and Darko Hric. 2016. Community detection in networks: A user guide. Physics Reports 659 (2016), 1–44.
    [17]
    Wojciech Galuba, Karl Aberer, Dipanjan Chakraborty, Zoran Despotovic, and Wolfgang Kellerer. 2010. Outtweeting the twitterers-predicting information cascades in microblogs. In Proceedings of the 3rd Wonference on Online social networks. ACM.
    [18]
    C. Lee Giles, Kurt D. Bollacker, and Steve Lawrence. 1998. Citeseer: An automatic citation indexing system. In Proceedings of the 3rd ACM conference on Digital libraries. 89–98.
    [19]
    Manuel Gomez-Rodriguez, David Balduzzi, and Bernhard Schölkopf. 2011. Uncovering the temporal dynamics of diffusion networks. In Proceedings of the 28th International Conference on Machine Learning. 561–568.
    [20]
    Manuel Gomez Rodriguez, Jure Leskovec, David Balduzzi, and Bernhard Schölkopf. 2014. Uncovering the structure and temporal dynamics of information propagation. Network Science 2, 01 (2014), 26–65.
    [21]
    Manuel Gomez-Rodriguez, Jure Leskovec, and Andreas Krause. 2010. Inferring networks of diffusion and influence. In Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 1019–1028.
    [22]
    Manuel Gomez-Rodriguez, Jure Leskovec, and Bernhard Schölkopf. 2013. Modeling information propagation with survival theory. In Proceedings of the International Conference on Machine Learning. 666–674.
    [23]
    Manuel Gomez-Rodriguez, Jure Leskovec, and Bernhard Schölkopf. 2013. Structure and dynamics of information pathways in online media. In Proceedings of the 6th ACM International Conference on Web Search and Data Mining. ACM, 23–32.
    [24]
    Manuel Gomez-Rodriguez and Bernhard Schölkopf. 2012. Submodular inference of diffusion networks from multiple trees. In Proceedings of the 29th International Coference on International Conference on Machine Learning. 1587–1594.
    [25]
    Martijn Gösgens, Teun Hendriks, Marko Boon, Wim Steenbakkers, Hans Heesterbeek, Remco Van Der Hofstad, and Nelly Litvak. 2021. Trade-offs between mobility restrictions and transmission of SARS-CoV-2. Journal of the Royal Society Interface 18, 175 (2021), 20200936.
    [26]
    Martijn Gösgens, Liudmila Prokhorenkova, and Alexey Tikhonov. 2021. Systematic analysis of cluster similarity indices: How to validate validation measures. In Proceedings of the International Conference on Machine Learning.
    [27]
    Caitlin Gray, Lewis Mitchell, and Matthew Roughan. 2020. Bayesian inference of network structure from information cascades. IEEE Transactions on Signal and Information Processing Over Networks 6 (2020), 371–381.
    [28]
    Steve Gregory. 2008. A fast algorithm to find overlapping communities in networks. In Proceedings of the Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, 408–423.
    [29]
    Ling He, Wenzhong Guo, Yuzhong Chen, Kun Guo, and Qifeng Zhuang. 2021. Discovering overlapping communities in dynamic networks based on cascade information diffusion. IEEE Transactions on Computational Social Systems (2021), 1–13.
    [30]
    Till Hoffmann, Leto Peel, Renaud Lambiotte, and Nick S. Jones. 2020. Community detection in networks without observing edges. Science Advances 6, 4 (2020), eaav1478.
    [31]
    Lars Hufnagel, Dirk Brockmann, and Theo Geisel. 2004. Forecast and control of epidemics in a globalized world. Proceedings of the National Academy of Sciences of the United States of America 101, 42 (2004), 15124–15129.
    [32]
    Brian Karrer and Mark E. J. Newman. 2011. Stochastic blockmodels and community structure in networks. Physical Review E 83, 1 (2011), 016107.
    [33]
    Matt J. Keeling and Ken T. D. Eames. 2005. Networks and epidemic models. Journal of the Royal Society Interface 2, 4 (2005), 295–307.
    [34]
    David Kempe, Jon Kleinberg, and Éva Tardos. 2003. Maximizing the spread of influence through a social network. In Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 137–146.
    [35]
    Vishwas Kukreti, Hirdesh K Pharasi, Priya Gupta, and Sunil Kumar. 2020. A perspective on correlation-based financial networks and entropy measures. Frontiers in Physics 8 (2020), 323.
    [36]
    Haewoon Kwak, Changhyun Lee, Hosung Park, and Sue Moon. 2010. What is twitter, a social network or a news media?. In Proceedings of the 19th International Conference on World Wide Web. ACM, 591–600.
    [37]
    Andrea Lancichinetti, Santo Fortunato, and Filippo Radicchi. 2008. Benchmark graphs for testing community detection algorithms. Physical Review E 78, 4 (2008), 046110.
    [38]
    Kristina Lerman, Rumi Ghosh, and Tawan Surachawala. 2013. Social contagion: An empirical study of information spread on digg and twitter follower graphs. arXiv:1202.3162. Retrieved from https://arxiv.org/abs/1202.3162.
    [39]
    Jure Leskovec, Jon Kleinberg, and Christos Faloutsos. 2007. Graph evolution: Densification and shrinking diameters. ACM Transactions on Knowledge Discovery from Data 1, 1 (2007), 2.
    [40]
    Ivan Lobov and Sergey Ivanov. 2019. Unsupervised community detection with modularity-based attention model. arXiv:1905.10350.Retrieved from https://arxiv.org/abs/1905.10350.
    [41]
    David Lusseau, Karsten Schneider, Oliver J. Boisseau, Patti Haase, Elisabeth Slooten, and Steve M Dawson. 2003. The bottlenose dolphin community of doubtful sound features a large proportion of long-lasting associations. Behavioral Ecology and Sociobiology 54, 4 (2003), 396–405.
    [42]
    Xiaoxiao Ma, Jia Wu, Shan Xue, Jian Yang, Quan Z Sheng, Hui Xiong, and Leman Akoglu. 2021. A comprehensive survey on graph anomaly detection with deep learning. IEEE Transactions on Knowledge and Data Engineering.
    [43]
    L.-E. Martinet, M. A. Kramer, W. Viles, L. N. Perkins, E. Spencer, C. J. Chu, S. S. Cash, and E. D. Kolaczyk. 2020. Robust dynamic community detection with applications to human brain functional networks. Nature Communications 11, 1 (2020), 1–13.
    [44]
    Andrew Kachites McCallum, Kamal Nigam, Jason Rennie, and Kristie Seymore. 2000. Automating the construction of internet portals with machine learning. Information Retrieval 3, 2 (2000), 127–163.
    [45]
    Seth Myers and Jure Leskovec. 2010. On the convexity of latent social network inference. In Proceedings of the Advances in Neural Information Processing Systems. 1741–1749.
    [46]
    M. E. J. Newman. 2016. Community detection in networks: Modularity optimization and maximum likelihood are equivalent. arXiv:1606.02319. Retrieved from https://arxiv.org/abs/1606.02319.
    [47]
    Mark E. J. Newman. 2006. Modularity and community structure in networks. In Proceedings of the National Academy of Sciences 103, 23 (2006), 8577–8582.
    [48]
    Mark E. J. Newman and Michelle Girvan. 2004. Finding and evaluating community structure in networks. Physical Review E 69, 2 (2004), 026113.
    [49]
    Hossein Noorazar. 2020. Recent advances in opinion propagation dynamics: A 2020 survey. The European Physical Journal Plus 135, 6 (2020), 1–20.
    [50]
    Tiago P Peixoto. 2013. Parsimonious module inference in large networks. Physical Review Letters 110, 14 (2013), 148701.
    [51]
    Tiago P Peixoto. 2019. Network reconstruction and community detection from dynamics. Physical Review Letters 123, 12 (2019), 128301.
    [52]
    Liudmila Prokhorenkova and Alexey Tikhonov. 2019. Community detection through likelihood optimization: In search of a sound model. In Proceedings of the 2019 World Wide Web Conference.
    [53]
    Liudmila Prokhorenkova, Alexey Tikhonov, and Nelly Litvak. 2019. Learning clusters through information diffusion. In Proceedings of the The World Wide Web Conference. ACM, 3151–3157.
    [54]
    Maryam Ramezani, Ali Khodadadi, and Hamid R. Rabiee. 2018. Community detection using diffusion information. ACM Transactions on Knowledge Discovery from Data 12, 2 (2018), 20.
    [55]
    Maryam Ramezani, Hamid R Rabiee, Maryam Tahani, and Arezoo Rajabi. 2017. DANI: A fast diffusion aware network inference algorithm. arXiv:1706.00941. Retrieved from https://arxiv.org/abs/1706.00941.
    [56]
    Daniel M. Romero, Brendan Meeder, and Jon Kleinberg. 2011. Differences in the mechanics of information diffusion across topics: Idioms, political hashtags, and complex contagion on twitter. In Proceedings of the 20th International Conference on World Wide Web. ACM, 695–704.
    [57]
    Martin Rosvall and Carl T Bergstrom. 2008. Maps of random walks on complex networks reveal community structure. Proceedings of the National Academy of Sciences 105, 4 (2008), 1118–1123.
    [58]
    Jaron Sanders, Alexandre Proutière, and Se-Young Yun. 2020. Clustering in block markov chains. The Annals of Statistics 48, 6 (2020), 3488–3512.
    [59]
    Mohammad Sattari and Kamran Zamanifar. 2018. A cascade information diffusion based label propagation algorithm for community detection in dynamic social networks. Journal of Computational Science 25 (2018), 122–133.
    [60]
    Xing Su, Shan Xue, Fanzhen Liu, Jia Wu, Jian Yang, Chuan Zhou, Wenbin Hu, Cecile Paris, Surya Nepal, Di Jin, Quan Z. Sheng, and Philip S. Yu. 2021. A comprehensive survey on community detection with deep learning. arXiv:2105.12584. Retrieved from https://arxiv.org/abs/2105.12584.
    [61]
    Daiki Suzuki and Sho Tsugawa. 2021. Effects of hidden users on cascade-based community detection. In Proceedings of the International Conference on Complex Networks. Springer, 1–11.
    [62]
    Chenxi Wang, John C. Knight, and Matthew C. Elder. 2000. On computer viral infection and the effect of immunization. In Proceedings of the 16th Annual Computer Security Applications Conference ACSAC’00. IEEE, 246–256.
    [63]
    Zonghan Wu, Shirui Pan, Fengwen Chen, Guodong Long, Chengqi Zhang, and S. Yu Philip. 2020. A comprehensive survey on graph neural networks. IEEE Transactions on Neural Networks and Learning Systems 32, 1 (2020), 4–24.
    [64]
    Yu Xing, Xingkang He, Haitao Fang, and Karl H Johansson. 2021. Detecting communities in a gossip model with stubborn agents. arXiv:2102.09683. Retrieved from https://arxiv.org/abs/2102.09683.
    [65]
    Luh Yen, Francois Fouss, Christine Decaestecker, Pascal Francq, and Marco Saerens. 2007. Graph nodes clustering based on the commute-time kernel. In Proceedings of the Pacific-asia Conference on Knowledge Discovery and Data Mining. Springer, 1037–1045.
    [66]
    Wayne W Zachary. 1977. An information flow model for conflict and fission in small groups. Journal of Anthropological Research 33, 4 (1977), 452–473.
    [67]
    Xinyi Zhou and Reza Zafarani. 2019. Network-based fake news detection: A pattern-driven approach. ACM SIGKDD Explorations Newsletter 21, 2 (2019), 48–60.
    [68]
    Jessica Hoffmann, Soumya Basu, Surbhi Goel, and Constantine Caramanis. 2020. Learning mixtures of graphs from epidemic cascades. In International Conference on Machine Learning. PMLR, 4342–4352.

    Cited By

    View all
    • (2024)Bayesian Graph Local Extrema Convolution with Long-tail Strategy for Misinformation DetectionACM Transactions on Knowledge Discovery from Data10.1145/363940818:4(1-21)Online publication date: 12-Feb-2024
    • (2023)Inferring Diffusion Network from Information Cascades using Transitive InfluenceJournal of Information Systems and Telecommunication (JIST)10.61186/jist.33656.11.44.30711:44(307-319)Online publication date: 16-Dec-2023
    • (2023)Almost Exact Recovery in Gossip Opinion Dynamics Over Stochastic Block Models2023 62nd IEEE Conference on Decision and Control (CDC)10.1109/CDC49753.2023.10383465(2421-2426)Online publication date: 13-Dec-2023
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Knowledge Discovery from Data
    ACM Transactions on Knowledge Discovery from Data  Volume 16, Issue 4
    August 2022
    529 pages
    ISSN:1556-4681
    EISSN:1556-472X
    DOI:10.1145/3505210
    Issue’s Table of Contents

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 08 January 2022
    Accepted: 01 October 2021
    Revised: 01 October 2021
    Received: 01 May 2021
    Published in TKDD Volume 16, Issue 4

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Community detection
    2. information propagation
    3. epidemic cascades
    4. diffusion
    5. network inference
    6. likelihood optimization

    Qualifiers

    • Research-article
    • Refereed

    Funding Sources

    • Ministry of Education and Science of the Russian Federation in the framework of MegaGrant
    • Russian President

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)74
    • Downloads (Last 6 weeks)3
    Reflects downloads up to 26 Jul 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Bayesian Graph Local Extrema Convolution with Long-tail Strategy for Misinformation DetectionACM Transactions on Knowledge Discovery from Data10.1145/363940818:4(1-21)Online publication date: 12-Feb-2024
    • (2023)Inferring Diffusion Network from Information Cascades using Transitive InfluenceJournal of Information Systems and Telecommunication (JIST)10.61186/jist.33656.11.44.30711:44(307-319)Online publication date: 16-Dec-2023
    • (2023)Almost Exact Recovery in Gossip Opinion Dynamics Over Stochastic Block Models2023 62nd IEEE Conference on Decision and Control (CDC)10.1109/CDC49753.2023.10383465(2421-2426)Online publication date: 13-Dec-2023
    • (2023)Community structure recovery and interaction probability estimation for gossip opinion dynamicsAutomatica (Journal of IFAC)10.1016/j.automatica.2023.111105154:COnline publication date: 1-Aug-2023

    View Options

    Get Access

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Full Text

    View this article in Full Text.

    Full Text

    HTML Format

    View this article in HTML Format.

    HTML Format

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media