Abstract
When a piece of information (microblog, photograph, video, link, etc.) starts to spread in a social network, an important question arises: will it spread to “viral” proportions—where “viral” can be defined as an order-of-magnitude increase. However, several previous studies have established that cascade size and frequency are related through a power law—which leads to a severe imbalance in this classification problem. In this paper, we devise a suite of measurements based on “structural diversity”—the variety of social contexts (communities) in which individuals partaking in a given cascade engage. We demonstrate these measures are able to distinguish viral from non-viral cascades, despite the severe imbalance of the data for this problem. Further, we leverage these measurements as features in a classification approach, successfully predicting microblogs that grow from 50 to 500 reposts with precision of 0.69 and recall of 0.52 for the viral class—despite this class comprising under 2 % of samples. This significantly outperforms our baseline approach as well as the current state of the art. We also show this approach also performs well for identifying whether cascades observed for 60 min will grow to 500 reposts as well as demonstrate how we can trade-off between precision and recall.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
This was their highest-performing set of features for predicting cascades that grew from 50 to 367 and 100 to 417 reposts. We also included the baseline feature in this set as we found it improved the effectiveness of this approach.
References
Bakshy E, Hofman JM, Mason WA, Watts DJ (2011) Everyone’s an influencer: quantifying influence on twitter. In: Proceedings of the fourth ACM international conference on web search and data mining, WSDM ’11. ACM, New York, NY, USA, pp 65–74. doi:10.1145/1935826.1935845
Bao P, Shen HW, Chen W, Cheng XQ (2013a) Cumulative effect in information diffusion: empirical study on a microblogging network. PloS One 8(10):e76,027
Bao Q, Cheung WK, Zhang Y (2013b) Incorporating structural diversity of neighbors in a diffusion model for social networks. In: 2013 IEEE/WIC/ACM international joint conferences on web intelligence (WI) and intelligent agent technologies (IAT), vol 1. IEEE, pp 431–438
Bian J, Yang Y, Chua TS (2014) Predicting trending messages and diffusion participants in microblogging network. In: Proceedings of the 37th international ACM SIGIR conference on research & development in information retrieval. ACM, pp 537–546
Blondel VD, Guillaume JL, Lambiotte R, Lefebvre E (2008) Fast unfolding of communities in large networks. J Stat Mech Theory Exp 2008(10):P10,008
Breiman L, Friedman J, Stone CJ, Olshen RA (1984) Classification and regression trees. CRC Press, Boca Raton
Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) Smote: synthetic minority over-sampling technique. J Artif Intell Res 16(1):321–357
Cheng J, Adamic L, Dow PA, Kleinberg JM, Leskovec J (2014) Can cascades be predicted? In: Proceedings of the 23rd international conference on world wide web, International World Wide Web Conferences Steering Committee, pp 925–936
Gallos L, Havlin S, Kitsak M, Liljeros F, Makse H, Muchnik L, Stanley H (2010) Identification of influential spreaders in complex networks. Nat Phys 6(11):888–893
Galuba W, Aberer K, Chakraborty D, Despotovic Z, Kellerer W (2010) Outtweeting the twitterers-predicting information cascades in microblogs. In: Proceedings of the 3rd conference on online social networks, vol 39, p 3âAS3
Grabowicz PA, Ramasco JJ, Moro E, Pujol JM, Eguiluz VM et al (2012) Social features of online networks: the strength of intermediary ties in online social media. PloS One 7(1):e29,358
Guo R, Shaabani E, Bhatnagar A, Shakarian P (2015) Toward order-of-magnitude cascade prediction. In: Proceedings of the 2015 IEEE/ACM international conference on advances in social networks analysis and mining 2015. ACM, pp 1610–1613
Gupta M, Gao J, Zhai C, Han J (2012) Predicting future popularity trend of events in microblogging platforms. Proc Am Soc Inf Sci Technol 49(1):1–10
Huang X, Cheng H, Li RH, Qin L, Yu JX (2013) Top-k structural diversity search in large networks. Proc VLDB Endow 6(13):1618–1629
Jenders M, Kasneci G, Naumann F (2013) Analyzing and predicting viral tweets. In: Proceedings of the 22nd international conference on world wide web companion, International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, Switzerland, WWW ’13 Companion, pp 657–664, http://dl.acm.org/citation.cfm?id=2487788.2488017
Kempe D, Kleinberg J, Tardos E (2003) Maximizing the spread of influence through a social network. In: Proceedings of the ninth ACM SIGKDD international conference on knowledge discovery and data mining (KDD ’03). ACM, New York, NY, USA, pp 137–146. doi:10.1145/956750.956769
Li RH, Qin L, Yu JX, Mao R (2015) Influential community search in large networks. Proc VLDB Endow 8(5):509–520
Lieberman E, Hauert C, Nowak MA (2005) Evolutionary dynamics on graphs. Nature 433(7023):312–316. doi:10.1038/nature03204
Pei S, Muchnik L, Andrade JS Jr, Zheng Z, Makse HA (2014) Searching for superspreaders of information in real-world social media. Sci Rep 4:5547
Raghavan UN, Albert R, Kumara S (2007) Near linear time algorithm to detect community structures in large-scale networks. Phys Rev E 76(3):036,106
Rosvall M, Bergstrom CT (2008) Maps of random walks on complex networks reveal community structure. Proc Natl Acad Sci 105(4):1118–1123
Shakarian P, Gerdes L, Lei H (2014) Circle-based tipping cascades in social networks. In: WSDM workshop on diffusion networks and cascade analytics
Shakarian P, Bhatnagar A, Aleali A, Guo R, Shaabani E (2015) Diffusion in social networks. Springer, Berlin
Ugander J, Backstrom L, Marlow C, Kleinberg J (2012) Structural diversity in social contagion. Proc Natl Acad Sci 109(16):5962–5966
Waltman L, van Eck NJ (2013) A smart local moving algorithm for large-scale modularity-based community detection. Eur Phys J B 86(11):1–14
Weng L, Menczer F, Ahn YY (2014) Predicting successful memes using network and community structure. In: Eighth international AAAI conference on weblogs and social media
Zhang J, Liu B, Tang J, Chen T, Li J (2013) Social influence locality for modeling retweeting behaviors. In: Proceedings of the twenty-third international joint conference on artificial intelligence. AAAI Press, pp 2761–2767
Acknowledgments
Some of the authors of this paper are supported by AFOSR Young Investigator Program (YIP) Grant FA9550-15-1-0159, ARO Grant W911NF-15-1-0282, and the DoD Minerva program. Portions of this work were also disclosed in US provisional Patent 62/201, 517. A non-provisional patent is currently being filed.
Author information
Authors and Affiliations
Corresponding author
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Guo, R., Shaabani, E., Bhatnagar, A. et al. Toward early and order-of-magnitude cascade prediction in social networks. Soc. Netw. Anal. Min. 6, 64 (2016). https://doi.org/10.1007/s13278-016-0372-7
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s13278-016-0372-7