Abstract
Social activity networks are formed from activities among users (such as wall posts, tweets, emails, and etc.), where any activity between two users results in an addition of an edge to the network graph. These networks are streaming and include massive volume of edges. A streaming graph is considered to be a stream of edges that continuously evolves over time. This paper proposes a sampling algorithm for social activity networks, implemented in a streaming fashion. The proposed algorithm utilizes a set of fixed structure learning automata. Each node of the original activity graph is equipped with a learning automaton which decides whether its corresponding node should be added to the sample set or not. The proposed algorithm is compared with the best streaming sampling algorithm reported so far in terms of Kolmogorov-Smirnov (KS) test and normalized L1 and L2 distances over real-world activity networks and synthetic networks presented as a sequence of edges. The experimental results show the superiority of the proposed algorithm.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Leskovec J, Faloutsos C (2006) Sampling from large graphs. In: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD ’06. ACM Press, New York, p 631
Ebbes P, Huang Z, Rangaswamy A (2012) Subgraph sampling methods for social networks: the good, the bad, and the ugly. SSRN Electron J. doi:10.2139/ssrn.1580074
Lee SH, Kim P-J, Jeong H (2006) Statistical properties of sampled networks. Phys Rev E 73:16102. doi:10.1103/PhysRevE.73.016102
Yoon S, Lee S, Yook S-H, Kim Y (2007) Statistical properties of sampled networks by random walks. Phys Rev E 75:46114. doi:10.1103/PhysRevE.75.046114
Ghavipour M, Meybodi MR (2017) Irregular cellular learning automata-based algorithm for sampling social networks. Eng Appl Artif Intell 59:244–259
Krishnamurthy V, Faloutsos M, Chrobak M et al (2007) Sampling large Internet topologies for simulation purposes. Comput Networks 51:4284–4302. doi:10.1016/j.comnet.2007.06.004
Hübler C, Kriegel H-P, Borgwardt K, Ghahramani Z (2008) Metropolis algorithms for representative subgraph sampling. In: 2008 8th IEEE international conference on data mining. IEEE, pp 283–292
Kurant M, Markopoulou A, Thiran P (2011) Towards unbiased BFS sampling. IEEE J Sel Areas Commun 29:1799–1809. doi:10.1109/JSAC.2011.111005
Rezvanian A, Meybodi MR (2015) Sampling social networks using shortest paths. Phys A Stat Mech Appl 424:254–268. doi:10.1016/j.physa.2015.01.030
Rezvanian A, Meybodi MR (2015) A new learning automata-based sampling algorithm for social networks. Int J Commun Syst, n/a-n/a. doi:10.1002/dac.3091
Ahmed NK, Neville J, Kompella R (2014) Network sampling: from static to streaming graphs. ACM Trans Knowl Discov Data 8:7. doi:10.1145/2601438
Bar-Yossef Z, Kumar R, Sivakumar D (2002) Reductions in streaming algorithms, with an application to counting triangles in graphs. In: Proceedings of the 13th annual ACM-SIAM symposium on Discrete algorithms. Society for Industrial and Applied Mathematics, San Francisco, California, pp 623–632
Aggarwal CC (2006) On biased reservoir sampling in the presence of stream evolution. In: Proceedings of the 32nd international conference on very large data bases, pp 607–618
Sarma AD, Gollapudi S, Panigrahy R (2011) Estimating PageRank on graph streams. J ACM 58:1–19. doi:10.1145/1970392.1970397
Buriol LS, Frahling G, Leonardi S et al (2006) Counting triangles in data streams. In: Proceedings of the 25th ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems. ACM, pp 253–262
Aggarwal CC, Li Y, Yu PS, Jin R (2010) On dense pattern mining in graph streams. Proc VLDB Endow 3:975–984
Aggarwal CC, Zhao Y, Yu PS (2010) On clustering graph streams. In: Proceedings of the 2010 SIAM international conference on data mining SIAM, pp 478–489
Chen L, Wang C (2010) Continuous subgraph pattern search over certain and uncertain graph streams. IEEE Trans Knowl Data Eng 22:1093–1109
Cormode G, Muthukrishnan S (2005) Space efficient mining of multigraph streams. In: Proceedings of the 24th ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems - Pod. ’05. ACM Press, New York, p 271
Ahmed NK, Berchmans F, Neville J, Kompella R (2010) Time-based sampling of social network activity graphs Proceedings 8th Work. Min. Learn. with Graphs - MLG ’10. ACM Press, New York, pp 1–9
Aggarwal CC, Zhao Y, Philip SY (2011) Outlier detection in graph streams. In: 27th IEEE international conference on data engineering 2011 (ICDE 2011). IEEE, pp 399–409
Jin EM, Girvan M, Newman MEJ (2001) Structure of growing social networks. Phys Rev E 64:46132
Tang L, Liu H (2010) Community detection and mining in social media. Synth Lect Data Min Knowl Discov 2:1–137
Leskovec J, Kleinberg J, Faloutsos C (2005) Graphs over time. In: Proceedings of the Elev. ACM SIGKDD international conference on knowledge discovery and data mining - KDD ’05. ACM Press, New York, p 177
Kumar R, Novak J, Tomkins A (2010) Structure and evolution of online social networks. In: Link min Model algorithms Appl. Springer, pp 337–357
Stumpf MP, Wiuf C, May RM (2005) Subnets of scale-free networks are not scale-free: sampling properties of networks. In: Proceedings of the Natl. Acad. Sci. U. S. A. National Acad Sciences, pp 4221–4224
Ahn Y-Y, Han S, Kwak H et al (2007) Analysis of topological characteristics of huge online social networking services. In: Proceedings of the 16th international conference on world wide web. ACM, pp 835–844
Mislove A, Marcon M, Gummadi KP et al (2007) Measurement and analysis of online social networks. In: Proceedings of the 7th ACM SIGCOMM conference on internet measurement. ACM, pp 29–42
Wilson C, Boe B, Sala A et al (2009) User interactions in social networks and their implications. In: Proceedings of the 4th ACM european conference on computer systems. ACM, pp 205– 218
Goodman LA (1961) Snowball sampling. Ann Math Stat 32:148–170
Gjoka M, Kurant M, Butts CT, Markopoulou A (2010) Walking in Facebook: A case study of unbiased sampling of OSNs 2010. In: Proceedings of the IEEE Infocom. IEEE, pp 1–9
Ye S, Lang J, Wu F (2010) Crawling online social graphs. In: The 12th international Asia-Pacific web conference (APWeb 2010). IEEE, pp 236–242
Lu J, Li D (2012) Sampling online social networks by random walk. In: Proceedings of the 1st ACM international workshop on hot topics on interdisciplinary social networks research - hotsocial ’12. ACM Press, New York, pp 33–40
Kurant M, Gjoka M, Butts CT, Markopoulou A (2011) Walking on a graph with a magnifying glass. In: Proceedings of the ACM SIGMETRICS Jt. international conference on measurement and modeling of computer systems - SIGMETRICS ’11. ACM Press, New York, p 281
Rasti AH, Torkjazi M, Rejaie R et al (2009) Respondent-driven sampling for characterizing unstructured overlays. In: IEEE INFOCOM 2009. IEEE, pp 2701–2705
Lee C-H, Xu X, Eun DY et al (2012) Beyond random walk and metropolis-hastings samplers. In: Proceedings of the 12th ACM SIGMETRICS/PERFORMANCE Jt. international conference on measurement and modeling of computer systems - SIGMETRICS ’12. ACM Press, New York, p 319
Stutzbach D, Rejaie R, Duffield N et al (2009) On unbiased sampling for unstructured peer-to-peer networks. IEEE/ACM Trans Netw 17:377–390
Ribeiro B, Towsley D (2010) Estimating and sampling graphs with multidimensional random walks. In: Proceedings of the 10th ACM SIGCOMM Conf. Internet Meas. ACM, pp 390–403
Avrachenkov K, Ribeiro B, Towsley D (2010) Improving random walk estimation accuracy with uniform restarts. In: Int. Work. Algorithms Model. Web-Graph. Springer, pp 98–109
Thathachar MAL, Sastry PS (2011) Networks of learning automata: techniques for online stochastic optimization. Springer Science & Business Media
Narendra KS, Thathachar MAL (2012) Learning automata: an introduction. doi:10.1109/TSMCB.2002.1049606
Ghavipour M, Meybodi MR (2016) An adaptive fuzzy recommender system based on learning automata. Electron Commer Res Appl 20:105–115
Mirsaleh MR, Meybodi MR (2016) A new memetic algorithm based on cellular learning automata for solving the vertex coloring problem. Memetic Comput 8:2112–222. doi:10.1007/s12293-016-0183-4
Tsetlin M (1961) On behaviour of finite automata in random medium. Avtom I Telemekhanika 22:1345–1354
Barabási A -L, Albert R (1999) Emergence of scaling in random networks. Science (80-) 286:509–512
Albert R, Jeong H, Barabási A-L (2000) Error and attack tolerance of complex networks. Nature 406:378–382
Bayer R, Mccreight E (2002) Organization and maintenance of large ordered indexes. In: Softw. Pioneers. Springer Berlin Heidelberg, Berlin, Heidelberg, pp 245–262
Gleich DF (2012) Graph of flickr photo-sharing social network crawled in May 2006. doi:10.4231/D39P2W550
Viswanath B, Mislove A, Cha M, Gummadi KP (2009) On the evolution of user interaction in Facebook. In: Proceedings 2nd ACM work Online soc. networks - WOSN ’09. ACM Press, New York, p 37
Leskovec J, Krevl A (2014) SNAP Datasets: Stanford Large Network Dataset Collection
Goldstein ML, Morris SA, Yen GG (2004) Problems with fitting to the power-law distribution. Eur Phys J B 41:255–258. doi:10.1140/epjb/e2004-00316-5
Watts DJ, Strogatz SH (1998) Collective dynamics of “small-world” networks. Nature 393:440–442
Acknowledgments
The authors acknowledge the use of high performance computers provided by High Performance Computing Research Center (HPCRC) at Amirkabir University of Technology, in the completion of this work.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Ghavipour, M., Meybodi, M.R. A streaming sampling algorithm for social activity networks using fixed structure learning automata. Appl Intell 48, 1054–1081 (2018). https://doi.org/10.1007/s10489-017-1005-1
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-017-1005-1