Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Mining user–user communities for a weighted bipartite network using spark GraphFrames and Flink Gelly

Published: 01 June 2021 Publication History

Abstract

Large-scale graph processing is one of the recently developed significant research areas relevant to big data analytics. Distributed graph analytics is useful to see the intuitive insights of node interactions from large-scale network data. Distributed graph computing is an upcoming area in graph data mining that explores crucial node relationships for a given graph dataset. In this paper, we propose a new method to discover top-k user–user communities for a weighted bipartite network by defining a weighted similarity measure. We extend the structural similarity metric, namely Otsuka–Ochiai coefficient, by adding weights of nodes and quantifies the similarity between distinct items of a user–item network. We propose a new method to mine top-k user–user communities based on the similarity of items using a weighted similarity measure. Further, two algorithms, namely TUCSGF, TUCFlink, are presented to mine top-k user–user communities in a distributed approach based on the strength of the item-to-item similarities. Moreover, we execute the TUCSGF algorithm using Apache Spark by utilizing the advantage of Spark GraphFrames to mine top-k user–user communities. Also, we implement the TUCFlink algorithm to mine top-k communities using Apache Flink by utilizing the functionalities of Flink Gelly. Further, we explore two real-world network applications online learning network, chain of hospitals network with various graph methods that are to be applied for both the applications. Furthermore, we systematically perform various experiments concerning execution time, memory consumption, and CPU usage of both TUCSGF, TUCFlink on three distinct datasets. The performance of TUCFLINK is far better than TUCSGF concerning computing time. Applying distributed graph analytics for various complex networks using distributed graph processing tools GraphX, GraphFrames and Gelly provides more intuitive insights about distinct types of node interactions in graph data mining.

References

[1]
Alzahrani T, Horadam KJ (2016) Community detection in bipartite networks: algorithms and case studies. In: Lü J, Yu X, Chen G, Yu W (eds) Complex Systems and Networks. Understanding Complex Systems, pp 25–50. ISBN: 978-3-662-47824-0
[2]
Alzahrani T, Horadam KJ, Boztas S (2014) Community detection in bipartite networks using random walks. In: Contucci P, Menezes R, Omicini A, Poncela-Casasnovas J (eds) Complex networks V. Studies in Computational Intelligence, vol 549. Springer, Cham, pp 157–165. ISBN: 978-3-319-05401-8.
[3]
Avery C Giraph: large-scale graph processing infrastructure on hadoop Proc Hadoop Summit Santa Clara 2011 11 3 5-9
[4]
Baeza-Yates R, Ribeiro-Neto B, et al. Modern information retrieval 1999 New York ACM Press
[5]
Banadaki SVM, Lattanzi S, Feldman JE, Epasto A, Leonardi S, Lynch H, Sharma V (2015) Efficient similarity ranking for bipartite graphs. US Patent App. 14/278,811
[6]
Barber MJ Modularity and community detection in bipartite networks Phys Rev E 2007 76 6 066102
[7]
Beckett SJ Improved community detection in weighted bipartite networks R Soc Open Sci 2016 3 1 140,536
[8]
Bhih A, Johnson P, and Randles M An optimisation tool for robust community detection algorithms using content and topology information J Supercomput 2020 76 1 226-254
[9]
Boccaletti S, Latora V, Moreno Y, Chavez M, and Hwang DU Complex networks: structure and dynamics Phys Rep 2006 424 4–5 175-308
[10]
Bu Y, Howe B, Balazinska M, and Ernst MD Haloop: efficient iterative data processing on large clusters Proc VLDB Endow 2010 3 1–2 285-296
[11]
Carbone P, Katsifodimos A, Ewen S, Markl V, Haridi S, et al. Apache flink: stream and batch processing in a single engine Bull IEEE Comput Soc Tech Committee Data Eng 2015 36 4 28-38
[12]
Carrington PJ, Scott J, and Wasserman S Models and methods in social network analysis 2005 Cambridge Cambridge University Press
[13]
CDC (2019) 500 cities: local data for better health, 2019, centers for disease control and prevention. https://chronicdata.cdc.gov/500-Cities/500-Cities-Local-Data-for-Better-Health-2019-relea/6vp6-wxuq. Accessed 19 Dec 2019
[14]
Chen R, Shi J, Chen Y, Zang B, Guan H, and Chen H Powerlyra: differentiated graph computation and partitioning on skewed graphs ACM Trans Parallel Comput (TOPC) 2019 5 3 13
[15]
Cheng Y, Jiang H, Wang F, Hua Y, Feng D, Guo W, and Wu Y Using high-bandwidth networks efficiently for fast graph computation IEEE Trans Parallel Distrib Syst 2018 30 5 1170-1183
[16]
Chs (2019) Chs: community health systems. https://www.chs.net/. Accessed 19 Dec 2019
[17]
[18]
Cui Y and Wang X Detecting one-mode communities in bipartite networks by bipartite clustering triangular Phys A 2016 457 307-315
[19]
Dave A, Jindal A, Li LE, Xin R, Gonzalez J, Zaharia M (2016) Graphframes: an integrated API for mixing graph and relational queries. In: Proceedings of the Fourth International Workshop on Graph Data Management Experiences and Systems, pp 1–8
[20]
Dean J and Ghemawat S Mapreduce: simplified data processing on large clusters Commun ACM 2008 51 1 107-113
[21]
Deshpande M and Karypis G Item-based top-n recommendation algorithms ACM Trans Inf Syst (TOIS) 2004 22 1 143-177
[22]
Du N, Wang B, Wu B, Wang Y (2008) Overlapping community detection in bipartite networks. In: 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology. IEEE, vol 1, pp 176–179
[23]
Ekanayake J, Li H, Zhang B, Gunarathne T, Bae SH, Qiu J, Fox G (2010) Twister: a runtime for iterative mapreduce. In: Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, pp 810–818
[24]
Fani H, Jiang E, Bagheri E, Al-Obeidat F, Du W, and Kargar M User community detection via embedding of social network structure and temporal content Inf Process Manag 2020 57 2 102,056
[25]
[26]
Fortunato S and Barthelemy M Resolution limit in community detection Proc Natl Acad Sci 2007 104 1 36-41
[27]
Gelly (2019) Introducing gelly: graph processing with apache flink. https://flink.apache.org/news/2015/08/24/introducing-flink-gelly.html
[28]
Girvan M and Newman ME Community structure in social and biological networks Proc Natl Acad Sci 2002 99 12 7821-7826
[29]
Gonzalez JE, Low Y, Gu H, Bickson D, Guestrin C (2012) Powergraph: distributed graph-parallel computation on natural graphs. In: OSDI, vol 12, p 2
[30]
Grujić J (2008) Movies recommendation networks as bipartite graphs. In: International Conference on Computational Science, Springer, pp 576–583
[31]
GSA (2019) Introducing gelly: graph processing with apache flink. Apache Flink. Accessed 20 Aug 2019
[32]
Guimerà R, Sales-Pardo M, and Amaral LAN Module identification in bipartite and directed networks Phys Rev E 2007 76 3 036,102
[33]
Heidari S, Simmhan Y, Calheiros RN, and Buyya R Scalable graph processing frameworks: a taxonomy and open challenges ACM Comput Surv (CSUR) 2018 51 3 60
[34]
Jackson DA, Somers KM, and Harvey HH Similarity coefficients: measures of co-occurrence and association or simply measures of occurrence Am Nat 1989 133 3 436-453
[35]
Jeh G, Widom J (2002) Simrank: a measure of structural-context similarity. In: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, pp 538–543
[36]
Kalavri V, Vlassov V, and Haridi S High-level programming abstractions for distributed graph processing IEEE Trans Knowl Data Eng 2018 1 1-1
[37]
Kang U, Tsourakakis CE, Faloutsos C (2009) Pegasus: a peta-scale graph mining system implementation and observations. In: Ninth IEEE International Conference on Data Mining, 2009. ICDM’09. IEEE, pp 229–238
[38]
Kang U, Tsourakakis CE, and Faloutsos C Pegasus: mining peta-scale graphs Knowl Inf Syst 2011 27 2 303-325
[39]
Kholod I, Shorov A, Titkov E, and Gorlatch S A formally based parallelization of data mining algorithms for multi-core systems J Supercomput 2019 75 12 7909-7920
[40]
Kuzilek J, Hlosta M, and Zdrahal Z Open university learning analytics dataset Sci Data 2017 4 170 171
[41]
Lambiotte R and Ausloos M Uncovering collective listening habits and music genres in bipartite networks Phys Rev E 2005 72 6 066107
[42]
Lambiotte R and Ausloos M On the genre-fication of music: a percolation approach Eur Phys J B Condens Matter Compl Syst 2006 50 1–2 183-188
[43]
Lehmann S, Schwartz M, and Hansen LK Biclique communities Phys Rev E 2008 78 1 016,108
[44]
Leicht EA and Newman ME Community structure in directed networks Phys Rev Lett 2008 100 11 118,703
[45]
Li K and Pang Y A unified community detection algorithm in complex network Neurocomputing 2014 130 36-43
[46]
Linden G, Smith B, and York J Amazon.com recommendations: item-to-item collaborative filtering IEEE Internet Comput 2003 7 1 76-80
[47]
Liu N, Ds Li, Ym Zhang, and Xl Li Large-scale graph processing systems: a survey Front Inf Technol Electron Eng 2020 21 384-404
[48]
Low Y, Bickson D, Gonzalez J, Guestrin C, Kyrola A, and Hellerstein JM Distributed graphlab: a framework for machine learning and data mining in the cloud Proc VLDB Endow 2012 5 8 716-727
[49]
Ma T, Wang Y, Tang M, Cao J, Tian Y, Al-Dhelaan A, and Al-Rodhaan M LED: a fast overlapping communities detection algorithm based on structural clustering Neurocomputing 2016 207 488-500
[50]
Malewicz G, Austern MH, Bik AJ, Dehnert JC, Horn I, Leiser N, Czajkowski G (2010) Pregel: a system for large-scale graph processing. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data. ACM, pp 135–146
[51]
Mitrović M and Tadić B Bloggers behavior and emergent communities in blog space Eur Phys J B 2010 73 2 293-301
[52]
Mitrović M, Paltoglou G, and Tadić B Networks and emotion-driven user communities at popular blogs Eur Phys J B 2010 77 4 597-609
[53]
Murata T (2009) Detecting communities from bipartite networks based on bipartite modularities. In: 2009 International Conference on Computational Science and Engineering. IEEE, vol 4, pp 50–57
[54]
OCHIAI A, (1957) Zoogeographical studies on the soleoid fishes found in japan and its neighhouring regions-ii. Nippon Suisan Gakkaishi 22(9):526–530.
[55]
Papagelis M and Plexousakis D Qualitative analysis of user-based and item-based prediction algorithms for recommendation agents Eng Appl Artif Intell 2005 18 7 781-789
[56]
Park C, Park HM, and Kang U Flexgraph: flexible partitioning and storage for scalable graph mining PLoS ONE 2020 15 1 e0227,032
[57]
Rashid AM, Karypis G, Riedl J (2005) Influence in ratings-based recommender systems: an algorithm-independent approach. In: Proceedings of the 2005 SIAM International Conference on Data Mining. SIAM, pp 556–560
[58]
Resnick P and Varian HR Recommender systems Commun ACM 1997 40 3 56-59
[59]
Sarwar B, Karypis G, Konstan J, Riedl J (2001) Item-based collaborative filtering recommendation algorithms. In: Proceedings of the 10th International Conference on World Wide Web, pp 285–295
[60]
Taguchi H, Murata T, Liu X (2020) Bimlpa: community detection in bipartite networks by multi-label propagation. In: International Conference on Network Science. Springer, pp 17–31
[61]
Valiant LG A bridging model for parallel computation Commun ACM 1990 33 8 103-111
[62]
Wang G, Xie W, Demers AJ, and Gehrke J Asynchronous large-scale graph processing made easy CIDR 2013 13 3-6
[63]
Wang R, Ma X, Jiang C, Ye Y, and Zhang Y Heterogeneous information network-based music recommendation system in mobile networks Comput Commun 2020 150 429-437
[64]
Xin RS, Gonzalez JE, Franklin MJ, Stoica I (2013) Graphx: a resilient distributed graph system on spark. In: First International Workshop on Graph Data Management Experiences and Systems. ACM, p 2
[65]
Xin RS, Crankshaw D, Dave A, Gonzalez JE, Franklin MJ, Stoica I (2014) Graphx: unifying data-parallel and graph-parallel analytics. arXiv preprint arXiv:14022394
[66]
Yan B and Gregory S Detecting community structure in networks using edge prediction methods J Stat Mech Theory Exp 2012 09 P09,008
[67]
Yen TC, Larremore DB (2020) Community detection in bipartite networks with stochastic blockmodels. arXiv preprint arXiv:200111818
[68]
Zaharia M, Xin RS, Wendell P, Das T, Armbrust M, Dave A, Meng X, Rosen J, Venkataraman S, Franklin MJ, et al. Apache spark: a unified engine for big data processing Commun ACM 2016 59 11 56-65
[69]
Zhou S, Kannan R, Prasanna VK, Seetharaman G, and Wu Q Hitgraph: high-throughput graph processing framework on FPGA IEEE Trans Parallel Distrib Syst 2019 30 2249-2264

Cited By

View all
  • (2024)Dynamic bipartite network model based on structure and preference featuresKnowledge and Information Systems10.1007/s10115-024-02093-866:8(4527-4548)Online publication date: 1-Aug-2024
  • (2023)A novel approach to discover frequent weighted subgraphs using the average measureApplied Intelligence10.1007/s10489-023-04501-y53:16(19491-19504)Online publication date: 1-Aug-2023
  • (2022)Real Estate Marketing Adaptive Decision-Making Algorithm Based on Big Data AnalysisSecurity and Communication Networks10.1155/2022/34431822022Online publication date: 1-Jan-2022

Recommendations

Comments

Information & Contributors

Information

Published In

cover image The Journal of Supercomputing
The Journal of Supercomputing  Volume 77, Issue 6
Jun 2021
1189 pages

Publisher

Kluwer Academic Publishers

United States

Publication History

Published: 01 June 2021
Accepted: 22 October 2020

Author Tags

  1. Weighted bipartite graph
  2. Community detection
  3. Spark GraphFrames
  4. Flink Gelly
  5. User–user communities

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 01 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Dynamic bipartite network model based on structure and preference featuresKnowledge and Information Systems10.1007/s10115-024-02093-866:8(4527-4548)Online publication date: 1-Aug-2024
  • (2023)A novel approach to discover frequent weighted subgraphs using the average measureApplied Intelligence10.1007/s10489-023-04501-y53:16(19491-19504)Online publication date: 1-Aug-2023
  • (2022)Real Estate Marketing Adaptive Decision-Making Algorithm Based on Big Data AnalysisSecurity and Communication Networks10.1155/2022/34431822022Online publication date: 1-Jan-2022
  • (2022)Statistics and Analysis of Targeted Poverty Alleviation Information Integrated with Big Data Mining AlgorithmSecurity and Communication Networks10.1155/2022/14961702022Online publication date: 1-Jan-2022

View Options

View options

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media