Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

KTMiner: : Distributed k-truss detection in big graphs

Published: 01 July 2019 Publication History

Abstract

Discovering cohesive subgraphs is an important issue in the analysis of massive graphs. A new type of cohesive subgraphs, k-truss, has gained a lot of attentions in recent years. Although different methods have been proposed to extract k-truss subgraphs, they are unable to deal with big graphs, and they suffer from the lack of efficiency. To deal with the problem of finding k-truss subgraphs in big graphs, we propose a novel efficient distributed algorithm, called KTMiner, based on the MapReduce paradigm and the key–value structure. KTMiner is deployed on the Spark platform, a big data framework. Given a specific k value, KTMiner finds edges belonging to the k-truss subgraphs. It includes three consequent phases. First, a novel distributed k-core routine is employed to prune unnecessary vertices from the input graph. Then, the triangle information per edge in the form of a new data structure, called Triangle Set (TSet), is produced. Finally, using a distributed iterative procedure, the desired k-truss subgraphs are detected. KTMiner perfectly caches the reusable data in the distributed memory. In addition, it forms an efficient load balancing mechanism by designing appropriate data structures. This results in a fine-grained parallelism. The superiority of our solution over the state-of-the-art methods is shown by the experiments on real-world graphs.

Highlights

High performance and scalable k-core and k-truss detection based on Apache Spark APIs.
Memory caching in a distributed iterative algorithm with efficient join operation.
Outperforming other distributed competitors with respect to response time and failure.

References

[1]
M.-E.G. Rossi, F.D. Malliaros, M. Vazirgiannis, Spread it good spread it fast: Identification of influential nodes in social networks, in: Proc. 24th Int. Conf. World Wide Web, 2015, pp. 101–102, https://doi.org/10.1145/2740908.2742736.
[2]
Y. Dourisboure, F. Geraci, M. Pellegrini, Extraction and classification of dense communities in the web, in: Proc. 16th Int. Conf. World Wide Web - WWW ’07, 2007, pp. 461–470, https://doi.org/10.1145/1242572.1242635.
[3]
Palla G., Farkas I., Derényi I., Vicsek T., Uncovering the overlapping community structure of complex networks in nature and society, Nature 435 (2005) 814–818,.
[4]
Luce R.D., Perry A.D., A method of matrix analysis of group structure, Psychometrika. 14 (1949) 95–116,.
[5]
Alba R.D., A graph-theoretic definition of a sociometric clique, J. Math. Sociol. 3 (1973) 113–126,.
[6]
Mokken R.J., Cliques, clubs and clans, Qual. Quant. 13 (1979) 161–173,.
[7]
Seidman S.B., Foster B.L., A graph-theoretic generalization of the clique concept, J. Math. Sociol. 6 (1978) 139–154,.
[8]
Seidman S.B., Network structure and minimum degree, Soc. Networks. 5 (1983) 269–287,.
[9]
Cohen J.D., Trusses: Cohesive subgraphs for social network analysis, Natl. Secur. Agency Tech. Rep. (2008) 1–29.
[10]
X. Huang, H. Cheng, L. Qin, W. Tian, J.X. Yu, Querying k-truss community in large and dynamic graphs, in: Proc. 2014 ACM SIGMOD Int. Conf. Manag. Data - SIGMOD ’14. vol.2, 2014, pp. 1311–1322, http://dx.doi.org/doi:10.1145/2588555.2610495.
[11]
X. Huang, L.V.S. Lakshmanan, J.X. Yu, H. Cheng, Approximate closest community search in networks, in: Proc. VLDB Endow. 9.4, 2015, pp. 276–287, https://doi.org/10.14778/2856318.2856323.
[12]
Akbas E., Zhao P., Truss-based community search: a truss-equivalence based indexing approach, Proc. VLDB Endow. 10 (2017) 1298–1309.
[13]
Zhang Y., Parthasarathy S., Extracting analyzing and visualizing triangle k-core motifs within networks, Proc. - Int. Conf. Data Eng. (2012) 1049–1060,.
[14]
Zhao F., Tung A.K.H., Large scale cohesive subgraphs discovery for social network visual analysis, Proc. VLDB Endow. 6 (2014) 85–96,.
[15]
Malliaros F.D., Rossi M.E.G., Vazirgiannis M., Locating influential nodes in complex networks, Sci. Rep. 6 (2016) 1–10,.
[16]
M.E.G. Rossi, M. Vazirgiannis, M.E.G. Rossi, M. Vazirgiannis, E. Network, M.G. Rossi, Exploring network centralities in spreading processes, in: Int. Symp. Web Algorithms, 2016.
[17]
Alvarez-hamelin J.I., Dall’Astra L., Barrat A., Vespignani A., Large scale networks fingerprinting and visualization using the k-core decomposition, Adv. Neural Inf. Process. Syst. (2006) 41–50.
[18]
R.A. Rossi, D.F. Gleich, A.H. Gebremedhin, M.M.A. Patwary, Parallel maximum clique algorithms with applications to network analysis and storage, 37 (2013), https://doi.org/10.1137/14100018X.
[19]
A.E. Sariyuce, C. Seshadhri, A. Pinar, U.V. Catalyurek, Finding the hierarchy of dense subgraphs using nucleus decompositions, in: Proc. 24th Int. Conf. World Wide Web, International World Wide Web Conferences Steering Committee, 2014, 927–937, http://arxiv.org/abs/14113312.
[20]
Erdem A., Pinar A., Fast hierarchy construction for dense subgraphs, Proc. VLDB Endow. (2017) 97–108,.
[21]
P.C. Redmond, Ursula, Martin Harrigan, Martin harrigan identifying dense structures to guide the detection of misuse and fraud in network data⋆, in: Second Int. Work. Min. Ubiquitous Soc. Environ, 2011, pp. 55–62 http://www.everyaware.eu/resources/publications/Atzmueller-2011pdf#page=63.
[22]
Zheng Z., Ye F., Li R.-H., Ling G., Jin T., Finding weighted k-truss communities in large networks, Inf. Sci. (Ny). 417 (2017) 344–360.
[23]
Huang X., Lakshmanan L.V.S., Attribute-driven community search, Proc. VLDB Endow. 10 (2017) 949–960.
[24]
Chu L., Wang Z., Pei J., Zhang Y., Yang Y., Chen E., Finding theme communities from database networks: from mining to indexing and query answering, 2017, ArXiv Prepr. ArXiv: 1709.08083.
[25]
Zhang J., Han K., Han J., Li F., Li S., Attributed community search in dynamic networks, in: 2018 IEEE SmartWorld, Ubiquitous Intell. Comput. Adv. Trust. Comput. Scalable Comput. Commun. Cloud Big Data Comput. Internet People Smart City Innov, IEEE, 2018, pp. 893–900.
[26]
Zou Z., Zhu R., Truss decomposition of uncertain graphs, Knowl. Inf. Syst. 50 (2017) 197–230,.
[27]
Huang X., Lu W., Lakshmanan L.V.S., Truss decomposition of probabilistic graphs: Semantics and algorithms, in: Proc. 2016 Int. Conf. Manag. Data, ACM, 2016, pp. 77–90.
[28]
Saito K., Yamada T., Kazama K., Extracting communities from complex networks by the k-dense method, IEICE Trans. Fundam. Electron. Commun. Comput. Sci. E91–A (2008) 3304–3311,.
[29]
Ugander J., Backstrom L., Marlow C., Kleinberg J., Structural diversity in social contagion, Proc. Natl. Acad. Sci. 109 (2012) 5962–5966,.
[30]
X. Hu, Y. Tao, C.-W. Chung, Massive graph triangulation, in: Proc. 2013 Int. Conf. Manag. Data - SIGMOD ’13, 2013, p. 325, https://doi.org/10.1145/2463676.2463704.
[31]
Wang J., Cheng J., Truss decomposition in massive networks, in: VLDB Endow, vol. 5, 2012, pp. 812–823,.
[32]
Rossi R.A., Fast triangle core decomposition for mining large graphs, in: Lect. Notes Comput. Sci. (Including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), Springer, 2014, pp. 310–322,.
[33]
Kabir H., Madduri K., Shared-memory graph truss decomposition, in: Proc. - 24th IEEE Int. Conf. High Perform. Comput. HiPC 2017, IEEE, 2018, pp. 13–22,.
[34]
Cohen J., Graph twiddling in a mapreduce world, Comput. Sci. Eng. 11 (2009) 29–41,.
[35]
P. Chen, C. Chou, M. Chen, Distributed algorithms for k -truss decomposition, in: IEEE Int. Conf. Big Data, 2014, 471–480.
[36]
L. Quick, P. Wilkinson, D. Hardcastle, Using pregel-like large scale graph processing frameworks for social network analysis, in: Proc. 2012 IEEE/ACM Int. Conf. Adv. Soc. Networks Anal. Mining, ASONAM 2012, 2012, pp. 457–463, http://dx.doi.org/101109/ASONAM2012254.
[37]
Y. Shao, L. Chen, B. Cui, Efficient cohesive subgraphs detection in parallel, in: Proc. 2014 ACM SIGMOD Int. Conf. Manag. Data - SIGMOD ’14, 2014, pp. 613–624
[38]
D. Hutchison, J. Kepner, V. Gadepally, B. Howe, From NoSQL accumulo to NewSQL graphulo: Design and utility of graph algorithms inside a BigTable database, in: 2016 IEEE High Perform. Extrem. Comput. Conf. HPEC 2016, 2016, pp. 1–9, http://dx.doi.org/101109/HPEC20167761577.
[39]
Aridhi S., Mephu Nguifo E., Big graph mining: Frameworks and techniques, Big Data Res. 6 (2016) 1–10,.
[40]
White T., Hadoop: The Definitive Guide, O’Reilly Media, Inc., 2012.
[41]
M. Zaharia, M. Chowdhury, M.J. Franklin, S. Shenker, I. Stoica, Spark : Cluster computing with working sets, in: HotCloud’10 Proc. 2nd USENIX Conf. Hot Top. Cloud Comput., 2010, p. 95 http://dx.doi.org/101007/s00256-009-0861-0.
[42]
D. Murray, F. McSherry, R. Isaacs, M. Isard, P. Barham, M. Abadi, Naiad: A Timely Dataflow System, Sosp ’13 2013, pp. 439–455, http://dx.doi.org/101145/25173492522738.
[43]
Carbone P., Ewen S., Haridi S., Katsifodimos A., Markl V., Tzoumas K., Apache flink: Unified stream and batch processing in a single engine, Data Eng. 36 (2015) 28–38,.
[44]
C. Avery, Giraph: Large-scale graph processing infrastructure on hadoop, in: Hadoop Summit, 2011, pp. 5–9.
[45]
J.E. Gonzalez, R.S. Xin, A. Dave, D. Crankshaw, M.J. Franklin, J.E. Gonzalez, et al. GraphX : Graph processing in a distributed dataflow framework, in: 11th USENIX Symp. Oper. Syst. Des. Implement., 2014, pp. 599–613, https://doi.org/10.1145/2463676.2467799.
[46]
Low Y., Gonzalez J., Kyrola A., Bickson D., Guestrin C., Hellerstein J.M., Distributed GraphLab: A Framework for Machine Learning in the Cloud, 2012, pp. 716–727,.
[47]
J. Dean, S. Ghemawat, MapReduce: Simplified data processing on large clusters, in: Proc. 6th Symp. Oper. Syst. Des. Implement., 2004, pp. 137–149, https://doi.org/10.1145/1327452.1327492.
[48]
Assunção M.D., Calheiros R.N., Bianchi S., Netto M.A.S., Buyya R., Big data computing and clouds: Trends and future directions, J. Parallel Distrib. Comput. 79–80 (2015) 3–15,.
[49]
Malewicz G., Austern M.H., Bik A.J.C., Dehnert J.C., Horn I., Leiser N., Czajkowski G., Pregel: a system for large-scale graph processing, in: Proc. 2010 ACM SIGMOD Int. Conf. Manag. Data, ACM, 2010, pp. 135–146.
[50]
Alemi M., Haghighi H., Shahrivari S., CCFinder: using spark to find clustering coefficient in big graphs, J. Supercomput. 73 (2017) 4683–4710,.
[51]
Apache sparkTM - lightning-fast cluster computing, 2018, http://spark.apache.org/. (Accessed 1 March 2018).
[52]
Verma A., Butenko S., Network clustering via clique relaxations: A community based approach, Graph Partit. Graph Clust. (2013) 129–139,.
[53]
T.M. Low, D.G. Spampinato, A. Kutuluru, U. Sridhar, D.T. Popovici, F. Franchetti, S. McMillan, Linear algebraic formulation of edge-centric K-truss algorithms with adjacency matrices, in: 2018 IEEE High Perform. Extrem. Comput. Conf. HPEC 2018, 2018, pp. 1–7, https://doi.org/10.1109/HPEC.2018.8547718.
[54]
T.A. Davis, Graph algorithms via SuiteSparse: GraphBLAS: triangle counting and K-truss, in: 2018 IEEE High Perform. Extrem. Comput. Conf. HPEC 2018, 2018, pp. 1–6, doi:10.1109/HPEC20188547538.
[55]
Kepner J., Bader D., Buluç A., Gilbert J., Mattson T., Meyerhenke H., Graphs, matrices, and the GraphBLAS: Seven good reasons, Procedia Comput. Sci. (2015) 2453–2462,.
[56]
H. Kabir, K. Madduri, Parallel k-truss decomposition on multicore systems, in: 2017 IEEE High Perform. Extrem. Comput. Conf. HPEC 2017, 2017, pp. 1–7 https://doi.org/10.1109/HPEC20178091052.
[57]
Grossi R., Marino A., Versari L., Erdem A., Seshadhri C., Pinar A., Kabir H., Davis T.A., Low T.M., Spampinato D.G., Kutuluru A., Sridhar U., Popovici D.T., Franchetti F., Mcmillan S., Discovering k -trusses in large-scale networks, in: 2018 IEEE High Perform. Extrem. Comput. Conf., IEEE, 2018, pp. 1–6.
[58]
Smith S., Liu X., Ahmed N.K., Tom A.S., Petrini F., Karypis G., Truss decomposition on shared-memory parallel systems, in: High Perform. Extrem. Comput. Conf. (HPEC), 2017 IEEE, IEEE, 2017, pp. 1–6.
[59]
Sariyüce A.E., Seshadhri C., Pinar A., Local algorithms for hierarchical dense subgraph discovery, Proc. VLDB Endow. 12 (2018) 43–56.
[60]
Hutchison D., Kepner J., Gadepally V., Fuchs A., Graphulo implementation of server-side sparse matrix multiply in the accumulo database, in: 2015 IEEE High Perform. Extrem. Comput. Conf. HPEC 2015, 2015, pp. 1–7,.
[61]
F. Chang, J. Dean, S. Ghemawat, W.C. Hsieh, D.A. Wallach, M. Burrows, et al. Bigtable: A distributed storage system for structured data, in: 7th Symp. Oper. Syst. Des. Implement. (OSDI ’06), Novemb. 6-8, Seattle, WA, USA, 2006, pp. 205–218, https://doi.org/10.1145/1365815.1365816.
[62]
Schank T., Wagner D., Finding, counting and listing all triangles in large graphs, an experimental study, in: Int. Work. Exp. Effic. Algorithms, Springer, 2005, pp. 606–609.
[63]
SNAP: Stanford network analysis project, 2018, http://snap.stanford.edu. (Accessed 1 March 2018).
[64]
Rossi R.A., Ahmed N.K., An interactive data repository with visual analytics, ACM SIGKDD Explor. Newsl. 17 (2016) 37–41,.

Cited By

View all
  • (2024)A Survey of Distributed Graph Algorithms on Massive GraphsACM Computing Surveys10.1145/369496657:2(1-39)Online publication date: 10-Oct-2024
  • (2022)Financial Data Mining Model Based on K-Truss Community Query Model and Artificial IntelligenceComputational Intelligence and Neuroscience10.1155/2022/94676232022Online publication date: 1-Jan-2022

Index Terms

  1. KTMiner: Distributed k-truss detection in big graphs
          Index terms have been assigned to the content through auto-classification.

          Recommendations

          Comments

          Information & Contributors

          Information

          Published In

          cover image Information Systems
          Information Systems  Volume 83, Issue C
          Jul 2019
          230 pages

          Publisher

          Elsevier Science Ltd.

          United Kingdom

          Publication History

          Published: 01 July 2019

          Author Tags

          1. Cohesive subgraphs
          2. K-truss
          3. Big graphs
          4. Triangle counting
          5. K-core
          6. Distributed algorithm

          Qualifiers

          • Research-article

          Contributors

          Other Metrics

          Bibliometrics & Citations

          Bibliometrics

          Article Metrics

          • Downloads (Last 12 months)0
          • Downloads (Last 6 weeks)0
          Reflects downloads up to 10 Feb 2025

          Other Metrics

          Citations

          Cited By

          View all
          • (2024)A Survey of Distributed Graph Algorithms on Massive GraphsACM Computing Surveys10.1145/369496657:2(1-39)Online publication date: 10-Oct-2024
          • (2022)Financial Data Mining Model Based on K-Truss Community Query Model and Artificial IntelligenceComputational Intelligence and Neuroscience10.1155/2022/94676232022Online publication date: 1-Jan-2022

          View Options

          View options

          Figures

          Tables

          Media

          Share

          Share

          Share this Publication link

          Share on social media