research-article

KTMiner: : Distributed k-truss detection in big graphs

Authors:

Hassan HaghighiAuthors Info & Claims

Volume 83, Issue C

Pages 195 - 216

https://doi.org/10.1016/j.is.2019.03.014

Published: 01 July 2019 Publication History

Abstract

Discovering cohesive subgraphs is an important issue in the analysis of massive graphs. A new type of cohesive subgraphs, k-truss, has gained a lot of attentions in recent years. Although different methods have been proposed to extract k-truss subgraphs, they are unable to deal with big graphs, and they suffer from the lack of efficiency. To deal with the problem of finding k-truss subgraphs in big graphs, we propose a novel efficient distributed algorithm, called KTMiner, based on the MapReduce paradigm and the key–value structure. KTMiner is deployed on the Spark platform, a big data framework. Given a specific k value, KTMiner finds edges belonging to the k-truss subgraphs. It includes three consequent phases. First, a novel distributed k-core routine is employed to prune unnecessary vertices from the input graph. Then, the triangle information per edge in the form of a new data structure, called Triangle Set (TSet), is produced. Finally, using a distributed iterative procedure, the desired k-truss subgraphs are detected. KTMiner perfectly caches the reusable data in the distributed memory. In addition, it forms an efficient load balancing mechanism by designing appropriate data structures. This results in a fine-grained parallelism. The superiority of our solution over the state-of-the-art methods is shown by the experiments on real-world graphs.

Highlights

•

High performance and scalable k-core and k-truss detection based on Apache Spark APIs.

•

Memory caching in a distributed iterative algorithm with efficient join operation.

•

Outperforming other distributed competitors with respect to response time and failure.

References

[1]

M.-E.G. Rossi, F.D. Malliaros, M. Vazirgiannis, Spread it good spread it fast: Identification of influential nodes in social networks, in: Proc. 24th Int. Conf. World Wide Web, 2015, pp. 101–102, https://doi.org/10.1145/2740908.2742736.

[2]

Y. Dourisboure, F. Geraci, M. Pellegrini, Extraction and classification of dense communities in the web, in: Proc. 16th Int. Conf. World Wide Web - WWW ’07, 2007, pp. 461–470, https://doi.org/10.1145/1242572.1242635.

[3]

Palla G., Farkas I., Derényi I., Vicsek T., Uncovering the overlapping community structure of complex networks in nature and society, Nature 435 (2005) 814–818,.

[4]

Luce R.D., Perry A.D., A method of matrix analysis of group structure, Psychometrika. 14 (1949) 95–116,.

[5]

Alba R.D., A graph-theoretic definition of a sociometric clique, J. Math. Sociol. 3 (1973) 113–126,.

[6]

Mokken R.J., Cliques, clubs and clans, Qual. Quant. 13 (1979) 161–173,.

[7]

Seidman S.B., Foster B.L., A graph-theoretic generalization of the clique concept, J. Math. Sociol. 6 (1978) 139–154,.

[8]

Seidman S.B., Network structure and minimum degree, Soc. Networks. 5 (1983) 269–287,.

[9]

Cohen J.D., Trusses: Cohesive subgraphs for social network analysis, Natl. Secur. Agency Tech. Rep. (2008) 1–29.

[10]

X. Huang, H. Cheng, L. Qin, W. Tian, J.X. Yu, Querying k-truss community in large and dynamic graphs, in: Proc. 2014 ACM SIGMOD Int. Conf. Manag. Data - SIGMOD ’14. vol.2, 2014, pp. 1311–1322, http://dx.doi.org/doi:10.1145/2588555.2610495.

[11]

X. Huang, L.V.S. Lakshmanan, J.X. Yu, H. Cheng, Approximate closest community search in networks, in: Proc. VLDB Endow. 9.4, 2015, pp. 276–287, https://doi.org/10.14778/2856318.2856323.

[12]

Akbas E., Zhao P., Truss-based community search: a truss-equivalence based indexing approach, Proc. VLDB Endow. 10 (2017) 1298–1309.

[13]

Zhang Y., Parthasarathy S., Extracting analyzing and visualizing triangle k-core motifs within networks, Proc. - Int. Conf. Data Eng. (2012) 1049–1060,.

Digital Library

[14]

Zhao F., Tung A.K.H., Large scale cohesive subgraphs discovery for social network visual analysis, Proc. VLDB Endow. 6 (2014) 85–96,.

Digital Library

[15]

Malliaros F.D., Rossi M.E.G., Vazirgiannis M., Locating influential nodes in complex networks, Sci. Rep. 6 (2016) 1–10,.

[16]

M.E.G. Rossi, M. Vazirgiannis, M.E.G. Rossi, M. Vazirgiannis, E. Network, M.G. Rossi, Exploring network centralities in spreading processes, in: Int. Symp. Web Algorithms, 2016.

[17]

Alvarez-hamelin J.I., Dall’Astra L., Barrat A., Vespignani A., Large scale networks fingerprinting and visualization using the k-core decomposition, Adv. Neural Inf. Process. Syst. (2006) 41–50.

[18]

R.A. Rossi, D.F. Gleich, A.H. Gebremedhin, M.M.A. Patwary, Parallel maximum clique algorithms with applications to network analysis and storage, 37 (2013), https://doi.org/10.1137/14100018X.

[19]

A.E. Sariyuce, C. Seshadhri, A. Pinar, U.V. Catalyurek, Finding the hierarchy of dense subgraphs using nucleus decompositions, in: Proc. 24th Int. Conf. World Wide Web, International World Wide Web Conferences Steering Committee, 2014, 927–937, http://arxiv.org/abs/14113312.

[20]

Erdem A., Pinar A., Fast hierarchy construction for dense subgraphs, Proc. VLDB Endow. (2017) 97–108,.

Digital Library

[21]

P.C. Redmond, Ursula, Martin Harrigan, Martin harrigan identifying dense structures to guide the detection of misuse and fraud in network data⋆, in: Second Int. Work. Min. Ubiquitous Soc. Environ, 2011, pp. 55–62 http://www.everyaware.eu/resources/publications/Atzmueller-2011pdf#page=63.

[22]

Zheng Z., Ye F., Li R.-H., Ling G., Jin T., Finding weighted k-truss communities in large networks, Inf. Sci. (Ny). 417 (2017) 344–360.

[23]

Huang X., Lakshmanan L.V.S., Attribute-driven community search, Proc. VLDB Endow. 10 (2017) 949–960.

[24]

Chu L., Wang Z., Pei J., Zhang Y., Yang Y., Chen E., Finding theme communities from database networks: from mining to indexing and query answering, 2017, ArXiv Prepr. ArXiv: 1709.08083.

[25]

Zhang J., Han K., Han J., Li F., Li S., Attributed community search in dynamic networks, in: 2018 IEEE SmartWorld, Ubiquitous Intell. Comput. Adv. Trust. Comput. Scalable Comput. Commun. Cloud Big Data Comput. Internet People Smart City Innov, IEEE, 2018, pp. 893–900.

[26]

Zou Z., Zhu R., Truss decomposition of uncertain graphs, Knowl. Inf. Syst. 50 (2017) 197–230,.

Digital Library

[27]

Huang X., Lu W., Lakshmanan L.V.S., Truss decomposition of probabilistic graphs: Semantics and algorithms, in: Proc. 2016 Int. Conf. Manag. Data, ACM, 2016, pp. 77–90.

[28]

Saito K., Yamada T., Kazama K., Extracting communities from complex networks by the k-dense method, IEICE Trans. Fundam. Electron. Commun. Comput. Sci. E91–A (2008) 3304–3311,.

Digital Library

[29]

Ugander J., Backstrom L., Marlow C., Kleinberg J., Structural diversity in social contagion, Proc. Natl. Acad. Sci. 109 (2012) 5962–5966,.

[30]

X. Hu, Y. Tao, C.-W. Chung, Massive graph triangulation, in: Proc. 2013 Int. Conf. Manag. Data - SIGMOD ’13, 2013, p. 325, https://doi.org/10.1145/2463676.2463704.

[31]

Wang J., Cheng J., Truss decomposition in massive networks, in: VLDB Endow, vol. 5, 2012, pp. 812–823,.

Digital Library

[32]

Rossi R.A., Fast triangle core decomposition for mining large graphs, in: Lect. Notes Comput. Sci. (Including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), Springer, 2014, pp. 310–322,.

[33]

Kabir H., Madduri K., Shared-memory graph truss decomposition, in: Proc. - 24th IEEE Int. Conf. High Perform. Comput. HiPC 2017, IEEE, 2018, pp. 13–22,.

[34]

Cohen J., Graph twiddling in a mapreduce world, Comput. Sci. Eng. 11 (2009) 29–41,.

Digital Library

[35]

P. Chen, C. Chou, M. Chen, Distributed algorithms for k -truss decomposition, in: IEEE Int. Conf. Big Data, 2014, 471–480.

[36]

L. Quick, P. Wilkinson, D. Hardcastle, Using pregel-like large scale graph processing frameworks for social network analysis, in: Proc. 2012 IEEE/ACM Int. Conf. Adv. Soc. Networks Anal. Mining, ASONAM 2012, 2012, pp. 457–463, http://dx.doi.org/101109/ASONAM2012254.

[37]

Y. Shao, L. Chen, B. Cui, Efficient cohesive subgraphs detection in parallel, in: Proc. 2014 ACM SIGMOD Int. Conf. Manag. Data - SIGMOD ’14, 2014, pp. 613–624

[38]

D. Hutchison, J. Kepner, V. Gadepally, B. Howe, From NoSQL accumulo to NewSQL graphulo: Design and utility of graph algorithms inside a BigTable database, in: 2016 IEEE High Perform. Extrem. Comput. Conf. HPEC 2016, 2016, pp. 1–9, http://dx.doi.org/101109/HPEC20167761577.

[39]

Aridhi S., Mephu Nguifo E., Big graph mining: Frameworks and techniques, Big Data Res. 6 (2016) 1–10,.

[40]

White T., Hadoop: The Definitive Guide, O’Reilly Media, Inc., 2012.

[41]

M. Zaharia, M. Chowdhury, M.J. Franklin, S. Shenker, I. Stoica, Spark : Cluster computing with working sets, in: HotCloud’10 Proc. 2nd USENIX Conf. Hot Top. Cloud Comput., 2010, p. 95 http://dx.doi.org/101007/s00256-009-0861-0.

[42]

D. Murray, F. McSherry, R. Isaacs, M. Isard, P. Barham, M. Abadi, Naiad: A Timely Dataflow System, Sosp ’13 2013, pp. 439–455, http://dx.doi.org/101145/25173492522738.

[43]

Carbone P., Ewen S., Haridi S., Katsifodimos A., Markl V., Tzoumas K., Apache flink: Unified stream and batch processing in a single engine, Data Eng. 36 (2015) 28–38,.

[44]

C. Avery, Giraph: Large-scale graph processing infrastructure on hadoop, in: Hadoop Summit, 2011, pp. 5–9.

[45]

J.E. Gonzalez, R.S. Xin, A. Dave, D. Crankshaw, M.J. Franklin, J.E. Gonzalez, et al. GraphX : Graph processing in a distributed dataflow framework, in: 11th USENIX Symp. Oper. Syst. Des. Implement., 2014, pp. 599–613, https://doi.org/10.1145/2463676.2467799.

[46]

Low Y., Gonzalez J., Kyrola A., Bickson D., Guestrin C., Hellerstein J.M., Distributed GraphLab: A Framework for Machine Learning in the Cloud, 2012, pp. 716–727,.

Digital Library

[47]

J. Dean, S. Ghemawat, MapReduce: Simplified data processing on large clusters, in: Proc. 6th Symp. Oper. Syst. Des. Implement., 2004, pp. 137–149, https://doi.org/10.1145/1327452.1327492.

[48]

Assunção M.D., Calheiros R.N., Bianchi S., Netto M.A.S., Buyya R., Big data computing and clouds: Trends and future directions, J. Parallel Distrib. Comput. 79–80 (2015) 3–15,.

Digital Library

[49]

Malewicz G., Austern M.H., Bik A.J.C., Dehnert J.C., Horn I., Leiser N., Czajkowski G., Pregel: a system for large-scale graph processing, in: Proc. 2010 ACM SIGMOD Int. Conf. Manag. Data, ACM, 2010, pp. 135–146.

[50]

Alemi M., Haghighi H., Shahrivari S., CCFinder: using spark to find clustering coefficient in big graphs, J. Supercomput. 73 (2017) 4683–4710,.

Digital Library

[51]

Apache sparkTM - lightning-fast cluster computing, 2018, http://spark.apache.org/. (Accessed 1 March 2018).

[52]

Verma A., Butenko S., Network clustering via clique relaxations: A community based approach, Graph Partit. Graph Clust. (2013) 129–139,.

[53]

T.M. Low, D.G. Spampinato, A. Kutuluru, U. Sridhar, D.T. Popovici, F. Franchetti, S. McMillan, Linear algebraic formulation of edge-centric K-truss algorithms with adjacency matrices, in: 2018 IEEE High Perform. Extrem. Comput. Conf. HPEC 2018, 2018, pp. 1–7, https://doi.org/10.1109/HPEC.2018.8547718.

[54]

T.A. Davis, Graph algorithms via SuiteSparse: GraphBLAS: triangle counting and K-truss, in: 2018 IEEE High Perform. Extrem. Comput. Conf. HPEC 2018, 2018, pp. 1–6, doi:10.1109/HPEC20188547538.

[55]

Kepner J., Bader D., Buluç A., Gilbert J., Mattson T., Meyerhenke H., Graphs, matrices, and the GraphBLAS: Seven good reasons, Procedia Comput. Sci. (2015) 2453–2462,.

Digital Library

[56]

H. Kabir, K. Madduri, Parallel k-truss decomposition on multicore systems, in: 2017 IEEE High Perform. Extrem. Comput. Conf. HPEC 2017, 2017, pp. 1–7 https://doi.org/10.1109/HPEC20178091052.

[57]

Grossi R., Marino A., Versari L., Erdem A., Seshadhri C., Pinar A., Kabir H., Davis T.A., Low T.M., Spampinato D.G., Kutuluru A., Sridhar U., Popovici D.T., Franchetti F., Mcmillan S., Discovering k -trusses in large-scale networks, in: 2018 IEEE High Perform. Extrem. Comput. Conf., IEEE, 2018, pp. 1–6.

[58]

Smith S., Liu X., Ahmed N.K., Tom A.S., Petrini F., Karypis G., Truss decomposition on shared-memory parallel systems, in: High Perform. Extrem. Comput. Conf. (HPEC), 2017 IEEE, IEEE, 2017, pp. 1–6.

[59]

Sariyüce A.E., Seshadhri C., Pinar A., Local algorithms for hierarchical dense subgraph discovery, Proc. VLDB Endow. 12 (2018) 43–56.

[60]

Hutchison D., Kepner J., Gadepally V., Fuchs A., Graphulo implementation of server-side sparse matrix multiply in the accumulo database, in: 2015 IEEE High Perform. Extrem. Comput. Conf. HPEC 2015, 2015, pp. 1–7,.

[61]

F. Chang, J. Dean, S. Ghemawat, W.C. Hsieh, D.A. Wallach, M. Burrows, et al. Bigtable: A distributed storage system for structured data, in: 7th Symp. Oper. Syst. Des. Implement. (OSDI ’06), Novemb. 6-8, Seattle, WA, USA, 2006, pp. 205–218, https://doi.org/10.1145/1365815.1365816.

[62]

Schank T., Wagner D., Finding, counting and listing all triangles in large graphs, an experimental study, in: Int. Work. Exp. Effic. Algorithms, Springer, 2005, pp. 606–609.

[63]

SNAP: Stanford network analysis project, 2018, http://snap.stanford.edu. (Accessed 1 March 2018).

[64]

Rossi R.A., Ahmed N.K., An interactive data repository with visual analytics, ACM SIGKDD Explor. Newsl. 17 (2016) 37–41,.

Digital Library

Cited By

Meng LShao YYuan LLai LCheng PLi XYu WZhang WLin XZhou J(2024)A Survey of Distributed Graph Algorithms on Massive GraphsACM Computing Surveys10.1145/369496657:2(1-39)Online publication date: 10-Oct-2024
https://dl.acm.org/doi/10.1145/3694966
Han ZLi FWang G(2022)Financial Data Mining Model Based on K-Truss Community Query Model and Artificial IntelligenceComputational Intelligence and Neuroscience10.1155/2022/94676232022Online publication date: 1-Jan-2022
https://dl.acm.org/doi/10.1155/2022/9467623

Index Terms

KTMiner: Distributed k-truss detection in big graphs

Index terms have been assigned to the content through auto-classification.

Recommendations

Patterns and anomalies in k-cores of real-world graphs with applications

How do the k-core structures of real-world graphs look like__ __ What are the common patterns and the anomalies__ __ How can we exploit them for applications__ __ A k-core is the maximal subgraph in which all vertices have degree at least k. This ...
Finding the Hierarchy of Dense Subgraphs using Nucleus Decompositions
WWW '15: Proceedings of the 24th International Conference on World Wide Web

Finding dense substructures in a graph is a fundamental graph mining operation, with applications in bioinformatics, social networks, and visualization to name a few. Yet most standard formulations of this problem (like clique, quasiclique, k-densest ...
Nucleus Decompositions for Identifying Hierarchy of Dense Subgraphs

Finding dense substructures in a graph is a fundamental graph mining operation, with applications in bioinformatics, social networks, and visualization to name a few. Yet most standard formulations of this problem (like clique, quasi-clique, densest at-...

Comments

Information & Contributors

Information

Published In

cover image Information Systems

Information Systems Volume 83, Issue C

Jul 2019

230 pages

ISSN:0306-4379

Issue’s Table of Contents

Elsevier Ltd.

Publisher

Elsevier Science Ltd.

United Kingdom

Publication History

Published: 01 July 2019

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 10 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Meng LShao YYuan LLai LCheng PLi XYu WZhang WLin XZhou J(2024)A Survey of Distributed Graph Algorithms on Massive GraphsACM Computing Surveys10.1145/369496657:2(1-39)Online publication date: 10-Oct-2024
https://dl.acm.org/doi/10.1145/3694966
Han ZLi FWang G(2022)Financial Data Mining Model Based on K-Truss Community Query Model and Artificial IntelligenceComputational Intelligence and Neuroscience10.1155/2022/94676232022Online publication date: 1-Jan-2022
https://dl.acm.org/doi/10.1155/2022/9467623

View Options

View options

Figures

Tables

Media

View Issue’s Table of Contents