ScaleSCAN: Scalable Density-Based Graph Clustering

Shiokawa, Hiroaki; Takahashi, Tomokatsu; Kitagawa, Hiroyuki

doi:10.1007/978-3-319-98809-2_2

Hiroaki Shiokawa^18,19,
Tomokatsu Takahashi²⁰ &
Hiroyuki Kitagawa^18,19

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11029))

Included in the following conference series:

International Conference on Database and Expert Systems Applications

1351 Accesses
14 Citations

Abstract

How can we efficiently find clusters (a.k.a. communities) included in a graph with millions or even billions of edges? Density-based graph clustering SCAN is one of the fundamental graph clustering algorithms that can find densely connected nodes as clusters. Although SCAN is used in many applications due to its effectiveness, it is computationally expensive to apply SCAN to large-scale graphs since SCAN needs to compute all nodes and edges. In this paper, we propose a novel density-based graph clustering algorithm named ScaleSCAN for tackling this problem on a multicore CPU. Towards the problem, ScaleSCAN integrates efficient node pruning methods and parallel computation schemes on the multicore CPU for avoiding the exhaustive nodes and edges computations. As a result, ScaleSCAN detects exactly same clusters as those of SCAN with much shorter computation time. Extensive experiments on both real-world and synthetic graphs demonstrate that the performance superiority of ScaleSCAN over the state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

A Dynamic Programming Framework for Large-Scale Online Clustering on Graphs

Article 09 August 2020

CCFinder: using Spark to find clustering coefficient in big graphs

Article 12 April 2017

DSCAN: Distributed Structural Graph Clustering for Billion-Edge Graphs

Notes

1.
We opened our source codes of ScaleSCAN on our website.

References

Arai, J., Shiokawa, H., Yamamuro, T., Onizuka, M., Iwamura, S.: Rabbit order: just-in-time parallel reordering for fast graph analysis. In: Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium, pp. 22–31 (2016)
Google Scholar
Boldi, P., Vigna, S.: The webgraph framework I: compression techniques. In: Proceedings of the 13th International Conference on World Wide Web, pp. 595–601 (2004)
Google Scholar
Chang, L., Li, W., Qin, L., Zhang, W., Yang, S.: pSCAN: fast and exact structural graph clustering. IEEE Trans. Knowl. Data Eng. 29(2), 387–401 (2017)
Article Google Scholar
Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms. The MIT Press, Cambridge (2009)
MATH Google Scholar
Ding, Y., et al.: atBioNet–an integrated network analysis tool for genomics and biomarker discovery. BMC Genom. 13(1), 1–12 (2012)
Article Google Scholar
Fortunato, S., Lancichinetti, A.: Community detection algorithms: a comparative analysis. In: Proceedings of the 4th International ICST Conference on Performance Evaluation Methodologies and Tools, pp. 27:1–27:2 (2009)
Google Scholar
Fujiwara, Y., Nakatsuji, M., Shiokawa, H., Ida, Y., Toyoda, M.: Adaptive message update for fast affinity propagation. In: Proceedings of the 21st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 309–318 (2015)
Google Scholar
Herlihy, M.: Wait-free synchronization. ACM Trans. Program. Lang. Syst. 13(1), 124–149 (1991)
Article Google Scholar
Leskovec, J., Krevl, A.: SNAP Datasets: Stanford Large Network Dataset Collection, June 2014. http://snap.stanford.edu/data
Mai, S.T., Dieu, M.S., Assent, I., Jacobsen, J., Kristensen, J., Birk, M.: Scalable and interactive graph clustering algorithm on multicore CPUs. In: Proceedings of the 33rd IEEE International Conference on Data Engineering, pp. 349–360 (2017)
Google Scholar
Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge University Press, New York (2008)
Book Google Scholar
Naik, A., Maeda, H., Kanojia, V., Fujita, S.: Scalable Twitter user clustering approach boosted by personalized PageRank. In: Kim, J., Shim, K., Cao, L., Lee, J.-G., Lin, X., Moon, Y.-S. (eds.) PAKDD 2017. LNCS, vol. 10234, pp. 472–485. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-57454-7_37
Chapter Google Scholar
Sato, T., Shiokawa, H., Yamaguchi, Y., Kitagawa, H.: FORank: fast ObjectRank for large heterogeneous graphs. In: Companion Proceedings of the the Web Conference, pp. 103–104 (2018)
Google Scholar
Shi, J., Malik, J.: Normalized cuts and image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 888–905 (2000)
Article Google Scholar
Shiokawa, H., Fujiwara, Y., Onizuka, M.: Fast algorithm for modularity-based graph clustering. In: Proceedings of the 27th AAAI Conference on Artificial Intelligence, pp. 1170–1176 (2013)
Google Scholar
Shiokawa, H., Fujiwara, Y., Onizuka, M.: SCAN++: efficient algorithm for finding clusters, hubs and outliers on large-scale graphs. Proc. Very Large Data Bases 8(11), 1178–1189 (2015)
Google Scholar
Solihin, Y.: Fundamentals of Parallel Multicore Architecture, 1st edn. Chapman & Hall/CRC, Boca Raton (2015)
Google Scholar
Takahashi, T., Shiokawa, H., Kitagawa, H.: SCAN-XP: parallel structural graph clustering algorithm on Intel Xeon Phi coprocessors. In: Proceedings of the 2nd International Workshop on Network Data Analytics, pp. 6:1–6:7 (2017)
Google Scholar
Wang, L., Xiao, Y., Shao, B., Wang, H.: How to partition a billion-node graph. In: Proceedings of the IEEE 30th International Conference on Data Engineering, pp. 568–579 (2014)
Google Scholar
Xu, X., Yuruk, N., Feng, Z., Schweiger, T.A.J.: SCAN: a structural clustering algorithm for networks. In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 824–833 (2007)
Google Scholar

Download references

Acknowledgement

This work was supported by JSPS KAKENHI Early-Career Scientists Grant Number JP18K18057, JST ACT-I, and Interdisciplinary Computational Science Program in CCS, University of Tsukuba.

Author information

Authors and Affiliations

Center for Computational Sciences, University of Tsukuba, Tsukuba, Japan
Hiroaki Shiokawa & Hiroyuki Kitagawa
Center for Artificial Intelligence Research, University of Tsukuba, Tsukuba, Japan
Hiroaki Shiokawa & Hiroyuki Kitagawa
Graduate School of Systems and Information Engineering, University of Tsukuba, Tsukuba, Japan
Tomokatsu Takahashi

Authors

Hiroaki Shiokawa
View author publications
You can also search for this author in PubMed Google Scholar
Tomokatsu Takahashi
View author publications
You can also search for this author in PubMed Google Scholar
Hiroyuki Kitagawa
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hiroaki Shiokawa .

Editor information

Editors and Affiliations

Clausthal University of Technology, Clausthal-Zellerfeld, Germany
Sven Hartmann
Victoria University of Wellington, Wellington, New Zealand
Hui Ma
Paul Sabatier University, Toulouse, France
Abdelkader Hameurlain
University of Regensburg, Regensburg, Germany
Günther Pernul
Johannes Kepler University, Linz, Austria
Roland R. Wagner

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Shiokawa, H., Takahashi, T., Kitagawa, H. (2018). ScaleSCAN: Scalable Density-Based Graph Clustering. In: Hartmann, S., Ma, H., Hameurlain, A., Pernul, G., Wagner, R. (eds) Database and Expert Systems Applications. DEXA 2018. Lecture Notes in Computer Science(), vol 11029. Springer, Cham. https://doi.org/10.1007/978-3-319-98809-2_2

Download citation

DOI: https://doi.org/10.1007/978-3-319-98809-2_2
Published: 09 August 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-98808-5
Online ISBN: 978-3-319-98809-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

ScaleSCAN: Scalable Density-Based Graph Clustering

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

A Dynamic Programming Framework for Large-Scale Online Clustering on Graphs

CCFinder: using Spark to find clustering coefficient in big graphs

DSCAN: Distributed Structural Graph Clustering for Billion-Edge Graphs

Notes

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

ScaleSCAN: Scalable Density-Based Graph Clustering

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

A Dynamic Programming Framework for Large-Scale Online Clustering on Graphs

CCFinder: using Spark to find clustering coefficient in big graphs

DSCAN: Distributed Structural Graph Clustering for Billion-Edge Graphs

Notes

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation