Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3524059.3532365acmconferencesArticle/Chapter ViewAbstractPublication PagesicsConference Proceedingsconference-collections
research-article

MASTIFF: structure-aware minimum spanning tree/forest

Published: 28 June 2022 Publication History

Abstract

The Minimum Spanning Forest (MSF) problem finds usage in many different applications. While theoretical analysis shows that linear-time solutions exist, in practice, parallel MSF algorithms remain computationally demanding due to the continuously increasing size of data sets.
In this paper, we study the MSF algorithm from the perspective of graph structure and investigate the implications of the power-law degree distribution of real-world graphs on this algorithm.
We introduce the MASTIFF algorithm as a structure-aware MSF algorithm that optimizes work efficiency by (1) dynamically tracking the largest forest component of each graph component and exempting them from processing, and (2) by avoiding topology-related operations such as relabeling and merging neighbour lists.
The evaluations on 2 different processor architectures with up to 128 cores and on graphs of up to 124 billion edges, shows that Mastiff is 3.4--5.9X faster than previous works.

References

[1]
Dan Alistarh, Alexander Fedorov, and Nikita Koval. 2019. In Search of the Fastest Concurrent Union-Find Algorithm. CoRR abs/1911.06347 (2019). arXiv:1911.06347 http://arxiv.org/abs/1911.06347
[2]
Richard J. Anderson and Heather Woll. 1991. Wait-Free Parallel Algorithms for the Union-Find Problem. In Proceedings of the Twenty-Third Annual ACM Symposium on Theory of Computing (New Orleans, Louisiana, USA) (STOC '91). ACM, New York, NY, USA, 370--380.
[3]
David A. Bader and Paul Burkhardt. 2019. A Linear Time Algorithm for Finding Minimum Spanning Tree Replacement Edges. CoRR abs/1908.03473 (2019). arXiv:1908.03473 http://arxiv.org/abs/1908.03473
[4]
David A. Bader and Guojing Cong. 2004. Fast shared-memory algorithms for computing the minimum spanning forest of sparse graphs. In 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings. IEEE, USA, 39--.
[5]
David A. Bader, Guojing Cong, and John Feo. 2005. On the architectural requirements for efficient execution of graph algorithms. In 2005 International Conference on Parallel Processing (ICPP'05). 547--556.
[6]
Vincent D Blondel, Jean-Loup Guillaume, Renaud Lambiotte, and Etienne Lefebvre. 2008. Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment 2008, 10 (oct 2008), P10008.
[7]
Paolo Boldi, Bruno Codenotti, Massimo Santini, and Sebastiano Vigna. 2004. UbiCrawler: A Scalable Fully Distributed Web Crawler. Softw. Pract. Exper. 34, 8 (July 2004), 711--726.
[8]
Paolo Boldi, Andrea Marino, Massimo Santini, and Sebastiano Vigna. 2018. BUbiNG: Massive Crawling for the Masses. ACM Trans. Web 12, 2, Article 12 (June 2018), 26 pages.
[9]
Paolo Boldi, Marco Rosa, Massimo Santini, and Sebastiano Vigna. 2011. Layered Label Propagation: A Multiresolution Coordinate-Free Ordering for Compressing Social Networks. In Proceedings of the 20th International Conference on World Wide Web (Hyderabad, India) (WWW '11). Association for Computing Machinery, New York, NY, USA, 587--596.
[10]
Paolo Boldi and Sebastiano Vigna. 2004. The Webgraph Framework I: Compression Techniques. In Proceedings of the 13th International Conference on World Wide Web (New York, NY, USA) (WWW '04). Association for Computing Machinery, New York, NY, USA, 595--602.
[11]
Otakar Borůvka. 1926. O jistém problému minimálním. http://dml.cz/dmlcz/500114
[12]
Shaowei Cai and Jinkun Lin. 2016. Fast Solving Maximum Weight Clique Problem in Massive Graphs. In Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence (New York, New York, USA) (IJCAI'16). AAAI Press, 568--574.
[13]
Meeyoung Cha, Hamed Haddadi, Fabricio Benevenuto, and Krishna P. Gummadi. 2010. Measuring User Influence in Twitter: The Million Follower Fallacy, In ICWSM. AAAI Conference on Weblogs and Social Media 14.
[14]
Rong Chen, Jiaxin Shi, Yanzhe Chen, and Haibo Chen. 2015. Power-Lyra: Differentiated Graph Computation and Partitioning on Skewed Graphs. In Proceedings of the Tenth European Conference on Computer Systems (Bordeaux, France) (EuroSys '15). Association for Computing Machinery, New York, NY, USA, Article 1, 15 pages.
[15]
Sun Chung and A. Condon. 1996. Parallel implementation of Bouvka's minimum spanning tree algorithm. In Proceedings of International Conference on Parallel Processing. IEEE, USA, 302--308.
[16]
Charles L Clarke, Nick Craswell, and Ian Soboroff. 2009. Overview of the trec 2009 web track. Technical Report. DTIC Document.
[17]
Guojing Cong and Simone Sbaraglia. 2006. A Study on the Locality Behavior of Minimum Spanning Tree Algorithms. In Proceedings of the 13th International Conference on High Performance Computing (Bangalore, India) (HiPC'06). Springer-Verlag, Berlin, Heidelberg, 583--594.
[18]
Guojing Cong and Ilie Tanase. 2016. Composable Locality Optimizations for Accelerating Parallel Forest Computations. In 2016 IEEE conferences on HPCC/SmartCity/DSS. IEEE, USA, 190--197.
[19]
Leonardo Dagum and Ramesh Menon. 1998. OpenMP: an industry standard API for shared-memory programming. IEEE Computational Science and Engineering 5, 1 (1998), 46--55.
[20]
Laxman Dhulipala, Guy E. Blelloch, and Julian Shun. 2021. Theoretically Efficient Parallel Graph Algorithms Can Be Fast and Scalable. ACM Trans. Parallel Comput. 8, 1, Article 4 (April 2021), 70 pages.
[21]
Bin Dong, Surendra Byna, and Kesheng Wu. 2016. SDS-Sort: Scalable Dynamic Skew-Aware Parallel Sorting. In Proceedings of the 25th ACM International Symposium on High-Performance Parallel and Distributed Computing (Kyoto, Japan) (HPDC '16). Association for Computing Machinery, New York, NY, USA, 57--68.
[22]
Jason Eisner. 1997. State-of-the-Art Algorithms for Minimum Spanning Trees - A Tutorial Discussion. https://www.cs.jhu.edu/~jason/papers/eisner.mst-tutorial.pdf
[23]
Oded Green. 2021. Inverse-Deletion BFS - Revisiting Static Graph BFS Traversals with Dynamic Graph Operations. In 2021 IEEE High Performance Extreme Computing Conference (HPEC). 1--7.
[24]
Hao Guo, Lei Liu, Junjie Chen, Yong Xu, and Xiang Jie. 2017. Alzheimer Classification Using a Minimum Spanning Tree of High-Order Functional Network on fMRI Dataset. Frontiers in Neuroscience 11 (2017).
[25]
Vojtěch Jarník. 1930. O jistém problému minimálním.(Z dopisu panu O. Borůvkovi). http://dml.cz/dmlcz/500726
[26]
Anastasios Katsigiannis, Nikos Anastopoulos, Konstantinos Nikas, and Nectarios Koziris. 2012. An Approach to Parallelize Kruskal's Algorithm Using Helper Threads. In 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops PhD Forum. IEEE, USA, 1601--1610.
[27]
Mohsen Koohi Esfahani, Peter Kilpatrick, and Hans Vandierendonck. 2021. Exploiting In-Hub Temporal Locality In SpMV-Based Graph Processing. In 50th International Conference on Parallel Processing (Lemont, IL, USA) (ICPP 2021). Association for Computing Machinery, New York, NY, USA, 10.
[28]
Mohsen Koohi Esfahani, Peter Kilpatrick, and Hans Vandierendonck. 2021. How Do Graph Relabeling Algorithms Improve Memory Locality?. In 2021 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). IEEE Computer Society, USA, 84--86.
[29]
Mohsen Koohi Esfahani, Peter Kilpatrick, and Hans Vandierendonck. 2021. Locality Analysis of Graph Reordering Algorithms. In 2021 IEEE International Symposium on Workload Characterization (IISWC'21). IEEE Computer Society, USA, 101--112.
[30]
Mohsen Koohi Esfahani, Peter Kilpatrick, and Hans Vandierendonck. 2021. Thrifty Label Propagation: Fast Connected Components for Skewed-Degree Graphs. In 2021 IEEE International Conference on Cluster Computing (CLUSTER). IEEE Computer Society, USA, 226--237.
[31]
Mohsen Koohi Esfahani, Peter Kilpatrick, and Hans Vandierendonck. 2022. LOTUS: Locality Optimizing Triangle Counting. In 27th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming (PPoPP 2022). Association for Computing Machinery, New York, NY, USA, 219--233.
[32]
Mohsen Koohi Esfahani, Peter Kilpatrick, and Hans Vandierendonck. 2022. SAPCo Sort: Optimizing Degree-Ordering for Power-Law Graphs. In 2022 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). IEEE Computer Society.
[33]
Joseph B Kruskal. 1956. On the shortest spanning subtree of a graph and the traveling salesman problem. Proceedings of the American Mathematical society 7, 1 (1956), 48--50.
[34]
Jérôme Kunegis. 2013. KONECT: The Koblenz Network Collection. In Proceedings of the 22nd International Conference on World Wide Web (Rio de Janeiro, Brazil) (WWW '13 Companion). Association for Computing Machinery, New York, NY, USA, 1343--1350.
[35]
Haewoon Kwak, Changhyun Lee, Hosung Park, and Sue Moon. 2010. What is Twitter, a Social Network or a News Media?. In Proceedings of the 19th International Conference on World Wide Web (Raleigh, North Carolina, USA) (WWW '10). Association for Computing Machinery, New York, NY, USA, 591--600.
[36]
Oliver Lehmberg, Robert Meusel, and Christian Bizer. 2014. Graph Structure in the Web: Aggregated by Pay-Level Domain. In Proceedings of the 2014 ACM Conference on Web Science (Bloomington, Indiana, USA) (WebSci '14). Association for Computing Machinery, New York, NY, USA, 119--128.
[37]
Vladimir Lončar and Srdjan Škrbic. 2012. Parallel implementation of minimum spanning tree algorithms using MPI. In 2012 IEEE 13th International Symposium on Computational Intelligence and Informatics (CINTI). IEEE, USA, 35--38.
[38]
Robert Meusel, Sebastiano Vigna, Oliver Lehmberg, and Christian Bizer. 2014. Graph Structure in the Web --- Revisited: A Trick of the Heavy Tail. In Proceedings of the 23rd International Conference on World Wide Web (Seoul, Korea) (WWW '14 Companion). Association for Computing Machinery, New York, NY, USA, 427--432.
[39]
Robert Meusel, Sebastiano Vigna, Oliver Lehmberg, and Christian Bizer. 2015. The Graph Structure in the Web - Analyzed on Different Aggregation Levels. The Journal of Web Science 1, 1 (2015), 33--47.
[40]
Ulrich Meyer and Peter Sanders. 2003. Δ-stepping: a parallelizable shortest path algorithm. Journal of Algorithms 49, 1 (2003), 114--152. 1998 European Symposium on Algorithms.
[41]
G. Magare Minal and D. R. Patil. 2015. Learning collective behavior of social media using minimum spanning tree algorithm. In 2015 2nd International Conference on Electronics and Communication Systems (ICECS). 461--465.
[42]
Alan Mislove, Massimiliano Marcon, Krishna P. Gummadi, Peter Druschel, and Bobby Bhattacharjee. 2007. Measurement and Analysis of Online Social Networks. In Proceedings of the 7th ACM SIGCOMM Conference on Internet Measurement (San Diego, California, USA) (IMC '07). ACM, New York, NY, USA, 29--42.
[43]
Donald Nguyen, Andrew Lenharth, and Keshav Pingali. 2013. A Lightweight Infrastructure for Graph Analytics. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles (Farminton, Pennsylvania) (SOSP '13). Association for Computing Machinery, New York, NY, USA, 456--471.
[44]
Sadegh Nobari, Thanh-Tung Cao, Panagiotis Karras, and Stéphane Bressan. 2012. Scalable Parallel Minimum Spanning Forest Computation. SIGPLAN Not. 47, 8 (Feb. 2012), 205--214.
[45]
Rintu Panja and Sathish Vadhiyar. 2018. MND-MST: A Multi-Node Multi-Device Parallel Boruvka's MST Algorithm. In Proceedings of the 47th International Conference on Parallel Processing (Eugene, OR, USA) (ICPP 2018). Association for Computing Machinery, New York, NY, USA, Article 20, 10 pages.
[46]
Robert Clay Prim. 1957. Shortest connection networks and some generalizations. The Bell System Technical Journal 36, 6 (1957), 1389--1401.
[47]
Ryan A. Rossi and Nesreen K. Ahmed. 2015. The Network Data Repository with Interactive Graph Analytics and Visualization. In Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence (Austin, Texas) (AAAI'15). AAAI Press, USA, 4292--4293.
[48]
Scott Rostrup, Shweta Srivastava, and Kishore Singhal. 2013. Fast and Memory-Efficient Minimum Spanning Tree on the GPU. Int. J. Comput. Sci. Eng. 8, 1 (Feb. 2013), 21--33.
[49]
Youcef Saad. 1994. Sparskit: a basic tool kit for sparse matrix computations - Version 2.
[50]
Nadathur Satish, Narayanan Sundaram, Md. Mostofa Ali Patwary, Jiwon Seo, Jongsoo Park, M. Amber Hassaan, Shubho Sengupta, Zhaoming Yin, and Pradeep Dubey. 2014. Navigating the Maze of Graph Analytics Frameworks Using Massive Graph Datasets. In Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data (Snowbird, Utah, USA) (SIGMOD '14). Association for Computing Machinery, New York, NY, USA, 979--990.
[51]
Yossi Shiloach and Uzi Vishkin. 1982. An O(logn) parallel connectivity algorithm. Journal of Algorithms 3, 1 (1982), 57--67.
[52]
Friendster social network. 2011. Friendster: The online gaming social network. archive.org/details/friendster-dataset-201107.
[53]
Enea Spada, Luciano Sagliocca, John Sourdis, Anna Rosa Garbuglia, Vincenzo Poggi, Carmela De Fusco, and Alfonso Mele. 2004. Use of the Minimum Spanning Tree Model for Molecular Epidemiological Investigation of a Nosocomial Outbreak of Hepatitis C Virus Infection. Journal of Clinical Microbiology 42, 9 (2004), 4230--4236.
[54]
Jiawen Sun, Hans Vandierendonck, and Dimitrios S. Nikolopoulos. 2017. GraphGrind: Addressing Load Imbalance of Graph Partitioning. In Proceedings of the International Conference on Supercomputing (Chicago, Illinois) (ICS '17). Association for Computing Machinery, New York, NY, USA, Article 16, 10 pages.
[55]
Jiawen Sun, Hans Vandierendonck, and Dimitrios S. Nikolopoulos. 2018. VEBO: A Vertex- and Edge-Balanced Ordering Heuristic to Load Balance Parallel Graph Processing. CoRR abs/1806.06576 (2018), 1--13. arXiv:1806.06576 http://arxiv.org/abs/1806.06576
[56]
Robert Endre Tarjan. 1975. Efficiency of a Good But Not Linear Set Union Algorithm. J. ACM 22, 2 (apr 1975), 215--225.
[57]
Daniel Terpstra, Heike Jagode, Haihang You, and Jack J. Dongarra. 2009. Collecting Performance Data with PAPI-C. In Tools for High Performance Computing 2009 - Proceedings of the 3rd International Workshop on Parallel Tools for High Performance Computing, September 2009, ZIH, Dresden, Matthias S. Müller, Michael M. Resch, Alexander Schulz, and Wolfgang E. Nagel (Eds.). Springer, 157--173.
[58]
Edwin van Dellen, Iris E. Sommer, Marc M. Bohlken, Prejaas Tewarie, Laurijn R. Draaisma, Andrew Zalesky, Maria Angelique Di Biase, Jesse A. Brown, Linda Douw, Willem M. Otte, René C.W. Mandl, and Cornelis J. Stam. 2018. Minimum spanning tree analysis of the human connectome. Human Brain Mapping 39 (2018), 2455--2471.
[59]
Steven R. Vegdahl. 1999. Using Node Merging to Enhance Graph Coloring. SIGPLAN Not. 34, 5 (may 1999), 150--154.
[60]
Vibhav Vineet, Pawan Harish, Suryakant Patidar, and P. J. Narayanan. 2009. Fast Minimum Spanning Tree for Large Graphs on the GPU. In Proceedings of the Conference on High Performance Graphics 2009 (New Orleans, Louisiana) (HPG '09). Association for Computing Machinery, New York, NY, USA, 167--171.
[61]
Meichen Yu, Arjan Hillebrand, Prejaas Tewarie, Jil Meier, Bob van Dijk, Piet Van Mieghem, and Cornelis Jan Stam. 2015. Hierarchical clustering in minimum spanning trees. Chaos: An Interdisciplinary Journal of Nonlinear Science 25, 2 (2015), 023107.
[62]
Feng Zhang, Bo Wu, Jidong Zhai, Bingsheng He, and Wenguang Chen. 2017. FinePar: Irregularity-aware fine-grained workload partitioning on integrated architectures. In 2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO). 27--38.
[63]
Wei Zhou. 2017. A Practical Scalable Shared-Memory Parallel Algorithm for Computing Minimum Spanning Trees. Master's thesis. Karlsruhe Institute of Technology. https://algo2.iti.kit.edu/english/3333.php

Cited By

View all
  • (2023)Engineering Massively Parallel MST Algorithms2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS54959.2023.00075(691-701)Online publication date: May-2023
  • (2023)On Overcoming HPC Challenges of Trillion-Scale Real-World Graph Datasets2023 IEEE International Conference on Big Data (BigData)10.1109/BigData59044.2023.10386309(215-220)Online publication date: 15-Dec-2023

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ICS '22: Proceedings of the 36th ACM International Conference on Supercomputing
June 2022
514 pages
ISBN:9781450392815
DOI:10.1145/3524059
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 June 2022

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. graph algorithms
  2. high performance computing
  3. minimum spanning forest
  4. minimum spanning tree
  5. real-world graphs

Qualifiers

  • Research-article

Funding Sources

  • DiPET (CHIST-ERA project)
  • High Performance Computing center of Queen's University Belfast and the Kelvin-2 supercomputer
  • EPSRC

Conference

ICS '22
Sponsor:

Acceptance Rates

Overall Acceptance Rate 629 of 2,180 submissions, 29%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)34
  • Downloads (Last 6 weeks)20
Reflects downloads up to 25 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Engineering Massively Parallel MST Algorithms2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS54959.2023.00075(691-701)Online publication date: May-2023
  • (2023)On Overcoming HPC Challenges of Trillion-Scale Real-World Graph Datasets2023 IEEE International Conference on Big Data (BigData)10.1109/BigData59044.2023.10386309(215-220)Online publication date: 15-Dec-2023

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media