research-article

Public Access

SPIRIT: a framework for creating distributed recursive tree applications

Authors:

Milind KulkarniAuthors Info & Claims

ICS '17: Proceedings of the International Conference on Supercomputing

Article No.: 3, Pages 1 - 11

https://doi.org/10.1145/3079079.3079095

Published: 14 June 2017 Publication History

Abstract

An important set of applications, from diverse domains such as cosmological simulations, data mining, and computer graphics, involve repeated, depth-first traversal of trees. As these applications operate over massive data sets, it is often necessary to distribute the trees to process all of the data. In this paper, we introduce SPIRIT, a framework to ease the writing of distributed tree applications. SPIRIT automates the challenging tasks of tree distribution, optimizing communication and parallelizing independent computations. The common algorithmic pattern in tree traversals is exploited to effectively schedule parallel computations and improve locality. As a result, we identify systematic ways of exploiting pipeline parallelism in these applications and show how this parallelism can be complemented by selective application of data parallelism to provide greater speed-ups without requiring excessive data replication. SPIRIT is packaged into a set of application programming interfaces (APIs) that developers can use to create scalable applications. Evaluation of SPIRIT on various tree traversal algorithms shows a scalable system. We also find that SPIRIT implementations perform substantially less communication and achieve significant performance improvements over implementations in other distributed graph systems, and are competitive against state-of-the-art, hand-tuned, application-specific implementations.

References

[1]

Mohamed Aly, Mario Munich, and Pietro Perona. 2011. Distributed kd-trees for retrieval from very large image collections. In Proceedings of the British Machine Vision Conference (BMVC).

[2]

Margarita Amor, Francisco Argüello, Juan López, O Plata, and Emilio L Zapata. 2001. A data parallel formulation of the barnes-hut method for n-body simulations. In Applied Parallel Computing. New Paradigms for HPC in Industry and Academia. Springer, 342--349.

Digital Library

[3]

Josh Barnes and Piet Hut. 1986. A hierarchical O (N log N) force-calculation algorithm. Nature (1986).

[4]

Jatin Chhugani, Changkyu Kim, Hemant Shukla, Jongsoo Park, Pradeep Dubey, John Shalf, and Horst D Simon. 2012. Billion-particle SIMD-friendly two-point correlation on large-scale HPC cluster systems. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis. IEEE Computer Society Press, 1.

Digital Library

[5]

Marios D Dikaiakos and Joachim Stadel. 1996. A performance study of cosmological simulations on message-passing and shared-memory multiprocessors. In Proceedings of the 10th international conference on Supercomputing. ACM, 94--101.

Digital Library

[6]

Adam Fidel, Nancy M Amato, Lawrence Rauchwerger, and others. 2012. The stapl parallel graph library. In International Workshop on Languages and Compilers for Parallel Computing. Springer, 46--60.

[7]

Tim Foley and Jeremy Sugerman. 2005. KD-tree acceleration structures for a GPU raytracer. In Proceedings of the ACM SIGGRAPH/EUROGRAPHICS conference on Graphics hardware (HWWS '05). 15--22.

Digital Library

[8]

Joseph E. Gonzalez, Yucheng Low, Haijie Gu, Danny Bickson, and Carlos Guestrin. 2012. PowerGraph: Distributed Graph-parallel Computation on Natural Graphs. In Proceedings of the 10th USENIX Conference on Operating Systems Design and Implementation (OSDI'12). USENIX Association, Berkeley, CA, USA, 17--30.

Digital Library

[9]

A. G. Gray and A.W. Moore. 2001. N-Body Problems in Statistical Learning. In Advances in Neural Information Processing Systems (NIPS) 13. 521--527.

Digital Library

[10]

Douglas Gregor and Andrew Lumsdaine. 2005. The Parallel BGL: A generic library for distributed graph computations. Parallel Object-Oriented Scientific Computing (POOSC) 2 (2005), 1--18.

[11]

Suyash Gupta and V. Krishna Nandivada. 2015. IMSuite: A benchmark suite for simulating distributed algorithms. J. Parallel and Distrib. Comput. 75, 0 (Jan. 2015), 1 -- 19.

Digital Library

[12]

Nikhil Hegde, Jianqiao Liu, Kirshanthan Sundararajah, and Milind Kulkarni. 2017. Treelogy: A Benchmark Suite for Tree Traversals. presented at the IEEE International Symposium on Performance Analysis of Systems and Software (2017), Santa Rosa, CA, USA.

[13]

Pritish Jetley, Filippo Gioachin, Celso Mendes, Laxmikant V Kale, and Thomas Quinn. 2008. Massively parallel cosmological simulations with ChaNGa. In Parallel and Distributed Processing, 2008. IPDPS 2008. IEEE International Symposium on. IEEE, 1--12.

[14]

Youngjoon Jo and Milind Kulkarni. 2011. Enhancing Locality for Recursive Traversals of Recursive Structures. In Proceedings of the 2011 ACM International Conference on Object Oriented Programming Systems Languages and Applications (OOPSLA '11). ACM, New York, NY, USA, 463--482.

Digital Library

[15]

Milind Vidyadhar Kulkarni. 2008. The galois system: optimistic parallelization of irregular programs. Ph.D. Dissertation. Cornell University.

[16]

M. Lichman. 2013. UCI Machine Learning Repository. (2013).

[17]

Yucheng Low, Danny Bickson, Joseph Gonzalez, Carlos Guestrin, Aapo Kyrola, and Joseph M. Hellerstein. 2012. Distributed GraphLab: A Framework for Machine Learning and Data Mining in the Cloud. Proc. VLDB Endow. 5, 8 (April 2012), 716--727.

Digital Library

[18]

Grzegorz Malewicz, Matthew H. Austern, Aart J.C Bik, James C. Dehnert, Ilan Horn, Naty Leiser, and Grzegorz Czajkowski. 2010. Pregel: A System for Large-scale Graph Processing. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data (SIGMOD '10). ACM, New York, NY, USA, 135--146.

Digital Library

[19]

Jacob Nelson, Brandon Holt, Brandon Myers, Preston Briggs, Luis Ceze, Simon Kahan, and Mark Oskin. 2014. Grappa: A latency-tolerant runtime for large-scale irregular applications. In International Workshop on Rack-Scale Computing (WRSC w/EuroSys).

[20]

Donald Nguyen, Andrew Lenharth, and Keshav Pingali. 2013. A lightweight infrastructure for graph analytics. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles. ACM, 456--471.

Digital Library

[21]

Jacopo Pantaleoni, Luca Fascione, Martin Hill, and Timo Aila. 2010. PantaRay: fast ray-traced occlusion caching of massive scenes. ACM Transactions on Graphics (TOG) 29, 4 (2010), 37.

Digital Library

[22]

Thomas Quinn. 2016. Personal correspondence. (2016).

[23]

Jaswinder Pal Singh, Chris Holt, Takashi Totsuka, Anoop Gupta, and John Hennessy. 1995. Load balancing and data locality in adaptive hierarchical N-body methods: Barnes-Hut, fast multipole, and radiosity. J. Parallel and Distrib. Comput. 27, 2 (1995), 118--141.

Digital Library

[24]

Bruce Walter, Kavita Bala, Milind Kulkarni, and Keshav Pingali. 2008. Fast Agglomerative Clustering for Rendering. In IEEE Symposium on Interactive Ray Tracing (RT). 81--86.

[25]

Michael S Warren and John K Salmon. 1992. Astrophysical N-body simulations using hierarchical tree data structures. In Proceedings of the 1992 ACM/IEEE Conference on Supercomputing. IEEE Computer Society Press, 570--576.

Digital Library

[26]

Jeremiah James Willcock, Torsten Hoefler, Nicholas Gerard Edmonds, and Andrew Lumsdaine. 2010. AM++: A generalized active message framework. In Proceedings of the 19th international conference on Parallel architectures and compilation techniques. ACM, 401--410.

Digital Library

[27]

Peter N Yianilos. 1993. Data structures and algorithms for nearest neighbor search in general metric spaces. In SODA, Vol. 93. 311--321.

Digital Library

Cited By

Liu JRobson MQuinn TKulkarni MEigenmann RDing CMcKee S(2019)Efficient GPU tree walks for effective distributed n-body simulationsProceedings of the ACM International Conference on Supercomputing10.1145/3330345.3330348(24-34)Online publication date: 26-Jun-2019
https://dl.acm.org/doi/10.1145/3330345.3330348

Index Terms

SPIRIT: a framework for creating distributed recursive tree applications
1. Computing methodologies
  1. Distributed computing methodologies
    1. Distributed programming languages
  2. Parallel computing methodologies
    1. Parallel algorithms
      1. Massively parallel algorithms
2. Software and its engineering
  1. Software notations and tools
    1. Development frameworks and environments
      1. Application specific development environments

Recommendations

SPIRIT: a runtime system for distributed irregular tree applications
PPoPP '16

Repeated, depth-first traversal of trees is a common algorithmic pattern in an important set of applications from diverse domains such as cosmological simulations, data mining, and computer graphics. As these applications operate over massive data sets, ...
SPIRIT: a runtime system for distributed irregular tree applications
PPoPP '16: Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming

Repeated, depth-first traversal of trees is a common algorithmic pattern in an important set of applications from diverse domains such as cosmological simulations, data mining, and computer graphics. As these applications operate over massive data sets, ...
SPIRIT-/spl mu/Kernel for strongly partitioned real-time systems
RTCSA '00: Proceedings of the Seventh International Conference on Real-Time Systems and Applications

To achieve reliability, reusability and cost reduction, a significant trend of building large, complex real-time systems is to integrate separated application modules. An essential requirement of integrated real-time systems is to guarantee strong ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ICS '17: Proceedings of the International Conference on Supercomputing

June 2017

300 pages

ISBN:9781450350204

DOI:10.1145/3079079

General Chairs:
William D. Gropp
University of Illinois at Urbana-Champaign, Illinois
,
Pete Beckman
Argonne National Laboratory/Northwestern University, Illinois
,
Program Chairs:
Zhiyuan Li
Purdue University, West Lafayette, Indiana
,
Francisco J. Cazorla
IIIA-CSIC and Barcelona Supercomputing Center, Barcelona, Spain

Copyright © 2017 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGARCH: ACM Special Interest Group on Computer Architecture

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 June 2017

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article

Funding Sources

Conference

ICS '17

Sponsor:

SIGARCH

ICS '17: 2017 International Conference on Supercomputing

June 14 - 16, 2017

Illinois, Chicago

Acceptance Rates

Overall Acceptance Rate 629 of 2,180 submissions, 29%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
470
Total Downloads

Downloads (Last 12 months)118
Downloads (Last 6 weeks)3

Reflects downloads up to 09 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Liu JRobson MQuinn TKulkarni MEigenmann RDing CMcKee S(2019)Efficient GPU tree walks for effective distributed n-body simulationsProceedings of the ACM International Conference on Supercomputing10.1145/3330345.3330348(24-34)Online publication date: 26-Jun-2019
https://dl.acm.org/doi/10.1145/3330345.3330348

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents