Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3079079.3079095acmconferencesArticle/Chapter ViewAbstractPublication PagesicsConference Proceedingsconference-collections
research-article
Public Access

SPIRIT: a framework for creating distributed recursive tree applications

Published: 14 June 2017 Publication History

Abstract

An important set of applications, from diverse domains such as cosmological simulations, data mining, and computer graphics, involve repeated, depth-first traversal of trees. As these applications operate over massive data sets, it is often necessary to distribute the trees to process all of the data. In this paper, we introduce SPIRIT, a framework to ease the writing of distributed tree applications. SPIRIT automates the challenging tasks of tree distribution, optimizing communication and parallelizing independent computations. The common algorithmic pattern in tree traversals is exploited to effectively schedule parallel computations and improve locality. As a result, we identify systematic ways of exploiting pipeline parallelism in these applications and show how this parallelism can be complemented by selective application of data parallelism to provide greater speed-ups without requiring excessive data replication. SPIRIT is packaged into a set of application programming interfaces (APIs) that developers can use to create scalable applications. Evaluation of SPIRIT on various tree traversal algorithms shows a scalable system. We also find that SPIRIT implementations perform substantially less communication and achieve significant performance improvements over implementations in other distributed graph systems, and are competitive against state-of-the-art, hand-tuned, application-specific implementations.

References

[1]
Mohamed Aly, Mario Munich, and Pietro Perona. 2011. Distributed kd-trees for retrieval from very large image collections. In Proceedings of the British Machine Vision Conference (BMVC).
[2]
Margarita Amor, Francisco Argüello, Juan López, O Plata, and Emilio L Zapata. 2001. A data parallel formulation of the barnes-hut method for n-body simulations. In Applied Parallel Computing. New Paradigms for HPC in Industry and Academia. Springer, 342--349.
[3]
Josh Barnes and Piet Hut. 1986. A hierarchical O (N log N) force-calculation algorithm. Nature (1986).
[4]
Jatin Chhugani, Changkyu Kim, Hemant Shukla, Jongsoo Park, Pradeep Dubey, John Shalf, and Horst D Simon. 2012. Billion-particle SIMD-friendly two-point correlation on large-scale HPC cluster systems. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis. IEEE Computer Society Press, 1.
[5]
Marios D Dikaiakos and Joachim Stadel. 1996. A performance study of cosmological simulations on message-passing and shared-memory multiprocessors. In Proceedings of the 10th international conference on Supercomputing. ACM, 94--101.
[6]
Adam Fidel, Nancy M Amato, Lawrence Rauchwerger, and others. 2012. The stapl parallel graph library. In International Workshop on Languages and Compilers for Parallel Computing. Springer, 46--60.
[7]
Tim Foley and Jeremy Sugerman. 2005. KD-tree acceleration structures for a GPU raytracer. In Proceedings of the ACM SIGGRAPH/EUROGRAPHICS conference on Graphics hardware (HWWS '05). 15--22.
[8]
Joseph E. Gonzalez, Yucheng Low, Haijie Gu, Danny Bickson, and Carlos Guestrin. 2012. PowerGraph: Distributed Graph-parallel Computation on Natural Graphs. In Proceedings of the 10th USENIX Conference on Operating Systems Design and Implementation (OSDI'12). USENIX Association, Berkeley, CA, USA, 17--30.
[9]
A. G. Gray and A.W. Moore. 2001. N-Body Problems in Statistical Learning. In Advances in Neural Information Processing Systems (NIPS) 13. 521--527.
[10]
Douglas Gregor and Andrew Lumsdaine. 2005. The Parallel BGL: A generic library for distributed graph computations. Parallel Object-Oriented Scientific Computing (POOSC) 2 (2005), 1--18.
[11]
Suyash Gupta and V. Krishna Nandivada. 2015. IMSuite: A benchmark suite for simulating distributed algorithms. J. Parallel and Distrib. Comput. 75, 0 (Jan. 2015), 1 -- 19.
[12]
Nikhil Hegde, Jianqiao Liu, Kirshanthan Sundararajah, and Milind Kulkarni. 2017. Treelogy: A Benchmark Suite for Tree Traversals. presented at the IEEE International Symposium on Performance Analysis of Systems and Software (2017), Santa Rosa, CA, USA.
[13]
Pritish Jetley, Filippo Gioachin, Celso Mendes, Laxmikant V Kale, and Thomas Quinn. 2008. Massively parallel cosmological simulations with ChaNGa. In Parallel and Distributed Processing, 2008. IPDPS 2008. IEEE International Symposium on. IEEE, 1--12.
[14]
Youngjoon Jo and Milind Kulkarni. 2011. Enhancing Locality for Recursive Traversals of Recursive Structures. In Proceedings of the 2011 ACM International Conference on Object Oriented Programming Systems Languages and Applications (OOPSLA '11). ACM, New York, NY, USA, 463--482.
[15]
Milind Vidyadhar Kulkarni. 2008. The galois system: optimistic parallelization of irregular programs. Ph.D. Dissertation. Cornell University.
[16]
M. Lichman. 2013. UCI Machine Learning Repository. (2013).
[17]
Yucheng Low, Danny Bickson, Joseph Gonzalez, Carlos Guestrin, Aapo Kyrola, and Joseph M. Hellerstein. 2012. Distributed GraphLab: A Framework for Machine Learning and Data Mining in the Cloud. Proc. VLDB Endow. 5, 8 (April 2012), 716--727.
[18]
Grzegorz Malewicz, Matthew H. Austern, Aart J.C Bik, James C. Dehnert, Ilan Horn, Naty Leiser, and Grzegorz Czajkowski. 2010. Pregel: A System for Large-scale Graph Processing. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data (SIGMOD '10). ACM, New York, NY, USA, 135--146.
[19]
Jacob Nelson, Brandon Holt, Brandon Myers, Preston Briggs, Luis Ceze, Simon Kahan, and Mark Oskin. 2014. Grappa: A latency-tolerant runtime for large-scale irregular applications. In International Workshop on Rack-Scale Computing (WRSC w/EuroSys).
[20]
Donald Nguyen, Andrew Lenharth, and Keshav Pingali. 2013. A lightweight infrastructure for graph analytics. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles. ACM, 456--471.
[21]
Jacopo Pantaleoni, Luca Fascione, Martin Hill, and Timo Aila. 2010. PantaRay: fast ray-traced occlusion caching of massive scenes. ACM Transactions on Graphics (TOG) 29, 4 (2010), 37.
[22]
Thomas Quinn. 2016. Personal correspondence. (2016).
[23]
Jaswinder Pal Singh, Chris Holt, Takashi Totsuka, Anoop Gupta, and John Hennessy. 1995. Load balancing and data locality in adaptive hierarchical N-body methods: Barnes-Hut, fast multipole, and radiosity. J. Parallel and Distrib. Comput. 27, 2 (1995), 118--141.
[24]
Bruce Walter, Kavita Bala, Milind Kulkarni, and Keshav Pingali. 2008. Fast Agglomerative Clustering for Rendering. In IEEE Symposium on Interactive Ray Tracing (RT). 81--86.
[25]
Michael S Warren and John K Salmon. 1992. Astrophysical N-body simulations using hierarchical tree data structures. In Proceedings of the 1992 ACM/IEEE Conference on Supercomputing. IEEE Computer Society Press, 570--576.
[26]
Jeremiah James Willcock, Torsten Hoefler, Nicholas Gerard Edmonds, and Andrew Lumsdaine. 2010. AM++: A generalized active message framework. In Proceedings of the 19th international conference on Parallel architectures and compilation techniques. ACM, 401--410.
[27]
Peter N Yianilos. 1993. Data structures and algorithms for nearest neighbor search in general metric spaces. In SODA, Vol. 93. 311--321.

Cited By

View all
  • (2019)Efficient GPU tree walks for effective distributed n-body simulationsProceedings of the ACM International Conference on Supercomputing10.1145/3330345.3330348(24-34)Online publication date: 26-Jun-2019

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ICS '17: Proceedings of the International Conference on Supercomputing
June 2017
300 pages
ISBN:9781450350204
DOI:10.1145/3079079
  • General Chairs:
  • William D. Gropp,
  • Pete Beckman,
  • Program Chairs:
  • Zhiyuan Li,
  • Francisco J. Cazorla
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 June 2017

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article

Funding Sources

Conference

ICS '17
Sponsor:

Acceptance Rates

Overall Acceptance Rate 629 of 2,180 submissions, 29%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)118
  • Downloads (Last 6 weeks)3
Reflects downloads up to 09 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2019)Efficient GPU tree walks for effective distributed n-body simulationsProceedings of the ACM International Conference on Supercomputing10.1145/3330345.3330348(24-34)Online publication date: 26-Jun-2019

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media