Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/1757112.1757137guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

UTS: an unbalanced tree search benchmark

Published: 02 November 2006 Publication History

Abstract

This paper presents an unbalanced tree search (UTS) benchmark designed to evaluate the performance and ease of programming for parallel applications requiring dynamic load balancing. We describe algorithms for building a variety of unbalanced search trees to simulate different forms of load imbalance. We created versions of UTS in two parallel languages, OpenMP and Unified Parallel C (UPC), using work stealing as the mechanism for reducing load imbalance. We benchmarked the performance of UTS on various parallel architectures, including shared-memory systems and PC clusters. We found it simple to implement UTS in both UPC and OpenMP, due to UPC's shared-memory abstractions. Results show that both UPC and OpenMP can support efficient dynamic load balancing on shared-memory architectures. However, UPC cannot alleviate the underlying communication costs of distributed-memory systems. Since dynamic load balancing requires intensive communication, performance portability remains difficult for applications such as UTS and performance degrades on PC clusters. By varying key work stealing parameters, we expose important tradeoffs between the granularity of load balance, the degree of parallelism, and communication costs.

References

[1]
T. Harris, The Theory of Branching Processes. Springer, 1963.
[2]
D. Eastlake and P. Jones, "US secure hash algorithm 1 (SHA-1)," Internet Engineering Task Force, RFC 3174, Sept. 2001. {Online}. Available: http://www.rfc-editor.org/rfc/rfc3174.txt
[3]
J. Leskovec, J. Kleinberg, and C. Faloutsos, "Graphs over time: densification laws, shrinking diameters and possible explanations," in Proc. 11th ACM SIGKDD Int'l Conf. Know. Disc. Data Mining (KDD '05), 2005, pp. 177-187.
[4]
R. Blumofe and C. Leiserson, "Scheduling multithreaded computations by work stealing," in Proc. 35th Ann. Symp. Found. Comp. Sci., Nov. 1994, pp. 356-368.
[5]
V. Kumar, A. Y. Grama, and N. R. Vempaty, "Scalable load balancing techniques for parallel computers," J. Par. Dist. Comp., vol. 22, no. 1, pp. 60-79, 1994.
[6]
V. Kumar and V. N. Rao, "Parallel depth first search. part ii. analysis," Int'l J. Par. Prog., vol. 16, no. 6, pp. 501-519, 1987.
[7]
UPC Consortium, "UPC language specifications, v1.2," Lawrence Berkeley National Lab, Tech. Rep. LBNL-59208, 2005.
[8]
M. Frigo, C. E. Leiserson, and K. H. Randall, "The implementation of the Cilk-5 multithreaded language," in Proc. 1998 SIGPLAN Conf. Prog. Lang. Design Impl. (PLDI '98), 1998, pp. 212-223.
[9]
A. Marowka, "Analytic comparison of two advanced c language-based parallel programming models." in Proc. Third Int'l Symp. Par. and Dist. Comp./Int'l Workshop Algorithms, Models and Tools for Par. Comp. Hetero. Nets. (ISPDC/ HeteroPar'04), 2004, pp. 284-291.
[10]
J. Marathe and F. Mueller, "Hardware profile-guided automatic page placement for ccnuma systems," in Proc. 11th ACM SIGPLAN Symp. Princ. Pract. Par. Prog. (PPoPP '06), 2006, pp. 90-99.
[11]
K. Berlin, J. Huan, M. Jacob, G. Kochhar, J. Prins, W. Pugh, P. Sadayappan, J. Spacco, and C.-W. Tseng, "Evaluating the impact of programming language features on the performance of parallel applications on cluster architectures." in Proc. LCPC 2003, ser. LNCS, L. Rauchwerger, Ed., vol. 2958, 2003, pp. 194-208.
[12]
European Center for Parallelism, "PARAVER," 2006. {Online}. Available: http://www.cepba.upc.edu/paraver/
[13]
W. Chen, C. Iancu, and K. A. Yelick, "Communication optimizations for finegrained UPC applications." in Proc. Int'l Conf. Par. Arch. Compilation Tech. (PACT 2005), 2005, pp. 267-278.
[14]
J. Prins, J. Huan, B. Pugh, C. Tseng, and P. Sadayappan, "UPC implementation of an unbalanced tree search benchmark," Univ. North Carolina at Chapel Hill, Tech. Rep. TR03-034, Oct. 2003.

Cited By

View all
  • (2020)Equilibrium: an elasticity controller for parallel tree search in the cloudThe Journal of Supercomputing10.1007/s11227-020-03197-y76:11(9211-9245)Online publication date: 1-Nov-2020
  • (2020)Evaluating the Efficiency of OpenMP Tasking for Unbalanced Computation on Diverse CPU ArchitecturesOpenMP: Portable Multi-Level Parallelism on Modern Systems10.1007/978-3-030-58144-2_2(18-33)Online publication date: 22-Sep-2020
  • (2019)Fairness in responsive parallelismProceedings of the ACM on Programming Languages10.1145/33416853:ICFP(1-30)Online publication date: 26-Jul-2019
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings
LCPC'06: Proceedings of the 19th international conference on Languages and compilers for parallel computing
November 2006
365 pages
ISBN:9783540725206
  • Editors:
  • George Almási,
  • Calin Cascaval,
  • Peng Wu

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 02 November 2006

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 09 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2020)Equilibrium: an elasticity controller for parallel tree search in the cloudThe Journal of Supercomputing10.1007/s11227-020-03197-y76:11(9211-9245)Online publication date: 1-Nov-2020
  • (2020)Evaluating the Efficiency of OpenMP Tasking for Unbalanced Computation on Diverse CPU ArchitecturesOpenMP: Portable Multi-Level Parallelism on Modern Systems10.1007/978-3-030-58144-2_2(18-33)Online publication date: 22-Sep-2020
  • (2019)Fairness in responsive parallelismProceedings of the ACM on Programming Languages10.1145/33416853:ICFP(1-30)Online publication date: 26-Jul-2019
  • (2019)HOPEProceedings of the 48th International Conference on Parallel Processing10.1145/3337821.3337899(1-11)Online publication date: 5-Aug-2019
  • (2019)Failure Recovery in Resilient X10ACM Transactions on Programming Languages and Systems10.1145/333237241:3(1-30)Online publication date: 2-Jul-2019
  • (2019)Blaze-TasksACM Transactions on Architecture and Code Optimization10.1145/329344815:4(1-25)Online publication date: 8-Jan-2019
  • (2018)FULTProceedings of the 47th International Conference on Parallel Processing10.1145/3225058.3225115(1-10)Online publication date: 13-Aug-2018
  • (2018)Elastic PlacesACM Transactions on Architecture and Code Optimization10.1145/318545815:2(1-26)Online publication date: 1-May-2018
  • (2018)An architectural framework for accelerating dynamic parallel algorithms on reconfigurable hardwareProceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture10.1109/MICRO.2018.00014(55-67)Online publication date: 20-Oct-2018
  • (2018)Hybrid work stealing of locality-flexible and cancelable tasks for the APGAS libraryThe Journal of Supercomputing10.1007/s11227-018-2234-874:4(1435-1448)Online publication date: 1-Apr-2018
  • Show More Cited By

View Options

View options

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media