research-article

Kremlin: rethinking and rebooting gprof for the multicore age

Authors:

Saturnino Garcia,

Christopher M. Louie,

Michael Bedford TaylorAuthors Info & Claims

PLDI '11: Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation

Pages 458 - 469

https://doi.org/10.1145/1993498.1993553

Published: 04 June 2011 Publication History

Abstract

Many recent parallelization tools lower the barrier for parallelizing a program, but overlook one of the first questions that a programmer needs to answer: which parts of the program should I spend time parallelizing?

This paper examines Kremlin, an automatic tool that, given a serial version of a program, will make recommendations to the user as to what regions (e.g. loops or functions) of the program to attack first. Kremlin introduces a novel hierarchical critical path analysis and develops a new metric for estimating the potential of parallelizing a region: self-parallelism. We further introduce the concept of a parallelism planner, which provides a ranked order of specific regions to the programmer that are likely to have the largest performance impact when parallelized. Kremlin supports multiple planner personalities, which allow the planner to more effectively target a particular programming environment or class of machine.

We demonstrate the effectiveness of one such personality, an OpenMP planner, by comparing versions of programs that are parallelized according to Kremlin's plan against third-party manually parallelized versions. The results show that Kremlin's OpenMP planner is highly effective, producing plans whose performance is typically comparable to, and sometimes much better than, manual parallelization. At the same time, these plans would require that the user parallelize significantly fewer regions of the program.

References

[1]

NAS Parallel Benchmarks 2.3; OpenMP C. www.hpcc.jp/Omni/.

[2]

Spec OMP2001 Benchmarks. http://www.spec.org/omp.

[3]

F. Allen, M. Burke, R. Cytron, J. Ferrante, W. Hsieh, and V. Sarkar. "A framework for determining useful parallelism". In Proceedings of the 2nd international conference on Supercomputing, ICS '88, 1988.

Digital Library

[4]

T. E. Anderson, and E. D. Lazowska. "Quartz: A tool for tuning parallel program performance". In SIGMETRICS, vol. 18, 1990.

Digital Library

[5]

T. Austin, and G. S. Sohi. "Dynamic dependency analysis of ordinary programs". In ISCA, 1992.

Digital Library

[6]

Bailey et al. "The NAS parallel benchmarks. In SC, 1991.

[7]

W. Blume, R. Doallo, R. Eigenmann, J. Grout, J. Hoeflinger, T. Lawrence, J. Lee, D. Padua, W. Paek, Y. Pottenger, L. Rauchwerger, and P. Tu. "Parallel programming with Polaris". IEEE Computer, Aug 2002.

Digital Library

[8]

J. M. Bull, and D. O'Neill. "A microbenchmark suite for openmp 2.0. SIGARCH Comput. Archit. News, December 2001.

Digital Library

[9]

M. K. Chen, and K. Olukotun. "The Jrpm system for dynamically parallelizing Java programs". In ISCA, 2003.

Digital Library

[10]

D. Dig, J. Marrero, and M. D. Ernst. "Refactoring sequential java code for concurrency via concurrent libraries". In ICSE, 2009.

Digital Library

[11]

Z. H. Du, C. C. Lim, X. F. Li, C. Yang, Q. Zhao, and T. F. Ngai. "A cost-driven compilation framework for speculative parallelization of sequential programs". In PLDI, 2004.

Digital Library

[12]

M. W. Hall, J. M. Anderson, S. P. Amarasinghe, B. R. Murphy, S.-W. Liao, and E. Bu. "Maximizing multiprocessor performance with the SUIF compiler". IEEE Computer, Aug 1996.

Digital Library

[13]

C. Hammacher, K. Streit, S. Hack, and A. Zeller. "Profiling java programs for parallelism". In Proceedings of the 2009 ICSE Workshop on Multicore Software Engineering, IWMSE '09, 2009.

Digital Library

[14]

Y. He, C. Leiserson, and W. Leiserson. "The Cilkview Scalability Analyzer". In SPAA, 2010.

Digital Library

[15]

D. Jeon, S. Garcia, C. Louie, S. Kota Venkata, and M. Taylor. "Kremlin: Like gprof, but for Parallelization". In Principles and Practice of Parallel Programming, 2011.

Digital Library

[16]

G. Jost, H. Jin, J. Labarta, and J. Gimenez. "Interfacing computer aided parallelization and performance analysis". In Computational Science ICCS 2003, vol. 2660 of Lecture Notes in Computer Science, 715--715. 2003.

Digital Library

[17]

K. Kelsey, T. Bai, C. Ding, and C. Zhang. "Fast track: A software system for speculative program optimization". In CGO, 2009.

Digital Library

[18]

K. Kennedy, K. S. McKinley, and C. W. Tseng. "Interactive parallel programming using the parascope editor". IEEE TPDS, 1991.

Digital Library

[19]

M. Kim, H. Kim, and C.-K. Luk. "Sd3: A scalable approach to dynamic data-dependence profiling". Microarchitecture, IEEE/ACM International Symposium on, 2010.

Digital Library

[20]

S. Kota Venkata, I. Ahn, D. Jeon, A. Gupta, C. Louie, S. Garcia, S. Belongie, and M. Taylor. "SD-VBS: The San Diego Vision Benchmark Suite". In IISWC, 2009.

Digital Library

[21]

D. Kuck, Y. Muraoka, and S.-C. Chen. "On the number of operations simultaneously executable in fortran-like programs and their resulting speedup". IEEE Transactions on Computers, Dec. 1972.

Digital Library

[22]

M. Kulkarni, M. Burtscher, R. Inkulu, K. Pingali, and C. Casçaval. "How much parallelism is there in irregular applications"? In PPoPP, 2009.

Digital Library

[23]

M. Kumar. "Measuring parallelism in computation-intensive scientific/engineering applications". IEEE TOC, Sep 1988.

Digital Library

[24]

J. R. Larus. "Loop-level parallelism in numeric and symbolic programs". IEEE Trans. Parallel Distrib. Syst., 1993.

Digital Library

[25]

C. Lattner, and V. Adve. "LLVM: A compilation framework for lifelong program analysis & transformation". In CGO, Mar 2004.

Digital Library

[26]

W. Lee, R. Barua, M. Frank, D. Srikrishna, J. Babb, V. Sarkar, and S. Amarasinghe. "Space-time scheduling of instruction-level parallelism on a Raw machine". In ASPLOS, October 1998.

Digital Library

[27]

C. E. Leiserson. "The Cilk concurrency platform. In DAC, 2009.

Digital Library

[28]

S.-W. Liao, A. Diwan, R. P. Bosch, Jr., A. Ghuloum, and M. S. Lam. "Suif explorer: an interactive and interprocedural parallelizer". In Proceedings of the ACM SIGPLAN symposium on Principles and practice of parallel programming, 1999.

Digital Library

[29]

W. Liu, J. Tuck, L. Ceze, W. Ahn, K. Strauss, J. Renau, and J. Torrellas. "POSH: a TLS compiler that exploits program structure". In PPoPP, 2006.

Digital Library

[30]

N. Nethercote, and J. Seward. "Valgrind: A framework for heavyweight dynamic binary instrumentation". In PLDI, 2007.

Digital Library

[31]

L. Rauchwerger, and D. Padua. "The LRPD test: speculative run-time parallelization of loops with privatization and reduction parallelization". In PLDI, 1995.

Digital Library

[32]

A. Rountev, K. Van Valkenburgh, D. Yan, and P. Sadayappan. "Understanding parallelism-inhibiting dependences in sequential java programs". In Software Maintenance (ICSM), 2010 IEEE International Conference on, Sept 2010.

Digital Library

[33]

V. A. Saraswat, V. Sarkar, and C. von Praun. "X10: concurrent programming for modern architectures". In PPoPP, 2007.

Digital Library

[34]

G. Sohi, S. Breach, and T. Vijaykumar. "Multiscalar processors". In ISCA, 1995.

Digital Library

[35]

N. R. Tallent, and J. M. Mellor Crummey. "Effective performance measurement and analysis of multithreaded applications". In PPoPP, 2009.

Digital Library

[36]

W. Thies, S. Hall, and S. Amarasinghe. "Manipulating lossless video in the compressed domain". In ACM Multimedia, 2009.

Digital Library

[37]

C. Tian, M. Feng, V. Nagarajan, and R. Gupta. "Copy or discard execution model for speculative parallelization on multicores". In MICRO, 2008.

[38]

G. Tournavitis, Z. Wang, B. Franke, and M. F. P. O'Boyle. "Towards a holistic approach to auto-parallelization: integrating profile-driven parallelism detection and machine-learning based mapping". In PLDI, 2009.

Digital Library

[39]

C. von Praun, R. Bordawekar, and C. Cascaval. "Modeling optimistic concurrency using quantitative dependence analysis". In PPoPP, 2008.

Digital Library

[40]

J. Wloka, M. Sridharan, and F. Tip. "Refactoring for reentrancy". In FSE, 2009.

Digital Library

[41]

P. Wu, A. Kejariwal, and C. Caşcaval. "Compiler-driven dependence profiling to guide program parallelization". In LCPC, 232--248. 2008.

Digital Library

[42]

B. Xin, and X. Zhang. "Efficient online detection of dynamic control dependence". In ISSTA, 2007.

Digital Library

[43]

X. Zhang, A. Navabi, and S. Jagannathan. "Alchemist: A transparent dependence distance profiling infrastructure". In CGO, 2009.

Digital Library

[44]

Y. Zhang, and R. Gupta. "Timestamped whole program path representation and its applications". In PLDI, 2001.

Digital Library

[45]

Q. Zhao, D. Bruening, and S. Amarasinghe. "Umbra: Efficient and scalable memory shadowing". In CGO, 2010.

Digital Library

[46]

H. Zhong, M. Mehrara, S. Lieberman, and S. Mahlke. "Uncovering hidden loop level parallelism in sequential applications". In HPCA, 2008.

Cited By

Nahian ADemsky BRodríguez GSadayappan PSukumaran-Rajam A(2024)FlowProf: Profiling Multi-threaded Programs using Information-FlowProceedings of the 33rd ACM SIGPLAN International Conference on Compiler Construction10.1145/3640537.3641577(137-149)Online publication date: 17-Feb-2024
https://dl.acm.org/doi/10.1145/3640537.3641577
Shen YPeng MWu QXie G(2023)Multigraph learning for parallelism discovery in sequential programsConcurrency and Computation: Practice and Experience10.1002/cpe.764835:9Online publication date: 13-Feb-2023
https://doi.org/10.1002/cpe.7648
Abbas MSoliman MRabia SKimura KEl-Mahdy A(2022)Accelerating Data Dependence Profiling Through Abstract Interpretation of Loop InstructionsIEEE Access10.1109/ACCESS.2022.316072910(31626-31640)Online publication date: 2022
https://doi.org/10.1109/ACCESS.2022.3160729
Show More Cited By

Index Terms

Kremlin: rethinking and rebooting gprof for the multicore age
1. Computing methodologies
  1. Parallel computing methodologies
    1. Parallel programming languages
2. Software and its engineering
  1. Software creation and management
    1. Designing software
      1. Software implementation planning
        Software design techniques
    2. Software development process management
  2. Software notations and tools
    1. General programming languages
      1. Language types
        Parallel programming languages

Recommendations

Kremlin: rethinking and rebooting gprof for the multicore age
PLDI '11

Many recent parallelization tools lower the barrier for parallelizing a program, but overlook one of the first questions that a programmer needs to answer: which parts of the program should I spend time parallelizing?

This paper examines Kremlin, an ...
The Kremlin Oracle for Sequential Code Parallelization

The Kremlin open-source tool helps programmers by automatically identifying regions in sequential programs that merit parallelization. Kremlin combines a novel dynamic program analysis, hierarchical critical-path analysis, with multicore processor ...
Kismet: parallel speedup estimates for serial programs
OOPSLA '11

Software engineers now face the difficult task of refactoring serial programs for parallel execution on multicore processors. Currently, they are offered little guidance as to how much benefit may come from this task, or how close they are to the best ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

PLDI '11: Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation

June 2011

668 pages

ISBN:9781450306638

DOI:10.1145/1993498

General Chair:
Mary Hall
University of Utah
,
Program Chair:
David Padua
University of Illinois at Urbana-Champaign

ACM SIGPLAN Notices Volume 46, Issue 6
PLDI '11
June 2011
652 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/1993316
Issue’s Table of Contents

Copyright © 2011 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGPLAN: ACM Special Interest Group on Programming Languages

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 June 2011

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

PLDI '11

Sponsor:

SIGPLAN

PLDI '11: ACM SIGPLAN Conference on Programming Language Design and Implementation

June 4 - 8, 2011

California, San Jose, USA

Acceptance Rates

Overall Acceptance Rate 406 of 2,067 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

90
Total Citations
View Citations
622
Total Downloads

Downloads (Last 12 months)21
Downloads (Last 6 weeks)6

Reflects downloads up to 12 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Nahian ADemsky BRodríguez GSadayappan PSukumaran-Rajam A(2024)FlowProf: Profiling Multi-threaded Programs using Information-FlowProceedings of the 33rd ACM SIGPLAN International Conference on Compiler Construction10.1145/3640537.3641577(137-149)Online publication date: 17-Feb-2024
https://dl.acm.org/doi/10.1145/3640537.3641577
Shen YPeng MWu QXie G(2023)Multigraph learning for parallelism discovery in sequential programsConcurrency and Computation: Practice and Experience10.1002/cpe.764835:9Online publication date: 13-Feb-2023
https://doi.org/10.1002/cpe.7648
Abbas MSoliman MRabia SKimura KEl-Mahdy A(2022)Accelerating Data Dependence Profiling Through Abstract Interpretation of Loop InstructionsIEEE Access10.1109/ACCESS.2022.316072910(31626-31640)Online publication date: 2022
https://doi.org/10.1109/ACCESS.2022.3160729
Yoga ANagarakatte SMcKinley KFisher K(2019)Parallelism-centric what-if and differential analysesProceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation10.1145/3314221.3314621(485-501)Online publication date: 8-Jun-2019
https://dl.acm.org/doi/10.1145/3314221.3314621
Zhao JQin ZYang H(2019)Distributed Parallelizability Analysis and Optimization of Legacy Code in Cloud Migration2019 IEEE Intl Conf on Parallel & Distributed Processing with Applications, Big Data & Cloud Computing, Sustainable Computing & Communications, Social Computing & Networking (ISPA/BDCloud/SocialCom/SustainCom)10.1109/ISPA-BDCloud-SustainCom-SocialCom48970.2019.00103(679-684)Online publication date: Dec-2019
https://doi.org/10.1109/ISPA-BDCloud-SustainCom-SocialCom48970.2019.00103
Eddine Debbi AFarhat Hamida ABakhti H(2019)Would it be Profitable Enough to Re-adapt Algorithmic Thinking for Parallelism Paradigm2019 2nd International Conference on new Trends in Computing Sciences (ICTCS)10.1109/ICTCS.2019.8923085(1-6)Online publication date: Oct-2019
https://doi.org/10.1109/ICTCS.2019.8923085
Bubel RHähnle RHeydari Tabar A(2019)A Program Logic for Dependence AnalysisIntegrated Formal Methods10.1007/978-3-030-34968-4_5(83-100)Online publication date: 22-Nov-2019
https://doi.org/10.1007/978-3-030-34968-4_5
Zhou FGan YMa SWang YArpaci-Dusseau AVoelker G(2018)wPerfProceedings of the 13th USENIX conference on Operating Systems Design and Implementation10.5555/3291168.3291207(527-543)Online publication date: 8-Oct-2018
https://dl.acm.org/doi/10.5555/3291168.3291207
Curtsinger CBerger E(2018)CozCommunications of the ACM10.1145/320591161:6(91-99)Online publication date: 23-May-2018
https://dl.acm.org/doi/10.1145/3205911
Şahin SGedik B(2018)C-StreamACM Transactions on Parallel Computing10.1145/31841204:3(1-27)Online publication date: 27-Apr-2018
https://dl.acm.org/doi/10.1145/3184120
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents