Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1993498.1993553acmconferencesArticle/Chapter ViewAbstractPublication PagespldiConference Proceedingsconference-collections
research-article

Kremlin: rethinking and rebooting gprof for the multicore age

Published: 04 June 2011 Publication History

Abstract

Many recent parallelization tools lower the barrier for parallelizing a program, but overlook one of the first questions that a programmer needs to answer: which parts of the program should I spend time parallelizing?
This paper examines Kremlin, an automatic tool that, given a serial version of a program, will make recommendations to the user as to what regions (e.g. loops or functions) of the program to attack first. Kremlin introduces a novel hierarchical critical path analysis and develops a new metric for estimating the potential of parallelizing a region: self-parallelism. We further introduce the concept of a parallelism planner, which provides a ranked order of specific regions to the programmer that are likely to have the largest performance impact when parallelized. Kremlin supports multiple planner personalities, which allow the planner to more effectively target a particular programming environment or class of machine.
We demonstrate the effectiveness of one such personality, an OpenMP planner, by comparing versions of programs that are parallelized according to Kremlin's plan against third-party manually parallelized versions. The results show that Kremlin's OpenMP planner is highly effective, producing plans whose performance is typically comparable to, and sometimes much better than, manual parallelization. At the same time, these plans would require that the user parallelize significantly fewer regions of the program.

References

[1]
NAS Parallel Benchmarks 2.3; OpenMP C. www.hpcc.jp/Omni/.
[2]
Spec OMP2001 Benchmarks. http://www.spec.org/omp.
[3]
F. Allen, M. Burke, R. Cytron, J. Ferrante, W. Hsieh, and V. Sarkar. "A framework for determining useful parallelism". In Proceedings of the 2nd international conference on Supercomputing, ICS '88, 1988.
[4]
T. E. Anderson, and E. D. Lazowska. "Quartz: A tool for tuning parallel program performance". In SIGMETRICS, vol. 18, 1990.
[5]
T. Austin, and G. S. Sohi. "Dynamic dependency analysis of ordinary programs". In ISCA, 1992.
[6]
Bailey et al. "The NAS parallel benchmarks. In SC, 1991.
[7]
W. Blume, R. Doallo, R. Eigenmann, J. Grout, J. Hoeflinger, T. Lawrence, J. Lee, D. Padua, W. Paek, Y. Pottenger, L. Rauchwerger, and P. Tu. "Parallel programming with Polaris". IEEE Computer, Aug 2002.
[8]
J. M. Bull, and D. O'Neill. "A microbenchmark suite for openmp 2.0. SIGARCH Comput. Archit. News, December 2001.
[9]
M. K. Chen, and K. Olukotun. "The Jrpm system for dynamically parallelizing Java programs". In ISCA, 2003.
[10]
D. Dig, J. Marrero, and M. D. Ernst. "Refactoring sequential java code for concurrency via concurrent libraries". In ICSE, 2009.
[11]
Z. H. Du, C. C. Lim, X. F. Li, C. Yang, Q. Zhao, and T. F. Ngai. "A cost-driven compilation framework for speculative parallelization of sequential programs". In PLDI, 2004.
[12]
M. W. Hall, J. M. Anderson, S. P. Amarasinghe, B. R. Murphy, S.-W. Liao, and E. Bu. "Maximizing multiprocessor performance with the SUIF compiler". IEEE Computer, Aug 1996.
[13]
C. Hammacher, K. Streit, S. Hack, and A. Zeller. "Profiling java programs for parallelism". In Proceedings of the 2009 ICSE Workshop on Multicore Software Engineering, IWMSE '09, 2009.
[14]
Y. He, C. Leiserson, and W. Leiserson. "The Cilkview Scalability Analyzer". In SPAA, 2010.
[15]
D. Jeon, S. Garcia, C. Louie, S. Kota Venkata, and M. Taylor. "Kremlin: Like gprof, but for Parallelization". In Principles and Practice of Parallel Programming, 2011.
[16]
G. Jost, H. Jin, J. Labarta, and J. Gimenez. "Interfacing computer aided parallelization and performance analysis". In Computational Science ICCS 2003, vol. 2660 of Lecture Notes in Computer Science, 715--715. 2003.
[17]
K. Kelsey, T. Bai, C. Ding, and C. Zhang. "Fast track: A software system for speculative program optimization". In CGO, 2009.
[18]
K. Kennedy, K. S. McKinley, and C. W. Tseng. "Interactive parallel programming using the parascope editor". IEEE TPDS, 1991.
[19]
M. Kim, H. Kim, and C.-K. Luk. "Sd3: A scalable approach to dynamic data-dependence profiling". Microarchitecture, IEEE/ACM International Symposium on, 2010.
[20]
S. Kota Venkata, I. Ahn, D. Jeon, A. Gupta, C. Louie, S. Garcia, S. Belongie, and M. Taylor. "SD-VBS: The San Diego Vision Benchmark Suite". In IISWC, 2009.
[21]
D. Kuck, Y. Muraoka, and S.-C. Chen. "On the number of operations simultaneously executable in fortran-like programs and their resulting speedup". IEEE Transactions on Computers, Dec. 1972.
[22]
M. Kulkarni, M. Burtscher, R. Inkulu, K. Pingali, and C. Casçaval. "How much parallelism is there in irregular applications"? In PPoPP, 2009.
[23]
M. Kumar. "Measuring parallelism in computation-intensive scientific/engineering applications". IEEE TOC, Sep 1988.
[24]
J. R. Larus. "Loop-level parallelism in numeric and symbolic programs". IEEE Trans. Parallel Distrib. Syst., 1993.
[25]
C. Lattner, and V. Adve. "LLVM: A compilation framework for lifelong program analysis & transformation". In CGO, Mar 2004.
[26]
W. Lee, R. Barua, M. Frank, D. Srikrishna, J. Babb, V. Sarkar, and S. Amarasinghe. "Space-time scheduling of instruction-level parallelism on a Raw machine". In ASPLOS, October 1998.
[27]
C. E. Leiserson. "The Cilk concurrency platform. In DAC, 2009.
[28]
S.-W. Liao, A. Diwan, R. P. Bosch, Jr., A. Ghuloum, and M. S. Lam. "Suif explorer: an interactive and interprocedural parallelizer". In Proceedings of the ACM SIGPLAN symposium on Principles and practice of parallel programming, 1999.
[29]
W. Liu, J. Tuck, L. Ceze, W. Ahn, K. Strauss, J. Renau, and J. Torrellas. "POSH: a TLS compiler that exploits program structure". In PPoPP, 2006.
[30]
N. Nethercote, and J. Seward. "Valgrind: A framework for heavyweight dynamic binary instrumentation". In PLDI, 2007.
[31]
L. Rauchwerger, and D. Padua. "The LRPD test: speculative run-time parallelization of loops with privatization and reduction parallelization". In PLDI, 1995.
[32]
A. Rountev, K. Van Valkenburgh, D. Yan, and P. Sadayappan. "Understanding parallelism-inhibiting dependences in sequential java programs". In Software Maintenance (ICSM), 2010 IEEE International Conference on, Sept 2010.
[33]
V. A. Saraswat, V. Sarkar, and C. von Praun. "X10: concurrent programming for modern architectures". In PPoPP, 2007.
[34]
G. Sohi, S. Breach, and T. Vijaykumar. "Multiscalar processors". In ISCA, 1995.
[35]
N. R. Tallent, and J. M. Mellor Crummey. "Effective performance measurement and analysis of multithreaded applications". In PPoPP, 2009.
[36]
W. Thies, S. Hall, and S. Amarasinghe. "Manipulating lossless video in the compressed domain". In ACM Multimedia, 2009.
[37]
C. Tian, M. Feng, V. Nagarajan, and R. Gupta. "Copy or discard execution model for speculative parallelization on multicores". In MICRO, 2008.
[38]
G. Tournavitis, Z. Wang, B. Franke, and M. F. P. O'Boyle. "Towards a holistic approach to auto-parallelization: integrating profile-driven parallelism detection and machine-learning based mapping". In PLDI, 2009.
[39]
C. von Praun, R. Bordawekar, and C. Cascaval. "Modeling optimistic concurrency using quantitative dependence analysis". In PPoPP, 2008.
[40]
J. Wloka, M. Sridharan, and F. Tip. "Refactoring for reentrancy". In FSE, 2009.
[41]
P. Wu, A. Kejariwal, and C. Caşcaval. "Compiler-driven dependence profiling to guide program parallelization". In LCPC, 232--248. 2008.
[42]
B. Xin, and X. Zhang. "Efficient online detection of dynamic control dependence". In ISSTA, 2007.
[43]
X. Zhang, A. Navabi, and S. Jagannathan. "Alchemist: A transparent dependence distance profiling infrastructure". In CGO, 2009.
[44]
Y. Zhang, and R. Gupta. "Timestamped whole program path representation and its applications". In PLDI, 2001.
[45]
Q. Zhao, D. Bruening, and S. Amarasinghe. "Umbra: Efficient and scalable memory shadowing". In CGO, 2010.
[46]
H. Zhong, M. Mehrara, S. Lieberman, and S. Mahlke. "Uncovering hidden loop level parallelism in sequential applications". In HPCA, 2008.

Cited By

View all
  • (2024)FlowProf: Profiling Multi-threaded Programs using Information-FlowProceedings of the 33rd ACM SIGPLAN International Conference on Compiler Construction10.1145/3640537.3641577(137-149)Online publication date: 17-Feb-2024
  • (2023)Multigraph learning for parallelism discovery in sequential programsConcurrency and Computation: Practice and Experience10.1002/cpe.764835:9Online publication date: 13-Feb-2023
  • (2022)Accelerating Data Dependence Profiling Through Abstract Interpretation of Loop InstructionsIEEE Access10.1109/ACCESS.2022.316072910(31626-31640)Online publication date: 2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
PLDI '11: Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation
June 2011
668 pages
ISBN:9781450306638
DOI:10.1145/1993498
  • General Chair:
  • Mary Hall,
  • Program Chair:
  • David Padua
  • cover image ACM SIGPLAN Notices
    ACM SIGPLAN Notices  Volume 46, Issue 6
    PLDI '11
    June 2011
    652 pages
    ISSN:0362-1340
    EISSN:1558-1160
    DOI:10.1145/1993316
    Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 June 2011

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. hierarchical critical path analysis
  2. parallel software engineering
  3. parallelism planner
  4. self-parallelism

Qualifiers

  • Research-article

Conference

PLDI '11
Sponsor:

Acceptance Rates

Overall Acceptance Rate 406 of 2,067 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)21
  • Downloads (Last 6 weeks)6
Reflects downloads up to 12 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)FlowProf: Profiling Multi-threaded Programs using Information-FlowProceedings of the 33rd ACM SIGPLAN International Conference on Compiler Construction10.1145/3640537.3641577(137-149)Online publication date: 17-Feb-2024
  • (2023)Multigraph learning for parallelism discovery in sequential programsConcurrency and Computation: Practice and Experience10.1002/cpe.764835:9Online publication date: 13-Feb-2023
  • (2022)Accelerating Data Dependence Profiling Through Abstract Interpretation of Loop InstructionsIEEE Access10.1109/ACCESS.2022.316072910(31626-31640)Online publication date: 2022
  • (2019)Parallelism-centric what-if and differential analysesProceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation10.1145/3314221.3314621(485-501)Online publication date: 8-Jun-2019
  • (2019)Distributed Parallelizability Analysis and Optimization of Legacy Code in Cloud Migration2019 IEEE Intl Conf on Parallel & Distributed Processing with Applications, Big Data & Cloud Computing, Sustainable Computing & Communications, Social Computing & Networking (ISPA/BDCloud/SocialCom/SustainCom)10.1109/ISPA-BDCloud-SustainCom-SocialCom48970.2019.00103(679-684)Online publication date: Dec-2019
  • (2019)Would it be Profitable Enough to Re-adapt Algorithmic Thinking for Parallelism Paradigm2019 2nd International Conference on new Trends in Computing Sciences (ICTCS)10.1109/ICTCS.2019.8923085(1-6)Online publication date: Oct-2019
  • (2019)A Program Logic for Dependence AnalysisIntegrated Formal Methods10.1007/978-3-030-34968-4_5(83-100)Online publication date: 22-Nov-2019
  • (2018)wPerfProceedings of the 13th USENIX conference on Operating Systems Design and Implementation10.5555/3291168.3291207(527-543)Online publication date: 8-Oct-2018
  • (2018)CozCommunications of the ACM10.1145/320591161:6(91-99)Online publication date: 23-May-2018
  • (2018)C-StreamACM Transactions on Parallel Computing10.1145/31841204:3(1-27)Online publication date: 27-Apr-2018
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media