research-article

Free access

Optimistic parallelism requires abstractions

Authors:

Milind Kulkarni,

Keshav Pingali,

Ganesh Ramanarayanan,

L. Paul ChewAuthors Info & Claims

Communications of the ACM, Volume 52, Issue 9

Pages 89 - 97

https://doi.org/10.1145/1562164.1562188

Published: 01 September 2009 Publication History

All formats PDF

Abstract

The problem of writing software for multicore processors is greatly simplified if we could automatically parallelize sequential programs. Although auto-parallelization has been studied for many decades, it has succeeded only in a few application areas such as dense matrix computations. In particular, auto-parallelization of irregular programs, which are organized around large, pointer-based data structures like graphs, has seemed intractable.

The Galois project is taking a fresh look at autoparallelization. Rather than attempt to parallelize all programs no matter how obscurely they are written, we are designing programming abstractions that permit programmers to highlight opportunities for exploiting parallelism in sequential programs, and building a runtime system that uses these hints to execute the program in parallel. In this paper, we describe the design and implementation of a system based on these ideas. Experimental results for two real-world irregular applications, a Delaunay mesh refinement application and a graphics application that performs agglomerative clustering, demonstrate that this approach is promising.

References

[1]

Burke, M., Carini, P., Choi, J.-D. Interprocedural Pointer Alias Analysis. Technical Report IBM RC 21055, IBM Yorktown Heights, 1997.

[2]

Chew, L.P. Guaranteed-quality mesh generation for curved surfaces. In SCG'93: Proceedings of the 9th Annual Symposium on Computational Geometry (1993), 274--280.

Digital Library

[3]

de Galas, J. The quest for more processing power: is the single core CPU doomed? http://www.anandtech.com/cpuchipsets/showdoc.aspx?I=2377, February 2005.

[4]

Diniz, P.C., Rinard, M.C. Commutativity analysis: a new analysis technique for parallelizing compilers. ACM Trans. Prog. Lang. Syst. 19, 6 (1997), 942--991.

Digital Library

[5]

Ghiya, R., Hendren, L. Is it a tree, a dag, or a cyclic graph? A shape analysis for heap-directed pointers c. In POPL, 1996.

Digital Library

[6]

Herlihy, M., Koskinen, E. Transactional boosting: a methodology for highlyconcurrent transactional objects. In Principles and Practices of Parallel Programming (PPoPP), 2008.

Digital Library

[7]

Herlihy, M., Moss, J.E.B. Transactional memory: architectural support for lock-free data structures. In ISCA '93: Proceedings of the 20th Annual International Symposium on Computer Architecture (1993).

Digital Library

[8]

Hudson, B., Miller, G.L., Phillips, T. Sparse parallel Delaunay mesh refinement. In SPAA (2007).

Digital Library

[9]

Jefferson, D.R. Virtual time. ACM Trans. Prog. Lang. Syst. 7, 3 (1985), 404--425.

Digital Library

[10]

Kennedy, K., Allen, J., editors. Optimizing Compilers for Modern Architectures: A Dependence-Based Approach. Morgan Kaufmann, 2001.

Digital Library

[11]

Kulkarni, M., Burtscher, M., Inkulu, R., Pingali, K., Cascaval, C. How much parallelism is there in irregular applications? In Principles and Practices of Parallel Programming (PPoPP), 2009.

Digital Library

[12]

Kulkarni, M., Carribault, P., Pingali, K., Ramanarayanan, G., Walter, B., Bala, K., Chew, L.P. Scheduling strategies for optimistic parallel execution of irregular programs. In Symposium on Parallel Architectures and Algorithms (SPAA) (2008).

Digital Library

[13]

Kulkarni M., Pingali, K., Ramanarayanan, G., Walter, B., Bala, K., Chew, L.P. Optimistic parallelism benefits from data partitioning. SIGARCH Comput. Archit. News 36, 1 (2008), 233--243.

Digital Library

[14]

Kulkarni, M., Pingali, K., Walter, B., Ramanarayanan, G., Bala, K., Chew, L.P. Optimistic parallelism requires abstractions. SIGPLAN Not (Proceedings of PLDI 2007) 42, 6 (2007), 211--222.

Digital Library

[15]

Larus, J., Rajwar, R. Transactional Memory (Synthesis Lectures on Computer Architecture). Morgan&Claypool Publishers, 2007.

Digital Library

[16]

Ni, Y., Menon, V., Adl-Tabatabai, A.-R., Hosking, A.L., Hudson, R., Moss, J.E.B., Saha, B., Shpeisman, T. Open nesting in software transactional memory. In Principles and Practices of Parallel Programming (PPoPP), 2007.

Digital Library

[17]

Pang-Ning Tan, M.S., Kumar, V., editors. Introduction to Data Mining. Pearson Addison Wesley, 2005.

Digital Library

[18]

Ponnusamy, R., Saltz, J., Choudhary, A. Runtime compilation techniques for data partitioning and communication schedule reuse. In Proceedings of the 1993 ACM/IEEE Conference on Supercomputing (1993).

Digital Library

[19]

Rauchwerger, L., Padua, D.A. The LRPD test: Speculative run-time parallelization of loops with privatization and reduction parallelization. IEEE Trans. Parallel Distrib. Syst. 10, 2 (1999), 160--180.

Digital Library

[20]

Sagiv, M., Reps, T., Wilhelm, R. Solving shape-analysis problems in languages with destructive updating. In Proceedings of the 23rd Annual ACM Symposium on the Principles of Programming Languages (st. Petersburg Beach, FL, January 1996).

Digital Library

[21]

Shewchuk, J.R. Triangle: Engineering a 2D quality mesh generator and Delaunay triangulator. In Applied Computational Geometry: Towards Geometric Engineering, volume 1148 of Lecture Notes in Computer Science. May 1996, 203--222.

Digital Library

[22]

Steffan, J.G., Colohan, C.B., Zhai, A., Mowry, T.C. A scalable approach to thread-level speculation. In ISCA '00: Proceedings of the 27th Annual International Symposium on Computer Architecture (2000).

Digital Library

[23]

Walter, B., Fernandez, S., Arbree, A., Bala, K., Donikian, M., Greenberg, D. Lightcuts: a scalable approach to illumination. ACM Trans. Graphics (SIGGRAPH) 24, 3 (July 2005), 1098--1107.

Digital Library

[24]

Zhan, L.R.Y., Torrellas, J. Hardware for speculative run-time parallelization in distributed shared-memory multiprocessors. In HPCA '98: Proceedings of the 4th International Symposium on High-Performance Computer Architecture (1998).

Digital Library

Cited By

Martínez MFraguela BCabaleiro JRivera F(2024)A new thread-level speculative automatic parallelization model and library based on duplicate code executionThe Journal of Supercomputing10.1007/s11227-024-05987-0Online publication date: 11-Mar-2024
https://doi.org/10.1007/s11227-024-05987-0
Fuchs PMargan DGiceva J(2023)Sortledton: a Universal Graph Data StructureACM SIGMOD Record10.1145/3604437.360444252:1(17-25)Online publication date: 8-Jun-2023
https://dl.acm.org/doi/10.1145/3604437.3604442
Fuchs PMargan DGiceva J(2022)SortledtonProceedings of the VLDB Endowment10.14778/3514061.351406515:6(1173-1186)Online publication date: 22-Jun-2022
https://dl.acm.org/doi/10.14778/3514061.3514065
Show More Cited By

Index Terms

Optimistic parallelism requires abstractions
1. Computing methodologies
  1. Concurrent computing methodologies
    1. Concurrent programming languages
2. Software and its engineering
  1. Software notations and tools
    1. General programming languages
      1. Language types
        Concurrent programming languages

Recommendations

Optimistic parallelism requires abstractions
PLDI '07: Proceedings of the 28th ACM SIGPLAN Conference on Programming Language Design and Implementation

Irregular applications, which manipulate large, pointer-based data structures like graphs, are difficult to parallelize manually. Automatic tools and techniques such as restructuring compilers and run-time speculative execution have failed to uncover ...
Optimistic parallelism requires abstractions
Proceedings of the 2007 PLDI conference

Irregular applications, which manipulate large, pointer-based data structures like graphs, are difficult to parallelize manually. Automatic tools and techniques such as restructuring compilers and run-time speculative execution have failed to uncover ...
Performance of hybrid message-passing and shared-memory parallelism for discrete element modeling
SC '00: Proceedings of the 2000 ACM/IEEE conference on Supercomputing

The current trend in HPC hardware is towards clusters of shared-memory (SMP) compute nodes. For applications developers the major question is how best to program these SMP clusters. To address this we study an algorithm from Discrete Element Modeling, ...

Comments

Information & Contributors

Information

Published In

cover image Communications of the ACM

Communications of the ACM Volume 52, Issue 9

The Status of the P versus NP Problem

September 2009

139 pages

ISSN:0001-0782

EISSN:1557-7317

DOI:10.1145/1562164

Issue’s Table of Contents

Copyright © 2009 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 September 2009

Published in CACM Volume 52, Issue 9

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article
Popular
Refereed

Funding Sources

National Science Foundation

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

20
Total Citations
View Citations
5,327
Total Downloads

Downloads (Last 12 months)240
Downloads (Last 6 weeks)46

Reflects downloads up to 12 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Martínez MFraguela BCabaleiro JRivera F(2024)A new thread-level speculative automatic parallelization model and library based on duplicate code executionThe Journal of Supercomputing10.1007/s11227-024-05987-0Online publication date: 11-Mar-2024
https://doi.org/10.1007/s11227-024-05987-0
Fuchs PMargan DGiceva J(2023)Sortledton: a Universal Graph Data StructureACM SIGMOD Record10.1145/3604437.360444252:1(17-25)Online publication date: 8-Jun-2023
https://dl.acm.org/doi/10.1145/3604437.3604442
Fuchs PMargan DGiceva J(2022)SortledtonProceedings of the VLDB Endowment10.14778/3514061.351406515:6(1173-1186)Online publication date: 22-Jun-2022
https://dl.acm.org/doi/10.14778/3514061.3514065
Dickerson TKoskinen EGazzillo PHerlihy M(2019)Conflict Abstractions and Shadow Speculation for Optimistic Transactional ObjectsProgramming Languages and Systems10.1007/978-3-030-34175-6_16(313-331)Online publication date: 18-Nov-2019
https://doi.org/10.1007/978-3-030-34175-6_16
Estebanez ALlanos DGonzalez-Escribano A(2017)Using the Xeon Phi Platform to Run Speculatively-Parallelized CodesInternational Journal of Parallel Programming10.1007/s10766-016-0421-x45:2(225-241)Online publication date: 1-Apr-2017
https://dl.acm.org/doi/10.1007/s10766-016-0421-x
Estebanez ALlanos DGonzalez-Escribano A(2016)A Survey on Thread-Level Speculation TechniquesACM Computing Surveys10.1145/293836949:2(1-39)Online publication date: 30-Jun-2016
https://dl.acm.org/doi/10.1145/2938369
Estebanez ALlanos DGonzalez-Escribano A(2016)New Data Structures to Handle Speculative Parallelization at RuntimeInternational Journal of Parallel Programming10.1007/s10766-014-0347-044:3(407-426)Online publication date: 1-Jun-2016
https://dl.acm.org/doi/10.1007/s10766-014-0347-0
McCune RWeninger TMadey G(2015)Thinking Like a VertexACM Computing Surveys10.1145/281818548:2(1-39)Online publication date: 12-Oct-2015
https://dl.acm.org/doi/10.1145/2818185
Tithi JMatani DMenghani GChowdhury R(2013)Avoiding Locks and Atomic Instructions in Shared-Memory Parallel BFS Using Optimistic ParallelizationProceedings of the 2013 IEEE 27th International Symposium on Parallel and Distributed Processing Workshops and PhD Forum10.1109/IPDPSW.2013.241(1628-1637)Online publication date: 20-May-2013
https://dl.acm.org/doi/10.1109/IPDPSW.2013.241
Philippsen MTillmann NBrinkers D(2013)Double Inspection for Run-Time Loop ParallelizationLanguages and Compilers for Parallel Computing10.1007/978-3-642-36036-7_4(46-60)Online publication date: 2013
https://doi.org/10.1007/978-3-642-36036-7_4
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Digital Edition

View this article in digital edition.

Digital Edition

Magazine Site

View this article on the magazine site (external)

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Figures

Tables

Media

View Issue’s Table of Contents