research-article

Optimistic parallelism benefits from data partitioning

Authors:

Milind Kulkarni,

Keshav Pingali,

Ganesh Ramanarayanan,

L. Paul ChewAuthors Info & Claims

ACM SIGARCH Computer Architecture News, Volume 36, Issue 1

Pages 233 - 243

https://doi.org/10.1145/1353534.1346311

Published: 01 March 2008 Publication History

Abstract

Recent studies of irregular applications such as finite-element mesh generators and data-clustering codes have shown that these applications have a generalized data parallelism arising from the use of iterative algorithms that perform computations on elements of worklists. In some irregular applications, the computations on different elements are independent. In other applications, there may be complex patterns of dependences between these computations.

The Galois system was designed to exploit this kind of irregular data parallelism on multicore processors. Its main features are (i) two kinds of set iterators for expressing worklist-based data parallelism, and (ii) a runtime system that performs optimistic parallelization of these iterators, detecting conflicts and rolling back computations as needed. Detection of conflicts and rolling back iterations requires information from class implementors.

In this paper, we introduce mechanisms to improve the execution efficiency of Galois programs: data partitioning, data-centric work assignment, lock coarsening, and over-decomposition. These mechanisms can be used to exploit locality of reference, reduce mis-speculation, and lower synchronization overhead. We also argue that the design of the Galois system permits these mechanisms to be used with relatively little modification to the user code. Finally, we present experimental results that demonstrate the utility of these mechanisms.

Supplementary Material

JPG File (1346311.jpg)

Download
14.75 KB

index.html (index.html)

Slides from the presentation

Download
.98 KB

ZIP File (p233-kulkarni-slides.zip)

Supplemental material for Optimistic parallelism benefits from data partitioning

Download
13.17 MB

Audio only (1346311.mp3)

Download
9.00 MB

Video (1346311.mp4)

Download
124.38 MB

References

[1]

Yuri Boykov and Vladimir Kolmogorov. An experimental comparison of min-cut/max-flow algorithms for energy minimization in vision. International Journal of Computer Vision (IJCV), 70(2):109--131, 2006.

Digital Library

[2]

Donald D. Chamberlin, Morton M. Astrahan, Michael W. Blasgen, James N. Gray, W. Frank King, Bruce G. Lindsay, Raymond Lorie, James W. Mehl, Thomas G. Price, Franco Putzolu, Patricia Griffiths Selinger, Mario Schkolnick, Donald R. Slutz, Irving L. Traiger, Bradford W. Wade, and Robert A. Yost. A history and evaluation of system R, pages 54--68. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1994.

[3]

Shimin Chen, Phillip B. Gibbons, Michael Kozuch, Vasileios Liaskovitis, Anastassia Ailamaki, Guy E. Blelloch, Babak Falsafi, Limor Fix, Nikos Hardavellas, Todd C. Mowry, and Chris Wilkerson. Scheduling threads for constructive cache sharing on cmps. In SPAA '07: Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures, pages 105--115, New York, NY, USA, 2007. ACM Press.

Digital Library

[4]

Thomas Cormen, Charles Leiserson, Ronald Rivest, and Clifford Stein, editors. Introduction to Algorithms. MIT Press, 2001.

Digital Library

[5]

Andrew V. Goldberg and Robert E. Tarjan. A new approach to the maximum-flow problem. J. ACM, 35(4):921--940, 1988.

Digital Library

[6]

Michael I. Gordon, William Thies, and Saman Amarasinghe. Exploiting coarse-grained task, data, and pipeline parallelism in stream programs. In ASPLOS-XII: Proceedings of the 12th international conference on Architectural support for programming languages and operating systems, pages 151--162, New York, NY, USA, 2006. ACM Press.

Digital Library

[7]

Tim Harris and Keir Fraser. Language support for lightweight transactions. In OOPSLA '03: Proceedings of the 18th annual ACM SIGPLAN conference on Object-oriented programing, systems, languages, and applications, pages 388--402, New York, NY, USA, 2003.

Digital Library

[8]

L. Hendren and A. Nicolau. Parallelizing programs with recursive data structures. IEEE Transactions on Parallel and Distributed Systems, 1(1):35--47, January 1990.

Digital Library

[9]

Maurice Herlihy and J. Eliot B. Moss. Transactional memory: architectural support for lock-free data structures. In ISCA '93: Proceedings of the 20th annual international symposium on Computer architecture, 1993.

Digital Library

[10]

S. Horowitz, P. Pfieffer, and T. Reps. Dependence analysis for pointer variables. In Proceedings of the SIGPLAN '89 Conference on Program Language Design and Implementation, Portland, OR, June 1989.

Digital Library

[11]

Benoît Hudson, Gary L. Miller, and Todd Phillips. Sparse parallel delaunay mesh refinement. In SPAA '07: Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures, pages 339--347, New York, NY, USA, 2007. ACM Press.

Digital Library

[12]

Intel Corporation. Intel thread building blocks 2.0. http://osstbb.intel.com.

[13]

G. Karypis and V. Kumar. Multilevel k-way partitioning scheme for irregular graphs. Journal of Parallel and Distributed Computing, 48(1):96--129, 1998.

Digital Library

[14]

Ken Kennedy and John Allen, editors. Optimizing compilers for modren architectures:a dependence-based approach. Morgan Kaufmann, 2001.

Digital Library

[15]

B. W. Kernighan and S. Lin. An effective heuristic procedure for partitioning graphs. The Bell System Technical Journal, pages 291--308, February 1970.

[16]

Venkata Krishnan and Josep Torrellas. A chip-multiprocessor architecture with speculative multithreading. IEEE Trans. Comput., 48(9):866--880, 1999.

Digital Library

[17]

Milind Kulkarni, Keshav Pingali, Bruce Walter, Ganesh Ramanarayanan, Kavita Bala, and L. Paul Chew. Optimistic parallelism requires abstractions. SIGPLAN Not. (Proceedings of PLDI 2007), 42(6):211--222, 2007.

Digital Library

[18]

J. R. Larus and P. N. Hilfinger. Detecting conflicts between structure accesses. In PLDI '88: Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation, pages 24--31, New York, NY, USA, 1988. ACM Press.

Digital Library

[19]

Kevin E. Moore, Jayaram Bobba, Michelle J. Moravan, Mark D. Hill, and David A. Wood. Logtm: Log-based transactional memory. In HPCA '06: Proceedings of the 12th International Symposium on High Performance Computer Architecture, 2006.

[20]

Andreas Müller and Roland Rühl. Extending high performance fortran for the support of unstructured computations. In ICS '95: Proceedings of the 9th international conference on Supercomputing, pages 127--136, New York, NY, USA, 1995. ACM Press.

Digital Library

[21]

Yang Ni, Vijay Menon, Ali-Reza Adl-Tabatabai, Antony L. Hosking, Rick Hudson, J. Eliot B. Moss, Bratin Saha, and Tatiana Shpeisman. Open nesting in software transactional memory. In Principles and Practices of Parallel Programming (PPoPP), 2007.

Digital Library

[22]

Lawrence Rauchwerger and David A. Padua. The LRPD test: Speculative run-time parallelization of loops with privatization and reduction parallelization. IEEE Trans. Parallel Distrib. Syst., 10(2):160--180, 1999.

Digital Library

[23]

Anne Rogers and Keshav Pingali. Process decomposition through locality of reference. In ACM Symposium on Programming Language Design and Implementation, pages 69--80, 1989.

Digital Library

[24]

M. Sagiv, T. Reps, and R. Wilhelm. Solving shape-analysis problems in languages with destructive updating. ACM Transactions on Programming Languages and Systems, 20(1):1--50, January 1998.

Digital Library

[25]

Bratin Saha, Ali-Reza Adl-Tabatabai, Richard L. Hudson, Chi Cao Minh, and Benjamin Hertzberg. McRT-STM: a high performance software transactional memory system for a multi-core runtime. In PPoPP '06: Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming, pages 187--197, New York, NY, USA, 2006. ACM Press.

Digital Library

[26]

Michael Scott, Michael F. Spear, Luke Dalessandro, and Virendra J. Marathe. Delaunay triangulation with transactions and barriers. In IEEE Intl. Symp. on Workload Characterization (IISWC), Boston, MA, September 2007.

Digital Library

[27]

Jonathan Richard Shewchuk. Triangle: Engineering a 2D Quality Mesh Generator and Delaunay Triangulator. In Applied Computational Geometry: Towards Geometric Engineering, volume 1148 of Lecture Notes in Computer Science, pages 203--222. Springer-Verlag, 1996.

Digital Library

[28]

A. Sohn and H. D. Simon. S-HARP: A parallel dynamic spectral partitioner. Lecture Notes in Computer Science, 1457:376--385, 1998.

Digital Library

[29]

Pang-Ning Tan, Michael Steinbach, and Vipin Kumar, editors. Introduction to Data Mining. Pearson Addison Wesley, 2005.

Digital Library

[30]

Leslie G. Valiant. A bridging model for parallel computation. Commun. ACM, 33(8):103--111, 1990.

Digital Library

[31]

Bruce Walter, Sebastian Fernandez, Adam Arbree, Kavita Bala, Michael Donikian, and Donald Greenberg. Lightcuts: a scalable approach to illumination. ACM Transactions on Graphics (SIGGRAPH), 24(3):1098--1107, July 2005.

Digital Library

Cited By

Shashidhar GNasre R(2017)LightHouse: An Automatic Code Generator for Graph Algorithms on GPUsLanguages and Compilers for Parallel Computing10.1007/978-3-319-52709-3_18(235-249)Online publication date: 24-Jan-2017
https://doi.org/10.1007/978-3-319-52709-3_18
Neves D(2016)EPIC: A framework to exploit parallelism in irregular codesConcurrency and Computation: Practice and Experience10.1002/cpe.384229:2Online publication date: 12-May-2016
https://doi.org/10.1002/cpe.3842
Devietti JWood BStrauss KCeze LGrossman DQadeer S(2012)RADISHACM SIGARCH Computer Architecture News10.1145/2366231.233718240:3(201-212)Online publication date: 9-Jun-2012
https://dl.acm.org/doi/10.1145/2366231.2337182
Show More Cited By

Index Terms

Optimistic parallelism benefits from data partitioning
1. Computing methodologies
  1. Concurrent computing methodologies
    1. Concurrent programming languages
2. Software and its engineering
  1. Software notations and tools
    1. General programming languages
      1. Language features
      2. Language types
        Concurrent programming languages

Recommendations

Optimistic parallelism benefits from data partitioning
ASPLOS '08

Recent studies of irregular applications such as finite-element mesh generators and data-clustering codes have shown that these applications have a generalized data parallelism arising from the use of iterative algorithms that perform computations on ...
Optimistic parallelism benefits from data partitioning
ASPLOS XIII: Proceedings of the 13th international conference on Architectural support for programming languages and operating systems

Recent studies of irregular applications such as finite-element mesh generators and data-clustering codes have shown that these applications have a generalized data parallelism arising from the use of iterative algorithms that perform computations on ...
Optimistic parallelism benefits from data partitioning
ASPLOS '08

Recent studies of irregular applications such as finite-element mesh generators and data-clustering codes have shown that these applications have a generalized data parallelism arising from the use of iterative algorithms that perform computations on ...

Comments

Information & Contributors

Information

Published In

cover image ACM SIGARCH Computer Architecture News

ACM SIGARCH Computer Architecture News Volume 36, Issue 1

ASPLOS '08

March 2008

339 pages

ISSN:0163-5964

DOI:10.1145/1353534

Issue’s Table of Contents

ASPLOS XIII: Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
March 2008
352 pages
ISBN:9781595939586
DOI:10.1145/1346281
General Chair:
Susan Eggers
University of Washington, USA
,
Program Chair:
James Larus
Microsoft Research, USA

Copyright © 2008 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 March 2008

Published in SIGARCH Volume 36, Issue 1

Check for updates

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

92
Total Citations
View Citations
1,267
Total Downloads

Downloads (Last 12 months)9
Downloads (Last 6 weeks)4

Reflects downloads up to 01 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Shashidhar GNasre R(2017)LightHouse: An Automatic Code Generator for Graph Algorithms on GPUsLanguages and Compilers for Parallel Computing10.1007/978-3-319-52709-3_18(235-249)Online publication date: 24-Jan-2017
https://doi.org/10.1007/978-3-319-52709-3_18
Neves D(2016)EPIC: A framework to exploit parallelism in irregular codesConcurrency and Computation: Practice and Experience10.1002/cpe.384229:2Online publication date: 12-May-2016
https://doi.org/10.1002/cpe.3842
Devietti JWood BStrauss KCeze LGrossman DQadeer S(2012)RADISHACM SIGARCH Computer Architecture News10.1145/2366231.233718240:3(201-212)Online publication date: 9-Jun-2012
https://dl.acm.org/doi/10.1145/2366231.2337182
Liu JJaiyen BVeras RMutlu O(2012)RAIDRACM SIGARCH Computer Architecture News10.1145/2366231.233716140:3(1-12)Online publication date: 9-Jun-2012
https://dl.acm.org/doi/10.1145/2366231.2337161
Méndez-Lojo MMathew APingali K(2010)Parallel inclusion-based points-to analysisACM SIGPLAN Notices10.1145/1932682.186949545:10(428-443)Online publication date: 17-Oct-2010
https://dl.acm.org/doi/10.1145/1932682.1869495
Méndez-Lojo MMathew APingali KCook WClarke SRinard MSullivan KSteinberg D(2010)Parallel inclusion-based points-to analysisProceedings of the ACM international conference on Object oriented programming systems languages and applications10.1145/1869459.1869495(428-443)Online publication date: 17-Oct-2010
https://dl.acm.org/doi/10.1145/1869459.1869495
Méndez-Lojo MNguyen DPrountzos DSui XHassaan MKulkarni MBurtscher MPingali K(2010)Structure-driven optimizations for amorphous data-parallel programsACM SIGPLAN Notices10.1145/1837853.169345745:5(3-14)Online publication date: 9-Jan-2010
https://dl.acm.org/doi/10.1145/1837853.1693457
Méndez-Lojo MNguyen DPrountzos DSui XHassaan MKulkarni MBurtscher MPingali KGovindarajan RPadua DHall M(2010)Structure-driven optimizations for amorphous data-parallel programsProceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming10.1145/1693453.1693457(3-14)Online publication date: 9-Jan-2010
https://dl.acm.org/doi/10.1145/1693453.1693457
Fisher KMandelbaum YWalker D(2010)The next 700 data description languagesJournal of the ACM10.1145/1667053.166705957:2(1-51)Online publication date: 8-Feb-2010
https://dl.acm.org/doi/10.1145/1667053.1667059
Gottlob GSenellart P(2010)Schema mapping discovery from data instancesJournal of the ACM10.1145/1667053.166705557:2(1-37)Online publication date: 8-Feb-2010
https://dl.acm.org/doi/10.1145/1667053.1667055
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents