research-article

Optimistic parallelism benefits from data partitioning

Authors:

Milind Kulkarni,

Keshav Pingali,

Ganesh Ramanarayanan,

L. Paul ChewAuthors Info & Claims

ACM SIGOPS Operating Systems Review, Volume 42, Issue 2

Pages 233 - 243

https://doi.org/10.1145/1353535.1346311

Published: 01 March 2008 Publication History

Abstract

Recent studies of irregular applications such as finite-element mesh generators and data-clustering codes have shown that these applications have a generalized data parallelism arising from the use of iterative algorithms that perform computations on elements of worklists. In some irregular applications, the computations on different elements are independent. In other applications, there may be complex patterns of dependences between these computations.

The Galois system was designed to exploit this kind of irregular data parallelism on multicore processors. Its main features are (i) two kinds of set iterators for expressing worklist-based data parallelism, and (ii) a runtime system that performs optimistic parallelization of these iterators, detecting conflicts and rolling back computations as needed. Detection of conflicts and rolling back iterations requires information from class implementors.

In this paper, we introduce mechanisms to improve the execution efficiency of Galois programs: data partitioning, data-centric work assignment, lock coarsening, and over-decomposition. These mechanisms can be used to exploit locality of reference, reduce mis-speculation, and lower synchronization overhead. We also argue that the design of the Galois system permits these mechanisms to be used with relatively little modification to the user code. Finally, we present experimental results that demonstrate the utility of these mechanisms.

Supplementary Material

JPG File (1346311.jpg)

Download
14.75 KB

index.html (index.html)

Slides from the presentation

Download
.98 KB

ZIP File (p233-kulkarni-slides.zip)

Supplemental material for Optimistic parallelism benefits from data partitioning

Download
13.17 MB

Audio only (1346311.mp3)

Download
9.00 MB

Video (1346311.mp4)

Download
124.38 MB

References

[1]

Yuri Boykov and Vladimir Kolmogorov. An experimental comparison of min-cut/max-flow algorithms for energy minimization in vision. International Journal of Computer Vision (IJCV), 70(2):109--131, 2006.

Digital Library

[2]

Donald D. Chamberlin, Morton M. Astrahan, Michael W. Blasgen, James N. Gray, W. Frank King, Bruce G. Lindsay, Raymond Lorie, James W. Mehl, Thomas G. Price, Franco Putzolu, Patricia Griffiths Selinger, Mario Schkolnick, Donald R. Slutz, Irving L. Traiger, Bradford W. Wade, and Robert A. Yost. A history and evaluation of system R, pages 54--68. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1994.

[3]

Shimin Chen, Phillip B. Gibbons, Michael Kozuch, Vasileios Liaskovitis, Anastassia Ailamaki, Guy E. Blelloch, Babak Falsafi, Limor Fix, Nikos Hardavellas, Todd C. Mowry, and Chris Wilkerson. Scheduling threads for constructive cache sharing on cmps. In SPAA '07: Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures, pages 105--115, New York, NY, USA, 2007. ACM Press.

Digital Library

[4]

Thomas Cormen, Charles Leiserson, Ronald Rivest, and Clifford Stein, editors. Introduction to Algorithms. MIT Press, 2001.

Digital Library

[5]

Andrew V. Goldberg and Robert E. Tarjan. A new approach to the maximum-flow problem. J. ACM, 35(4):921--940, 1988.

Digital Library

[6]

Michael I. Gordon, William Thies, and Saman Amarasinghe. Exploiting coarse-grained task, data, and pipeline parallelism in stream programs. In ASPLOS-XII: Proceedings of the 12th international conference on Architectural support for programming languages and operating systems, pages 151--162, New York, NY, USA, 2006. ACM Press.

Digital Library

[7]

Tim Harris and Keir Fraser. Language support for lightweight transactions. In OOPSLA '03: Proceedings of the 18th annual ACM SIGPLAN conference on Object-oriented programing, systems, languages, and applications, pages 388--402, New York, NY, USA, 2003.

Digital Library

[8]

L. Hendren and A. Nicolau. Parallelizing programs with recursive data structures. IEEE Transactions on Parallel and Distributed Systems, 1(1):35--47, January 1990.

Digital Library

[9]

Maurice Herlihy and J. Eliot B. Moss. Transactional memory: architectural support for lock-free data structures. In ISCA '93: Proceedings of the 20th annual international symposium on Computer architecture, 1993.

Digital Library

[10]

S. Horowitz, P. Pfieffer, and T. Reps. Dependence analysis for pointer variables. In Proceedings of the SIGPLAN '89 Conference on Program Language Design and Implementation, Portland, OR, June 1989.

Digital Library

[11]

Benoît Hudson, Gary L. Miller, and Todd Phillips. Sparse parallel delaunay mesh refinement. In SPAA '07: Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures, pages 339--347, New York, NY, USA, 2007. ACM Press.

Digital Library

[12]

Intel Corporation. Intel thread building blocks 2.0. http://osstbb.intel.com.

[13]

G. Karypis and V. Kumar. Multilevel k-way partitioning scheme for irregular graphs. Journal of Parallel and Distributed Computing, 48(1):96--129, 1998.

Digital Library

[14]

Ken Kennedy and John Allen, editors. Optimizing compilers for modren architectures:a dependence-based approach. Morgan Kaufmann, 2001.

Digital Library

[15]

B. W. Kernighan and S. Lin. An effective heuristic procedure for partitioning graphs. The Bell System Technical Journal, pages 291--308, February 1970.

[16]

Venkata Krishnan and Josep Torrellas. A chip-multiprocessor architecture with speculative multithreading. IEEE Trans. Comput., 48(9):866--880, 1999.

Digital Library

[17]

Milind Kulkarni, Keshav Pingali, Bruce Walter, Ganesh Ramanarayanan, Kavita Bala, and L. Paul Chew. Optimistic parallelism requires abstractions. SIGPLAN Not. (Proceedings of PLDI 2007), 42(6):211--222, 2007.

Digital Library

[18]

J. R. Larus and P. N. Hilfinger. Detecting conflicts between structure accesses. In PLDI '88: Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation, pages 24--31, New York, NY, USA, 1988. ACM Press.

Digital Library

[19]

Kevin E. Moore, Jayaram Bobba, Michelle J. Moravan, Mark D. Hill, and David A. Wood. Logtm: Log-based transactional memory. In HPCA '06: Proceedings of the 12th International Symposium on High Performance Computer Architecture, 2006.

[20]

Andreas Müller and Roland Rühl. Extending high performance fortran for the support of unstructured computations. In ICS '95: Proceedings of the 9th international conference on Supercomputing, pages 127--136, New York, NY, USA, 1995. ACM Press.

Digital Library

[21]

Yang Ni, Vijay Menon, Ali-Reza Adl-Tabatabai, Antony L. Hosking, Rick Hudson, J. Eliot B. Moss, Bratin Saha, and Tatiana Shpeisman. Open nesting in software transactional memory. In Principles and Practices of Parallel Programming (PPoPP), 2007.

Digital Library

[22]

Lawrence Rauchwerger and David A. Padua. The LRPD test: Speculative run-time parallelization of loops with privatization and reduction parallelization. IEEE Trans. Parallel Distrib. Syst., 10(2):160--180, 1999.

Digital Library

[23]

Anne Rogers and Keshav Pingali. Process decomposition through locality of reference. In ACM Symposium on Programming Language Design and Implementation, pages 69--80, 1989.

Digital Library

[24]

M. Sagiv, T. Reps, and R. Wilhelm. Solving shape-analysis problems in languages with destructive updating. ACM Transactions on Programming Languages and Systems, 20(1):1--50, January 1998.

Digital Library

[25]

Bratin Saha, Ali-Reza Adl-Tabatabai, Richard L. Hudson, Chi Cao Minh, and Benjamin Hertzberg. McRT-STM: a high performance software transactional memory system for a multi-core runtime. In PPoPP '06: Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming, pages 187--197, New York, NY, USA, 2006. ACM Press.

Digital Library

[26]

Michael Scott, Michael F. Spear, Luke Dalessandro, and Virendra J. Marathe. Delaunay triangulation with transactions and barriers. In IEEE Intl. Symp. on Workload Characterization (IISWC), Boston, MA, September 2007.

Digital Library

[27]

Jonathan Richard Shewchuk. Triangle: Engineering a 2D Quality Mesh Generator and Delaunay Triangulator. In Applied Computational Geometry: Towards Geometric Engineering, volume 1148 of Lecture Notes in Computer Science, pages 203--222. Springer-Verlag, 1996.

Digital Library

[28]

A. Sohn and H. D. Simon. S-HARP: A parallel dynamic spectral partitioner. Lecture Notes in Computer Science, 1457:376--385, 1998.

Digital Library

[29]

Pang-Ning Tan, Michael Steinbach, and Vipin Kumar, editors. Introduction to Data Mining. Pearson Addison Wesley, 2005.

Digital Library

[30]

Leslie G. Valiant. A bridging model for parallel computation. Commun. ACM, 33(8):103--111, 1990.

Digital Library

[31]

Bruce Walter, Sebastian Fernandez, Adam Arbree, Kavita Bala, Michael Donikian, and Donald Greenberg. Lightcuts: a scalable approach to illumination. ACM Transactions on Graphics (SIGGRAPH), 24(3):1098--1107, July 2005.

Digital Library

Cited By

Starlin Jini SChenthalir Indra D(2022)Understanding the Impact of Data Parallelism on Neural Network ClassificationOptical Memory and Neural Networks10.3103/S1060992X2201010631:1(107-121)Online publication date: 1-Mar-2022
https://dl.acm.org/doi/10.3103/S1060992X22010106
Kandemir MTang XZhao HRyoo JKarakoy MFreund SYahav E(2021)Distance-in-time versus distance-in-spaceProceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation10.1145/3453483.3454069(665-680)Online publication date: 19-Jun-2021
https://dl.acm.org/doi/10.1145/3453483.3454069
González CFraguela B(2015)Enhancing and Evaluating the Configuration Capability of a Skeleton for Irregular ComputationsProceedings of the 2015 23rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing10.1109/PDP.2015.41(119-127)Online publication date: 4-Mar-2015
https://dl.acm.org/doi/10.1109/PDP.2015.41
Show More Cited By

Index Terms

Optimistic parallelism benefits from data partitioning
1. Computing methodologies
  1. Concurrent computing methodologies
    1. Concurrent programming languages
2. Software and its engineering
  1. Software notations and tools
    1. General programming languages
      1. Language features
      2. Language types
        Concurrent programming languages

Recommendations

Optimistic parallelism requires abstractions
PLDI '07: Proceedings of the 28th ACM SIGPLAN Conference on Programming Language Design and Implementation

Irregular applications, which manipulate large, pointer-based data structures like graphs, are difficult to parallelize manually. Automatic tools and techniques such as restructuring compilers and run-time speculative execution have failed to uncover ...
Optimistic parallelism benefits from data partitioning
ASPLOS '08

Recent studies of irregular applications such as finite-element mesh generators and data-clustering codes have shown that these applications have a generalized data parallelism arising from the use of iterative algorithms that perform computations on ...
Optimistic parallelism benefits from data partitioning
ASPLOS XIII: Proceedings of the 13th international conference on Architectural support for programming languages and operating systems

Recent studies of irregular applications such as finite-element mesh generators and data-clustering codes have shown that these applications have a generalized data parallelism arising from the use of iterative algorithms that perform computations on ...

Comments

Information & Contributors

Information

Published In

cover image ACM SIGOPS Operating Systems Review

ACM SIGOPS Operating Systems Review Volume 42, Issue 2

ASPLOS '08

March 2008

339 pages

ISSN:0163-5980

DOI:10.1145/1353535

Issue’s Table of Contents

ASPLOS XIII: Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
March 2008
352 pages
ISBN:9781595939586
DOI:10.1145/1346281
General Chair:
Susan Eggers
University of Washington, USA
,
Program Chair:
James Larus
Microsoft Research, USA

Copyright © 2008 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 March 2008

Published in SIGOPS Volume 42, Issue 2

Check for updates

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

92
Total Citations
View Citations
1,268
Total Downloads

Downloads (Last 12 months)10
Downloads (Last 6 weeks)3

Reflects downloads up to 13 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Starlin Jini SChenthalir Indra D(2022)Understanding the Impact of Data Parallelism on Neural Network ClassificationOptical Memory and Neural Networks10.3103/S1060992X2201010631:1(107-121)Online publication date: 1-Mar-2022
https://dl.acm.org/doi/10.3103/S1060992X22010106
Kandemir MTang XZhao HRyoo JKarakoy MFreund SYahav E(2021)Distance-in-time versus distance-in-spaceProceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation10.1145/3453483.3454069(665-680)Online publication date: 19-Jun-2021
https://dl.acm.org/doi/10.1145/3453483.3454069
González CFraguela B(2015)Enhancing and Evaluating the Configuration Capability of a Skeleton for Irregular ComputationsProceedings of the 2015 23rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing10.1109/PDP.2015.41(119-127)Online publication date: 4-Mar-2015
https://dl.acm.org/doi/10.1109/PDP.2015.41
González CFraguela B(2014)An Algorithm Template for Domain-Based Parallel Irregular AlgorithmsInternational Journal of Parallel Programming10.1007/s10766-013-0268-342:6(948-967)Online publication date: 1-Dec-2014
https://dl.acm.org/doi/10.1007/s10766-013-0268-3
Chen AFathololumi PKoskinen EPincus J(2022)Veracity: declarative multicore programming with commutativityProceedings of the ACM on Programming Languages10.1145/35633496:OOPSLA2(1726-1756)Online publication date: 31-Oct-2022
https://dl.acm.org/doi/10.1145/3563349
Tian CFeng MGupta R(2017)Software‐Based Speculative ParallelizationProgramming multi‐core and many‐core computing systems10.1002/9781119332015.ch10(205-225)Online publication date: 27-Jan-2017
https://doi.org/10.1002/9781119332015.ch10
Jeffrey MSubramanian SAbeydeera MEmer JSanchez DHsu WYang CLipasti MLee H(2016)Data-centric execution of speculative parallel programsThe 49th Annual IEEE/ACM International Symposium on Microarchitecture10.5555/3195638.3195644(1-13)Online publication date: 15-Oct-2016
https://dl.acm.org/doi/10.5555/3195638.3195644
Estebanez ALlanos DGonzalez-Escribano A(2016)A Survey on Thread-Level Speculation TechniquesACM Computing Surveys10.1145/293836949:2(1-39)Online publication date: 30-Jun-2016
https://dl.acm.org/doi/10.1145/2938369
Jeffrey MSubramanian SAbeydeera MEmer JSanchez D(2016)Data-centric execution of speculative parallel programs2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO.2016.7783708(1-13)Online publication date: Oct-2016
https://doi.org/10.1109/MICRO.2016.7783708
Paul AZhuang WXu LLi MRafique MButt A(2016)CHOPPER: Optimizing Data Partitioning for In-memory Data Analytics Frameworks2016 IEEE International Conference on Cluster Computing (CLUSTER)10.1109/CLUSTER.2016.41(110-119)Online publication date: Sep-2016
https://doi.org/10.1109/CLUSTER.2016.41
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents