Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Optimistic parallelism benefits from data partitioning

Published: 01 March 2008 Publication History

Abstract

Recent studies of irregular applications such as finite-element mesh generators and data-clustering codes have shown that these applications have a generalized data parallelism arising from the use of iterative algorithms that perform computations on elements of worklists. In some irregular applications, the computations on different elements are independent. In other applications, there may be complex patterns of dependences between these computations.
The Galois system was designed to exploit this kind of irregular data parallelism on multicore processors. Its main features are (i) two kinds of set iterators for expressing worklist-based data parallelism, and (ii) a runtime system that performs optimistic parallelization of these iterators, detecting conflicts and rolling back computations as needed. Detection of conflicts and rolling back iterations requires information from class implementors.
In this paper, we introduce mechanisms to improve the execution efficiency of Galois programs: data partitioning, data-centric work assignment, lock coarsening, and over-decomposition. These mechanisms can be used to exploit locality of reference, reduce mis-speculation, and lower synchronization overhead. We also argue that the design of the Galois system permits these mechanisms to be used with relatively little modification to the user code. Finally, we present experimental results that demonstrate the utility of these mechanisms.

Supplementary Material

JPG File (1346311.jpg)
index.html (index.html)
Slides from the presentation
ZIP File (p233-kulkarni-slides.zip)
Supplemental material for Optimistic parallelism benefits from data partitioning
Audio only (1346311.mp3)
Video (1346311.mp4)

References

[1]
Yuri Boykov and Vladimir Kolmogorov. An experimental comparison of min-cut/max-flow algorithms for energy minimization in vision. International Journal of Computer Vision (IJCV), 70(2):109--131, 2006.
[2]
Donald D. Chamberlin, Morton M. Astrahan, Michael W. Blasgen, James N. Gray, W. Frank King, Bruce G. Lindsay, Raymond Lorie, James W. Mehl, Thomas G. Price, Franco Putzolu, Patricia Griffiths Selinger, Mario Schkolnick, Donald R. Slutz, Irving L. Traiger, Bradford W. Wade, and Robert A. Yost. A history and evaluation of system R, pages 54--68. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1994.
[3]
Shimin Chen, Phillip B. Gibbons, Michael Kozuch, Vasileios Liaskovitis, Anastassia Ailamaki, Guy E. Blelloch, Babak Falsafi, Limor Fix, Nikos Hardavellas, Todd C. Mowry, and Chris Wilkerson. Scheduling threads for constructive cache sharing on cmps. In SPAA '07: Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures, pages 105--115, New York, NY, USA, 2007. ACM Press.
[4]
Thomas Cormen, Charles Leiserson, Ronald Rivest, and Clifford Stein, editors. Introduction to Algorithms. MIT Press, 2001.
[5]
Andrew V. Goldberg and Robert E. Tarjan. A new approach to the maximum-flow problem. J. ACM, 35(4):921--940, 1988.
[6]
Michael I. Gordon, William Thies, and Saman Amarasinghe. Exploiting coarse-grained task, data, and pipeline parallelism in stream programs. In ASPLOS-XII: Proceedings of the 12th international conference on Architectural support for programming languages and operating systems, pages 151--162, New York, NY, USA, 2006. ACM Press.
[7]
Tim Harris and Keir Fraser. Language support for lightweight transactions. In OOPSLA '03: Proceedings of the 18th annual ACM SIGPLAN conference on Object-oriented programing, systems, languages, and applications, pages 388--402, New York, NY, USA, 2003.
[8]
L. Hendren and A. Nicolau. Parallelizing programs with recursive data structures. IEEE Transactions on Parallel and Distributed Systems, 1(1):35--47, January 1990.
[9]
Maurice Herlihy and J. Eliot B. Moss. Transactional memory: architectural support for lock-free data structures. In ISCA '93: Proceedings of the 20th annual international symposium on Computer architecture, 1993.
[10]
S. Horowitz, P. Pfieffer, and T. Reps. Dependence analysis for pointer variables. In Proceedings of the SIGPLAN '89 Conference on Program Language Design and Implementation, Portland, OR, June 1989.
[11]
Benoît Hudson, Gary L. Miller, and Todd Phillips. Sparse parallel delaunay mesh refinement. In SPAA '07: Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures, pages 339--347, New York, NY, USA, 2007. ACM Press.
[12]
Intel Corporation. Intel thread building blocks 2.0. http://osstbb.intel.com.
[13]
G. Karypis and V. Kumar. Multilevel k-way partitioning scheme for irregular graphs. Journal of Parallel and Distributed Computing, 48(1):96--129, 1998.
[14]
Ken Kennedy and John Allen, editors. Optimizing compilers for modren architectures:a dependence-based approach. Morgan Kaufmann, 2001.
[15]
B. W. Kernighan and S. Lin. An effective heuristic procedure for partitioning graphs. The Bell System Technical Journal, pages 291--308, February 1970.
[16]
Venkata Krishnan and Josep Torrellas. A chip-multiprocessor architecture with speculative multithreading. IEEE Trans. Comput., 48(9):866--880, 1999.
[17]
Milind Kulkarni, Keshav Pingali, Bruce Walter, Ganesh Ramanarayanan, Kavita Bala, and L. Paul Chew. Optimistic parallelism requires abstractions. SIGPLAN Not. (Proceedings of PLDI 2007), 42(6):211--222, 2007.
[18]
J. R. Larus and P. N. Hilfinger. Detecting conflicts between structure accesses. In PLDI '88: Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation, pages 24--31, New York, NY, USA, 1988. ACM Press.
[19]
Kevin E. Moore, Jayaram Bobba, Michelle J. Moravan, Mark D. Hill, and David A. Wood. Logtm: Log-based transactional memory. In HPCA '06: Proceedings of the 12th International Symposium on High Performance Computer Architecture, 2006.
[20]
Andreas Müller and Roland Rühl. Extending high performance fortran for the support of unstructured computations. In ICS '95: Proceedings of the 9th international conference on Supercomputing, pages 127--136, New York, NY, USA, 1995. ACM Press.
[21]
Yang Ni, Vijay Menon, Ali-Reza Adl-Tabatabai, Antony L. Hosking, Rick Hudson, J. Eliot B. Moss, Bratin Saha, and Tatiana Shpeisman. Open nesting in software transactional memory. In Principles and Practices of Parallel Programming (PPoPP), 2007.
[22]
Lawrence Rauchwerger and David A. Padua. The LRPD test: Speculative run-time parallelization of loops with privatization and reduction parallelization. IEEE Trans. Parallel Distrib. Syst., 10(2):160--180, 1999.
[23]
Anne Rogers and Keshav Pingali. Process decomposition through locality of reference. In ACM Symposium on Programming Language Design and Implementation, pages 69--80, 1989.
[24]
M. Sagiv, T. Reps, and R. Wilhelm. Solving shape-analysis problems in languages with destructive updating. ACM Transactions on Programming Languages and Systems, 20(1):1--50, January 1998.
[25]
Bratin Saha, Ali-Reza Adl-Tabatabai, Richard L. Hudson, Chi Cao Minh, and Benjamin Hertzberg. McRT-STM: a high performance software transactional memory system for a multi-core runtime. In PPoPP '06: Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming, pages 187--197, New York, NY, USA, 2006. ACM Press.
[26]
Michael Scott, Michael F. Spear, Luke Dalessandro, and Virendra J. Marathe. Delaunay triangulation with transactions and barriers. In IEEE Intl. Symp. on Workload Characterization (IISWC), Boston, MA, September 2007.
[27]
Jonathan Richard Shewchuk. Triangle: Engineering a 2D Quality Mesh Generator and Delaunay Triangulator. In Applied Computational Geometry: Towards Geometric Engineering, volume 1148 of Lecture Notes in Computer Science, pages 203--222. Springer-Verlag, 1996.
[28]
A. Sohn and H. D. Simon. S-HARP: A parallel dynamic spectral partitioner. Lecture Notes in Computer Science, 1457:376--385, 1998.
[29]
Pang-Ning Tan, Michael Steinbach, and Vipin Kumar, editors. Introduction to Data Mining. Pearson Addison Wesley, 2005.
[30]
Leslie G. Valiant. A bridging model for parallel computation. Commun. ACM, 33(8):103--111, 1990.
[31]
Bruce Walter, Sebastian Fernandez, Adam Arbree, Kavita Bala, Michael Donikian, and Donald Greenberg. Lightcuts: a scalable approach to illumination. ACM Transactions on Graphics (SIGGRAPH), 24(3):1098--1107, July 2005.

Cited By

View all
  • (2022)Understanding the Impact of Data Parallelism on Neural Network ClassificationOptical Memory and Neural Networks10.3103/S1060992X2201010631:1(107-121)Online publication date: 1-Mar-2022
  • (2021)Distance-in-time versus distance-in-spaceProceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation10.1145/3453483.3454069(665-680)Online publication date: 19-Jun-2021
  • (2015)Enhancing and Evaluating the Configuration Capability of a Skeleton for Irregular ComputationsProceedings of the 2015 23rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing10.1109/PDP.2015.41(119-127)Online publication date: 4-Mar-2015
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM SIGOPS Operating Systems Review
ACM SIGOPS Operating Systems Review  Volume 42, Issue 2
ASPLOS '08
March 2008
339 pages
ISSN:0163-5980
DOI:10.1145/1353535
Issue’s Table of Contents
  • cover image ACM Conferences
    ASPLOS XIII: Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
    March 2008
    352 pages
    ISBN:9781595939586
    DOI:10.1145/1346281
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 March 2008
Published in SIGOPS Volume 42, Issue 2

Check for updates

Author Tags

  1. data partitioning
  2. irregular programs
  3. locality
  4. lock coarsening
  5. optimistic parallelism
  6. over-decomposition

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)10
  • Downloads (Last 6 weeks)3
Reflects downloads up to 13 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2022)Understanding the Impact of Data Parallelism on Neural Network ClassificationOptical Memory and Neural Networks10.3103/S1060992X2201010631:1(107-121)Online publication date: 1-Mar-2022
  • (2021)Distance-in-time versus distance-in-spaceProceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation10.1145/3453483.3454069(665-680)Online publication date: 19-Jun-2021
  • (2015)Enhancing and Evaluating the Configuration Capability of a Skeleton for Irregular ComputationsProceedings of the 2015 23rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing10.1109/PDP.2015.41(119-127)Online publication date: 4-Mar-2015
  • (2014)An Algorithm Template for Domain-Based Parallel Irregular AlgorithmsInternational Journal of Parallel Programming10.1007/s10766-013-0268-342:6(948-967)Online publication date: 1-Dec-2014
  • (2022)Veracity: declarative multicore programming with commutativityProceedings of the ACM on Programming Languages10.1145/35633496:OOPSLA2(1726-1756)Online publication date: 31-Oct-2022
  • (2017)Software‐Based Speculative ParallelizationProgramming multi‐core and many‐core computing systems10.1002/9781119332015.ch10(205-225)Online publication date: 27-Jan-2017
  • (2016)Data-centric execution of speculative parallel programsThe 49th Annual IEEE/ACM International Symposium on Microarchitecture10.5555/3195638.3195644(1-13)Online publication date: 15-Oct-2016
  • (2016)A Survey on Thread-Level Speculation TechniquesACM Computing Surveys10.1145/293836949:2(1-39)Online publication date: 30-Jun-2016
  • (2016)Data-centric execution of speculative parallel programs2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO.2016.7783708(1-13)Online publication date: Oct-2016
  • (2016)CHOPPER: Optimizing Data Partitioning for In-memory Data Analytics Frameworks2016 IEEE International Conference on Cluster Computing (CLUSTER)10.1109/CLUSTER.2016.41(110-119)Online publication date: Sep-2016
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media