Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Optimistic parallelism benefits from data partitioning

Published: 01 March 2008 Publication History

Abstract

Recent studies of irregular applications such as finite-element mesh generators and data-clustering codes have shown that these applications have a generalized data parallelism arising from the use of iterative algorithms that perform computations on elements of worklists. In some irregular applications, the computations on different elements are independent. In other applications, there may be complex patterns of dependences between these computations.
The Galois system was designed to exploit this kind of irregular data parallelism on multicore processors. Its main features are (i) two kinds of set iterators for expressing worklist-based data parallelism, and (ii) a runtime system that performs optimistic parallelization of these iterators, detecting conflicts and rolling back computations as needed. Detection of conflicts and rolling back iterations requires information from class implementors.
In this paper, we introduce mechanisms to improve the execution efficiency of Galois programs: data partitioning, data-centric work assignment, lock coarsening, and over-decomposition. These mechanisms can be used to exploit locality of reference, reduce mis-speculation, and lower synchronization overhead. We also argue that the design of the Galois system permits these mechanisms to be used with relatively little modification to the user code. Finally, we present experimental results that demonstrate the utility of these mechanisms.

Supplementary Material

JPG File (1346311.jpg)
index.html (index.html)
Slides from the presentation
ZIP File (p233-kulkarni-slides.zip)
Supplemental material for Optimistic parallelism benefits from data partitioning
Audio only (1346311.mp3)
Video (1346311.mp4)

References

[1]
Yuri Boykov and Vladimir Kolmogorov. An experimental comparison of min-cut/max-flow algorithms for energy minimization in vision. International Journal of Computer Vision (IJCV), 70(2):109--131, 2006.
[2]
Donald D. Chamberlin, Morton M. Astrahan, Michael W. Blasgen, James N. Gray, W. Frank King, Bruce G. Lindsay, Raymond Lorie, James W. Mehl, Thomas G. Price, Franco Putzolu, Patricia Griffiths Selinger, Mario Schkolnick, Donald R. Slutz, Irving L. Traiger, Bradford W. Wade, and Robert A. Yost. A history and evaluation of system R, pages 54--68. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1994.
[3]
Shimin Chen, Phillip B. Gibbons, Michael Kozuch, Vasileios Liaskovitis, Anastassia Ailamaki, Guy E. Blelloch, Babak Falsafi, Limor Fix, Nikos Hardavellas, Todd C. Mowry, and Chris Wilkerson. Scheduling threads for constructive cache sharing on cmps. In SPAA '07: Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures, pages 105--115, New York, NY, USA, 2007. ACM Press.
[4]
Thomas Cormen, Charles Leiserson, Ronald Rivest, and Clifford Stein, editors. Introduction to Algorithms. MIT Press, 2001.
[5]
Andrew V. Goldberg and Robert E. Tarjan. A new approach to the maximum-flow problem. J. ACM, 35(4):921--940, 1988.
[6]
Michael I. Gordon, William Thies, and Saman Amarasinghe. Exploiting coarse-grained task, data, and pipeline parallelism in stream programs. In ASPLOS-XII: Proceedings of the 12th international conference on Architectural support for programming languages and operating systems, pages 151--162, New York, NY, USA, 2006. ACM Press.
[7]
Tim Harris and Keir Fraser. Language support for lightweight transactions. In OOPSLA '03: Proceedings of the 18th annual ACM SIGPLAN conference on Object-oriented programing, systems, languages, and applications, pages 388--402, New York, NY, USA, 2003.
[8]
L. Hendren and A. Nicolau. Parallelizing programs with recursive data structures. IEEE Transactions on Parallel and Distributed Systems, 1(1):35--47, January 1990.
[9]
Maurice Herlihy and J. Eliot B. Moss. Transactional memory: architectural support for lock-free data structures. In ISCA '93: Proceedings of the 20th annual international symposium on Computer architecture, 1993.
[10]
S. Horowitz, P. Pfieffer, and T. Reps. Dependence analysis for pointer variables. In Proceedings of the SIGPLAN '89 Conference on Program Language Design and Implementation, Portland, OR, June 1989.
[11]
Benoît Hudson, Gary L. Miller, and Todd Phillips. Sparse parallel delaunay mesh refinement. In SPAA '07: Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures, pages 339--347, New York, NY, USA, 2007. ACM Press.
[12]
Intel Corporation. Intel thread building blocks 2.0. http://osstbb.intel.com.
[13]
G. Karypis and V. Kumar. Multilevel k-way partitioning scheme for irregular graphs. Journal of Parallel and Distributed Computing, 48(1):96--129, 1998.
[14]
Ken Kennedy and John Allen, editors. Optimizing compilers for modren architectures:a dependence-based approach. Morgan Kaufmann, 2001.
[15]
B. W. Kernighan and S. Lin. An effective heuristic procedure for partitioning graphs. The Bell System Technical Journal, pages 291--308, February 1970.
[16]
Venkata Krishnan and Josep Torrellas. A chip-multiprocessor architecture with speculative multithreading. IEEE Trans. Comput., 48(9):866--880, 1999.
[17]
Milind Kulkarni, Keshav Pingali, Bruce Walter, Ganesh Ramanarayanan, Kavita Bala, and L. Paul Chew. Optimistic parallelism requires abstractions. SIGPLAN Not. (Proceedings of PLDI 2007), 42(6):211--222, 2007.
[18]
J. R. Larus and P. N. Hilfinger. Detecting conflicts between structure accesses. In PLDI '88: Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation, pages 24--31, New York, NY, USA, 1988. ACM Press.
[19]
Kevin E. Moore, Jayaram Bobba, Michelle J. Moravan, Mark D. Hill, and David A. Wood. Logtm: Log-based transactional memory. In HPCA '06: Proceedings of the 12th International Symposium on High Performance Computer Architecture, 2006.
[20]
Andreas Müller and Roland Rühl. Extending high performance fortran for the support of unstructured computations. In ICS '95: Proceedings of the 9th international conference on Supercomputing, pages 127--136, New York, NY, USA, 1995. ACM Press.
[21]
Yang Ni, Vijay Menon, Ali-Reza Adl-Tabatabai, Antony L. Hosking, Rick Hudson, J. Eliot B. Moss, Bratin Saha, and Tatiana Shpeisman. Open nesting in software transactional memory. In Principles and Practices of Parallel Programming (PPoPP), 2007.
[22]
Lawrence Rauchwerger and David A. Padua. The LRPD test: Speculative run-time parallelization of loops with privatization and reduction parallelization. IEEE Trans. Parallel Distrib. Syst., 10(2):160--180, 1999.
[23]
Anne Rogers and Keshav Pingali. Process decomposition through locality of reference. In ACM Symposium on Programming Language Design and Implementation, pages 69--80, 1989.
[24]
M. Sagiv, T. Reps, and R. Wilhelm. Solving shape-analysis problems in languages with destructive updating. ACM Transactions on Programming Languages and Systems, 20(1):1--50, January 1998.
[25]
Bratin Saha, Ali-Reza Adl-Tabatabai, Richard L. Hudson, Chi Cao Minh, and Benjamin Hertzberg. McRT-STM: a high performance software transactional memory system for a multi-core runtime. In PPoPP '06: Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming, pages 187--197, New York, NY, USA, 2006. ACM Press.
[26]
Michael Scott, Michael F. Spear, Luke Dalessandro, and Virendra J. Marathe. Delaunay triangulation with transactions and barriers. In IEEE Intl. Symp. on Workload Characterization (IISWC), Boston, MA, September 2007.
[27]
Jonathan Richard Shewchuk. Triangle: Engineering a 2D Quality Mesh Generator and Delaunay Triangulator. In Applied Computational Geometry: Towards Geometric Engineering, volume 1148 of Lecture Notes in Computer Science, pages 203--222. Springer-Verlag, 1996.
[28]
A. Sohn and H. D. Simon. S-HARP: A parallel dynamic spectral partitioner. Lecture Notes in Computer Science, 1457:376--385, 1998.
[29]
Pang-Ning Tan, Michael Steinbach, and Vipin Kumar, editors. Introduction to Data Mining. Pearson Addison Wesley, 2005.
[30]
Leslie G. Valiant. A bridging model for parallel computation. Commun. ACM, 33(8):103--111, 1990.
[31]
Bruce Walter, Sebastian Fernandez, Adam Arbree, Kavita Bala, Michael Donikian, and Donald Greenberg. Lightcuts: a scalable approach to illumination. ACM Transactions on Graphics (SIGGRAPH), 24(3):1098--1107, July 2005.

Cited By

View all
  • (2017)LightHouse: An Automatic Code Generator for Graph Algorithms on GPUsLanguages and Compilers for Parallel Computing10.1007/978-3-319-52709-3_18(235-249)Online publication date: 24-Jan-2017
  • (2016)EPIC: A framework to exploit parallelism in irregular codesConcurrency and Computation: Practice and Experience10.1002/cpe.384229:2Online publication date: 12-May-2016
  • (2012)RADISHACM SIGARCH Computer Architecture News10.1145/2366231.233718240:3(201-212)Online publication date: 9-Jun-2012
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM SIGARCH Computer Architecture News
ACM SIGARCH Computer Architecture News  Volume 36, Issue 1
ASPLOS '08
March 2008
339 pages
ISSN:0163-5964
DOI:10.1145/1353534
Issue’s Table of Contents
  • cover image ACM Conferences
    ASPLOS XIII: Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
    March 2008
    352 pages
    ISBN:9781595939586
    DOI:10.1145/1346281
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 March 2008
Published in SIGARCH Volume 36, Issue 1

Check for updates

Author Tags

  1. data partitioning
  2. irregular programs
  3. locality
  4. lock coarsening
  5. optimistic parallelism
  6. over-decomposition

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)9
  • Downloads (Last 6 weeks)4
Reflects downloads up to 01 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2017)LightHouse: An Automatic Code Generator for Graph Algorithms on GPUsLanguages and Compilers for Parallel Computing10.1007/978-3-319-52709-3_18(235-249)Online publication date: 24-Jan-2017
  • (2016)EPIC: A framework to exploit parallelism in irregular codesConcurrency and Computation: Practice and Experience10.1002/cpe.384229:2Online publication date: 12-May-2016
  • (2012)RADISHACM SIGARCH Computer Architecture News10.1145/2366231.233718240:3(201-212)Online publication date: 9-Jun-2012
  • (2012)RAIDRACM SIGARCH Computer Architecture News10.1145/2366231.233716140:3(1-12)Online publication date: 9-Jun-2012
  • (2010)Parallel inclusion-based points-to analysisACM SIGPLAN Notices10.1145/1932682.186949545:10(428-443)Online publication date: 17-Oct-2010
  • (2010)Parallel inclusion-based points-to analysisProceedings of the ACM international conference on Object oriented programming systems languages and applications10.1145/1869459.1869495(428-443)Online publication date: 17-Oct-2010
  • (2010)Structure-driven optimizations for amorphous data-parallel programsACM SIGPLAN Notices10.1145/1837853.169345745:5(3-14)Online publication date: 9-Jan-2010
  • (2010)Structure-driven optimizations for amorphous data-parallel programsProceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming10.1145/1693453.1693457(3-14)Online publication date: 9-Jan-2010
  • (2010)The next 700 data description languagesJournal of the ACM10.1145/1667053.166705957:2(1-51)Online publication date: 8-Feb-2010
  • (2010)Schema mapping discovery from data instancesJournal of the ACM10.1145/1667053.166705557:2(1-37)Online publication date: 8-Feb-2010
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media