research-article

Cache-efficient dynamic programming algorithms for multicores

Authors:

Rezaul Alam Chowdhury,

Vijaya RamachandranAuthors Info & Claims

SPAA '08: Proceedings of the twentieth annual symposium on Parallelism in algorithms and architectures

Pages 207 - 216

https://doi.org/10.1145/1378533.1378574

Published: 14 June 2008 Publication History

Abstract

We present cache-efficient chip multiprocessor (CMP) algorithms with good speed-up for some widely used dynamic programming algorithms. We consider three types of caching systems for CMPs: D-CMP with a private cache for each core, S-CMP with a single cache shared by all cores, and Multicore, which has private L₁ caches and a shared L₂ cache. We derive results for three classes of problems: local dependency dynamic programming (LDDP), Gaussian Elimination Paradigm (GEP), and parenthesis problem.

For each class of problems, we develop a generic CMP algorithm with an associated tiling sequence. We then tailor this tiling sequence to each caching model and provide a parallel schedule that results in a cache-efficient parallel execution up to the critical path length of the underlying dynamic programming algorithm.

We present experimental results on an 8-core Opteron for two sequence alignment problems that are important examples of LDDP. Our experimental results show good speed-ups for simple versions of our algorithms.

References

[1]

G. Blelloch, R. Chowdhury, P. Gibbons, V. Ramachandran, S. Chen, and M. Kozuch. Provably good multicore cache performance for divide-and-conquer algorithms. In Proc. ACM-SIAM SODA, pages 501--510, 2008.

Digital Library

[2]

G. Blelloch and P. Gibbons. Effectively sharing a cache among threads. In Proc. ACM SPAA, pages 235--244, 2004.

Digital Library

[3]

G. Blelloch, P. Gibbons, and Y. Matias. Provably efficient scheduling for languages with fine-grained parallelism. JACM, 46(2):281--321, 1999.

Digital Library

[4]

R. Blumofe and C. Leiserson. Scheduling multithreaded computations by work stealing. JACM, 46(5):720--748, 1999.

Digital Library

[5]

C. Cherng and R. Ladner. Cache efficient simple dynamic programming. In Proc. Intl Conf Analysis of Algorithms, pages 49--58, 2005.

[6]

R. Chowdhury, H. Le, and V. Ramachandran. Efficient cache-oblivious string algorithms for Bioinformatics. Technical Report TR-07-03, Dept. of Computer Sciences, UT-Austin, 2007.

[7]

R. Chowdhury and V. Ramachandran. Cache-oblivious dynamic programming. In Proc. ACM-SIAM SODA, pages 591--600, 2006.

Digital Library

[8]

R. Chowdhury and V. Ramachandran. The cache-oblivious gaussian elimination paradigm: Theoretical framework, parallelization and experimental evaluation. In Proc. {ACM} SPAA, pages 71--80, 2007.

Digital Library

[9]

R. Chowdhury and V. Ramachandran. Cache-efficient dynamic programming algorithms for multicores. Technical Report TR-08-16, Dept. of Computer Sciences, UT-Austin, 2008.

Digital Library

[10]

T. Cormen, C. Leiserson, R. Rivest, and C. Stein. Introduction to Algorithms. The MIT Press, second edition, 2001.

Digital Library

[11]

D. Culler, R. Karp, D. Patterson, A. Sahay, K. Schauser, S. E., R. Subramonian, and T. von Eicken. Logp: Toward a realistic model of parallel computation. In Proc. 4th SIGPLAN Symp. Principles Practices of Parallel Programming, pages 1--12, 1993.

Digital Library

[12]

T. DeSantis, I. Dubosarskiy, S. Murray, and G. Andersen. Comprehensive aligned sequence construction for automated design of effective probes (CASCADE-P) using 16S rDNA. Bioinformatics, 19:1461--1468, 2003. url: http://greengenes.llnl.gov/16S/.

[13]

M. Frigo, C. Leiserson, H. Prokop, and S. Ramachandran. Cache-oblivious algorithms. In Proc. IEEE FOCS, pages 285--297, 1999.

Digital Library

[14]

M. Frigo and V. Strumpen. The cache complexity of multithreaded cache oblivious algorithms. In Proc ACM SPAA, pages 271--280, 2006.

Digital Library

[15]

Z. Galil and K. Park. Parallel algorithms for dynamic programming recurrences with more than o(1) dependency. JPDC, 21:213--222, 1994.

Digital Library

[16]

P. Gibbons, Y. Matias, and V. Ramachandran. Can shared-memory model serve as a bridging model for parallel computation? In Proc. ACM SPAA, pages 72--83, 1997.

Digital Library

[17]

J. L. Hennessy and D. A. Patterson. Computer Architecture: A Quantitative Approach. Morgan Kaufmann, third edition, 2002.

Digital Library

[18]

D. Hirschberg. A linear space algorithm for computing maximal common subsequences. CACM, 18(6):341--343, 1975.

Digital Library

[19]

R. Karp and V. Ramachandran. Parallel algorithms for shared memory machines. In Handbook of Theor Comp Sci, pages 869--941. Elsevier, 1990.

[20]

B. Knudsen. Multiple parsimony alignment with "affalign". Software package multalign.tar.

[21]

B. Knudsen. Optimal multiple parsimony alignment with affine gap cost using a phylogenetic tree. In Proc. Workshop Algs in Bioinf., pages 433--446, 2003.

[22]

W. Pearson and D. Lipman. Improved tools for biological sequence comparison. In Proc. Natl Acad. Sciences, volume 85, pages 2444--2448, 1988.

[23]

D. Powell. Software package align3str_checkp.tar.gz.

[24]

D. Powell, L. Allison, and T. Dix. Fast, optimal alignment of three sequences using linear gap cost. Journal of Theoretical Biology, 207(3):325--336, 2000.

[25]

G. Tan, N. Sun, and G. R. Gao. A parallel dynamic programming algorithm on a multi-core architecture. In ACM SPAA, pages 135--144, 2007.

Digital Library

[26]

J. Thomas et al. Comparative analyses of multi-species sequences from targeted genomic regions. Nature, 424:788--793, 2003.

[27]

L. Valiant. General context-free recognition in less than cubic time. JCSS, 10:308--315, 1975.

Digital Library

[28]

L. Valiant. A bridging model for parallel computation. CACM, 33(8):103--111, 1990.

Digital Library

Cited By

Böhnlein TPapp PYzelman AAgrawal KPetrank E(2024)Brief Announcement: Red-Blue Pebbling with Multiple Processors: Time, Communication and Memory Trade-offsProceedings of the 36th ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3626183.3660269(285-287)Online publication date: 17-Jun-2024
https://dl.acm.org/doi/10.1145/3626183.3660269
Das DGilbert JHajiaghayi MKociumaka TSaha BAgrawal KPetrank E(2024)Brief Announcement: Upper and Lower Bounds for Edit Distance in Space-Efficient MPCProceedings of the 36th ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3626183.3660265(293-295)Online publication date: 17-Jun-2024
https://dl.acm.org/doi/10.1145/3626183.3660265
Blelloch GGu YSun Y(2024)Teaching Parallel Algorithms Using the Binary-Forking Model2024 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW63119.2024.00080(346-351)Online publication date: 27-May-2024
https://doi.org/10.1109/IPDPSW63119.2024.00080
Show More Cited By

Index Terms

Cache-efficient dynamic programming algorithms for multicores

Recommendations

High performance cache replacement using re-reference interval prediction (RRIP)
ISCA '10

Practical cache replacement policies attempt to emulate optimal replacement by predicting the re-reference interval of a cache block. The commonly used LRU replacement policy always predicts a near-immediate re-reference interval on cache hits and ...
High performance cache replacement using re-reference interval prediction (RRIP)
ISCA '10: Proceedings of the 37th annual international symposium on Computer architecture

Practical cache replacement policies attempt to emulate optimal replacement by predicting the re-reference interval of a cache block. The commonly used LRU replacement policy always predicts a near-immediate re-reference interval on cache hits and ...
Reactive NUCA: near-optimal block placement and replication in distributed caches
ISCA '09: Proceedings of the 36th annual international symposium on Computer architecture

Increases in on-chip communication delay and the large working sets of server and scientific workloads complicate the design of the on-chip last-level cache for multicore processors. The large working sets favor a shared cache design that maximizes the ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SPAA '08: Proceedings of the twentieth annual symposium on Parallelism in algorithms and architectures

June 2008

380 pages

ISBN:9781595939739

DOI:10.1145/1378533

General Chair:
Friedhelm Meyer auf der Heide
University of Paderborn, Germany
,
Program Chair:
Nir Shavit
Tel-Aviv University, Israel, and Sun Labs, USA

Copyright © 2008 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 June 2008

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

SPAA08

Sponsor:

SPAA08: 20th ACM Symposium on Parallelism in Algorithms and Architectures

June 14 - 16, 2008

Munich, Germany

Acceptance Rates

Overall Acceptance Rate 447 of 1,461 submissions, 31%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

63
Total Citations
View Citations
963
Total Downloads

Downloads (Last 12 months)11
Downloads (Last 6 weeks)0

Reflects downloads up to 04 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Böhnlein TPapp PYzelman AAgrawal KPetrank E(2024)Brief Announcement: Red-Blue Pebbling with Multiple Processors: Time, Communication and Memory Trade-offsProceedings of the 36th ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3626183.3660269(285-287)Online publication date: 17-Jun-2024
https://dl.acm.org/doi/10.1145/3626183.3660269
Das DGilbert JHajiaghayi MKociumaka TSaha BAgrawal KPetrank E(2024)Brief Announcement: Upper and Lower Bounds for Edit Distance in Space-Efficient MPCProceedings of the 36th ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3626183.3660265(293-295)Online publication date: 17-Jun-2024
https://dl.acm.org/doi/10.1145/3626183.3660265
Blelloch GGu YSun Y(2024)Teaching Parallel Algorithms Using the Binary-Forking Model2024 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW63119.2024.00080(346-351)Online publication date: 27-May-2024
https://doi.org/10.1109/IPDPSW63119.2024.00080
Takodara MParmar PRavikumar RN STiwari R(2023)Enhancing Performance in Heterogeneous Computing: A Comparative Study of CUDA on GPUs and CPUs2023 IEEE Fifth International Conference on Advances in Electronics, Computers and Communications (ICAECC)10.1109/ICAECC59324.2023.10560120(1-6)Online publication date: 7-Sep-2023
https://doi.org/10.1109/ICAECC59324.2023.10560120
Shubham Prakash SGanapathi P(2022)An Algorithm for the Sequence Alignment with Gap Penalty Problem using Multiway Divide-and-Conquer and Matrix TranspositionInformation Processing Letters10.1016/j.ipl.2021.106166173:COnline publication date: 22-Apr-2022
https://dl.acm.org/doi/10.1016/j.ipl.2021.106166
Ballard GParsons S(2021)Visualizing Parallel Dynamic Programming using the Thread Safe Graphics Library2021 IEEE/ACM Ninth Workshop on Education for High Performance Computing (EduHPC)10.1109/EduHPC54835.2021.00009(24-31)Online publication date: Nov-2021
https://doi.org/10.1109/EduHPC54835.2021.00009
Muller SWestrick SAcar U(2019)Fairness in responsive parallelismProceedings of the ACM on Programming Languages10.1145/33416853:ICFP(1-30)Online publication date: 26-Jul-2019
https://dl.acm.org/doi/10.1145/3341685
Hegde NChang QKulkarni MTaufer MBalaji PPeña A(2019)D2PProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3295500.3356205(1-22)Online publication date: 17-Nov-2019
https://dl.acm.org/doi/10.1145/3295500.3356205
Javanmard MGanapathr PDas RAhmad ZTschudi SChowdhury RHollingsworth JKeidar I(2019)Toward efficient architecture-independent algorithms for dynamic programsProceedings of the 24th Symposium on Principles and Practice of Parallel Programming10.1145/3293883.3299109(413-414)Online publication date: 16-Feb-2019
https://dl.acm.org/doi/10.1145/3293883.3299109
Javanmard MGanapathi PDas RAhmad ZTschudi SChowdhury R(2019)Toward Efficient Architecture-Independent Algorithms for Dynamic ProgramsHigh Performance Computing10.1007/978-3-030-20656-7_8(143-164)Online publication date: 17-May-2019
https://doi.org/10.1007/978-3-030-20656-7_8
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents