Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1378533.1378574acmconferencesArticle/Chapter ViewAbstractPublication PagesspaaConference Proceedingsconference-collections
research-article

Cache-efficient dynamic programming algorithms for multicores

Published: 14 June 2008 Publication History

Abstract

We present cache-efficient chip multiprocessor (CMP) algorithms with good speed-up for some widely used dynamic programming algorithms. We consider three types of caching systems for CMPs: D-CMP with a private cache for each core, S-CMP with a single cache shared by all cores, and Multicore, which has private L1 caches and a shared L2 cache. We derive results for three classes of problems: local dependency dynamic programming (LDDP), Gaussian Elimination Paradigm (GEP), and parenthesis problem.
For each class of problems, we develop a generic CMP algorithm with an associated tiling sequence. We then tailor this tiling sequence to each caching model and provide a parallel schedule that results in a cache-efficient parallel execution up to the critical path length of the underlying dynamic programming algorithm.
We present experimental results on an 8-core Opteron for two sequence alignment problems that are important examples of LDDP. Our experimental results show good speed-ups for simple versions of our algorithms.

References

[1]
G. Blelloch, R. Chowdhury, P. Gibbons, V. Ramachandran, S. Chen, and M. Kozuch. Provably good multicore cache performance for divide-and-conquer algorithms. In Proc. ACM-SIAM SODA, pages 501--510, 2008.
[2]
G. Blelloch and P. Gibbons. Effectively sharing a cache among threads. In Proc. ACM SPAA, pages 235--244, 2004.
[3]
G. Blelloch, P. Gibbons, and Y. Matias. Provably efficient scheduling for languages with fine-grained parallelism. JACM, 46(2):281--321, 1999.
[4]
R. Blumofe and C. Leiserson. Scheduling multithreaded computations by work stealing. JACM, 46(5):720--748, 1999.
[5]
C. Cherng and R. Ladner. Cache efficient simple dynamic programming. In Proc. Intl Conf Analysis of Algorithms, pages 49--58, 2005.
[6]
R. Chowdhury, H. Le, and V. Ramachandran. Efficient cache-oblivious string algorithms for Bioinformatics. Technical Report TR-07-03, Dept. of Computer Sciences, UT-Austin, 2007.
[7]
R. Chowdhury and V. Ramachandran. Cache-oblivious dynamic programming. In Proc. ACM-SIAM SODA, pages 591--600, 2006.
[8]
R. Chowdhury and V. Ramachandran. The cache-oblivious gaussian elimination paradigm: Theoretical framework, parallelization and experimental evaluation. In Proc. {ACM} SPAA, pages 71--80, 2007.
[9]
R. Chowdhury and V. Ramachandran. Cache-efficient dynamic programming algorithms for multicores. Technical Report TR-08-16, Dept. of Computer Sciences, UT-Austin, 2008.
[10]
T. Cormen, C. Leiserson, R. Rivest, and C. Stein. Introduction to Algorithms. The MIT Press, second edition, 2001.
[11]
D. Culler, R. Karp, D. Patterson, A. Sahay, K. Schauser, S. E., R. Subramonian, and T. von Eicken. Logp: Toward a realistic model of parallel computation. In Proc. 4th SIGPLAN Symp. Principles Practices of Parallel Programming, pages 1--12, 1993.
[12]
T. DeSantis, I. Dubosarskiy, S. Murray, and G. Andersen. Comprehensive aligned sequence construction for automated design of effective probes (CASCADE-P) using 16S rDNA. Bioinformatics, 19:1461--1468, 2003. url: http://greengenes.llnl.gov/16S/.
[13]
M. Frigo, C. Leiserson, H. Prokop, and S. Ramachandran. Cache-oblivious algorithms. In Proc. IEEE FOCS, pages 285--297, 1999.
[14]
M. Frigo and V. Strumpen. The cache complexity of multithreaded cache oblivious algorithms. In Proc ACM SPAA, pages 271--280, 2006.
[15]
Z. Galil and K. Park. Parallel algorithms for dynamic programming recurrences with more than o(1) dependency. JPDC, 21:213--222, 1994.
[16]
P. Gibbons, Y. Matias, and V. Ramachandran. Can shared-memory model serve as a bridging model for parallel computation? In Proc. ACM SPAA, pages 72--83, 1997.
[17]
J. L. Hennessy and D. A. Patterson. Computer Architecture: A Quantitative Approach. Morgan Kaufmann, third edition, 2002.
[18]
D. Hirschberg. A linear space algorithm for computing maximal common subsequences. CACM, 18(6):341--343, 1975.
[19]
R. Karp and V. Ramachandran. Parallel algorithms for shared memory machines. In Handbook of Theor Comp Sci, pages 869--941. Elsevier, 1990.
[20]
B. Knudsen. Multiple parsimony alignment with "affalign". Software package multalign.tar.
[21]
B. Knudsen. Optimal multiple parsimony alignment with affine gap cost using a phylogenetic tree. In Proc. Workshop Algs in Bioinf., pages 433--446, 2003.
[22]
W. Pearson and D. Lipman. Improved tools for biological sequence comparison. In Proc. Natl Acad. Sciences, volume 85, pages 2444--2448, 1988.
[23]
D. Powell. Software package align3str_checkp.tar.gz.
[24]
D. Powell, L. Allison, and T. Dix. Fast, optimal alignment of three sequences using linear gap cost. Journal of Theoretical Biology, 207(3):325--336, 2000.
[25]
G. Tan, N. Sun, and G. R. Gao. A parallel dynamic programming algorithm on a multi-core architecture. In ACM SPAA, pages 135--144, 2007.
[26]
J. Thomas et al. Comparative analyses of multi-species sequences from targeted genomic regions. Nature, 424:788--793, 2003.
[27]
L. Valiant. General context-free recognition in less than cubic time. JCSS, 10:308--315, 1975.
[28]
L. Valiant. A bridging model for parallel computation. CACM, 33(8):103--111, 1990.

Cited By

View all
  • (2024)Brief Announcement: Red-Blue Pebbling with Multiple Processors: Time, Communication and Memory Trade-offsProceedings of the 36th ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3626183.3660269(285-287)Online publication date: 17-Jun-2024
  • (2024)Brief Announcement: Upper and Lower Bounds for Edit Distance in Space-Efficient MPCProceedings of the 36th ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3626183.3660265(293-295)Online publication date: 17-Jun-2024
  • (2024)Teaching Parallel Algorithms Using the Binary-Forking Model2024 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW63119.2024.00080(346-351)Online publication date: 27-May-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SPAA '08: Proceedings of the twentieth annual symposium on Parallelism in algorithms and architectures
June 2008
380 pages
ISBN:9781595939739
DOI:10.1145/1378533
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 June 2008

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. cache-efficiency
  2. distributed cache
  3. multicore
  4. parallelism
  5. shared cache

Qualifiers

  • Research-article

Conference

SPAA08

Acceptance Rates

Overall Acceptance Rate 447 of 1,461 submissions, 31%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)11
  • Downloads (Last 6 weeks)0
Reflects downloads up to 04 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Brief Announcement: Red-Blue Pebbling with Multiple Processors: Time, Communication and Memory Trade-offsProceedings of the 36th ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3626183.3660269(285-287)Online publication date: 17-Jun-2024
  • (2024)Brief Announcement: Upper and Lower Bounds for Edit Distance in Space-Efficient MPCProceedings of the 36th ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3626183.3660265(293-295)Online publication date: 17-Jun-2024
  • (2024)Teaching Parallel Algorithms Using the Binary-Forking Model2024 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW63119.2024.00080(346-351)Online publication date: 27-May-2024
  • (2023)Enhancing Performance in Heterogeneous Computing: A Comparative Study of CUDA on GPUs and CPUs2023 IEEE Fifth International Conference on Advances in Electronics, Computers and Communications (ICAECC)10.1109/ICAECC59324.2023.10560120(1-6)Online publication date: 7-Sep-2023
  • (2022)An Algorithm for the Sequence Alignment with Gap Penalty Problem using Multiway Divide-and-Conquer and Matrix TranspositionInformation Processing Letters10.1016/j.ipl.2021.106166173:COnline publication date: 22-Apr-2022
  • (2021)Visualizing Parallel Dynamic Programming using the Thread Safe Graphics Library2021 IEEE/ACM Ninth Workshop on Education for High Performance Computing (EduHPC)10.1109/EduHPC54835.2021.00009(24-31)Online publication date: Nov-2021
  • (2019)Fairness in responsive parallelismProceedings of the ACM on Programming Languages10.1145/33416853:ICFP(1-30)Online publication date: 26-Jul-2019
  • (2019)D2PProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3295500.3356205(1-22)Online publication date: 17-Nov-2019
  • (2019)Toward efficient architecture-independent algorithms for dynamic programsProceedings of the 24th Symposium on Principles and Practice of Parallel Programming10.1145/3293883.3299109(413-414)Online publication date: 16-Feb-2019
  • (2019)Toward Efficient Architecture-Independent Algorithms for Dynamic ProgramsHigh Performance Computing10.1007/978-3-030-20656-7_8(143-164)Online publication date: 17-May-2019
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media