Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article
Public Access

Autogen: Automatic Discovery of Efficient Recursive Divide-8-Conquer Algorithms for Solving Dynamic Programming Problems

Published: 05 October 2017 Publication History
  • Get Citation Alerts
  • Abstract

    We present Autogen—an algorithm that for a wide class of dynamic programming (DP) problems automatically discovers highly efficient cache-oblivious parallel recursive divide-and-conquer algorithms from inefficient iterative descriptions of DP recurrences. Autogen analyzes the set of DP table locations accessed by the iterative algorithm when run on a DP table of small size and automatically identifies a recursive access pattern and a corresponding provably correct recursive algorithm for solving the DP recurrence. We use Autogen to autodiscover efficient algorithms for several well-known problems. Our experimental results show that several autodiscovered algorithms significantly outperform parallel looping and tiled loop-based algorithms. Also, these algorithms are less sensitive to fluctuations of memory and bandwidth compared with their looping counterparts, and their running times and energy profiles remain relatively more stable. To the best of our knowledge, Autogen is the first algorithm that can automatically discover new nontrivial divide-and-conquer algorithms.

    References

    [1]
    PAPI. 2017. Performance Application Programming Interface (PAPI). http://icl.cs.utk.edu/papi/.
    [2]
    XSEDE. 2017. Extreme Science and Engineering Discovery Environment (XSEDE). http://www.xsede.org/.
    [3]
    Nawaaz Ahmed and Keshav Pingali. 2000. Automatic generation of block-recursive codes. In Proceedings of the 6th European Conference on Parallel Processing (Euro-Par’00). 368--378.
    [4]
    Vineet Bafna and Nathan Edwards. 2003. On de novo interpretation of tandem mass spectra for peptide identification. In Proceedings of the 7th Annual International Conference on Research in Computational Molecular Biology (RCMB’03). 9--18.
    [5]
    Richard Bellman. 1957. Dynamic Programming. Princeton University Press.
    [6]
    Michael Bender, Roozbeh Ebrahimi, Jeremy Fineman, Golnaz Ghasemiesfeh, Rob Johnson, and Samuel McCauley. 2014. Cache-adaptive algorithms. In Proceedings of the 25th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA’14). 958--971.
    [7]
    Uday Bondhugula, Albert Hartono, Jagannathan Ramanujam, and Ponnuswamy Sadayappan. 2008. A practical automatic polyhedral parallelizer and locality optimizer. ACM SIGPLAN Notices 43, 6 (2008), 101--113.
    [8]
    Rezaul Chowdhury. 2007. Cache-Efficient Algorithms and Data Structures: Theory and Experimental Evaluation. Ph.D. Dissertation. Department of Computer Sciences, The University of Texas, Austin, Texas.
    [9]
    Rezaul Chowdhury and Pramod Ganapathi. Divide-and-Conquer Variants of Bubble, Selection, and Insertion Sorts. Unpublished manuscript.
    [10]
    Rezaul Chowdhury, Pramod Ganapathi, Vivek Pradhan, Jesmin Jahan Tithi, and Yunpeng Xiao. 2016. An efficient cache-oblivious parallel Viterbi algorithm. In Proceedings of the 22nd European Conference on Parallel Processing (Euro-Par’16). 574--587.
    [11]
    Rezaul Chowdhury, Pramod Ganapathi, Yuan Tang, and Jesmin Jahan Tithi. 2017. Provably efficient scheduling of cache-oblivious wavefront algorithms. In Proceedings of the 29th ACM Symposium on Parallelism in Algorithms and Architectures (SPAA’17). 339--350.
    [12]
    Rezaul Chowdhury, Pramod Ganapathi, Jesmin Jahan Tithi, Charles Bachmeier, Bradley Kuszmaul, Charles Leiserson, Armando Solar-Lezama, and Yuan Tang. 2016. AutoGen: Automatic discovery of cache-oblivious parallel recursive algorithms for solving dynamic programs. In Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’16). Article 10.
    [13]
    Rezaul Chowdhury and Vijaya Ramachandran. 2006. Cache-oblivious dynamic programming. In Proceedings of the 17th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA’06). 591--600.
    [14]
    Rezaul Chowdhury and Vijaya Ramachandran. 2008. Cache-efficient dynamic programming algorithms for multicores. In Proceedings of the 20th ACM Symposium on Parallelism in Algorithms and Architectures (SPAA’08). 207--216.
    [15]
    Rezaul Chowdhury and Vijaya Ramachandran. 2010. The cache-oblivious Gaussian elimination paradigm: Theoretical framework, parallelization and experimental evaluation. Theory of Computing Systems 47, 4 (2010), 878--919.
    [16]
    John Cocke. 1969. Programming Languages and Their Compilers: Preliminary Notes. Courant Institute of Mathematical Sciences, New York University.
    [17]
    Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. 2009. Introduction to Algorithms (3rd ed.). MIT Press.
    [18]
    Jun Du, Ce Yu, Jizhou Sun, Chao Sun, Shanjiang Tang, and Yanlong Yin. 2013. EasyHPS: A multilevel hybrid parallel system for dynamic programming. In Proceedings of the 27th IEEE International Parallel and Distributed Processing Symposium Workshops 8 PhD Forum (IPDPSW’13). 630--639.
    [19]
    F. C. Duckworth and A. J. Lewis. 1998. A fair method for resetting the target in interrupted one-day cricket matches. Journal of the Operational Research Society 49, 3 (1998), 220--227.
    [20]
    Richard Durbin, Sean R. Eddy, Anders Krogh, and Graeme Mitchison. 1998. Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge University Press.
    [21]
    D. Eklov, N. Nikoleris, D. Black-Schaffer, and E. Hagersten. 2011. Cache pirating: Measuring the curse of the shared cache. In Proceeding of the 40th International Conference on Parallel Processing (ICPP’11). 165--175.
    [22]
    Robert W. Floyd. 1962. Algorithm 97: Shortest path. CACM 5, 6 (1962), 345.
    [23]
    Matteo Frigo, Charles E. Leiserson, Harald Prokop, and Sridhar Ramachandran. 1999. Cache-oblivious algorithms. In Proceedings of the 40th Annual Symposium on Foundations of Computer Science (FOCS’99). 285--297.
    [24]
    Zvi Galil and Kunsoo Park. 1994. Parallel algorithms for dynamic programming recurrences with more than O(1) dependency. J. Parallel and Distrib. Comput. 21, 2 (1994), 213--222.
    [25]
    Pramod Ganapathi. 2016. Automatic Discovery of Efficient Divide-8-Conquer Algorithms for Dynamic Programming Problems. Ph.D. Dissertation. Department of Computer Science, Stony Brook University.
    [26]
    Robert Giegerich and Georg Sauthoff. 2011. Yield grammar analysis in the Bellman’s GAP compiler. In Proceedings of the 11th Workshop on Language Descriptions, Tools and Applications (LDTA’11). Article 7.
    [27]
    Dan Gusfield. 1997. Algorithms on Strings, Trees and Sequences. Cambridge University Press.
    [28]
    Frederick S. Hillier and Gerald J. Lieberman. 2010. Introduction to Operations Research (9th ed.). McGraw-Hill.
    [29]
    Daniel S. Hirschberg. 1975. A linear space algorithm for computing maximal common subsequences. Commun. ACM 18, 6 (1975), 341--343.
    [30]
    Shachar Itzhaky, Rohit Singh, Armando Solar-Lezama, Kuat Yessenov, Yongquan Lu, Charles Leiserson, and Rezaul Chowdhury. 2016. Deriving divide-and-conquer dynamic programming algorithms using solver-aided transformations. In Proceedings of the ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA’16). ACM, 145--164.
    [31]
    Ravi Kannan. 1987. Minkowski’s convex body theorem and integer programming. Mathematics of Operations Research 12, 3 (1987), 415--440.
    [32]
    Tadao Kasami. 1965. An efficient recognition and syntax-analysis algorithm for context-free languages. Technical Report. Department of Electrical Engineering, Hawaii University, Honolulu.
    [33]
    John O. S. Kennedy. 1981. Applications of dynamic programming to agriculture, forestry and fisheries: Review and prognosis. Review of Market Agricultural Economics 49, 3 (1981), 141--173.
    [34]
    Hendrik W. Lenstra Jr. 1983. Integer programming with a fixed number of variables. Mathematics of Operations Research 8, 4 (1983), 538--548.
    [35]
    Anany Levitin. 2011. Introduction to the Design and Analysis of Algorithms (3rd ed.). Pearson.
    [36]
    Art Lew and Holger Mauch. 2006. Dynamic Programming: A Computational Tool. Studies in Computational Intelligence, Vol. 38. Springer.
    [37]
    Weiguo Liu and Bertil Schmidt. 2004. A generic parallel pattern-based system for bioinformatics. In Proceedings of the 10th European Conference on Parallel Processing (Euro-Par’04). Springer, 989--996.
    [38]
    Yewen Pu, Rastislav Bodik, and Saurabh Srivastava. 2011. Synthesis of first-order dynamic programming algorithms. ACM SIGPLAN Notices 46, 10 (2011), 83--98.
    [39]
    Raphael Reitzig. 2012. Automated Parallelisation of Dynamic Programming Recursions. Masters Thesis: University of Kaiserslautern (2012).
    [40]
    Alexander A. Robichek, Edwin J. Elton, and Martin J. Gruber. 1971. Dynamic programming applications in finance. The Journal of Finance 26, 2 (1971), 473--506.
    [41]
    David Romer. 2002. It’s Fourth Down and What Does the Bellman Equation Say? A Dynamic Programming Analysis of Football Strategy. Technical Report. National Bureau of Economic Research.
    [42]
    John Rust. 1996. Numerical dynamic programming in economics. Handbook of Computational Economics 1 (1996), 619--729.
    [43]
    Hiroaki Sakoe and Seibi Chiba. 1978. Dynamic programming algorithm optimization for spoken word recognition. IEEE Transactions on Acoustics, Speech, and Signal Processing 26, 1 (1978), 43--49.
    [44]
    David Sankoff. 1985. Simultaneous solution of the RNA folding, alignment and protosequence problems. SIAM Journal on Applied Mathematics 45, 5 (1985), 810--825.
    [45]
    David K. Smith. 2007. Dynamic programming and board games: A survey. European Journal of Operational Research 176, 3 (2007), 1299--1318.
    [46]
    Moshe Sniedovich. 2010. Dynamic Programming: Foundations and Principles. CRC Press.
    [47]
    Shanjiang Tang, Ce Yu, Jizhou Sun, Bu-Sung Lee, Tao Zhang, Zhen Xu, and Huabei Wu. 2012. EasyPDP: An efficient parallel dynamic programming runtime system for computational biology. IEEE Transactions on Parallel and Distributed Systems 23, 5 (2012), 862--872.
    [48]
    Yuan Tang, Rezaul Chowdhury, Bradley C. Kuszmaul, Chi-Keung Luk, and Charles E. Leiserson. 2011. The Pochoir stencil compiler. In Proceedings of the 23rd ACM Symposium on Parallelism in Algorithms and Architectures (SPAA’11). 117--128.
    [49]
    Yuan Tang, Rezaul Chowdhury, Chi-Keung Luk, and Charles E. Leiserson. 2011. Coding stencil computations using the Pochoir stencil-specification language. In Proceedings of the 3rd USENIX Workshop on Hot Topics in Parallelism (HotPar’11).
    [50]
    Yuan Tang, Ronghui You, Haibin Kan, Jesmin Jahan Tithi, Pramod Ganapathi, and Rezaul A. Chowdhury. 2015. Cache-oblivious wavefront: Improving parallelism of recursive dynamic programming algorithms without losing cache-efficiency. ACM SIGPLAN Notices 50, 8 (2015), 205--214.
    [51]
    Jesmin Tithi, Pramod Ganapathi, Aakrati Talati, Sonal Agarwal, and Rezaul Chowdhury. 2015. High-performance energy-efficient recursive dynamic programming with matrix-multiplication-like flexible kernels. In Proceedings of the 29th IEEE International Parallel 8 Distributed Processing Symposium (IPDPS’15).
    [52]
    John Towns, Timothy Cockerill, Maytal Dahan, Ian Foster, Kelly Gaither, Andrew Grimshaw, Victor Hazlewood, Scott Lathrop, Dave Lifka, Gregory D. Peterson, Ralph Roskies, J. Ray Scott, and Nancy Wilkens-Diehr. 2014. XSEDE: Accelerating scientific discovery. Computing in Science and Engineering 16, 5 (2014), 62--74.
    [53]
    Jan Treibig, Georg Hager, and Gerhard Wellein. 2010. Likwid: A lightweight performance-oriented tool suite for x86 multicore environments. In Proceedings of the 39th International Conference on Parallel Processing Workshops (ICPPW’10). 207--216.
    [54]
    Jeffrey D. Ullman, Alfred V. Aho, and John E. Hopcroft. 1974. The Design and Analysis of Computer Algorithms. Addison-Wesley, Reading.
    [55]
    Chen Wang, Ce Yu, Shanjiang Tang, Jian Xiao, Jizhou Sun, and Xiangfei Meng. 2016. A general and fast distributed system for large-scale dynamic programming applications. Parallel Computing 60 (2016), 1--21.
    [56]
    Michael S. Waterman. 1995. Introduction to Computational Biology: Maps, Sequences and Genomes. Chapman 8 Hall Ltd.
    [57]
    Daniel H. Younger. 1967. Recognition and parsing of context-free languages in time n3. Information and Control 10, 2 (1967), 189--208.

    Cited By

    View all
    • (2024)GPT-Driven Source-to-Source Transformation for Generating Compilable Parallel CUDA Code for Nussinov’s AlgorithmElectronics10.3390/electronics1303048813:3(488)Online publication date: 24-Jan-2024
    • (2023)Time and Energy Benefits of Using Automatic Optimization Compilers for NPDP TasksElectronics10.3390/electronics1217357912:17(3579)Online publication date: 24-Aug-2023
    • (2023)NPDP benchmark suite for the evaluation of the effectiveness of automatic optimizing compilersParallel Computing10.1016/j.parco.2023.103016116:COnline publication date: 1-Jul-2023
    • Show More Cited By

    Index Terms

    1. Autogen: Automatic Discovery of Efficient Recursive Divide-8-Conquer Algorithms for Solving Dynamic Programming Problems

          Recommendations

          Comments

          Information & Contributors

          Information

          Published In

          cover image ACM Transactions on Parallel Computing
          ACM Transactions on Parallel Computing  Volume 4, Issue 1
          Special Issue: Invited papers from PPoPP 2016, Part 1
          March 2017
          170 pages
          ISSN:2329-4949
          EISSN:2329-4957
          DOI:10.1145/3131890
          Issue’s Table of Contents
          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          Published: 05 October 2017
          Accepted: 01 July 2017
          Revised: 01 June 2017
          Received: 01 January 2017
          Published in TOPC Volume 4, Issue 1

          Permissions

          Request permissions for this article.

          Check for updates

          Author Tags

          1. Autogen
          2. automatic discovery
          3. cache-efficient
          4. cache-oblivious
          5. divide-and-conquer
          6. dynamic programming
          7. energy-efficient
          8. parallel
          9. recursive

          Qualifiers

          • Research-article
          • Research
          • Refereed

          Funding Sources

          Contributors

          Other Metrics

          Bibliometrics & Citations

          Bibliometrics

          Article Metrics

          • Downloads (Last 12 months)198
          • Downloads (Last 6 weeks)14
          Reflects downloads up to 11 Aug 2024

          Other Metrics

          Citations

          Cited By

          View all
          • (2024)GPT-Driven Source-to-Source Transformation for Generating Compilable Parallel CUDA Code for Nussinov’s AlgorithmElectronics10.3390/electronics1303048813:3(488)Online publication date: 24-Jan-2024
          • (2023)Time and Energy Benefits of Using Automatic Optimization Compilers for NPDP TasksElectronics10.3390/electronics1217357912:17(3579)Online publication date: 24-Aug-2023
          • (2023)NPDP benchmark suite for the evaluation of the effectiveness of automatic optimizing compilersParallel Computing10.1016/j.parco.2023.103016116:COnline publication date: 1-Jul-2023
          • (2023)NPDP Benchmark Suite for Loop Tiling Effectiveness EvaluationParallel Processing and Applied Mathematics10.1007/978-3-031-30445-3_5(51-62)Online publication date: 27-Apr-2023
          • (2022)An Algorithm for the Sequence Alignment with Gap Penalty Problem using Multiway Divide-and-Conquer and Matrix TranspositionInformation Processing Letters10.1016/j.ipl.2021.106166173:COnline publication date: 1-Jan-2022
          • (2021)Reverse engineering for reduction parallelization via semiring polynomialsProceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation10.1145/3453483.3454079(820-834)Online publication date: 19-Jun-2021
          • (2021)A Unified Framework to Discover Permutation Generation AlgorithmsThe Computer Journal10.1093/comjnl/bxab18166:3(603-614)Online publication date: 17-Nov-2021
          • (2021)Parallel Divide-and-Conquer Algorithms for Bubble Sort, Selection Sort and Insertion SortThe Computer Journal10.1093/comjnl/bxab107Online publication date: 2-Aug-2021
          • (2021)(When) Do Multiple Passes Save Energy?Embedded Computer Systems: Architectures, Modeling, and Simulation10.1007/978-3-031-04580-6_30(451-466)Online publication date: 4-Jul-2021
          • (2019)Optimizing RNA-RNA interaction computationsProceedings of the 2019 IEEE/ACM International Symposium on Code Generation and Optimization10.5555/3314872.3314907(269-270)Online publication date: 16-Feb-2019
          • Show More Cited By

          View Options

          View options

          PDF

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader

          Get Access

          Login options

          Full Access

          Media

          Figures

          Other

          Tables

          Share

          Share

          Share this Publication link

          Share on social media