Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1065944.1065981acmconferencesArticle/Chapter ViewAbstractPublication PagesppoppConference Proceedingsconference-collections
Article

A framework for adaptive algorithm selection in STAPL

Published: 15 June 2005 Publication History
  • Get Citation Alerts
  • Abstract

    Writing portable programs that perform well on multiple platforms or for varying input sizes and types can be very difficult because performance is often sensitive to the system architecture, the run-time environment, and input data characteristics. This is even more challenging on parallel and distributed systems due to the wide variety of system architectures. One way to address this problem is to adaptively select the best parallel algorithm for the current input data and system from a set of functionally equivalent algorithmic options. Toward this goal, we have developed a general framework for adaptive algorithm selection for use in the Standard Template Adaptive Parallel Library (STAPL). Our framework uses machine learning techniques to analyze data collected by STAPL installation benchmarks and to determine tests that will select among algorithmic options at run-time. We apply a prototype implementation of our framework to two important parallel operations, sorting and matrix multiplication, on multiple platforms and show that the framework determines run-time tests that correctly select the best performing algorithm from among several competing algorithmic options in 86-100% of the cases studied, depending on the operation and the system.

    References

    [1]
    P. An, A. Jula, S. Rus, S. Saunders, T. Smith, G. Tanase, N. Thomas, N. Amato, and L. Rauchwerger. STAPL: A standard template adaptive parallel C++ library. In Proc. of the Intern. Workshop on Advanced Compiler Technology for High Performance and Embedded Processors (IWACT), Bucharest, Romania, July 2001.]]
    [2]
    P. An, A. Jula, S. Rus, S. Saunders, T. Smith, G. Tanase, N. Thomas, N. Amato, and L. Rauchwerger. STAPL: An adaptive, generic parallel programming library for C++. In Workshop on Languages and Compilers for Parallel Computing (LCPC), Cumberland Falls, KY, August 2001.]]
    [3]
    Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau, David E. Culler, Joseph M. Hellerstein, and David A. Patterson. High-performance sorting on networks of workstations. In Proc. ACM Conference on the Management of Data (SIGMOD), pages 243--254, 1997.]]
    [4]
    G. E. Blelloch, C. E. Leiserson, B. M. Maggs, C. G. Plaxton, S. J. Smith, and M. Zagha. A comparison of sorting algorithms for the Connection Machine CM-2. In Proc. ACM Symp. Par. Alg. Arch. (SPAA), pages 3--16, 1991.]]
    [5]
    Guy E. Blelloch, Siddhartha Chatterjee, Jonathan C. Hardwick, Jay Sipelstein, and Marco Zagha. Implementation of a portable nested data-parallel language. In Proc. ACM SIGPLAN Symp. Prin. Prac. Par. Prog. (PPoPP), pages 102--111. 1993.]]
    [6]
    Eric A. Brewer. High-level optimization via automated statistical modeling. In Proc. ACM SIGPLAN Symp. Prin. Prac. Par. Prog. (PPoPP), pages 80--91, 1995.]]
    [7]
    W.H. Burge. Sorting, trees, and measures of order. Information and Control, 1(3):181--197, 1958.]]
    [8]
    Geeta Chaudhry, Wisniewski Wisniewski, and Thomas H. Cormen. Columnsort lives! an efficient out-of-core sorting program, In Proc. of the 13th Annual ACM Symp. on Parallel Algorithms and Architectures (SPAA), July 2001.]]
    [9]
    Jaeyoung Choi, J. J. Dongarra, L. S. Ostrouchov, Petitet, A. P., D. W. Walker, and R. C. Whaley. Design and implementation of the ScaLAPACK LU, QR, and Cholesky factorization routines. Scientific Programming, 5(3):173--184, Fall 1996.]]
    [10]
    Jaeyoung Choi, Jack J. Dongarra, and David W. Walker. PUMMA: Parallel Universal Matrix Multiplication Algorithms on distributed memory concurrent computers. Concurrency: Practice and Experience, 6(7):543--570, 1994.]]
    [11]
    Curtis R. Cook and Do Jin Kim. Best sorting algorithm for nearly sorted lists. Communications of the ACM, 23(11):620--624, 1980.]]
    [12]
    Edsger W. Dijkstra. Smoothsort, an alternative for sorting in situ. Science of Computer Programming, 1(3):223--233, 1982.]]
    [13]
    Pedro Domingos and Michael Pazzani. On the optimality of the simple bayesian classifier under zero-one loss. Machine Learning, 29:103--130, 1997.]]
    [14]
    Amr Elmasry and Michael L. Fredman. Adaptive sorting and the information theoretic lower bound. In STACS, pages 654--662, 2003.]]
    [15]
    Vladimir Estivill-Castro and Derick Wood. A survey of adaptive sorting algorithms. ACM Computing Surveys, 24(4):441--476, December 1992.]]
    [16]
    G. C. Fox and S. W. Otto. Matrix algorithms on a hypercube I: matrix multiplication.]]
    [17]
    R. A. Van De Geijn and J. Watts. SUMMA: scalable universal matrix multiplication algorithm. Concurrency: Practice and Experience, 9(4):255--274, 1997.]]
    [18]
    C. L. Lawson, R. J. Hanson, D. R. Kincaid, and F. T. Krogh. Basic linear algebra subprograms for fortran usage. ACM Trans. Math. Softw., 5(3):308--323, 1979.]]
    [19]
    Hyuk-Jae Lee, James P. Robertson, and Jos A. B. Fortes. Generalized cannon's algorithm for parallel matrix multiplication. In Proc. of the 11th international conference on Supercomputing, pages 44--51. ACM Press, 1997.]]
    [20]
    T. Leighton. Tight bounds on the complexity of parallel sorting. IEEE Trans. Comput., c-34(4):344--354, 1985.]]
    [21]
    X. Li, M. J. Garzaran, and D. Padua. A dynamically tuned sorting library. In Proc. of the Intern. Symp. on Code Generation and Optimization, pages 111--124, March 2004.]]
    [22]
    D.B. Loveman. High performance fortran. IEEE Parallel and Distributed Technology, 1:25--42, 1993.]]
    [23]
    Tom M. Mitchell. Machine Learning. McGraw-Hill, New York, 1997.]]
    [24]
    David Musser, Gillmer Derge, and Atul Saini. STL Tutorial and Reference Guide, Second Edition. Addison-Wesley, 2001.]]
    [25]
    Marek Olszewski and Michael Voss. Proc. of the international conference on parallel and distributed processing techniques and applications, pdpta '04, june 21-24, 2004, las vegas, nevada, usa, volume 1. In Hamid R. Arabnia, editor, PDPTA. CSREA Press, 2004.]]
    [26]
    Ola Petersson and Alistair Moffat. A framework for adaptive sorting. In Third Scandinavian Workshop on Algorithm Theory (SWAT), pages 422--433, 1992.]]
    [27]
    Of Signal Processing. SPIRAL: A generator for platform-adapted libraries.]]
    [28]
    J. Ross Quinlan. Induction of decision trees. Machine Learning, 1(1):81--106, 1986.]]
    [29]
    L. Rauchwerger, F. Arzu, and K. Ouchi. Standard Templates Adaptive Parallel Library. In Proc. of the 4th Intern. Workshop on Languages, Compilers and Run-Time Systems for Scalable Computers (LCR), Pittsburgh, PA, May 1998.]]
    [30]
    J. R. Rice. The algorithm selection problem. Advances in Computers, 15:65--118, 1976.]]
    [31]
    D. Rumelhart, G. Hinton, and R. Williams. Learning internal representations by error propagation. IEEE Transactions on Parallel and Distributed Systems: Explorations in the Microstructure of Cognition, 1, 1986.]]
    [32]
    Steven Saunders and Lawrence Rauchwerger. ARMI: an adaptive, platform independent communication library. In Proc. ACM SIGPLAN Symp. Prin. Prac. Par. Prog. (PPoPP), pages 230--241. ACM Press, 2003.]]
    [33]
    Steven Saunders, Nathan Thomas, Nancy Amato, and Lawrence Rauchwerger. Adaptive parallel sorting in the STAPL library. Technical Report TR01-005, Parasol Laboratory, Texas A&M University, November 2001.]]
    [34]
    Jeremy G. Siek and Andrew Lumsdaine. The matrix template library: A generic programming approach to high performance numerical linear algebra. In ISCOPE, pages 59--70, 1998.]]
    [35]
    R. Vuduc, J. Demmel, and J. Bilmes. Statistical models for empirical search-based performance tuning. Int. Journal of High Performance Computing Applications, 18(1):65--94, February 2004.]]
    [36]
    R. Clint Whaley, Antoine Petitet, and J. Dongarra. Automated empirical optimizations of software and the ATLAS project. Parallel Computing, 27(1-2):3--35, January 2001.]]
    [37]
    Hao Yu and Lawrence Rauchwerger. Adaptive reduction parallelization techniques. In Proc. of the 14th ACM Intern. Conference on Supercomputing (ICS), pages 66--77, New York, NY, USA, 2000. ACM Press.]]
    [38]
    Hao Yu, Dongmin Zhang, and Lawrence Rauchwerger. An adaptive algorithm selection framework. In Proceedings of the Parallel Architecture and Compilation Techniques, 13th International Conference on (PACT'04), pages 278--289. IEEE Computer Society, 2004.]]

    Cited By

    View all
    • (2022)Dense dynamic blocksProceedings of the 36th ACM International Conference on Supercomputing10.1145/3524059.3532369(1-14)Online publication date: 28-Jun-2022
    • (2019)Using meta-heuristics and machine learning for software optimization of parallel computing systemsComputing10.1007/s00607-018-0614-9101:8(893-936)Online publication date: 1-Aug-2019
    • (2019)Mozart : Efficient Composition of Library Functions for Heterogeneous ExecutionLanguages and Compilers for Parallel Computing10.1007/978-3-030-35225-7_13(182-202)Online publication date: 15-Nov-2019
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    PPoPP '05: Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming
    June 2005
    310 pages
    ISBN:1595930809
    DOI:10.1145/1065944
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 15 June 2005

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. adaptive algorithms
    2. machine learning
    3. matrix multiplication
    4. parallel algorithms
    5. sorting

    Qualifiers

    • Article

    Conference

    PPoPP05
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 230 of 1,014 submissions, 23%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)25
    • Downloads (Last 6 weeks)3
    Reflects downloads up to 10 Aug 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2022)Dense dynamic blocksProceedings of the 36th ACM International Conference on Supercomputing10.1145/3524059.3532369(1-14)Online publication date: 28-Jun-2022
    • (2019)Using meta-heuristics and machine learning for software optimization of parallel computing systemsComputing10.1007/s00607-018-0614-9101:8(893-936)Online publication date: 1-Aug-2019
    • (2019)Mozart : Efficient Composition of Library Functions for Heterogeneous ExecutionLanguages and Compilers for Parallel Computing10.1007/978-3-030-35225-7_13(182-202)Online publication date: 15-Nov-2019
    • (2018)A Machine Learning-Based Approach for Selecting SpMV Kernels and Matrix Storage FormatsIEICE Transactions on Information and Systems10.1587/transinf.2017EDP7176E101.D:9(2307-2314)Online publication date: 1-Sep-2018
    • (2018)A Review of Machine Learning and Meta-heuristic Methods for Scheduling Parallel Computing SystemsProceedings of the International Conference on Learning and Optimization Algorithms: Theory and Applications10.1145/3230905.3230906(1-6)Online publication date: 2-May-2018
    • (2016)A self-adaptive approach to efficiently manage energy and performance in tomorrow's heterogeneous computing systemsProceedings of the 2016 Conference on Design, Automation & Test in Europe10.5555/2971808.2972017(906-911)Online publication date: 14-Mar-2016
    • (2015)Autotuning algorithmic choice for input sensitivityACM SIGPLAN Notices10.1145/2813885.273796950:6(379-390)Online publication date: 3-Jun-2015
    • (2015)Automated multi-objective control for self-adaptive software designProceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering10.1145/2786805.2786833(13-24)Online publication date: 30-Aug-2015
    • (2015)FASTProceedings of the 29th ACM on International Conference on Supercomputing10.1145/2751205.2751214(187-196)Online publication date: 8-Jun-2015
    • (2015)Autotuning algorithmic choice for input sensitivityProceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation10.1145/2737924.2737969(379-390)Online publication date: 3-Jun-2015
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media