Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/2388996.2389010acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article

A multi-objective auto-tuning framework for parallel codes

Published: 10 November 2012 Publication History

Abstract

In this paper we introduce a multi-objective auto-tuning framework comprising compiler and runtime components. Focusing on individual code regions, our compiler uses a novel search technique to compute a set of optimal solutions, which are encoded into a multi-versioned executable. This enables the runtime system to choose specifically tuned code versions when dynamically adjusting to changing circumstances.
We demonstrate our method by tuning loop tiling in cache-sensitive parallel programs, optimizing for both runtime and efficiency. Our static optimizer finds solutions matching or surpassing those determined by exhaustively sampling the search space on a regular grid, while using less than 4% of the computational effort on average. Additionally, we show that parallelism-aware multi-versioning approaches like our own gain a performance improvement of up to 70% over solutions tuned for only one specific number of threads.

References

[1]
K. Naono, K. Teranishi, J. Cavazos, and R. Suda, Software Automatic Tuning (From Concepts to State-of-the-Art Results). Secaucus, NJ, USA: Springer-Verlag New York, Inc., 2010.
[2]
R. Whaley and J. Dongarra, "Automatically tuned linear algebra software," in Proceedings of the 1998 ACM/IEEE conference on Supercomputing (CDROM). IEEE Computer Society, 1998, pp. 1--27.
[3]
R. Vuduc, J. Demmel, and K. Yelick, "Oski: A library of automatically tuned sparse matrix kernels," in Journal of Physics: Conference Series, vol. 16. IOP Publishing, 2005, p. 521.
[4]
K. Cooper, D. Subramanian, and L. Torczon, "Adaptive optimizing compilers for the 21st century," The Journal of Supercomputing, vol. 23, no. 1, pp. 7--22, 2001.
[5]
G. Fursin, Y. Kashnikov, A. Memon, Z. Chamski, O. Temam, M. Namolaru, E. Yom-Tov, B. Mendelson, A. Zaks, E. Courtois et al., "Milepost gcc: machine learning enabled self-tuning compiler," International Journal of Parallel Programming, vol. 39, no. 3, pp. 296--327, 2011.
[6]
J. Shirako, K. Sharma, N. Fauzia, L.-N. Pouchet, J. Ramanujam, P. Sadayappan, and V. Sarkar, "Analytical bounds for optimal tile size selection," in ETAPS International Conference on Compiler Construction (CC'12). Tallinn, Estonia: Springer Verlag, Mar. 2012.
[7]
A. Tiwari, C. Chen, J. Chame, M. Hall, and J. K. Hollingsworth, "A scalable auto-tuning framework for compiler optimization," in Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing, ser. IPDPS '09. Washington, DC, USA: IEEE Computer Society, 2009, pp. 1--12.
[8]
L. Pouchet, C. Bastoul, A. Cohen, and N. Vasilache, "Iterative optimization in the polyhedral model: Part i, one-dimensional time," in Code Generation and Optimization, 2007. CGO'07. International Symposium on. IEEE, 2007, pp. 144--156.
[9]
M. Baskaran, A. Hartono, S. Tavarageri, T. Henretty, J. Ramanujam, and P. Sadayappan, "Parameterized tiling revisited," in Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization. ACM, 2010, pp. 200--209.
[10]
T. Kisuki, P. M. W. Knijnenburg, and M. F. P. O'Boyle, "Combined selection of tile sizes and unroll factors using iterative compilation," in Proceedings of the 2000 International Conference on Parallel Architectures and Compilation Techniques, ser. PACT '00. Washington, DC, USA: IEEE Computer Society, 2000, pp. 237--.
[11]
A. Tiwari and J. K. Hollingsworth, "Online adaptive code generation and tuning," in Proceedings of the 2011 IEEE International Parallel & Distributed Processing Symposium. IEEE Computer Society, 2011, pp. 879--892.
[12]
C. A. C. Coello, G. B. Lamont, and D. A. V. Veldhuizen, Evolutionary Algorithms for Solving Multi-Objective Problems (Genetic and Evolutionary Computation). Secaucus, NJ, USA: Springer-Verlag New York, Inc., 2006.
[13]
R. Storn and K. Price, "Differential evolution: A simple and efficient heuristic for global optimization over continuous spaces." Journal of Global Optimization, vol. 11, no. 4, pp. 341--359, 1997.
[14]
Z. Pawlak, "Rough sets," International Journal of Parallel Programming, vol. 11, no. 5, pp. 341--356, 1982.
[15]
"Insieme comiler and runtime infrastructure." Distributed and Parallel Systems Group, University of Innsbruck. {Online}. Available: http://insieme-compiler.org
[16]
J. A. Nelder and R. Mead, "A Simplex Method for Function Minimization," The Computer Journal, vol. 7, no. 4, pp. 308--313, Jan. 1965.
[17]
S. Kukkonen and J. Lampinen, "Gde3: the third evolution step of generalized differential evolution," in IEEE Congress on Evolutionary Computation. IEEE, 2005, pp. 443--450.
[18]
J. J. Durillo, A. J. Nebro, F. Luna, C. A. Coello Coello, and E. Alba, "Convergence speed in multi-objective metaheuristics: Efficiency criteria and empirical study," International Journal for Numerical Methods in Engineering, vol. 84, no. 11, pp. 1344--1375, December 2010.
[19]
C. A. C. Coello and G. T. Pulido, "A micro-genetic algorithm for multiobjective optimization," Optimization, vol. 7, no. 5, pp. 126--140, 2001. {Online}. Available: citeseer.ist.psu.edu/444668.html
[20]
L. V. Santana-Quintero, A. G. Hernández-Díaz, J. M. Luque, C. A. C. Coello, and R. Caballero, "Demors: A hybrid multi-objective optimization algorithm using differential evolution and rough set theory for constrained problems," Computers & OR, vol. 37, no. 3, pp. 470--480, 2010.
[21]
C. Bastoul, "Code generation in the polyhedral model is easier than you think," in PACT'13 IEEE International Conference on Parallel Architecture and Compilation Techniques, Juan-les-Pins, France, September 2004, pp. 7--16.
[22]
E. Zitzler and L. Thiele, "Multiobjective evolutionary algorithms: a comparative case study and the strength pareto approach," IEEE Transactions on Evolutionary Computation, vol. 3, no. 4, pp. 257--271, 1999.
[23]
M. Puschel, J. Moura, J. Johnson, D. Padua, M. Veloso, B. Singer, J. Xiong, F. Franchetti, A. Gacic, Y. Voronenko et al., "Spiral: Code generation for dsp transforms," Proceedings of the IEEE, vol. 93, no. 2, pp. 232--275, 2005.
[24]
M. Frigo, "A fast fourier transform compiler," in Acm Sigplan Notices, vol. 34, no. 5. ACM, 1999, pp. 169--180.
[25]
C. Tapus, I. Chung, and J. Hollingsworth, "Active harmony: Towards automated performance tuning," in Supercomputing, ACM/IEEE 2002 Conference. IEEE, 2002, pp. 44--44.
[26]
K. Fatahalian, T. Knight, M. Houston, M. Erez, D. Horn, L. Leem, J. Park, M. Ren, A. Aiken, W. Dally et al., "Sequoia: programming the memory hierarchy," in SC 2006 Conference, Proceedings of the ACM/IEEE. IEEE, 2006, pp. 4--4.
[27]
J. Ansel, C. Chan, Y. Wong, M. Olszewski, Q. Zhao, A. Edelman, and S. Amarasinghe, PetaBricks: a language and compiler for algorithmic choice. ACM, 2009, vol. 44, no. 6.
[28]
S. P. Amarasinghe, "Petabricks: a language and compiler based on autotuning," in HiPEAC, M. Katevenis, M. Martonosi, C. Kozyrakis, and O. Temam, Eds. ACM, 2011, p. 3.
[29]
M. Christen, O. Schenk, and H. Burkhart, "Patus: A code generation and autotuning framework for parallel iterative stencil computations on modern microarchitectures," in Proceedings of the 2011 IEEE International Symposium on Parallel&Distributed Processing. IEEE Computer Society, 2011, pp. 676--687.
[30]
S. Kamil, C. Chan, L. Oliker, J. Shalf, S. Williams, and S. Williams, "An auto-tuning framework for parallel multicore stencil computations." in IPDPS, 2010, pp. 1--12.
[31]
L. Pouchet, C. Bastoul, A. Cohen, and J. Cavazos, "Iterative optimization in the polyhedral model: Part ii, multidimensional time," in ACM SIGPLAN Notices, vol. 43, no. 6. ACM, 2008, pp. 90--100.
[32]
C. Chen, J. Chame, and M. Hall, "Chill: A framework for composing high-level loop transformations," U. of Southern California, Tech. Rep, pp. 08--897, 2008.
[33]
K. Hoste, A. Georges, L. Eeckhout, and L. Eeckhout, "Automated just-in-time compiler tuning." in CGO. IEEE, 2010, pp. 62--72.
[34]
D. Bailey, J. Chame, C. Chen, J. Dongarra, M. Hall, J. Hollingsworth, P. Hovland, S. Moore, K. Seymour, J. Shin et al., "Peri auto-tuning," in Journal of Physics: Conference Series, vol. 125. IOP Publishing, 2008, p. 012089.
[35]
K. Hoste and L. Eeckhout, "Cole: Compiler optimization level exploration," in Proceedings of the 6th annual IEEE/ACM international symposium on Code generation and optimization. ACM, 2008, pp. 165--174.
[36]
P. Lokuciejewski, S. Plazar, H. Falk, P. Marwedel, and L. Thiele, "Multi-objective exploration of compiler optimizations for real-time systems," in ISORC, 2010, pp. 115--122.
[37]
K. Heydemann and F. Bodin, "Iterative compilation for two antagonistic criteria: Application to code size and performance," in Proceedings of the 4th Workshop on Optimizations for DSP and Embedded Systems, colocated with CGO, 2006.
[38]
F. Agakov, E. Bonilla, J. Cavazos, B. Franke, G. Fursin, M. O'Boyle, J. Thomson, M. Toussaint, and C. Williams, "Using machine learning to focus iterative optimization," in Proceedings of the International Symposium on Code Generation and Optimization. IEEE Computer Society, 2006, pp. 295--305.
[39]
M. Rahman, L.-N. Pouchet, and P. Sadayappan, "Neural network assisted tile size selection," in International Workshop on Automatic Performance Tuning (IWAPT'2010). Berkeley, CA: Springer Verlag, Jun. 2010.
[40]
A. Hartono, M. M. Baskaran, C. Bastoul, A. Cohen, S. Krishnamoorthy, B. Norris, J. Ramanujam, and P. Sadayappan, "Parametric multi-level tiling of imperfectly nested loops," in ICS, M. Gschwind, A. Nicolau, V. Salapura, and J. E. Moreira, Eds. ACM, 2009, pp. 147--157.
[41]
A. Hartono, M. M. Baskaran, J. Ramanujam, and P. Sadayappan, "Dyntile: Parametric tiled loop generation for parallel execution on multicore processors," in IPDPS. IEEE, 2010, pp. 1--12.
[42]
L. Renganarayanan, D. Kim, S. V. Rajopadhye, and M. M. Strout, "Parameterized tiled loops for free," in PLDI, J. Ferrante and K. S. McKinley, Eds. ACM, 2007, pp. 405--414.
[43]
J. Mars and R. Hundt, "Scenario based optimization: A framework for statically enabling online optimizations," in Code Generation and Optimization, 2009. CGO 2009. International Symposium on. IEEE, 2009, pp. 169--179.
[44]
X. Chen and S. Long, "Adaptive multi-versioning for openmp parallelization via machine learning," in Proceedings of the 2009 15th International Conference on Parallel and Distributed Systems, ser. ICPADS '09. Washington, DC, USA: IEEE Computer Society, 2009, pp. 907--912. {Online}. Available: http://dx.doi.org/10.1109/ICPADS.2009.77
[45]
24th IEEE International Symposium on Parallel and Distributed Processing, IPDPS 2010, Atlanta, Georgia, USA, 19-23 April 2010 - Conference Proceedings. IEEE, 2010.

Cited By

View all
  • (2021)Optimization of Java Virtual Machine Flags using Feature Model and Genetic AlgorithmCompanion of the ACM/SPEC International Conference on Performance Engineering10.1145/3447545.3451177(183-186)Online publication date: 19-Apr-2021
  • (2019)Optimizing I/O Performance of HPC Applications with AutotuningACM Transactions on Parallel Computing10.1145/33092055:4(1-27)Online publication date: 8-Mar-2019
  • (2019)Static Compiler Analyses for Application-specific Optimization of Task-Parallel Runtime SystemsJournal of Signal Processing Systems10.1007/s11265-018-1356-991:3-4(303-320)Online publication date: 1-Mar-2019
  • Show More Cited By
  1. A multi-objective auto-tuning framework for parallel codes

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SC '12: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
    November 2012
    1161 pages
    ISBN:9781467308045

    Sponsors

    Publisher

    IEEE Computer Society Press

    Washington, DC, United States

    Publication History

    Published: 10 November 2012

    Check for updates

    Qualifiers

    • Research-article

    Conference

    SC '12
    Sponsor:

    Acceptance Rates

    SC '12 Paper Acceptance Rate 100 of 461 submissions, 22%;
    Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

    Upcoming Conference

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)4
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 24 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2021)Optimization of Java Virtual Machine Flags using Feature Model and Genetic AlgorithmCompanion of the ACM/SPEC International Conference on Performance Engineering10.1145/3447545.3451177(183-186)Online publication date: 19-Apr-2021
    • (2019)Optimizing I/O Performance of HPC Applications with AutotuningACM Transactions on Parallel Computing10.1145/33092055:4(1-27)Online publication date: 8-Mar-2019
    • (2019)Static Compiler Analyses for Application-specific Optimization of Task-Parallel Runtime SystemsJournal of Signal Processing Systems10.1007/s11265-018-1356-991:3-4(303-320)Online publication date: 1-Mar-2019
    • (2018)Massively parallel skyline computation for processing-in-memory architecturesProceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques10.1145/3243176.3243187(1-12)Online publication date: 1-Nov-2018
    • (2017)SCALOACM Transactions on Architecture and Code Optimization10.1145/315864314:4(1-25)Online publication date: 18-Dec-2017
    • (2017)Towards fine-grained dynamic tuning of HPC applications on modern multi-core architecturesProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3126908.3126945(1-12)Online publication date: 12-Nov-2017
    • (2017)Task-parallel Runtime System Optimization Using Static Compiler AnalysisProceedings of the Computing Frontiers Conference10.1145/3075564.3075574(201-210)Online publication date: 15-May-2017
    • (2015)Acceleration of MPI Mechanisms for Sustainable HPC ApplicationsSupercomputing Frontiers and Innovations: an International Journal10.14529/jsfi1502022:2(28-45)Online publication date: 6-Apr-2015
    • (2015)Guided profiling for auto-tuning array layouts on GPUsProceedings of the 6th International Workshop on Performance Modeling, Benchmarking, and Simulation of High Performance Computing Systems10.1145/2832087.2832093(1-11)Online publication date: 15-Nov-2015
    • (2015)ANGELProceedings of the 5th International Workshop on Runtime and Operating Systems for Supercomputers10.1145/2768405.2768409(1-8)Online publication date: 16-Jun-2015
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media