Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/2190025.2190059acmconferencesArticle/Chapter ViewAbstractPublication PagescgoConference Proceedingsconference-collections
Article

Predictive modeling in a polyhedral optimization space

Published: 02 April 2011 Publication History

Abstract

Significant advances in compiler optimization have been made in recent years, enabling many transformations such as tiling, fusion, parallelization and vectorization on imperfectly nested loops. Nevertheless, the problem of finding the best combination of loop transformations remains a major challenge. Polyhedral models for compiler optimization have demonstrated strong potential for enhancing program performance, in particular for compute-intensive applications. But existing static cost models to optimize polyhedral transformations have significant limitations, and iterative compilation has become a very promising alternative to these models to find the most effective transformations. But since the number of polyhedral optimization alternatives can be enormous, it is often impractical to iterate over a significant fraction of the entire space of polyhedrally transformed variants. Recent research has focused on iterating over this search space either with manually-constructed heuristics or with automatic but very expensive search algorithms (e.g., genetic algorithms) that can eventually find good points in the polyhedral space. In this paper, we propose the use of machine learning to address the problem of selecting the best polyhedral optimizations. We show that these models can quickly find high-performance program variants in the polyhedral space, without resorting to extensive empirical search. We introduce models that take as input a characterization of a program based on its dynamic behavior, and predict the performance of aggressive high-level polyhedral transformations that includes tiling, parallelization and vectorization. We allow for a minimal empirical search on the target machine, discovering on average 83% of the search-space-optimal combinations in at most 5 runs. Our end-to-end framework is validated using numerous benchmarks on two multi-core platforms.

References

[1]
F. Agakov, E. Bonilla, J. Cavazos, B. Franke, G. Fursin, M. O'Boyle, J. Thomson, M. Toussaint, and C. Williams. Using machine learning to focus iterative optimization. In 4th Annual International Symposium on Code Generation and Optimization (CGO), Mar. 2006.
[2]
C. Bastoul. Code generation in the polyhedral model is easier than you think. In IEEE Intl. Conf. on Parallel Architectures and Compilation Techniques (PACT'04), pages 7-16, Juan-les-Pins, France, Sept. 2004.
[3]
M.-W. Benabderrahmane, L.-N. Pouchet, A. Cohen, and C. Bastoul. The polyhedral model is more widely applicable than you think. In Intl. Conf. on Compiler Construction (ETAPS CC'10), LNCS 6011, pages 283-303, Paphos, Cyprus, Mar. 2010.
[4]
U. Bondhugula, M. Baskaran, S. Krishnamoorthy, J. Ramanujam, A. Rountev, and P. Sadayappan. Automatic transformations for communication-minimized parallelization and locality optimization in the polyhedral model. In International conference on Compiler Construction (ETAPS CC), Apr. 2008.
[5]
U. Bondhugula, A. Hartono, J. Ramanujam, and P. Sadayappan. A practical automatic polyhedral program optimization system. In ACM SIGPLAN Conference on Programming Language Design and Implementation, June 2008.
[6]
R. R. Bouckaert, E. Frank, M. A. Hall, G. Holmes, B. Pfahringer, P. Reutemann, and I. H. Witten. WEKA-experiences with a java opensource project. Journal of Machine Learning Research, 11:2533-2541, 2010.
[7]
J. Cavazos, C. Dubach, F. Agakov, E. Bonilla, M. F. O'Boyle, G. Fursin, and O. Temam. Automatic performance model construction for the fast software exploration of new hardware designs. In Proceedings of the International Conference on Compilers, Architecture, And Synthesis For Embedded Systems (CASES 2006), October 2006.
[8]
J. Cavazos, G. Fursin, F. V. Agakov, E. V. Bonilla, M. F. P. O'Boyle, and O. Temam. Rapidly selecting good compiler optimizations using performance counters. In CGO, pages 185-197, 2007.
[9]
C. Chen, J. Chame, and M. Hall. CHiLL: A framework for composing high-level loop transformations. Technical Report 08-897, U. of Southern California, 2008.
[10]
K. D. Cooper, P. J. Schielke, and D. Subramanian. Optimizing for reduced code space using genetic algorithms. In Workshop on Languages, Compilers, and Tools for Embedded Systems, pages 1-9, Atlanta, Georgia, July 1999. ACM Press.
[11]
K. D. Cooper, D. Subramanian, and L. Torczon. Adaptive optimizing compilers for the 21st century. Journal of Supercomputing, 23(1):7-22, August 2002.
[12]
C. Dubach, J. Cavazos, B. Franke, M. O'Boyle, G. Fursin, and O. Temam. Fast compiler optimisation evaluation using code-feature based performance prediction. In Proceedings of the ACM International Conference on Computing Frontiers, May 2007.
[13]
C. Dubach, T. M. Jones, E. V. Bonilla, G. Fursin, and M. F. O'Boyle. Portable compiler optimization across embedded programs and microarchitectures using machine learning. In Proceedings of the IEEE/ACM International Symposium on Microarchitecture (MICRO), December 2009.
[14]
P. Feautrier. Some efficient solutions to the affine scheduling problem, part II: multidimensional time. Intl. J. of Parallel Programming, 21(6):389-420, Dec. 1992.
[15]
B. Franke, M. O'Boyle, J. Thomson, and G. Fursin. Probabilistic sourcelevel optimisation of embedded programs. In Proceedings of the 2005 ACM SIGPLAN/SIGBED Conference on Languages, Compilers, and Tools for Embedded Systems, pages 78-86, New York, NY, USA, 2005. ACM Press.
[16]
M. Frigo and S. G. Johnson. The design and implementation of FFTW3. Proceedings of the IEEE, 93(2):216-231, 2005. special issue on "Program Generation, Optimization, and Platform Adaptation".
[17]
G. Fursin, C. Miranda, O. Temam, M. Namolaru, E. Yom-Tov, A. Zaks, B. Mendelson, P. Barnard, E. Ashton, E. Courtois, F. Bodin, E. Bonilla, J. Thomson, H. Leather, C. Williams, and M. O'Boyle. MILEPOST GCC: machine learning based research compiler. In Proceedings of the GCC Developers' Summit, June 2008.
[18]
S. Girbal, N. Vasilache, C. Bastoul, A. Cohen, D. Parello, M. Sigler, and O. Temam. Semi-automatic composition of loop transformations. International Journal of Parallel Programming, 34(3):261-317, June 2006.
[19]
M. Haneda, P. M. W. Knijnenburg, and H. A. G. Wijshoff. Automatic selection of compiler options using non-parametric inferential statistics. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, pages 123-132, Washington, DC, USA, 2005. IEEE Computer Society.
[20]
INRIA and The Ohio State University. Polybench, the polyhedral benchmark suite. http://www-rocq.inria.fr/~pouchet/software/polybench.
[21]
F. Irigoin and R. Triolet. Supernode partitioning. In ACM SIGPLAN Principles of Programming Languages, pages 319-329, 1988.
[22]
T. Kisuki, P. M. W. Knijnenburg, and M. F. P. O'Boyle. Combined selection of tile sizes and unroll factors using iterative compilation. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, page 237, Washington, DC, USA, 2000. IEEE Computer Society.
[23]
P. Kulkarni, S. Hines, J. Hiser, D. Whalley, J. Davidson, and D. Jones. Fast searches for effective optimization phase sequences. In Proceedings of the ACM SIGPLAN '04 Conference on Programming Language Design and Implementation, pages 171-182, New York, NY, USA, 2004. ACM Press.
[24]
A. W. Lim and M. S. Lam. Maximizing parallelism and minimizing synchronization with affine transforms. In POPL '97: Proceedings of the 24th ACM SIGPLAN-SIGACT symposium on Principles of programming languages, pages 201-214, New York, NY, USA, 1997. ACM Press.
[25]
S. Long and G. Fursin. A heuristic search algorithm based on unified transformation framework. In Proc. of the 2005 Intl. Conf. on Parallel Processing Workshops (ICPPW'05), pages 137-144, Washington, DC, USA, 2005. IEEE Comp. Soc.
[26]
A. Monsifrot, F. Bodin, and R. Quiniou. A machine learning approach to automatic production of compiler heuristics. In AIMSA '02: Proc. of the 10th Intl. Conf. on Artificial Intelligence: Methodology, Systems, and Applications, pages 41-50, London, UK, 2002. Springer-Verlag.
[27]
P. Mucci. Papi - the performance application programming interface. http://icl.cs.utk.edu/papi/index.html, 2000.
[28]
M. Namolaru, A. Cohen, G. Fursin, A. Zaks, and A. Freund. Practical aggregation of semantical program properties for machine learning based optimization. In Intl. Conf. on Compilers Architectures and Synthesis for Embedded Systems (CASES'10), Oct. 2010.
[29]
D. Parello, O. Temam, A. Cohen, and J.-M. Verdun. Towards a systematic, pragmatic and architecture-aware program optimization process for complex processors. In SC '04: Proceedings of the 2004 ACM/IEEE conference on Supercomputing, page 15, Washington, DC, USA, 2004. IEEE Computer Society.
[30]
L.-N. Pouchet, C. Bastoul, A. Cohen, and J. Cavazos. Iterative optimization in the polyhedral model: Part II, multidimensional time. In ACM SIGPLAN Conf. on Programming Language Design and Implementation (PLDI'08), pages 90-100. ACM Press, 2008.
[31]
L.-N. Pouchet, C. Bastoul, A. Cohen, and N. Vasilache. Iterative optimization in the polyhedral model: Part I, one-dimensional time. In Proc. of the IEEE/ACM Fifth Intl. Symp. on Code Generation and Optimization (CGO'07), pages 144-156. IEEE Comp. Soc. press, 2007.
[32]
L.-N. Pouchet, U. Bondhugula, C. Bastoul, A. Cohen, J. Ramanujam, and P. Sadayappan. Combined iterative and model-driven optimization in an automatic parallelization framework. In ACM Supercomputing Conf. (SC'10), New Orleans, Lousiana, Nov. 2010. 11 pages.
[33]
M. Puschel, J. Moura, J. Johnson, D. Padua, M. Veloso, B. Singer, J. Xiong, F. Franchetti, A. Gacic, Y. Voronenko, K. Chen, R. W. Johnson, and N. Rizzolo. Spiral: Code generation for dsp transforms. Proceedings of the IEEE, 93(2):232-275, 2005. special issue on "Program Generation, Optimization, and Platform Adaptation".
[34]
J. Ramanujam and P. Sadayappan. Tiling multidimensional iteration spaces for multicomputers. Journal of Parallel and Distributed Computing, 16(2):108-230, 1992.
[35]
A. Tiwari, C. Chen, J. Chame, M. Hall, and J. K. Hollingsworth. A scalable auto-tuning framework for compiler optimization. In IPDPS '09: Proceedings of the 2009 IEEE International Symposium on Parallel& Distributed Processing, pages 1-12, Washington, DC, USA, 2009. IEEE Computer Society.
[36]
K. Trifunovic, D. Nuzman, A. Cohen, A. Zaks, and I. Rosen. Polyhedral-model guided loop-nest auto-vectorization. In Intl. Conf. on Parallel Architectures and Compilation Techniques (PACT'09), Raleigh, North Carolina, Sept. 2009.
[37]
Y. Voronenko, F. de Mesmay, and M. Püschel. Computer generation of general size linear transform libraries. In Intl. Symp. on Code Generation and Optimization (CGO'09), pages 102-113, Seattle, Washington, Mar. 2009.
[38]
R. Vuduc, J. W. Demmel, and J. A. Bilmes. Statistical models for empirical search-based performance tuning. Int. J. High Perform. Comput. Appl., 18(1):65-94, 2004.
[39]
R. C. Whaley and J. J. Dongarra. Automatically tuned linear algebra software. In SC '98: Proceedings of the 1998 ACM/IEEE conference on Supercomputing, pages 1-27, Washington, DC, USA, 1998. IEEE Computer Society.
[40]
R. C. Whaley, A. Petitet, and J. J. Dongarra. Automated empirical optimizations of software and the atlas project. Parallel Computing, 2000.
[41]
M. Wolfe. More iteration space tiling. In Supercomputing '89: Proceedings of the 1989 ACM/IEEE conference on Supercomputing, pages 655-664, New York, NY, USA, 1989. ACM.
[42]
K. Yotov, X. Li, G. Ren, M. Cibulskis, G. DeJong, M. Garzaran, D. Padua, K. Pingali, P. Stodghill, and P. Wu. A comparison of empirical and model-driven optimization. In ACM SIGPLAN Conf. on Programming Language Design and Implementation (PLDI'03), San Diego, CA, June 2003.

Cited By

View all
  • (2021)CoSAProceedings of the 48th Annual International Symposium on Computer Architecture10.1109/ISCA52012.2021.00050(554-566)Online publication date: 14-Jun-2021
  • (2018)Speeding up Iterative Polyhedral Schedule Optimization with Surrogate Performance ModelsACM Transactions on Architecture and Code Optimization10.1145/329177315:4(1-27)Online publication date: 19-Dec-2018
  • (2018)Bridging the gap between deep learning and sparse matrix format selectionACM SIGPLAN Notices10.1145/3200691.317849553:1(94-108)Online publication date: 10-Feb-2018
  • Show More Cited By
  1. Predictive modeling in a polyhedral optimization space

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    CGO '11: Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization
    April 2011
    324 pages
    ISBN:9781612843568

    Sponsors

    Publisher

    IEEE Computer Society

    United States

    Publication History

    Published: 02 April 2011

    Check for updates

    Qualifiers

    • Article

    Acceptance Rates

    CGO '11 Paper Acceptance Rate 28 of 105 submissions, 27%;
    Overall Acceptance Rate 312 of 1,061 submissions, 29%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)1
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 10 Oct 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2021)CoSAProceedings of the 48th Annual International Symposium on Computer Architecture10.1109/ISCA52012.2021.00050(554-566)Online publication date: 14-Jun-2021
    • (2018)Speeding up Iterative Polyhedral Schedule Optimization with Surrogate Performance ModelsACM Transactions on Architecture and Code Optimization10.1145/329177315:4(1-27)Online publication date: 19-Dec-2018
    • (2018)Bridging the gap between deep learning and sparse matrix format selectionACM SIGPLAN Notices10.1145/3200691.317849553:1(94-108)Online publication date: 10-Feb-2018
    • (2018)Bridging the gap between deep learning and sparse matrix format selectionProceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming10.1145/3178487.3178495(94-108)Online publication date: 10-Feb-2018
    • (2017)Exploring performance and energy tradeoffs for irregular applicationsJournal of Parallel and Distributed Computing10.1016/j.jpdc.2016.06.006104:C(234-251)Online publication date: 1-Jun-2017
    • (2015)Autotuning algorithmic choice for input sensitivityACM SIGPLAN Notices10.1145/2813885.273796950:6(379-390)Online publication date: 3-Jun-2015
    • (2015)Autotuning algorithmic choice for input sensitivityProceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation10.1145/2737924.2737969(379-390)Online publication date: 3-Jun-2015
    • (2014)BonesACM Transactions on Architecture and Code Optimization10.1145/266507911:4(1-25)Online publication date: 8-Dec-2014
    • (2014)OpenTunerProceedings of the 23rd international conference on Parallel architectures and compilation10.1145/2628071.2628092(303-316)Online publication date: 24-Aug-2014
    • (2013)Portable performance on heterogeneous architecturesACM SIGPLAN Notices10.1145/2499368.245116248:4(431-444)Online publication date: 16-Mar-2013
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media