Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Advertisement

Predictive Modeling in a Polyhedral Optimization Space

  • Published:
International Journal of Parallel Programming Aims and scope Submit manuscript

Abstract

High-level program optimizations, such as loop transformations, are critical for high performance on multi-core targets. However, complex sequences of loop transformations are often required to expose parallelism (both coarse-grain and fine-grain) and improve data locality. The polyhedral compilation framework has proved to be very effective at representing these complex sequences and restructuring compute-intensive applications, seamlessly handling perfectly and imperfectly nested loops. It models arbitrarily complex sequences of loop transformations in a unified mathematical framework, dramatically increasing the expressiveness (and expected effectiveness) of the loop optimization stage. Nevertheless identifying the most effective loop transformations remains a major challenge: current state-of-the-art heuristics in polyhedral frameworks simply fail to expose good performance over a wide range of numerical applications. Their lack of effectiveness is mainly due to simplistic performance models that do not reflect the complexity today’s processors (CPU, cache behavior, etc.). We address the problem of selecting the best polyhedral optimizations with dedicated machine learning models, trained specifically on the target machine. We show that these models can quickly select high-performance optimizations with very limited iterative search. We decouple the problem of selecting good complex sequences of optimizations in two stages: (1) we narrow the set of candidate optimizations using static cost models to select the loop transformations that implement specific high-level optimizations (e.g., tiling, parallelism, etc.); (2) we predict the performance of each high-level complex optimization sequence with trained models that take as input a performance-counter characterization of the original program. Our end-to-end framework is validated using numerous benchmarks on two modern multi-core platforms. We investigate a variety of different machine learning algorithms and hardware counters, and we obtain performance improvements over productions compilers ranging on average from \(3.2\times \) to \(8.7\times \), by running not more than \(6\) program variants from a polyhedral optimization space.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Notes

  1. http://pocc.sourceforge.net

  2. Note that multiplexing reduces the accuracy of the performance counter information collected.

References

  1. Agakov, F., Bonilla, E., Cavazos, J., Franke, B., Fursin, G., O’Boyle, M., Thomson, J., Toussaint, M., Williams, C.: Using machine learning to focus iterative optimization. In: Proceedings of the International Symposium on Code Generation and Optimization (CGO) (2006)

  2. Aha, D.W., Kibler, D., Albert, M.K.: Instance-based learning algorithms. Int. J. Mach. Learn. 6, 37–66 (1991)

    Google Scholar 

  3. Almagor, L., Cooper, K., Grosul, A., Harvey, T., Reeves, S., Subramanian, D., Torczon, L., Waterman, T.: Finding effective compilation sequences. In: Proceedings of the International Conference on Languages, Compilers, and Tools for Embedded Systems (LCTES), pp. 231–239. New York (2004)

  4. Anderson, E., Bai, Z., Dongarra, J., Greenbaum, A., McKenney, A., Du Croz, J., Hammerling, S., Demmel, J., Bischof, C., Sorensen, D.: Lapack: a portable linear algebra library for high-performance computers. In: Proceedings of the 1990 ACM/IEEE conference on Supercomputing, Supercomputing ’90, pp. 2–11. IEEE Computer Society Press, Los Alamitos, CA, USA (1990) http://dl.acm.org/citation.cfm?id=110382.110385

  5. Bastoul, C.: Code generation in the polyhedral model is easier than you think. In: Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT), (2004)

  6. Baumgartner, G., Bernholdt, D., Cociorva, D., Harrison, R., Hirata, S., Lam, C.C., Nooijen, M., Pitzer, R., Ramanujam, J., Sadayappan, P.: A high-level approach to synthesis of high-performance codes for quantum chemistry. In: Supercomputing (2002)

  7. Benabderrahmane, M.W., Pouchet, L.N., Cohen, A., Bastoul, C.: The polyhedral model is more widely applicable than you think. In: Proceedings of the International Conference on Compiler Construction (ETAPS CC), LNCS 6011, pp. 283–303 (2010)

  8. Bondhugula, U., Baskaran, M., Krishnamoorthy, S., Ramanujam, J., Rountev, A., Sadayappan, P.: Automatic transformations for communication-minimized parallelization and locality optimization in the polyhedral model. In: Proceedings of the International Conference on Compiler Construction (ETAPS CC) (2008)

  9. Bondhugula, U., Hartono, A., Ramanujam, J., Sadayappan, P.: A practical automatic polyhedral program optimization system. In: Proceedings of the International Conference on Programming Language Design and Implementation (PLDI) (2008)

  10. Bouckaert, R.R., Frank, E., Hall, M.A., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: WEKA-experiences with a java open-source project. J. Mach. Learn. Res. 11, 2533–2541 (2010)

    MATH  Google Scholar 

  11. Cavazos, J., Dubach, C., Agakov, F., Bonilla, E., O’Boyle, M.F., Fursin, G., Temam, O.: Automatic performance model construction for the fast software exploration of new hardware designs. In: International Conference on Compilers, Architectures and Synthesis of Embedded Systems (CASES) (2006)

  12. Cavazos, J., Fursin, G., Agakov, F.V., Bonilla, E.V., O’Boyle, M.F.P., Temam, O.: Rapidly selecting good compiler optimizations using performance counters. In: Proceedings of the International Symposium on Code Generation and Optimization (CGO) (2007)

  13. Chen, C., Chame, J., Hall, M.: CHiLL: A framework for composing high-level loop transformations. Tech. Rep. 08–897, U. of Southern California (2008)

  14. Chen, Y., Huang, Y., Eeckhout, L., Fursin, G., Peng, L., Temam, O., Wu, C.: Evaluating iterative optimization across 1000 datasets. In: Proceedings of the 2010 ACM SIGPLAN Conference on Programming language design and implementation, PLDI ’10, pp. 448–459. ACM, New York, NY, USA (2010).10.1145/1806596.1806647

  15. Cleary, J.G., Trigg, L.E.: K*: an instance-based learner using an entropic distance measure. In: In Proceedings of the 12th International Conference on Machine Learning, pp. 108–114. Morgan Kaufmann (1995)

  16. Cooper, K.D., Grosul, A., Harvey, T.J., Reeves, S., Subramanian, D., Torczon, L., Waterman, T.: Acme: adaptive compilation made efficient. In: Proceedings of the International Conference on Languages, Compilers, and Tools for Embedded Systems (LCTES), pp. 69–77. ACM Press, New York, NY, USA (2005). doi:10.1145/1065910.1065921

  17. Cooper, K.D., Schielke, P.J., Subramanian, D.: Optimizing for reduced code space using genetic algorithms. In: Proceedings of the International Conference on Languages, Compilers, and Tools for Embedded Systems (LCTES), pp. 1–9. ACM Press (1999)

  18. Cooper, K.D., Subramanian, D., Torczon, L.: Adaptive optimizing compilers for the 21st century. J. Supercomput. 23(1), 7–22 (2002)

    Article  MATH  Google Scholar 

  19. Datta, K., Kamil, S., Williams, S., Oliker, L., Shalf, J., Yelick, K.: Optimization and performance modeling of stencil computations on modern microprocessors. SIAM Review 51(1) (2009) doi:10.1137/070693199. http://link.aip.org/link/?SIR/51/129/1

  20. Dubach, C., Cavazos, J., Franke, B., O’Boyle, M., Fursin, G., Temam, O.: Fast compiler optimisation evaluation using code-feature based performance prediction. In: Proceedings of the International Conference on Computing Frontiers (CF) (2007)

  21. Dubach, C., Jones, T.M., Bonilla, E.V., Fursin, G., O’Boyle, M.F.: Portable compiler optimization across embedded programs and microarchitectures using machine learning. In: Proceedings of the International Symposium on Microarchitecture (MICRO) (2009)

  22. Feautrier, P.: Some efficient solutions to the affine scheduling problem, part I: one dimensional time. Int. J. Parallel Program (IJPP) 21(5), 313–348 (1992)

    Article  MathSciNet  MATH  Google Scholar 

  23. Feautrier, P.: Some efficient solutions to the affine scheduling problem, part II: multidimensional time. Int. J. Parallel Program. (IJPP) 21(6), 389–420 (1992)

    Article  MathSciNet  MATH  Google Scholar 

  24. Franke, B., O’Boyle, M., Thomson, J., Fursin, G.: Probabilistic source-level optimisation of embedded programs. In: Proceedings of the International Conference on Languages, Compilers, and Tools for Embedded Systems (LCTES), pp. 78–86. ACM Press (2005)

  25. Frigo, M., Johnson, S.G.: The design and implementation of FFTW3. In: Proceedings of the IEEE 93(2), 216–231 (2005) Special issue on “Program Generation, Optimization, and Platform Adaptation”

  26. Fursin, G., Cavazos, J., Temam, O.: Midatasets: creating the conditions for a more realistic evaluation of iterative optimization. In: In Proceedings of the International Conference on High Performance Embedded Architectures and Compilers (HiPEAC), pp. 245–260. Springer LNCS (2007)

  27. Fursin, G., Miranda, C., Temam, O., Namolaru, M., Yom-Tov, E., Zaks, A., Mendelson, B., Barnard, P., Ashton, E., Courtois, E., Bodin, F., Bonilla, E., Thomson, J., Leather, H., Williams, C., O’Boyle, M.: MILEPOST GCC: machine learning based research compiler. In: Proceedings of the GCC Developers’ Summit (2008)

  28. Girbal, S., Vasilache, N., Bastoul, C., Cohen, A., Parello, D., Sigler, M., Temam, O.: Semi-automatic composition of loop transformations. Int. J. Parallel Program. (IJPP) 34(3), 261–317 (2006)

    Article  MATH  Google Scholar 

  29. Haneda, M., Knijnenburg, P.M.W., Wijshoff, H.A.G.: Automatic selection of compiler options using non-parametric inferential statistics. In: Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT), pp. 123–132 (2005)

  30. INRIA, The Ohio State University: Polybench, the polyhedral benchmark suite. http://polybench.sourceforge.net

  31. Irigoin, F., Triolet, R.: Supernode partitioning. In: ACM SIGPLAN Principles of Programming Languages, pp. 319–329 (1988)

  32. Kelly, W., Pugh, W.: A unifying framework for iteration reordering transformations. In: IEEEInternational Conference on Algorithms and Architectures for Parallel Processing (ICAPP’95), pp. 153–162 (1995)

  33. Kisuki, T., Knijnenburg, P.M.W., O’Boyle, M.F.P.: Combined selection of tile sizes and unroll factors using iterative compilation. In: Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT), p. 237 (2000)

  34. Kulkarni, P., Hines, S., Hiser, J., Whalley, D., Davidson, J., Jones, D.: Fast searches for effective optimization phase sequences. In: Proceedings of the International Conference on Programming Language Design and Implementation (PLDI), pp. 171–182. ACM Press (2004)

  35. Lim, A.W., Lam, M.S.: Maximizing parallelism and minimizing synchronization with affine transforms. In: Proceedings of the ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL), pp. 201–214. ACM Press (1997)

  36. Long, S., Fursin, G.: A heuristic search algorithm based on unified transformation framework. In: Proceedings of the International Conference on Parallel Processing Workshops (ICPPW), pp. 137–144 (2005)

  37. Monsifrot, A., Bodin, F., Quiniou, R.: A machine learning approach to automatic production of compiler heuristics. In: Proceedings of the International Conference on Artificial Intelligence: Methodology, Systems, and Applications (AIMSA), pp. 41–50 Springer, Berlin (2002)

  38. Mucci, P.: Papi—the performance application programming interface. http://icl.cs.utk.edu/papi/index.html (2000)

  39. Namolaru, M., Cohen, A., Fursin, G., Zaks, A., Freund, A.: Practical aggregation of semantical program properties for machine learning based optimization. In: International Conference on Compilers, Architectures and Synthesis of Embedded Systems (CASES) (2010)

  40. Orozco, D., Gao, G.R.: Mapping the FDTD application to many-core chip architectures. In: ICPP (2009)

  41. Parello, D., Temam, O., Cohen, A., Verdun, J.M.: Towards a systematic, pragmatic and architecture-aware program optimization process for complex processors. In: Proceedings of the ACM/IEEE conference on Supercomputing (SC), p. 15. IEEE Computer Society (2004)

  42. Park, E., Cavazos, J., Alvarez, M.A.: Using graph-based program characterization for predictive modeling. In: 10th IEEE/ACM International Symposium on Code Generation and Optimization (CGO’12). IEEE Computer Society press, San Jose (2012)

  43. Park, E., Pouchet, L.N., Cavazos, J., Cohen, A., Sadayappan, P.: Predictive modeling in a polyhedral optimization space. In: 9th IEEE/ACM International Symposium on Code Generation and Optimization (CGO’11), pp. 119–129. IEEE Computer Society press, Chamonix, France (2011)

  44. Pouchet, L.N., Bastoul, C., Cohen, A., Cavazos, J.: Iterative optimization in the polyhedral model: Part II, multidimensional time. In: Proceedings of the International Conference on Programming Language Design and Implementation (PLDI), pp. 90–100. ACM Press (2008)

  45. Pouchet, L.N., Bastoul, C., Cohen, A., Vasilache, N.: Iterative optimization in the polyhedral model: Part I, one-dimensional time. In: Proceedings of the International Symposium on Code Generation and Optimization (CGO), pp. 144–156. IEEE Computer Society Press (2007)

  46. Pouchet, L.N., Bondhugula, U., Bastoul, C., Cohen, A., Ramanujam, J., Sadayappan, P.: Combined iterative and model-driven optimization in an automatic parallelization framework. In: Proceedings of the ACM/IEEE Conference on Supercomputing (SC) (2010). p. 11

  47. Pouchet, L.N., Bondhugula, U., Bastoul, C., Cohen, A., Ramanujam, J., Sadayappan, P., Vasilache, N.: Loop transformations: Convexity, pruning and optimization. In: 38th ACM SIGACT-SIGPLAN Symposium on Principles of Programming Languages (POPL’11), pp. 549–562. ACM Press, Austin (2011)

  48. Puschel, M., Moura, J., Johnson, J., Padua, D., Veloso, M., Singer, B., Xiong, J., Franchetti, F., Gacic, A., Voronenko, Y., Chen, K., Johnson, R.W., Rizzolo, N.: Spiral: code generation for dsp transforms. In: Proceedings of the IEEE 93(2), 232–275 (2005) Special issue on “Program Generation, Optimization, and Platform Adaptation”

  49. Ramanujam, J., Sadayappan, P.: Tiling multidimensional iteration spaces for multicomputers. J. Parallel Distrib. Comput. 16(2), 108–230 (1992)

    Article  Google Scholar 

  50. Smith, G.: Numerical Solution of Partial Differential Equations: Finite Difference Methods. Oxford University Press, Oxford (2004)

    Google Scholar 

  51. Stephenson, M., Amarasinghe, S.: Predicting unroll factors using supervised classification. In: CGO ’05: Proceedings of the International Symposium on Code Generation and Optimization, pp. 123–134. IEEE Computer Society, Washington (2005). doi:10.1109/CGO.2005.29

  52. Stephenson, M., Amarasinghe, S., Martin, M., O’Reilly, U.M.: Meta optimization: improving compiler heuristics with machine learning. SIGPLAN Not. 38(5):77–90 (2003) doi:10.1145/780822.781141

    Google Scholar 

  53. Tiwari, A., Chen, C., Chame, J., Hall, M., Hollingsworth, J.K.: A scalable auto-tuning framework for compiler optimization. In: Proceedings of the International Parallel and Distributed Processing Symposium (IPDPS), pp. 1–12. IEEE Computer Society (2009)

  54. Trifunovic, K., Nuzman, D., Cohen, A., Zaks, A., Rosen, I.: Polyhedral-model guided loop-nest auto-vectorization. In: Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT) (2009)

  55. Voronenko, Y., de Mesmay, F., Püschel, M.: Computer generation of general size linear transform libraries. In: Proceedings of the International Symposium on Code Generation and Optimization (CGO), pp. 102–113 (2009)

  56. Vuduc, R., Demmel, J.W., Bilmes, J.A.: Statistical models for empirical search-based performance tuning. Int. J. High Perform. Comput. Appl. 18(1), 65–94 (2004)

    Article  Google Scholar 

  57. Whaley, R.C., Dongarra, J.J.: Automatically tuned linear algebra software. In: Proceedings of the ACM/IEEE Conference on Supercomputing (SC), pp. 1–27. IEEE Computer Society (1998)

  58. Whaley, R.C., Petitet, A., Dongarra, J.J.: Automated empirical optimizations of software and the atlas project. Parallel Comput. (2000)

  59. Wolf, M., Lam, M.: A data locality optimizing algorithm. In: ACM SIGPLAN’91 Conference on Programming Language Design and Implementation, pp. 30–44. New York (1991)

  60. Yotov, K., Li, X., Ren, G., Cibulskis, M., DeJong, G., Garzaran, M., Padua, D., Pingali, K., Stodghill, P., Wu, P.: A comparison of empirical and model-driven optimization. In: Proceedings of the International Conference on Programming Language Design and Implementation (PLDI) (2003)

  61. Yotov, K., Pingali, K., Stodghill, P.: Think globally, search locally. In: ICS ’05: Proceedings of the 19th Annual International Conference on Supercomputing, pp. 141–150. ACM Press, New York (2005). doi:10.1145/1088149.1088168

Download references

Acknowledgments

This work was funded in part by the U.S. National Science Foundation through awards 0926688, 0811781, 0811457, 0926687 and 0926127, the Defense Advanced Research Projects Agency through AFRL Contract FA8650-09-C-7915, the DARPA Computer Science Study Group (CSSG), the U.S. Department of Energy through award DE-FC02-06ER25755, and NSF Career award 0953667.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Louis-Noël Pouchet.

Additional information

This article is an extended version of our work published at CGO’11 [43]

Rights and permissions

Reprints and permissions

About this article

Cite this article

Park, E., Cavazos, J., Pouchet, LN. et al. Predictive Modeling in a Polyhedral Optimization Space. Int J Parallel Prog 41, 704–750 (2013). https://doi.org/10.1007/s10766-013-0241-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10766-013-0241-1

Keywords