Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
survey

A Survey on Compiler Autotuning using Machine Learning

Published: 18 September 2018 Publication History

Abstract

Since the mid-1990s, researchers have been trying to use machine-learning-based approaches to solve a number of different compiler optimization problems. These techniques primarily enhance the quality of the obtained results and, more importantly, make it feasible to tackle two main compiler optimization problems: optimization selection (choosing which optimizations to apply) and phase-ordering (choosing the order of applying optimizations). The compiler optimization space continues to grow due to the advancement of applications, increasing number of compiler optimizations, and new target architectures. Generic optimization passes in compilers cannot fully leverage newly introduced optimizations and, therefore, cannot keep up with the pace of increasing options. This survey summarizes and classifies the recent advances in using machine learning for the compiler optimization field, particularly on the two major problems of (1) selecting the best optimizations, and (2) the phase-ordering of optimizations. The survey highlights the approaches taken so far, the obtained results, the fine-grain classification among different approaches, and finally, the influential papers of the field.

References

[1]
Bas Aarts, Michel Barreteau, François Bodin, Peter Brinkhaus, Zbigniew Chamski, Henri-Pierre Charles, Christine Eisenbeis, John Gurd, Jan Hoogerbrugge, Ping Hu et al. 1997. OCEANS: Optimizing compilers for embedded applications. In Proceedings of the European Conference on Parallel Processing (Euro-Par’97). 1351--1356.
[2]
Ali-Reza Adl-Tabatabai, Michał Cierniak, Guei-Yuan Lueh, Vishesh M. Parikh, and James M. Stichnoth. 1998. Fast, effective code generation in a just-in-time java compiler. In ACM SIGPlAN Notices, Vol. 33. ACM, 280--290.
[3]
Felix Agakov, Edwin Bonilla, John Cavazos, Björn Franke, Grigori Fursin, Michael F. P. O’Boyle, John Thomson, Marc Toussaint, and Christopher K. I. Williams. 2006. Using machine learning to focus iterative optimization. In Proceedings of the International Symposium on Code Generation and Optimization. IEEE, 295--305.
[4]
Alfred V. Aho, Ravi Sethi, and Jeffrey D. Ullman. 1986. Compilers, Principles, Techniques. Addison Wesley.
[5]
Frances E. Allen. 1970. Control flow analysis. In ACM Sigplan Notices, Vol. 5. ACM, 1--19.
[6]
L. Almagor and K. D. Cooper. 2004. Finding effective compilation sequences. ACM SIGPLAN Notices 39, 7 (2004), 231--239. Retrieved from http://www.anc.ed.ac.uk/machine-learning/colo/repository/LCTES04.pdf.
[7]
George Almasi and David A. Padua. 2000. MaJIC: A MATLAB just-in-time compiler. In Proceedings of the International Workshop on Languages and Compilers for Parallel Computing. Springer, 68--81.
[8]
Ethem Alpaydin. 2014. Introduction to Machine Learning. MIT Press.
[9]
Martin Alt, Uwe Aßmann, and Hans Van Someren. 1994. Cosy compiler phase embedding with the cosy compiler model. In Proceedings of the International Conference on Compiler Construction. Springer, 278--293.
[10]
Jason Ansel, Cy Chan, Yee Lok Wong, Marek Olszewski, Qin Zhao, Alan Edelman, and Saman Amarasinghe. 2009. PetaBricks: A language and compiler for algorithmic choice. In Proceedings of the 30th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’09). ACM, New York, NY, 38--49.
[11]
J. Ansel and S. Kamil. In Proceedings of the 23rd International Conference on Parallel Architectures and Compilation. 303--316.
[12]
Karl-Erik Årzén and Anton Cervin. 2005. Control and embedded computing: Survey of research directions. IFAC Proc. Vol. 38, 1 (2005), 191--202.
[13]
G. Ascia, V. Catania, M. Palesi, and D. Patti. 2005. A system-level framework for evaluating area/performance/power trade-offs of VLIW-based embedded systems. In Proceedings of the Design Automation Conference (ASP-DAC’05), Vol. 2. 940--943.
[14]
Yosi Ben Asher, Gadi Haber, and Esti Stein. 2017. A study of conflicting pairs of compiler optimizations. In Proceedings of the IEEE 11th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC’17). IEEE, 52--58.
[15]
A. H. Ashouri, G. Mariani, G. Palermo, and C. Silvano. 2014. A Bayesian network approach for compiler auto-tuning for embedded processors. In Proceedings of the IEEE Embedded Systems for Real-Time Multimedia (ESTIMedia). 90--97.
[16]
Amir Hossein Ashouri. 2012. Design space exploration methodology for compiler parameters in VLIW processors. Master’s thesis. M. Sc. Dissertation. Politecnico Di Milano, Italy. Retrieved from http://hdl.handle.net/10589/72083.
[17]
Amir Hossein Ashouri. 2016. Compiler Autotuning Using Machine Learning Techniques. Ph.D. Dissertation. Politecnico di Milano, Italy. Retrieved from http://hdl.handle.net/10589/129561.
[18]
Amir Hossein Ashouri, Andrea Bignoli, Gianluca Palermo, and Cristina Silvano. 2016. Predictive modeling methodology for compiler phase-ordering. In Proceedings of the 7th Workshop on Parallel Programming and Run-Time Management Techniques for Many-core Architectures and the 5th Workshop on Design Tools and Architectures For Multicore Embedded Computing Platforms (PARMA-DITAM’16). ACM, New York, NY, 7--12.
[19]
Amir H. Ashouri, Andrea Bignoli, Gianluca Palermo, Cristina Silvano, Sameer Kulkarni, and John Cavazos. 2017. MiCOMP: Mitigating the compiler phase-ordering problem using optimization sub-sequences and machine learning. ACM Trans. Archit. Code Optim. 14, 3, Article 29 (Sept. 2017).
[20]
Amir H. Ashouri, William Killian, John Cavazos, Gianluca Palermo, and Cristina Silvano. 2018. A survey on compiler autotuning using machine learning. arXiv preprint arXiv:1801.04405 (2018).
[21]
Amir Hossein Ashouri, Giovanni Mariani, Gianluca Palermo, Eunjung Park, John Cavazos, and Cristina Silvano. 2016. COBAYN: Compiler autotuning framework using bayesian networks. ACM Trans. Archit. Code Optim. 13, 2, Article 21 (June 2016).
[22]
Amir H. Ashouri, Gianluca Palermo, John Cavazos, and Cristina Silvano. 2018. Automatic Tuning of Compilers Using Machine Learning. Springer International Publishing.
[23]
Amir H. Ashouri, Gianluca Palermo, John Cavazos, and Cristina Silvano. 2018. Background. Springer International Publishing, Cham, 1--22.
[24]
Amir H. Ashouri, Gianluca Palermo, John Cavazos, and Cristina Silvano. 2018. Design Space Exploration of Compiler Passes: A Co-Exploration Approach for the Embedded Domain. Springer International Publishing, Cham, 23--39.
[25]
Amir H. Ashouri, Gianluca Palermo, John Cavazos, and Cristina Silvano. 2018. The Phase-Ordering Problem: A Complete Sequence Prediction Approach. Springer International Publishing, Cham, 85--113.
[26]
Amir H. Ashouri, Gianluca Palermo, John Cavazos, and Cristina Silvano. 2018. The Phase-Ordering Problem: An Intermediate Speedup Prediction Approach. Springer International Publishing, Cham, 71--83.
[27]
Amir H. Ashouri, Gianluca Palermo, John Cavazos, and Cristina Silvano. 2018. Selecting the Best Compiler Optimizations: A Bayesian Network Approach. Springer International Publishing, Cham, 41--70.
[28]
Amir Hossein Ashouri, Gianluca Palermo, and Cristina Silvano. An evaluation of autotuning techniques for the compiler optimization problems. In Proceedings of the Workshop on Resource Awareness and Application Autotuning in Adaptive and Heterogeneous Computing (RES4ANT’16), colocated with the Design Automation and Test in Europe Conference and Expo (DATE’16). 23--27. http://ceur-ws.org/Vol-1643/#paper-05
[29]
Amir Hossein Ashouri, Vittorio Zaccaria, Sotirios Xydis, Gianluca Palermo, and Cristina Silvano. 2013. A framework for Compiler Level statistical analysis over customized VLIW architecture. In Proceedings of the International Conference on Very Large Scale Integration (VLSI-SoC’13). 124--129.
[30]
Jose L. Ayala, Marisa López-Vallejo, David Atienza, Praveen Raghavan, Francky Catthoor, and Diederik Verkest. 2007. Energy-aware compilation and hardware design for VLIW embedded systems. Int. J. Embed. Syst. 3, 1--2 (2007), 73--82.
[31]
John Aycock. 2003. A brief history of just-in-time. ACM Comput. Surveys 35, 2 (2003), 97--113.
[32]
R. Babuka, P. J. Van der Veen, and U. Kaymak. 2002. Improved covariance estimation for Gustafson-Kessel clustering. In Proceedings of the 2002 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE’02), Vol. 2. IEEE, 1081--1085.
[33]
David F. Bacon, Susan L. Graham, and Oliver J. Sharp. 1994. Compiler transformations for high-performance computing. Comput. Surveys 26, 4 (Dec. 1994), 345--420.
[34]
Victor R. Basil and Albert J. Turner. 1975. Iterative enhancement: A practical technique for software development. IEEE Trans. Softw. Eng. 4 (1975), 390--396.
[35]
Protonu Basu, Mary Hall, Malik Khan, Suchit Maindola, Saurav Muralidharan, Shreyas Ramalingam, Axel Rivera, Manu Shantharam, and Anand Venkat. 2013. Towards making autotuning mainstream. Int. J. High Perform. Comput. Appl. 27, 4 (2013), 379--393.
[36]
Protonu Basu, Samuel Williams, Brian Van Straalen, Leonid Oliker, Phillip Colella, and Mary Hall. 2017. Compiler-based code generation and autotuning for geometric multigrid on GPU-accelerated supercomputers. Parallel Comput. 64 (2017), 50--64.
[37]
Mohamed-Walid Benabderrahmane, Louis-Noël Pouchet, Albert Cohen, and Cédric Bastoul. 2010. The polyhedral model is more widely applicable than you think. In Compiler Construction. Springer, 283--303.
[38]
Craig Blackmore, Oliver Ray, and Kerstin Eder. 2015. A logic programming approach to predict effective compiler settings for embedded software. Theory Pract. Logic Program. 15, 4--5 (2015), 481--494.
[39]
Craig Blackmore, Oliver Ray, and Kerstin Eder. 2017. Automatically tuning the GCC compiler to optimize the performance of applications running on the ARM cortex-M3. arXiv preprint arXiv:1703.08228 (2017).
[40]
Craig Blackmore, Oliver Ray, and Kerstin Eder. 2017. Automatically tuning the GCC compiler to optimize the performance of applications running on the ARM cortex-M3. CoRR abs/1703.08228 (2017). arxiv:1703.08228, retrieved from http://arxiv.org/abs/1703.08228.
[41]
Bruno Bodin, Luigi Nardi, M. Zeeshan Zia, Harry Wagstaff, Govind Sreekar Shenoy, Murali Emani, John Mawer, Christos Kotselidis, Andy Nisbet, Mikel Lujan et al. 2016. Integrating algorithmic parameters into benchmarking and design space exploration in 3D scene understanding. In Proceedings of the 2016 International Conference on Parallel Architectures and Compilation. ACM, 57--69.
[42]
François Bodin, Toru Kisuki, Peter Knijnenburg, Mike O’Boyle, and Erven Rohou. 1998. Iterative compilation in a non-linear optimisation space. In Proceedings of the Workshop on Profile and Feedback-Directed Compilation.
[43]
U. Bondhugula and M. Baskaran. 2008. Automatic transformations for communication-minimized parallelization and locality optimization in the polyhedral model. In Proceedings of the International Conference on Compiler Construction. 132--146. Retrieved from http://link.springer.com/chapter/10.1007/978-3-540-78791-4.
[44]
U. Bondhugula and A. Hartono. 2008. A practical automatic polyhedral parallelizer and locality optimizer. (2008). Retrieved from http://dl.acm.org/citation.cfm?id=1375595.
[45]
Uday Bondhugula, A. Hartono, J. Ramanujam, and P. Sadayappan. 2008. PLuTo: A practical and fully automatic polyhedral program optimization system. In Proceedings of the ACM SIGPLAN 2008 Conference on Programming Language Design and Implementation (PLDI’08). Citeseer.
[46]
Karsten M. Borgwardt and Hans-Peter Kriegel. 2005. Shortest-path kernels on graphs. In Proceedings of the 5th IEEE International Conference on Data Mining (ICDM’05). IEEE, 8--pp.
[47]
Rajkumar Buyya, Chee Shin Yeo, and Srikumar Venugopal. 2008. Market-oriented cloud computing: Vision, hype, and reality for delivering it services as computing utilities. In Proceedings of the 10th IEEE International Conference on High Performance Computing and Communications (HPCC’08). IEEE, 5--13.
[48]
Gustavo Camps-Valls, Tatyana V. Bandos Marsheva, and Dengyong Zhou. 2007. Semi-supervised graph-based hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 45, 10 (2007), 3044--3054.
[49]
João Manuel Paiva Cardoso, José Gabriel de Figueiredo Coutinho, and Pedro C. Diniz. 2017. Embedded Computing for High Performance: Efficient Mapping of Computations Using Customization, Code Transformations and Compilation. Morgan Kaufmann.
[50]
J. Cavazos, C. Dubach, and F. Agakov. 2006. Automatic performance model construction for the fast software exploration of new hardware designs. In Proceedings of the 2006 International Conference on Compilers, Architecture and Synthesis for Embedded Systems. 24--34. Retrieved from http://dl.acm.org/citation.cfm?id=1176765.
[51]
J. Cavazos, G. Fursin, and F. Agakov. 2007. Rapidly selecting good compiler optimizations using performance counters. Proceedings of the International Symposium on Code Generation and Optimization (CGO’07). Retrieved from http://ieeexplore.ieee.org/xpls/abs.
[52]
J. Cavazos and J. E. B. Moss. 2004. Inducing heuristics to decide whether to schedule. ACM SIGPLAN Notices (2004). Retrieved from http://dl.acm.org/citation.cfm?id=996864.
[53]
J. Cavazos, J. E. B. Moss, and M. F. P. O’Boyle. 2006. Hybrid optimizations: Which optimization algorithm to use?Compiler Construction (2006). Retrieved from http://link.springer.com/chapter/10.1007/11688839.
[54]
J. Cavazos and M. F. P. O’Boyle. 2005. Automatic tuning of inlining heuristics. Proceedings of the ACM/IEEE SC 2005 Conference on Supercomputing. 14--14. Retrieved from http://ieeexplore.ieee.org/xpls/abs.
[55]
J. Cavazos and M. F. P. O’boyle. 2006. Method-specific dynamic compilation using logistic regression. ACM SIGPLAN Notices (2006). Retrieved from http://dl.acm.org/citation.cfm?id=1167492.
[56]
Gregory J. Chaitin, Marc A. Auslander, Ashok K. Chandra, John Cocke, Martin E. Hopkins, and Peter W. Markstein. 1981. Register allocation via coloring. Comput. Lang. 6, 1 (1981), 47--57.
[57]
Olivier Chapelle, Bernhard Scholkopf, and Alexander Zien. 2009. Semi-supervised learning (O. Chapelle et al., eds.). IEEE Trans. Neural Netw. 20, 3 (2009), 542--542.
[58]
C. Chen, J. Chame, and M. Hall. 2005. Combining models and guided empirical search to optimize for multiple levels of the memory hierarchy. Proceedings of the International Symposium on Code Generation and Optimization. 111--122. Retrieved from http://ieeexplore.ieee.org/xpls/abs.
[59]
Chun Chen, Jacqueline Chame, and Mary Hall. 2008. CHiLL: A Framework for Composing High-level Loop Transformations. Technical report. Citeseer.
[60]
Yang Chen, Shuangde Fang, Yuanjie Huang, Lieven Eeckhout, Grigori Fursin, Olivier Temam, and Chengyong Wu. 2012. Deconstructing iterative optimization. ACM Trans. Architect. Code Optim. 9, 3 (2012), 21.
[61]
Yang Chen, Yuanjie Huang, Lieven Eeckhout, Grigori Fursin, Liang Peng, Olivier Temam, and Chengyong Wu. 2010. Evaluating iterative optimization across 1000 datasets. In Proceedings of the 2010 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’10). ACM, New York, NY, 448--459.
[62]
B. R. Childers and M. L. Soffa. 2005. A model-based framework: An approach for profit-driven optimization. In Proceedings of the International Symposium on Code Generation and Optimization. IEEE, 317--327.
[63]
Alton Chiu, Joseph Garvey, and Tarek S. Abdelrahman. 2015. Genesis: A language for generating synthetic training programs for machine learning. In Proceedings of the 12th ACM International Conference on Computing Frontiers. ACM, 8.
[64]
Cliff Click and Keith D. Cooper. 1995. Combining analyses, combining optimizations. ACM Trans. Program. Lang. Syst. 17, 2 (1995), 181--196.
[65]
Katherine Compton and Scott Hauck. 2002. Reconfigurable computing: A survey of systems and software. ACM Comput. Surveys 34, 2 (2002), 171--210.
[66]
Katherine E. Coons, Behnam Robatmili, Matthew E. Taylor, Bertrand A. Maher, Doug Burger, and Kathryn S. McKinley. 2008. Feature selection and policy optimization for distributed instruction placement using reinforcement learning. In Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques. ACM, 32--42.
[67]
K. D. Cooper, A. Grosul, and T. J. Harvey. 2005. ACME: Adaptive compilation made efficient. In ACM SIGPLAN Notices 40, 7 (2005), 69--77. Retrieved from http://dl.acm.org/citation.cfm?id=1065921.
[68]
K. Cooper, Timothy J. Harvey, Devika Subramanian, and Linda Torczon. 2002. Compilation Order Matters. Technical report.
[69]
K. D. Cooper, P. J. Schielke, and D. Subramanian. 1999. Optimizing for reduced code space using genetic algorithms. ACM SIGPLAN Notices. Retrieved from http://dl.acm.org/citation.cfm?id=314414.
[70]
K. D. Cooper, D. Subramanian, and L. Torczon. 2002. Adaptive optimizing compilers for the 21st Century. J. Supercomput. Retrieved from http://link.springer.com/article/10.1023/A:1015729001611.
[71]
Biagio Cosenza, Juan J. Durillo, Stefano Ermon, and Ben Juurlink. 2017. Stencil autotuning with ordinal regression: Extended abstract. In Proceedings of the 20th International Workshop on Software and Compilers for Embedded Systems (SCOPES’17). ACM, New York, NY, 72--75.
[72]
Chris Cummins, Pavlos Petoumenos, Michel Steuwer, and Hugh Leather. 2015. Autotuning OpenCL workgroup size for stencil patterns. arXiv preprint arXiv:1511.02490.
[73]
C. Cummins, P. Petoumenos, Z. Wang, and H. Leather. 2017. End-to-end deep learning of optimization heuristics. In Proceedings of the 26th International Conference on Parallel Architectures and Compilation Techniques (PACT’17). 219--232.
[74]
Chris Cummins, Pavlos Petoumenos, Zheng Wang, and Hugh Leather. 2017. Synthesizing benchmarks for predictive modeling. In Proceedings of the IEEE/ACM International Symposium on Code Generation and Optimization (CGO’17). IEEE, 86--99.
[75]
Kalyanmoy Deb, Amrit Pratap, Sameer Agarwal, and T. A. M. T. Meyarivan. 2002. A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans. Evol. Comput. 6, 2 (2002), 182--197.
[76]
Thomas G. Dietterich. 2000. Ensemble methods in machine learning. In International Workshop on Multiple Classifier Systems. Springer, 1--15.
[77]
Y. Ding, J. Ansel, and K. Veeramachaneni. 2015. Autotuning algorithmic choice for input sensitivity. ACM SIGPLAN Notices 50, 6 (2015), 379--390. Retrieved from http://dl.acm.org/citation.cfm?id=2737969.
[78]
C. Dubach, J. Cavazos, and B. Franke. 2007. Fast compiler optimisation evaluation using code-feature based performance prediction. In Proceedings of the 4th International Conference on Computing Frontiers. 131--142. Retrieved from http://dl.acm.org/citation.cfm?id=1242553.
[79]
C. Dubach, T. M. Jones, and E. V. Bonilla. 2009. Portable compiler optimisation across embedded programs and microarchitectures using machine learning. In Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture. ACM, 78--88. Retrieved from http://dl.acm.org/citation.cfm?id=1669124.
[80]
Chris Eagle. 2011. The IDA Pro Book: The Unofficial Guide to the World’s Most Popular Disassembler. No Starch Press.
[81]
Hadi Esmaeilzadeh, Emily Blem, Renee St. Amant, Karthikeyan Sankaralingam, and Doug Burger. 2011. Dark silicon and the end of multicore scaling. In Proceedings of the 38th Annual International Symposium on Computer Architecture (ISCA’11). IEEE, 365--376.
[82]
Thomas L. Falch and Anne C. Elster. 2015. Machine learning based auto-tuning for enhanced opencl performance portability. In Proceedings of the IEEE International Parallel and Distributed Processing Symposium Workshop (IPDPSW’15). IEEE, 1231--1240.
[83]
S. Fang, W. Xu, Y. Chen, and L. Eeckhout. 2015. Practical iterative optimization for the data center. ACM Trans. Archit. Code Optim. 12, 2 (2015), 15. Retrieved from http://dl.acm.org/citation.cfm?id=2739048.
[84]
Paolo Faraboschi, Geoffrey Brown, Joseph A. Fisher, Giuseppe Desoli, and Fred Homewood. 2000. Lx: A technology platform for customizable VLIW embedded processing. In ACM SIGARCH Computer Architecture News, Vol. 28. ACM, 203--213.
[85]
Paul Feautrier. 1988. Parametric integer programming. RAIRO Rech. Opération. 22, 3 (1988), 243--268.
[86]
Jeanne Ferrante, Karl J. Ottenstein, and Joe D. Warren. 1987. The program dependence graph and its use in optimization. ACM Trans. Program. Lang. Syst. 9, 3 (July 1987), 319--349.
[87]
J. A. Fisher, P. Faraboschi, and C. Young. 2009. VLIW processors: Once blue sky, now commonplace. IEEE Solid-State Circ. Mag. 1, 2 (2009), 10--17.
[88]
Joseph A. Fisher. 1981. Microcode compaction. IEEE Trans. Comput. 30, 7 (1981).
[89]
Joseph A. Fisher, Paolo Faraboschi, and Cliff Young. 2004. Embedded Computing: A VLIW Approach to Architecture, Compilers and Tools. Morgan Kaufmann.
[90]
Joseph A. Fisher, Paolo Faraboschi, and Cliff Young. 2005. Embedded Computing: A VLIW Approach to Architecture, Compilers and Tools. Elsevier.
[91]
B. Franke, M. O’Boyle, J. Thomson, and G. Fursin. 2005. Probabilistic source-level optimisation of embedded programs. ACM SIGPLAN Notices (2005). Retrieved from http://dl.acm.org/citation.cfm?id=1065922.
[92]
Christopher W. Fraser. 1999. Automatic inference of models for statistical code compression. ACM SIGPLAN Notices 34, 5 (May 1999), 242--246.
[93]
Stefan M. Freudenberger and John C. Ruttenberg. 1992. Phase ordering of register allocation and instruction scheduling. In Code GenerationâĂŤ Concepts, Tools, Techniques. Springer, 146--170.
[94]
Jerome Friedman, Trevor Hastie, and Robert Tibshirani. 2001. The Elements of Statistical Learning. Vol. 1. Springer, Berlin.
[95]
Nir Friedman, Dan Geiger, and Moises Goldszmidt. 1997. Bayesian network classifiers. Mach. Learn. 29, 2--3 (1997), 131--163.
[96]
G. G. Fursin. 2004. Iterative compilation and performance prediction for numerical applications. Retrieved from https://www.era.lib.ed.ac.uk/handle/1842/565.
[97]
Grigori Fursin. 2010. Collective benchmark (cbench), a collection of open-source programs with multiple datasets assembled by the community to enable realistic benchmarking and research on program and architecture optimization. Retrieved from http://ctuning.org/wiki/index.php/CTools:CBench.
[98]
G. Fursin, J. Cavazos, M. O’Boyle, and O. Temam. 2007. Midatasets: Creating the conditions for a more realistic evaluation of iterative optimization. Proceedings of the International Conference on High-Performance Embedded Architectures and Compilers. 245--260. Retrieved from http://link.springer.com/chapter/10.1007/978-3-540-69338-3.
[99]
G. Fursin and A. Cohen. 2007. Building a practical iterative interactive compiler. Workshop Proceedings. Retrieved from https://www.researchgate.net/profile/Chuck.
[100]
G. Fursin, A. Cohen, M. O’Boyle, and O. Temam. 2005. A practical method for quickly evaluating program optimizations. Proceedings of the International Conference on High-Performance Embedded Architectures and Compilers. 29--46. Retrieved from http://link.springer.com/chapter/10.1007/11587514.
[101]
G. Fursin, Y. Kashnikov, and A. W. Memon. 2011. Milepost GCC: Machine learning enabled self-tuning compiler. Int. J. Parallel Program. 39, 3 (2011), 296--327. Retrieved from http://link.springer.com/article/10.1007/s10766-010-0161-2.
[102]
Grigori Fursin, Anton Lokhmotov, and Ed Plowman. 2016. Collective knowledge: Towards R&D sustainability. In Proceedings of the Design, Automation 8 Test in Europe Conference 8 Exhibition (DATE’16). IEEE, 864--869.
[103]
Grigori Fursin, Anton Lokhmotov, Dmitry Savenko, and Eben Upton. 2018. A collective knowledge workflow for collaborative research into multi-objective autotuning and machine learning techniques. arXiv preprint arXiv:1801.08024 (2018).
[104]
Grigori Fursin, Abdul Memon, Christophe Guillon, and Anton Lokhmotov. 2015. Collective mind, part II: Towards performance-and cost-aware software engineering as a natural science. arXiv preprint arXiv:1506.06256 (2015).
[105]
G. Fursin, C. Miranda, and O. Temam. 2008. MILEPOST GCC: Machine learning based research compiler. Proceedings of the GCC Summit. Retrieved from https://hal.inria.fr/inria-00294704/.
[106]
G. G. Fursin, M. F. P. O’Boyle, and P. M. W. Knijnenburg. 2002. Evaluating iterative compilation. Proceedings of the International Workshop on Languages and Compilers for Parallel Computing. 362--376. Retrieved from http://link.springer.com/chapter/10.1007/11596110.
[107]
Grigori Fursin and Olivier Temam. 2009. Collective optimization. In Proceedings of the International Conference on High-Performance Embedded Architectures and Compilers. Springer, 34--49.
[108]
G. Fursin and O. Temam. 2010. Collective optimization: A practical collaborative approach. ACM Trans. Architect. Code Optim. 7, 4 (2010), 20. Retrieved from http://dl.acm.org/citation.cfm?id=1880047.
[109]
Davide Gadioli, Ricardo Nobre, Pedro Pinto, Emanuele Vitali, Amir H. Ashouri, Gianluca Palermo, Cristina Silvano, and Joao Cardoso. 2018. SOCRATES—A seamless online compiler and system runtime autotuning framework for energy-aware applications. In Proceedings of the Design, Automation and Test in Europe Conference 8 Exhibition (DATE’18). 1143--1146.
[110]
Unai Garciarena and Roberto Santana. 2016. Evolutionary optimization of compiler flag selection by learning and exploiting flags interactions. In Proceedings of the 2016 on Genetic and Evolutionary Computation Conference Companion (GECCO’16). ACM, New York, NY, 1159--1166.
[111]
Kyriakos Georgiou, Craig Blackmore, Samuel Xavier-de Souza, and Kerstin Eder. 2018. Less is more: Exploiting the standard compiler optimization levels for better performance and energy consumption. arXiv preprint arXiv:1802.09845 (2018).
[112]
Zhangxiaowen Gong, Zhi Chen, Justin Josef Szaday, David C. Wong, Zehra Sura, Neftali Watkinson, Saeed Maleki, David Padua, Alexandru Nicolau, Alexander V. Veidenbaum et al. 2018. An empirical study of the effect of source-level transformations on compiler stability. In Proceedings of the Workshop on Compilers for Parallel Computing (CPC’18).
[113]
Richard L. Gorsuch. 1988. Exploratory factor analysis. In Handbook of Multivariate Experimental Psychology. Springer, 231--258.
[114]
Scott Grauer-Gray, Lifan Xu, Robert Searles, Sudhee Ayalasomayajula, and John Cavazos. 2012. Auto-tuning a high-level language targeted to GPU codes. In Proceedings of the Conference on Innovative Parallel Computing (InPar’12). IEEE, 1--10.
[115]
M. Hall, D. Padua, and K. Pingali. 2009. Compiler research: The next 50 years. Commun. ACM (2009). Retrieved from http://dl.acm.org/citation.cfm?id=1461946.
[116]
M. Haneda. 2005. Optimizing general purpose compiler optimization. Proceedings of the 2nd Conference on Computing frontiers (2005), 180--188. Retrieved from http://dl.acm.org/citation.cfm?id=1062293.
[117]
Trevor Hastie, Robert Tibshirani, and Jerome Friedman. 2009. Unsupervised learning. In The Elements of Statistical Learning. Springer, 485--585.
[118]
Jeffrey Hightower and Gaetano Borriello. 2001. A survey and taxonomy of location systems for ubiquitous computing. IEEE Comput. 34, 8 (2001), 57--66.
[119]
Geoffrey Hinton, Li Deng, Dong Yu, George E. Dahl, Abdel-rahman Mohamed, Navdeep Jaitly, Andrew Senior, Vincent Vanhoucke, Patrick Nguyen, Tara N. Sainath et al. 2012. Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Process. Mag. 29, 6 (2012), 82--97.
[120]
Torsten Hoefler and Roberto Belli. 2015. Scientific benchmarking of parallel computing systems: Twelve ways to tell the masses when reporting performance results. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. ACM, 73.
[121]
Kenneth Hoste and Lieven Eeckhout. 2007. Microarchitecture-independent workload characterization. IEEE Micro 27, 3 (2007), 63--72.
[122]
K. Hoste and L. Eeckhout. 2008. Cole: Compiler optimization level exploration. In Proceedings of the 6th Annual IEEE/ACM International Symposium on Code Generation and Optimization. 165--174. Retrieved from http://dl.acm.org/citation.cfm?id=1356080.
[123]
K. Hoste, A. Georges, and L. Eeckhout. 2010. Automated just-in-time compiler tuning. Proceedings of the 8th Annual IEEE/ACM International Symposium on Code Generation and Optimization (2010), 62--72. Retrieved from http://dl.acm.org/citation.cfm?id=1772965.
[124]
Ronald A. Howard. 1966. “Dynamic programming.” Management Science 12, 5 (1966), 317--348.
[125]
Texas Instruments. 2012. Pandaboard. OMAP4430 SoC Dev. Board 2 (2012).
[126]
Kazuaki Ishizaki, Akihiro Hayashi, Gita Koblents, and Vivek Sarkar. 2015. Compiling and optimizing java 8 programs for gpu execution. In Proceedings of the International Conference on Parallel Architecture and Compilation (PACT’15). IEEE, 419--431.
[127]
Michael R. Jantz and Prasad A. Kulkarni. 2013. Exploiting phase inter-dependencies for faster iterative compiler optimization phase order searches. In Proceedings of the International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES’13). IEEE, 1--10.
[128]
Sverre Jarp. 2002. A Methodology for Using the Itanium 2 Performance Counters for Bottleneck Analysis. Technical report, HP Labs.
[129]
Brian Jeff. 2012. Big. LITTLE system architecture from ARM: Saving power through heterogeneous multiprocessing and task context migration. In Proceedings of the 49th Annual Design Automation Conference. ACM, 1143--1146.
[130]
Richard Arnold Johnson and Dean W. Wichern. 2002. Applied Multivariate Statistical Analysis. Vol. 5. Prentice Hall, Upper Saddle River, NJ.
[131]
Bill Joy, Guy Steele, James Gosling, and Gilad Bracha. 2000. Java (TM) Language Specification. Addisson-Wesley.
[132]
Agnieszka Kamiadska and Wacodzimierz Bielecki. 2016. Statistical models to accelerate software development by means of iterative compilation. Comput. Sci. 17, 3 (2016), 407. Retrieved from https://journals.agh.edu.pl/csci/article/view/1800.
[133]
Christos Kartsaklis, Oscar Hernandez, Chung-Hsing Hsu, Thomas Ilsche, Wayne Joubert, and Richard L. Graham. 2012. HERCULES: A pattern driven code transformation system. In Proceedings of the IEEE 26th International Parallel and Distributed Processing Symposium Workshops 8 PhD Forum (IPDPSW’12). IEEE, 574--583.
[134]
Christos Kartsaklis, Eunjung Park, and John Cavazos. 2014. HSLOT: The HERCULES scriptable loop transformations engine. In Proceedings of the 4th International Workshop on Domain-Specific Languages and High-Level Frameworks for High Performance Computing. IEEE Press, 31--41.
[135]
Vasilios Kelefouras. 2017. A methodology pruning the search space of six compiler transformations by addressing them together as one problem and by exploiting the hardware architecture details. Computing (2017), 1--24.
[136]
William Killian, Renato Miceli, Eunjung Park, Marco Alvarez, and John Cavazos. 2014. Performance improvement in kernels by guiding compiler auto-vectorization heuristics. PRACE-RI.EU. Retrieved from http://www.prace-ri.eu/IMG/pdf/WP183.pdf.
[137]
Kyoung-jae Kim. 2003. Financial time series forecasting using support vector machines. Neurocomputing 55, 1 (2003), 307--319.
[138]
Anton Kindestam. 2017. Graph-based features for machine learning driven code optimization. Master of science Dissertation, KTH, urn:nbn:se:kth:diva-211444.
[139]
Toru Kisuki, P. Knijnenburg, M. O’Boyle, and H. Wijshoff. 2000. Iterative compilation in program optimization. In Proceedings of the Conference on Compilers for Parallel Computers (CPC’10). Citeseer, 35--44.
[140]
Toru Kisuki, Peter M. W. Knijnenburg, Mike F. P. O’Boyle, François Bodin, and Harry A. G. Wijshoff. 1999. A feasibility study in iterative compilation. In High Performance Computing. Springer, 121--132.
[141]
Toru Kisuki, Peter M. W. Knijnenburg, and Michael F. P. O’Boyle. 2000. Combined selection of tile sizes and unroll factors using iterative compilation. In Proceedings of the 2000 International Conference on Parallel Architectures and Compilation Techniques (PACT’00). 237--248.
[142]
P. M. W. Knijnenburg, T. Kisuki, and M. F. P. O ’boyle. 2003. Combined selection of tile sizes and unroll factors using iterative compilation. J. Supercomput. 24 (2003), 43--67.
[143]
A. Koseki. 1997. A method for estimating optimal unrolling times for nested loops. In Proceedings of the 3rd International Symposium on Parallel Architectures, Algorithms, and Networks (I-SPAN’97). 376--382. Retrieved from http://ieeexplore.ieee.org/xpls/abs.
[144]
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems. Curran Associates, inc., 1097--1105.
[145]
David C. Ku and Giovanni De Micheli. 1992. Design space exploration. In High Level Synthesis of ASICs under Timing and Synchronization Constraints. Springer, 83--111.
[146]
P. Kulkarni, S. Hines, and J. Hiser. 2004. Fast searches for effective optimization phase sequences. ACM SIGPLAN Notices 39, 6 (2004), 171--182. Retrieved from http://dl.acm.org/citation.cfm?id=996863.
[147]
Prasad A. Kulkarni, Michael R. Jantz, and David B. Whalley. 2010. Improving both the performance benefits and speed of optimization phase sequence searches. In ACM Sigplan Notices, Vol. 45. ACM, 95--104.
[148]
Prasad A. Kulkarni, David B. Whalley, and Gary S. Tyson. 2007. Evaluating heuristic optimization phase order search algorithms. In Proceedings of the International Symposium on Code Generation and Optimization (CGO’07). IEEE, 157--169.
[149]
Prasad A. Kulkarni, David B. Whalley, Gary S. Tyson, and Jack W. Davidson. 2006. Exhaustive optimization phase order space exploration. In Proceedings of the International Symposium on Code Generation and Optimization (CGO’06). IEEE, 13.
[150]
Prasad A. Kulkarni, David B. Whalley, Gary S. Tyson, and Jack W. Davidson. 2009. Practical exhaustive optimization phase order exploration and evaluation. ACM Trans. Archit. Code Optim. 6, 1, Article 1 (Apr. 2009).
[151]
S. Kulkarni and J. Cavazos. 2012. Mitigating the compiler optimization phase-ordering problem using machine learning. ACM SIGPLAN Notices (2012). Retrieved from http://dl.acm.org/citation.cfm?id=2384628.
[152]
S. Kulkarni and J. Cavazos. 2013. Automatic construction of inlining heuristics using machine learning. Proceedings of the IEEE/ACM International Symposium on Code Generation and Optimization (CGO’13). 1--12. Retrieved from http://ieeexplore.ieee.org/xpls/abs.
[153]
T. Satish Kumar, S. Sakthivel, S. Sushil Kumar, and N. Arun. 2014. Compiler phase ordering and optimizing MPI runtime parameters using heuristic algorithms on SMPs. Int. J. Appl. Eng. Res. 9, 24 (2014), 30831--30851.
[154]
Chris Lattner and Vikram Adve. 2004. LLVM: A compilation framework for lifelong program analysis 8 transformation. In Proceedings of the International Symposium on Code Generation and Optimization (CGO’04). IEEE, 75--86.
[155]
H. Leather, E. Bonilla, and M. O’Boyle. 2009. Automatic feature generation for machine learning based optimizing compilation. Proceedings of the International Symposium on Code Generation and Optimization (CGO’09). 81--91. Retrieved from http://ieeexplore.ieee.org/xpls/abs.
[156]
Bruce W. Leverett, Roderic Geoffrey Galton Cattell, Steven O. Hobbs, Joseph M. Newcomer, Andrew H. Reiner, Bruce R. Schatz, and William A. Wulf. 1979. An Overview of the Production Quality Compiler-Compiler Project. Carnegie Mellon University, Department of Computer Science.
[157]
Fengqian Li, Feilong Tang, and Yao Shen. 2014. Feature mining for machine learning based compilation optimization. Proceedings of the 8th International Conference on Innovative Mobile and Internet Services in Ubiquitous Computing (IMIS’14). 207--214.
[158]
Sheng Li, Jung Ho Ahn, Richard D. Strong, Jay B. Brockman, Dean M. Tullsen, and Norman P. Jouppi. 2009. McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures. In Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’09). IEEE, 469--480.
[159]
Y. Li, J. Dongarra, and S. Tomov. 2009. A note on auto-tuning GEMM for GPUs. Proceedings of the Conference on Computational Science (ICCS’09). Retrieved from http://link.springer.com/chapter/10.1007/978-3-642-01970-8.
[160]
Hui Liu, Rongcai Zhao, Qi Wang, and Yingying Li. 2018. ALIC: A low overhead compiler optimization prediction model. Wireless Personal Commun. Springer.
[161]
Vincent Loechner. 1999. PolyLib: A library for manipulating parameterized polyhedra. Retrieved from online at http://icps.u-strasbg.fr/PolyLib/.
[162]
P. Lokuciejewski and F. Gedikli. 2009. Automatic WCET reduction by machine learning based heuristics for function inlining. Proceedings of the 3rd Workshop on Statistical and Machine Learning Approaches to Architectures and Compilation (SMART’09). 1--15. Retrieved from https://www.researchgate.net/profile/Peter.
[163]
Paul Lokuciejewski, Sascha Plazar, Heiko Falk, Peter Marwedel, and Lothar Thiele. 2010. Multi-objective exploration of compiler optimizations for real-time systems. Proceedings of the 13th IEEE International Symposium on Object/Component/Service-Oriented Real-Time Distributed Computing (ISORC’10), vol. 1, 115--122.
[164]
David B. Loveman. 1977. Program improvement by source-to-source transformation. J. ACM 24, 1 (1977), 121--145.
[165]
Chi-Keung Luk, Robert Cohn, Robert Muth, Harish Patil, Artur Klauser, Geoff Lowney, Steven Wallace, Vijay Janapa Reddi, and Kim Hazelwood. 2005. Pin: Building customized program analysis tools with dynamic instrumentation. In Proceedings of the 2005 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’05). ACM, New York, NY, 190--200.
[166]
L. Luo, Y. Chen, C. Wu, S. Long, and G. Fursin. 2014. Finding representative sets of optimizations for adaptive multiversioning applications. arXiv preprint arXiv:1407.4075 (2014). Retrieved from http://arxiv.org/abs/1407.4075.
[167]
Scott A. Mahlke, David C. Lin, William Y. Chen, Richard E. Hank, and Roger A. Bringmann. 1992. Effective compiler support for predicated execution using the hyperblock. In ACM SIGMICRO Newsletter, Vol. 23. IEEE Computer Society Press, 45--54.
[168]
Giovanni Mariani, Aleksandar Brankovic, Gianluca Palermo, Jovana Jovic, Vittorio Zaccaria, and Cristina Silvano. 2010. A correlation-based design space exploration methodology for multi-processor systems-on-chip. In Proceedings of the 47th ACM/IEEE Design Automation Conference (DAC’10). IEEE, 120--125.
[169]
J. Mars and R. Hundt. 2009. Scenario based optimization: A framework for statically enabling online optimizations. Proceedings of the 7th Annual IEEE/ACM International Symposium on Code Generation and Optimization. Retrieved from http://dl.acm.org/citation.cfm?id=1545068.
[170]
L. G. A. Martins and R. Nobre. 2016. Clustering-based selection for the exploration of compiler optimization sequences. ACM Trans. Architect. Code Optim. 13, 1 (2016), 28. Retrieved from http://dl.acm.org/citation.cfm?id=2883614.
[171]
Luiz G. A. Martins, Ricardo Nobre, Alexandre C. B. Delbem, Eduardo Marques, and João M. P. Cardoso. 2014. Exploration of compiler optimization sequences using clustering-based selection. In ACM SIGPLAN Notices, Vol. 49. ACM, 63--72.
[172]
Amy McGovern, Eliot Moss, and Andrew G. Barto. 1999. Scheduling straight-line code using reinforcement learning and rollouts. Technical report no. 99-23 (1999).
[173]
Amy McGovern, Eliot Moss, and Andrew G. Barto. 2002. Building a basic block instruction scheduler with reinforcement learning and rollouts. Mach. Learn. 49, 2--3 (2002), 141--160.
[174]
Abdul Wahid Memon and Grigori Fursin. 2013. Crowdtuning: Systematizing auto-tuning using predictive modeling and crowdsourcing. In Proceedings of the PARCO Mini-Symposium on Application Autotuning for HPC (Architectures).
[175]
R. Miceli, G. Civario, A. Sikora, and E. César. 2012. Autotune: A plugin-driven approach to the automatic tuning of parallel applications. Proceedings of the International Workshop on Applied Parallel Computing. 328--342. Retrieved from http://link.springer.com/chapter/10.1007/978-3-642-36803-5.
[176]
MinIR 2011. MINimal IR space. Retrieved from http://www.assembla.com/wiki/show/minir-dev.
[177]
Mehryar Mohri, Afshin Rostamizadeh, and Ameet Talwalkar. 2012. Foundations of Machine Learning. MIT Press.
[178]
A. Monsifrot, F. Bodin, and R. Quiniou. 2002. A machine learning approach to automatic production of compiler heuristics. Proceedings of the International Conference on Artificial Intelligence: Methodology, Systems, and Applications (2002), 41--50. Retrieved from http://link.springer.com/chapter/10.1007/3-540-46148-5.
[179]
Thierry Moreau, Anton Lokhmotov, and Grigori Fursin. 2018. Introducing ReQuEST: An open platform for reproducible and quality-efficient systems-ML tournaments. CoRR abs/1801.06378. arxiv:1801.06378. Retrieved from http://arxiv.org/abs/1801.06378.
[180]
Eliot Moss, Paul Utgoff, John Cavazos, Doina Precup, D Stefanovic, Carla Brodley, and David Scheeff. 1998. Learning to schedule straight-line code. Adv. Neural Info. Process. Syst. 10 (1998), 929--935. Retrieved from http://books.nips.cc/papers/files/nips10/0929.pdf.
[181]
Paschalis Mpeis, Pavlos Petoumenos, and Hugh Leather. 2015. Iterative compilation on mobile devices. CoRR abs/1511.02603. Retrieved from http://arxiv.org/abs/1511.02603.
[182]
Philip J. Mucci, Shirley Browne, Christine Deane, and George Ho. 1999. PAPI: A portable interface to hardware performance counters. In Proceedings of the Department of Defense HPCMP Users Group Conference. 7--10.
[183]
M. Namolaru, A. Cohen, and G. Fursin. 2010. Practical aggregation of semantical program properties for machine learning based optimization. In Proceedings of the 2010 International Conference on Compilers, Architectures and Synthesis for Embedded Systems. Retrieved from http://dl.acm.org/citation.cfm?id=1878951.
[184]
Ricardo Nobre, Reis Luis, and M. P. Cardoso Joao. 2016. Compiler phase ordering as an orthogonal approach for reducing energy consumption. In Proceedings of the 19th Workshop on Compilers for Parallel Computing (CPC’16).
[185]
Ricardo Nobre, Luiz G. A. Martins, and Joao M. P. Cardoso. 2015. Use of previously acquired positioning of optimizations for phase ordering exploration. In Proceedings of the 18th International Workshop on Software and Compilers for Embedded Systems. ACM, 58--67.
[186]
Ricardo Nobre, Luiz G. A. Martins, and João M. P. Cardoso. 2016. A graph-based iterative compiler pass selection and phase ordering approach. ACM SIGPLAN Notices. 21--30.
[187]
Ricardo Nobre, Luís Reis, and João M. P. Cardoso. 2018. Impact of compiler phase ordering when targeting GPUs. In Proceedings of the Parallel Processing Workshops (Euro-Par’17), Dora B. Heras and Luc Bougé (Eds.). Springer International Publishing, Cham, 427--438.
[188]
Hirotaka Ogawa, Kouya Shimura, Satoshi Matsuoka, Fuyuhiko Maruyama, Yukihiko Sohda, and Yasunori Kimura. 2000. OpenJIT: An open-ended, reflective JIT compiler framework for Java. In Proceedings of the European Conference on Object-Oriented Programming. Springer, 362--387.
[189]
William F. Ogilvie, Pavlos Petoumenos, Zheng Wang, and Hugh Leather. 2017. Minimizing the cost of iterative compilation with active learning. In Proceedings of the IEEE/ACM International Symposium on Code Generation and Optimization (CGO’17). IEEE, 245--256.
[190]
Karl Joseph Ottenstein. 1978. Data-flow graphs as an intermediate program form. Ph.D. Dissertation. Purdue University.
[191]
David A. Padua and Michael J. Wolfe. 1986. Advanced compiler optimizations for supercomputers. Commun. ACM 29, 12 (1986), 1184--1201.
[192]
G. Palermo, C. Silvano, S. Valsecchi, and V. Zaccaria. 2003. A system-level methodology for fast multi-objective design space exploration. In Proceedings of the 13th ACM Great Lakes Symposium on VLSI. ACM, 92--95.
[193]
Gianluca Palermo, Cristina Silvano, and Vittorio Zaccaria. 2005. Multi-objective design space exploration of embedded systems. J. Embed. Comput. 1, 3 (2005), 305--316.
[194]
Gianluca Palermo, Cristina Silvano, and Vittorio Zaccaria. 2005. Multi-objective design space exploration of embedded systems. J. Embed. Comput. 1, 3 (2005), 305--316.
[195]
James Pallister, Simon J. Hollis, and Jeremy Bennett. 2013. Identifying compiler options to minimize energy consumption for embedded platforms. Comput. J. 58, 1 (2013), 95--109.
[196]
Z. Pan and R. Eigenmann. 2004. Rating compiler optimizations for automatic performance tuning. Proceedings of the 2004 ACM/IEEE Conference on Supercomputing. 14. Retrieved from http://dl.acm.org/citation.cfm?id=1049958.
[197]
Zhelong Pan and Rudolf Eigenmann. 2006. Fast and effective orchestration of compiler optimizations for automatic performance tuning. In Proceedings of the International Symposium on Code Generation and Optimization (CGO’06). IEEE, 12.
[198]
E. Park, J. Cavazos, and M. A. Alvarez. 2012. Using graph-based program characterization for predictive modeling. Proceedings of the International Symposium on Code Generation and Optimization. 295--305. Retrieved from http://dl.acm.org/citation.cfm?id=2259042.
[199]
E. Park, J. Cavazos, and L. N. Pouchet. 2013. Predictive modeling in a polyhedral optimization space. International J. Parallel Program. 41, 5 (2013), 704--750. Retrieved from http://link.springer.com/article/10.1007/s10766-013-0241-1.
[200]
Eunjung Park, Christos Kartsaklis, and John Cavazos. 2014. HERCULES: Strong patterns towards more intelligent predictive modeling. Proceedings of the 43rd International Conference on Parallel Processing. 172--181.
[201]
E. Park, S. Kulkarni, and J. Cavazos. 2011. An evaluation of different modeling techniques for iterative compilation. In Proceedings of the 14th International Conference on Compilers, Architectures and Synthesis for Embedded Systems (CASES'11). ACM, 65--74. Retrieved from http://dl.acm.org/citation.cfm?id=2038711.
[202]
Eun Jung Park. 2015. Automatic selection of compiler optimizations using program characterization and machine learning title. Ph.D. Dissertation, University of Delaware, USA.
[203]
David A. Patterson and John L. Hennessy. 2013. Computer Organization and Design: The Hardware/Software Interface. Newnes.
[204]
Judea Pearl. 1985. Bayesian Networks: A Model of Self-activated Memory for Evidential Reasoning. UCLA Technical report no. CSD-850017); Proceedings of the 7th Conference of the Cognitive Science Society, vol. 3, 329--334.
[205]
Leslie Pérez Cáceres, Federico Pagnozzi, Alberto Franzin, and Thomas Stützle. 2018. Automatic configuration of GCC using irace. In Artificial Evolution, Evelyne Lutton, Pierrick Legrand, Pierre Parrend, Nicolas Monmarché, and Marc Schoenauer (Eds.). Springer International Publishing, Cham, 202--216.
[206]
R. P. J. Pinkers, P. M. W. Knijnenburg, M. Haneda, and H. A. G. Wijshoff. 2004. Statistical selection of compiler options. Proceedings of the IEEE Computer Society’s Annual International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunications Systems (MASCOTS’04). 494--501.
[207]
L. L. Pollock and M. L. Soffa. 1990. Incremental global optimization for faster recompilations. In Proceedings of the International Conference on Computer Languages. 281--290.
[208]
L. N. Pouchet and C. Bastoul. 2007. Iterative optimization in the polyhedral model. Part I: One-dimensional time. Proceedings of the International Symposium on Code Generation and Optimization (CGO’07). 144--156. Retrieved from http://ieeexplore.ieee.org/xpls/abs.
[209]
L. N. Pouchet, C. Bastoul, A. Cohen, and J. Cavazos. 2008. Iterative optimization in the polyhedral model. Part II: Multidimensional time. ACM SIGPLAN Notices. Retrieved from http://dl.acm.org/citation.cfm?id=1375594.
[210]
L. N. Pouchet and U. Bondhugula. 2010. Combined iterative and model-driven optimization in an automatic parallelization framework. Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis. 1--11. Retrieved from http://dl.acm.org/citation.cfm?id=1884672.
[211]
Louis-Noël Pouchet. 2012. Polybench: The polyhedral benchmark suite. Retrieved from http://www.cs.ucla.edu/pöouchet/software/polybench/.
[212]
S. Purini and L. Jain. 2013. Finding good optimization sequences covering program space. ACM Trans. Architect. Code Optim. 9, 4 (2013), 56. Retrieved from http://dl.acm.org/citation.cfm?id=2400715.
[213]
Matthieu Stéphane Benoit Queva. 2007. Phase-ordering in optimizing compilers. MS thesis. Technical University of Denmark, DTU, DK-2800 Kgs. Lyngby, Denmark.
[214]
Francesco Ricci, Lior Rokach, and Bracha Shapira. 2011. Introduction to recommender systems handbook. In Recommender Systems Handbook. Springer, 1--35.
[215]
Ranjit K. Roy. 2001. Design of Experiments Using the Taguchi Approach: 16 Steps to Product and Process Improvement. Wiley-Interscience.
[216]
T. Rusira, M. Hall, and P. Basu. 2017. Automating compiler-directed autotuning for phased performance behavior. In Proceedings of the IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW’17). 1362--1371.
[217]
Ricardo Nabinger Sanchez, Jose Nelson Amaral, Duane Szafron, Marius Pirvu, and Mark Stoodley. 2011. Using machines to learn method-specific compilation strategies. In Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization. 257--266. Retrieved from http://dl.acm.org/citation.cfm?id=2190072.
[218]
Vivek Sarkar. 1997. Automatic selection of high-order transformations in the IBM XL FORTRAN compilers. IBM J. Res. Dev. 41, 3 (1997), 233--264.
[219]
V. Sarkar. 2000. Optimized unrolling of nested loops. Proceedings of the 14th International Conference on Supercomputing. 153--166. Retrieved from http://dl.acm.org/citation.cfm?id=335246.
[220]
Robert R. Schaller. 1997. Moore’s law: Past, present, and future. IEEE Spectrum 34, 6 (1997), 52--59.
[221]
E. Schkufza, R. Sharma, and A. Aiken. 2014. Stochastic optimization of floating-point programs with tunable precision. ACM SIGPLAN Notices. Retrieved from http://dl.acm.org/citation.cfm?id=2594302.
[222]
Jürgen Schmidhuber. 2015. Deep learning in neural networks: An overview. Neural Netw. 61 (2015), 85--117.
[223]
Paul B. Schneck. 1973. A survey of compiler optimization techniques. In Proceedings of the ACM Annual Conference. ACM, 106--113.
[224]
Bernhard Schölkopf. 2001. The kernel trick for distances. In Advances in Neural Information Processing Systems. Curran Associates, inc., 301--307.
[225]
Cristina Silvano, Giovanni Agosta, Andrea Bartolini, Andrea Beccari, Luca Benini, Joao M. P. Cardoso, Carlo Cavazzoni, Radim Cmar, Jan Martinovic, Gianluca Palermo et al. 2015. ANTAREX--AutoTuning and adaptivity approach for energy efficient eXascale HPC systems. In Proceedings of the IEEE 18th International Conference on Computational Science and Engineering (CSE’15). IEEE, 343--346.
[226]
Cristina Silvano, Giovanni Agosta, Andrea Bartolini, Andrea R Beccari, Luca Benini, João Bispo, Radim Cmar, João MP Cardoso, Carlo Cavazzoni, Jan Martinovič et al. 2016. AutoTuning and adaptivity appRoach for energy efficient eXascale HPC systems: The ANTAREX approach. In Proceedings of the Design, Automation 8 Test in Europe Conference 8 Exhibition (DATE’16). IEEE, 708--713.
[227]
Cristina Silvano, Giovanni Agosta, Stefano Cherubin, Davide Gadioli, Gianluca Palermo, Andrea Bartolini, Luca Benini, Jan Martinovič, Martin Palkovič, Kateřina Slaninová et al. 2016. The ANTAREX approach to autotuning and adaptivity for energy efficient hpc systems. In Proceedings of the ACM International Conference on Computing Frontiers. ACM, 288--293.
[228]
Cristina Silvano, Andrea Bartolini, Andrea Beccari, Candida Manelfi, Carlo Cavazzoni, Davide Gadioli, Erven Rohou, Gianluca Palermo, Giovanni Agosta, Jan Martinovič et al. 2017. The ANTAREX tool flow for monitoring and autotuning energy efficient HPC systems. In Proceedings of the International Conference on Embedded Computer Systems: Architecture, Modeling, and Simulation (SAMOS’17).
[229]
Cristina Silvano, William Fornaciari, Gianluca Palermo, Vittorio Zaccaria, Fabrizio Castro, Marcos Martinez, Sara Bocchio, Roberto Zafalon, Prabhat Avasare, Geert Vanmeerbeeck et al. 2011. Multicube: Multi-objective design space exploration of multi-core architectures. In Proceedings of the VLSI 2010 Annual Symposium. Springer, 47--63.
[230]
Cristina Silvano, Gianluca Palermo, Giovanni Agosta, Amir H. Ashouri, Davide Gadioli, Stefano Cherubin, Emanuele Vitali, Luca Benini, Andrea Bartolini, Daniele Cesarini, Joao Cardoso, Joao Bispo, Pedro Pinto, Riccardo Nobre, Erven Rohou, Loïc Besnard, Imane Lasri, Nico Sanna, Carlo Cavazzoni, Radim Cmar, Jan Martinovič, Kateřina Slaninová, Martin Golasowski, Andrea R. Beccari, and Candida Manelfi. 2018. Autotuning and adaptivity in energy efficient HPC systems: The ANTAREX toolbox. In Proceedings of the Computing Frontiers Conference. ACM.
[231]
Bernard W. Silverman. 1986. Density estimation for statistics and data analysis. Vol. 26. CRC press.
[232]
Richard Stallman. 2001. Using and porting the GNU compiler collection. In MIT Artificial Intelligence Laboratory. Citeseer.
[233]
Richard M. Stallman et al. 2003. Using GCC: the GNU compiler collection reference manual. GNU Press.
[234]
Kenneth O. Stanley. 2002. Efficient reinforcement learning through evolving neural network topologies. In Proceedings of the Genetic and Evolutionary Computation Conference (GECCO’02). Citeseer.
[235]
M. W. Stephenson. 2006. Automating the construction of compiler heuristics using machine learning. Retrieved from http://groups.csail.mit.edu/commit/papers/2006/stephenson_phdthesis.pdf.
[236]
M. Stephenson and S. Amarasinghe. 2003. Meta optimization: Improving compiler heuristics with machine learning. 38, 5 (2003), 77--90. Retrieved from http://dl.acm.org/citation.cfm?id=781141.
[237]
M. Stephenson and S. Amarasinghe. 2005. Predicting unroll factors using supervised classification. In Proceedings of the International Symposium on Code Generation and Optimization.
[238]
M. Stephenson and U. M. O’Reilly. 2003. Genetic programming applied to compiler heuristic optimization. Proceedings of the European Conference on Genetic Programming. 238--253. Retrieved from http://link.springer.com/chapter/10.1007/3-540-36599-0.
[239]
Ralph E. Steuer. 1986. Multiple Criteria Optimization: Theory, Computation, and Applications. Wiley.
[240]
K. Stock, L. N. Pouchet, and P. Sadayappan. 2012. Using machine learning to improve automatic vectorization. ACM Trans. Architect. Code Optim. 8, 4 (2012), 50. Retrieved from http://dl.acm.org/citation.cfm?id=2086729.
[241]
Toshio Suganuma, Takeshi Ogasawara, Mikio Takeuchi, Toshiaki Yasue, Motohiro Kawahito, Kazuaki Ishizaki, Hideaki Komatsu, and Toshio Nakatani. 2000. Overview of the IBM Java just-in-time compiler. IBM Syst. J. 39, 1 (2000), 175--193.
[242]
Cristian Ţăpuş, I-Hsin Chung, Jeffrey K. Hollingsworth et al. 2002. Active harmony: Towards automated performance tuning. In Proceedings of the 2002 ACM/IEEE Conference on Supercomputing. IEEE Computer Society Press, 1--11.
[243]
Michele Tartara and Stefano Crespi Reghizzi. 2012. Parallel iterative compilation: Using MapReduce to speedup machine learning in compilers. In Proceedings of 3rd International Workshop on MapReduce and Its Applications Date. ACM, 33--40.
[244]
Michele Tartara and Stefano Crespi Reghizzi. 2013. Continuous learning of compiler heuristics. ACM Trans. Architect. Code Optim. 9, 4 (2013), 46.
[245]
Michele Tartara and Stefano Crespi Reghizzi. 2013. Continuous learning of compiler heuristics. ACM Trans. Archit. Code Optim. 9, 4, Article 46 (Jan. 2013).
[246]
Gerald Tesauro and Gregory R. Galperin. 1996. On-line policy improvement using monte-carlo search. In Proceedings of the Conference on Neural Information Processing Systems (NIPS’96), Vol. 96. 1068--1074.
[247]
Bruce Thompson. 2002. Statistical, practical, and clinical: How many kinds of significance do counselors need to consider? J. Counsel. Dev. 80, 1 (2002), 64--71.
[248]
J. Thomson, M. O’Boyle, G. Fursin, and B. Franke. 2009. Reducing training time in a one-shot machine learning-based compiler. Proceedings of the International Workshop on Languages and Compilers for Parallel Computing. 399--407. Retrieved from http://link.springer.com/10.1007.
[249]
A. Tiwari, C. Chen, and J. Chame. 2009. A scalable auto-tuning framework for compiler optimization. In Proceedings of the IEEE International Symposium on Parallel 8 Distributed Processin (IPDPS’09). 1--12. Retrieved from http://ieeexplore.ieee.org/xpls/abs.
[250]
Georgios Tournavitis, Zheng Wang, Björn Franke, and Michael F. P. O’Boyle. 2009. Towards a holistic approach to auto-parallelization: Integrating profile-driven parallelism detection and machine-learning based mapping. ACM Sigplan Not. 44, 6 (2009), 177--187.
[251]
S. Triantafyllis, M. Vachharajani, N. Vachharajani, and D. I. August. 2003. Compiler optimization-space exploration. In Proceedings of the International Symposium on Code Generation and Optimization (CGO’03). IEEE, 204--215.
[252]
Eben Upton and Gareth Halfacree. 2014. Raspberry Pi User Guide. John Wiley 8 Sons.
[253]
K. Vaswani. 2007. Microarchitecture sensitive empirical models for compiler optimizations. International Symposium on Code Generation and Optimization (CGO’07) (2007), 131--143. Retrieved from http://ieeexplore.ieee.org/xpls/abs.
[254]
Steven R. Vegdahl. 1982. Phase coupling and constant generation in an optimizing microcode compiler. ACM SIGMICRO Newslett. 13, 4 (1982), 125--133.
[255]
Richard Vuduc, James W. Demmel, and Jeff A. Bilmes. 2004. Statistical models for empirical search-based performance tuning. Int. J. High Perform. Comput. Appl. 18, 1 (2004), 65--94.
[256]
Richard W. Vuduc. 2011. Autotuning. Springer, Boston, MA, 102--105.
[257]
David W. Wall. 1991. Limits of Instruction-level Parallelism. Vol. 19. ACM.
[258]
Wei Wang, John Cavazos, and Allan Porterfield. 2014. Energy auto-tuning using the polyhedral approach. In Proceedings of the Workshop on Polyhedral Compilation Techniques.
[259]
Z. Wang and M. F. P. O’Boyle. 2009. Mapping parallelism to multi-cores: A machine learning based approach. ACM Sigplan Not. 44, 4 (2009), 75--84. Retrieved from http://dl.acm.org/citation.cfm?id=1504189.
[260]
Todd Waterman. 2006. Adaptive Compilation and Inlining. Ph.D. Dissertation, Rice University.
[261]
Deborah Whitfield and Mary Lou Soffa. 1991. Automatic generation of global optimizers. In ACM SIGPLAN Notices, Vol. 26. ACM, 120--129.
[262]
D. Whitfield, M. L. Soffa, D. Whitfield, and M. L. Soffa. 1990. An approach to ordering optimizing transformations. In Proceedings of the 2nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPOPP’90), Vol. 25. ACM Press, New York, New York, 137--146.
[263]
Deborah L. Whitfield and Mary Lou Soffa. 1997. An approach for exploring code improving transformations. ACM Trans. Program. Lang. Syst. 19, 6 (Nov. 1997), 1053--1084.
[264]
Doran K. Wilde. 1993. A Library for Doing Polyhedral Operations. Technical report no. 785. IRISA.
[265]
Samuel Williams, Andrew Waterman, and David Patterson. 2009. Roofline: An insightful visual performance model for multicore architectures. Commun. ACM 52, 4 (2009), 65--76.
[266]
Robert P. Wilson, Robert S. French, Christopher S. Wilson, Saman P. Amarasinghe, Jennifer M. Anderson, Steve W. K. Tjiang, Shih-Wei Liao, Chau-Wen Tseng, Mary W. Hall, Monica S. Lam et al. 1994. SUIF: An infrastructure for research on parallelizing and optimizing compilers. ACM Sigplan Not. 29, 12 (1994), 31--37.
[267]
M. I. Wolczko and D. M. Ungar. 2000. Method and apparatus for improving compiler performance during subsequent compilations of a source program. U.S. Patent No. 6,078,744. Retrieved from https://www.google.com/patents/US6078744.
[268]
Stephan Wong, Thijs Van As, and Geoffrey Brown. 2008. -VEX: A reconfigurable and extensible softcore VLIW processor. In Proceedings of the International Conference on ICECE Technology (FPT’08). IEEE, 369--372.
[269]
William Allan Wulf, Richard K. Johnsson, Charles B. Weinstock, Steven O. Hobbs, and Charles M. Geschke. 1975. The Design of an Optimizing Compiler. Elsevier Science Inc.
[270]
T. Yuki, V. Basupalli, G. Gupta, G. Iooss, and D. Kim. 2012. Alphaz: A system for analysis, transformation, and code generation in the polyhedral equational model. Retrieved from http://www.cs.colostate.edu/TechReports/Reports/2012/tr12-101.pdf.
[271]
T. Yuki, G. Gupta, D. G. Kim, T. Pathan, and S. Rajopadhye. 2012. AlphaZ: A system for design space exploration in the polyhedral model, In Proceedings of the International Workshop on Languages and Compilers for Parallel Computing. 17--31. Retrieved from http://people.rennes.inria.fr/Tomofumi.Yuki/papers/yuki-lcpc2012.pdf.
[272]
Vittorio Zaccaria, Gianluca Palermo, Fabrizio Castro, Cristina Silvano, and Giovanni Mariani. 2010. Multicube explorer: An open source framework for design space exploration of chip multi-processors. In Proceedings of the 23rd International Conference on Architecture of Computing Systems (ARCS’10). VDE, 1--7.
[273]
Min Zhao, Bruce Childers, Mary Lou Soffa, Min Zhao, Bruce Childers, and Mary Lou Soffa. 2003. Predicting the impact of optimizations for embedded systems. In Proceedings of the 2003 ACM SIGPLAN Conference on Language, Compiler, and Tool for Embedded Systems (LCTES’03), Vol. 38. ACM Press, New York, New York, 1.
[274]
Min Zhao, Bruce R. Childers, and Mary Lou Soffa. 2005. A model-based framework: An approach for profit-driven optimization. In Proceedings of the International Symposium on Code Generation and Optimization. IEEE, 317--327.

Cited By

View all
  • (2025)Measuring code efficiency optimization capabilities with ACEOBJournal of Systems and Software10.1016/j.jss.2024.112250219(112250)Online publication date: Jan-2025
  • (2024)Aerobic exercise versus acupuncture on the quality of life in women suffering from irritable bowel syndromeFizjoterapia Polska10.56984/8ZG56086EL24:2(259-265)Online publication date: 20-Jun-2024
  • (2024)Integrating ytopt and libEnsemble to autotune OpenMCThe International Journal of High Performance Computing Applications10.1177/10943420241286476Online publication date: 7-Oct-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Computing Surveys
ACM Computing Surveys  Volume 51, Issue 5
September 2019
791 pages
ISSN:0360-0300
EISSN:1557-7341
DOI:10.1145/3271482
  • Editor:
  • Sartaj Sahni
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 September 2018
Accepted: 01 March 2018
Revised: 01 February 2018
Received: 01 November 2016
Published in CSUR Volume 51, Issue 5

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Autotuning
  2. compilers
  3. machine learning
  4. optimizations
  5. phase ordering

Qualifiers

  • Survey
  • Research
  • Refereed

Funding Sources

  • EU Commission H2020-FET-HPC program

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)512
  • Downloads (Last 6 weeks)69
Reflects downloads up to 12 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2025)Measuring code efficiency optimization capabilities with ACEOBJournal of Systems and Software10.1016/j.jss.2024.112250219(112250)Online publication date: Jan-2025
  • (2024)Aerobic exercise versus acupuncture on the quality of life in women suffering from irritable bowel syndromeFizjoterapia Polska10.56984/8ZG56086EL24:2(259-265)Online publication date: 20-Jun-2024
  • (2024)Integrating ytopt and libEnsemble to autotune OpenMCThe International Journal of High Performance Computing Applications10.1177/10943420241286476Online publication date: 7-Oct-2024
  • (2024)MQT Predictor: Automatic Device Selection with Device-Specific Circuit Compilation for Quantum ComputingACM Transactions on Quantum Computing10.1145/3673241Online publication date: 17-Jun-2024
  • (2024)AutoSched: An Adaptive Self-configured Framework for Scheduling Deep Learning Training WorkloadsProceedings of the 38th ACM International Conference on Supercomputing10.1145/3650200.3656598(473-484)Online publication date: 30-May-2024
  • (2024)The Droplet Search Algorithm for Kernel SchedulingACM Transactions on Architecture and Code Optimization10.1145/365010921:2(1-28)Online publication date: 21-May-2024
  • (2024)Optimization Space Learning: A Lightweight, Noniterative Technique for Compiler AutotuningProceedings of the 28th ACM International Systems and Software Product Line Conference10.1145/3646548.3672588(36-46)Online publication date: 2-Sep-2024
  • (2024)Compiler Autotuning through Multiple-phase LearningACM Transactions on Software Engineering and Methodology10.1145/364033033:4(1-38)Online publication date: 11-Jan-2024
  • (2024)Proactive Resume and Pause of Resources for Microsoft Azure SQL Database ServerlessCompanion of the 2024 International Conference on Management of Data10.1145/3626246.3653371(227-240)Online publication date: 9-Jun-2024
  • (2024)Further Optimizations and Analysis of Smith-Waterman with Vector Extensions2024 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW63119.2024.00113(561-570)Online publication date: 27-May-2024
  • Show More Cited By

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media