Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Mapping Computations in Heterogeneous Multicore Systems with Statistical Regression on Program Inputs

Published: 18 October 2021 Publication History

Abstract

A hardware configuration is a set of processors and their frequency levels in a multicore heterogeneous system. This article presents a compiler-based technique to match functions with hardware configurations. Such a technique consists of using multivariate linear regression to associate function arguments with particular hardware configurations. By showing that this classification space tends to be convex in practice, this article demonstrates that linear regression is not only an efficient tool to map computations to heterogeneous hardware, but also an effective one. To demonstrate the viability of multivariate linear regression as a way to perform adaptive compilation for heterogeneous architectures, we have implemented our ideas onto the Soot Java bytecode analyzer. Code that we produce can predict the best configuration for a large class of Java and Scala benchmarks running on an Odroid XU4 big.LITTLE board; hence, outperforming prior techniques such as ARM’s GTS and CHOAMP, a recently released static program scheduler.

References

[1]
Umut A. Acar, Arthur Charguéraud, Adrien Guatto, Mike Rainey, and Filip Sieczkowski. 2018. Heartbeat scheduling: Provable efficiency fornested parallelism. In Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation. ACM, New York, NY, 769–782.
[2]
Amir H. Ashouri, William Killian, John Cavazos, Gianluca Palermo, and Cristina Silvano. 2018. A survey on compiler autotuning using machine learning. ACM Computing Surveys 51, 5 (2018), 96:1–96:42. DOI:https://doi.org/10.1145/3197978
[3]
Cedric Augonnet, Samuel Thibault, Raymond Namyst, and Pierre-Andre Wacrenier. 2011. StarPU: A unified platform for task scheduling on heterogeneous multicore architectures. Concurrency and Computation : Practice and Experience 23, 2 (2011), 187–198.
[4]
Muhammad Waqar Azhar, Miquel Pericàs, and Per Stenström. 2019. SaC: Exploiting execution-time slack to save energy in heterogeneous multicore systems. In Proceedings of the 48th International Conference on Parallel Processing. ACM, New York, NY, 26:1–26:12. DOI:https://doi.org/10.1145/3337821.3337865
[5]
M. Waqar Azhar, Per Stenström, and Vassilis Papaefstathiou. 2017. SLOOP: QoS-supervised loop execution to reduce energy on heterogeneous architectures. ACM Transactions on Architecture and Code Optimization 14, 4(2017), Article 41, 25 pages. DOI:https://doi.org/10.1145/3148053
[6]
D. H. Bailey, E. Barszcz, J. T. Barton, D. S. Browning, R. L. Carter, L. Dagum, R. A. Fatoohi, P. O. Frederickson, T. A. Lasinski, R. S. Schreiber, H. D. Simon, V. Venkatakrishnan, and S. K. Weeratunga. 1991. The NAS parallel benchmarks & mdash; Summary and Preliminary Results. In Proceedings of the 1991 ACM/IEEE Conference on Supercomputing. ACM, New York, NY, 158–165.
[7]
Thomas Ball and James R. Larus. 1993. Branch prediction for free. ACMSIGPLAN Notices 28, 6 (1993), 300–313. DOI:https://doi.org/10.1145/173262.155119
[8]
Rajkishore Barik, Naila Farooqui, Brian T. Lewis, Chunling Hu, and Tatiana Shpeisman. 2016. A black-box approach to energy-aware scheduling on integrated CPU-GPU systems. In Proceedings of the 2016 International Symposium on Code Generation and Optimization. ACM, New York, NY, 70–81.
[9]
Tarsila Bessa, Ghristopher Gull, Pedro Quint ao, Michael Frank, José Nacif, and Fernando Magno Quint ao Pereira. 2017. JetsonLEAP: A framework to measure power on a heterogeneous system-on-a-chip device. Science of Computer Programming 33, 1 (2017), 1–37.
[10]
Carlo Emilio Bonferroni. 1936. Teoria statistica delle classi e calcolo delle probabilità. Pubblicazioni del R Istituto Superiore di Scienze Economiche e Commericiali di Firenze, 8, (1936), 3–62.
[11]
Stephen Boyd and Lieven Vandenberghe. 2004. Convex Optimization. Cambridge University Press, New York, NY.
[12]
Paul Butcher. 2014. Seven Concurrency Models in Seven Weeks (1st ed.). Pragmatic Bookshelf, Raleigh, NC, US.
[13]
Haoran Cai, Qiang Cao, Feng Sheng, Manyi Zhang, Chuanyi Qi, Jie Yao, and Changsheng Xie. 2016. Montgolfier: Latency-aware power management system for heterogeneous servers. In Proceedings of the IEEE International Conference on Performance, Computing and Communications. IEEE, 1–8.
[14]
M. Augustine Cauchy. 1847. Méthode Générale pour la résolutiondes systèmes d’Équations simultanées. Comptes Rendus Hebd. Séances Acad.Sci. 25, 10 (1847), 536–538.
[15]
Junio Cezar Ribeiro da Silva, Fernando Magno Quintão Pereira, Michael Frank, and Abdoulaye Gamatié. 2018. A compiler-centric infra-structure for whole-board energy measurement on heterogeneous android systems. In Proceedings of the International Workshop on Reconfigurable Communication-Centric Systems-on-Chip. IEEE, 1–8.
[16]
Junio Cezar Ribeiro da Silva, Lorena Le ao, Vinícius Petrucci, Abdoulaye Gamatié, and Fernando Magno Quint ao Pereira. 2019. Scheduling in Heterogeneous Architecturesvia Multivariate Linear Regression on Function Inputs. Technical Report LIRMM-02281112. CNRS.
[17]
Junio Cezar Ribeiro da Silva, Lorena Le ao, Vinícius Petrucci, Abdoulaye Gamatié, and Fernando Magno Quint ao Pereira. 2020. Mapping computations in heterogeneous multicore systems with statistical regression on inputs. In Proceedings of the Brazilian Symposium on Computing System Engineering. IEEE, 42–49.
[18]
Stanley Chan. 2020. Linear Separability. (2020). Lecture Notes on Machine Learning - STAT598. School of Electrical and Computer Engineering, Purdue University.
[19]
Jason Cong and Bo Yuan. 2012. Energy-efficient scheduling on heterogeneous multi-core architectures. In Proceedings of the 2012 ACM/IEEE International Symposium on Low Power Electronics and Design. ACM, New York, NY, 345–350.
[20]
Keith D. Cooper, Alexander Grosul, Timothy J. Harvey, Steven Reeves, Devika Subramanian, Linda Torczon, and Todd Waterman. 2005. ACME: Adaptive compilation made efficient. In Proceedings of the 2005 ACM SIGPLAN/SIGBED Conference on Languages, Compilers, and Tools for Embedded Systems. ACM, New York, NY, 69–77.
[21]
Diego Costa and Artur Andrzejak. 2018. Collection Switch: A framework for efficient and dynamic collection selection. In Proceedings of the 2018 International Symposium on Code Generation and Optimization. ACM, New York, NY, 16–26. DOI:https://doi.org/10.1145/3168825
[22]
Marco Couto, João Saraiva, and João Paulo Fernandes. 2020. Energy refactorings for android in the large and in the wild. In Proceedings of the 2020 IEEE International Conference on Software Analysis, Evolution and Reengineering, Kostas Kontogiannis, Foutse Khomh, Alexander Chatzigeorgiou, Marios-Eleftherios Fokaefs, and Minghui Zhou (Eds.). IEEE, 217–228. DOI:
[23]
Florian David, Gael Thomas, Julia Lawall, and Gilles Muller. 2014. Continuously measuring critical section pressure with the free-lunch profiler. ACM SIGPLAN Notices 49, 10 (2014), 291–307.
[24]
Christina Delimitrou and Christos Kozyrakis. 2014. Quasar: Resource-efficient and QoS-aware Cluster Management. In Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems. ACM, New York, NY, 127–144.
[25]
Bryan Donyanavard, Tiago Mück, Santanu Sarma, and Nikil Dutt. 2016. SPARTA: Runtime task allocation for energy efficient heterogeneous many-cores. In Proceedings of the 11th IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis CODES. ACM, New York, NY, 27:1–27:10.
[26]
Olive Jean Dunn. 1958. Estimation of the means for dependent variables. Annals of Mathematical Statistics. 29, 4 (1958), 1095–1111.
[27]
Khalil Esper, Stefan Wildermann, and Jürgen Teich. 2021. A comparative evaluation of latency-aware energy optimization approaches in many-core systems (Invited Paper). In Proceedings of the 2nd Workshop on Next Generation Real-Time Embedded Systems (OpenAccess Series in Informatics (OASIcs)), Marko Bertognaand Federico Terraneo (Eds.), Vol. 87. Schloss Dagstuhl–Leibniz-Zentrum für Informatik, Dagstuhl, Germany, 1:1–1:12. DOI:
[28]
Ronald A. Fisher. 1918. The correlation between relatives on the supposition of mendelian inheritance. Philosophical Transactions 52, 2 (1918), 399–433.
[29]
M. Frigo and S. G. Johnson. 2005. The design and implementation of FFTW3. Proceedings of the IEEE 93, 2 (2005), 216 –231. DOI:
[30]
Adrian Garcia-Garcia, Juan Carlos Saez, and Manuel Prieto. 2018. Contention-aware fair scheduling for asymmetric single-ISA multicore systems. IEEE Transactions on Computers 67, 12 (2018), 1703–1719. DOI:
[31]
Francisco Gaspar, Luis Taniça, Pedro Tomás, AleksandarIlic, and Leonel Sousa. 2015. A framework for application-guided task management on heterogeneous embedded systems. ACM Transactions on Architecture Code Optimization 12, 4 (Dec. 2015), 42:1–42:25.
[32]
Peter Greenhalgh. 2011. Big.LITTLE processing with ARM cortex-A15 &cortex-A7. (2011). White paper, Vol. 17. Retrieved from https://www.eetimes.com/document.asp?doc_id=1279167.
[33]
Massimiliano Guarrasi, Giovanni Erbacci, and Andrew Emerson. 2013. Auto-tuning of the FFTW Library for Massively Parallel Supercomputers. Partnership Advanced Computing Europe, Tech. Rep (2013), 1–12.
[34]
Ujjwal Gupta, Chetan Arvind Patil, Ganapati Bhat, Prabhat Mishra, and Umit Y. Ogras. 2017. DyPO: Dynamic pareto-optimal configuration selection for heterogeneous MpSoCs. Transactions on Embedded Computing Systems 16, 5s (2017), 123:1–123:20. DOI:https://doi.org/10.1145/3126530
[35]
Mark Gurman, Debby Wu, and Ian King. 2020. Apple Aims to Sell Macs With Its Own Chips Startingin 2021. (2020). Accessed on July 2021.https://www.bloomberg.com/news/articles/2020-04-23/apple-aims-to-sell-macs-with-its-own-chips-startingin-2021.
[36]
Marcus Hähnel and Hermann Härtig. 2014. Heterogeneity by the Numbers: A study of the ODROIDXU+E Big. LITTLE platform. In Proceedings of the 6th Workshop on Power-Aware Computing and Systems HotPower. USENIX Association, Berkeley, CA, 3–3.
[37]
Connor Imes, David H. K. Kim, Martina Maggio, and Henry Hoffmann. 2015. POET: A portable approach to minimizing energy under soft real-time constraints. In Proceedings of the IEEE Symposium on Real-Time and Embedded Technology and Applications. IEEE, 75–86. DOI:
[38]
A. Jain, M. A. Laurenzano, L. Tang, and J. Mars. 2016. Continuous shape shifting: Enabling loopco-optimization via near-free dynamic code rewriting. In Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture MICRO. IEEE, 1–12.
[39]
Brian Jeff. 2013. big.LITTLE Technology moves towards fully heterogeneous Global Task Scheduling. Technical Report. Arm Ltd.
[40]
José A. Joao, M. Aater Suleman, Onur Mutlu, and Yale N. Patt. 2012. Bottleneck identification and scheduling inmultithreaded applications. In Proceedings of the Architectural Support for Programming Languages and Operating Systems. ACM, New York, NY, 223–234.
[41]
Changhee Jung, Silvius Rus, Brian P. Railing, Nathan Clark, and Santosh Pande. 2011. Brainy: Effective selection of data structures. In Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation. ACM, New York, NY, 86–97. DOI:https://doi.org/10.1145/1993498.1993509
[42]
Jörg Keller, Christoph Kessler, and Jesper Larsson Träff. 2000. Practical Pram Programming. John Wiley & Sons, Inc., USA.
[43]
J. M. Kim, S. K. Seo,and S. W. Chung. 2014. Looking into heterogeneity: when simple is faster. In Proceedings of the 2nd International Workshop on Parallelism in Mobile Platforms. Retrieved from https://news.ycombinator.com/item?id=8714613.
[44]
Jyothi Krishna and Rupesh Nasre. 2018. Optimizing graph algorithms in asymmetric multicore processors. Transactions on CAD of Integrated Circuits and Systems 37, 11(2018), 2673–2684. DOI:
[45]
Rakesh Kumar, Dean M. Tullsen, Parthasarathy Ranganathan, Norman P. Jouppi, and Keith I. Farkas. 2004. Single-ISA Heterogeneous multi-core architecturesfor multithreaded workload performance. SIGARCH Computer Architecture News 32, 2 (2004), 64. DOI:https://doi.org/10.1145/1028176.1006707
[46]
Chris Lattner and Sarita V. Adve. 2004. LLVM: A compilation framework for lifelong program analysis transformation. In Proceedings of the International Symposium on Code Generation and Optimization. IEEE, 75–86. DOI:https://doi.org/10.1109/CGO.2004.1281665
[47]
Chi-Keung Luk, Sunpyo Hong, and Hyesoon Kim. 2009. Qilin: Exploiting parallelism on heterogeneous multiprocessors with adaptive mapping. In Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture MICRO. ACM, New York, NY, 45–55.
[48]
A. Lukefahr, S. Padmanabha, R. Das, F. M. Sleiman, R. G. Dreslinski, T. F. Wenisch, and S. Mahlke. 2016. Exploring fine-grained heterogeneity with composite cores. Transactions on Computers 65, 2 (2016), 535–547.
[49]
Agostino Mascitti, Tommaso Cucinotta, and Mauro Marinoni. 2020. An Adaptive, utilization-based approach to schedulereal-time tasks for ARM Big.LITTLE architectures. SIGBED Review 17, 1 (2020), 18–23. DOI:https://doi.org/10.1145/3412821.3412824
[50]
Gilberto Melfe, Alcides Fonseca, and João Paulo Fernandes. 2018. Helping developers write energy efficient haskell through a data-structure evaluation. In Proceedings of the 2018 IEEE/ACM 6th International Workshop on Green and Sustainable Software, Ivano Malavolta, Rick Kazman, and João Saraiva(Eds.). ACM, New York, NY, 9–15. DOI:https://doi.org/10.1145/3194078.3194080
[51]
Gleison Mendonça, Breno Guimarães, Péricles Alves, Márcio Pereira, Guido Araújo, and Fernando Magno Quintão Pereira. 2017. DawnCC: Automatic annotation for data parallelism and offloading. Transactions on Architecture and Code Optimization 14, 2(2017), 13:1–13:25.
[52]
Sparsh Mittal. 2016. A survey of techniques for architecting and managing asymmetric multicore processors. Computing Surveys 48, 3 (2016), 45:1–45:38. DOI:https://doi.org/10.1145/2856125
[53]
Sparsh Mittal and Jeffrey S. Vetter. 2015. A Survey of CPU-GPU heterogeneous computing techniques. Computing Surveys 47, 4 (2015), 69:1–69:35.
[54]
Mehrzad Nejat, Madhavan Manivannan, Miquel Pericas, and Per Stenstrom. 2020. Coordinated management of processor configuration and cache partitioning to optimize energy under QoS constraints. In Proceedings of the 2020 IEEE International Parallel and Distributed Processing Symposium. IEEE, 303–313. DOI:
[55]
Jose Leal Domingues Neto, Se-Young Yu, Daniel F. Macedo, José Marcos S. Nogueira, Rami Langar, and Stefano Secci. 2018. ULOOF: A user level online offloading framework for mobile edge computing. IEEE Transactions on Mobile Computing 17, 11 (2018), 2660–2674. DOI:
[56]
Pengcheng Nie and Zhenhua Duan. 2012. Efficient and scalable scheduling for performance heterogeneous multicore systems. Journal of Parallel and Distributed Computing 72, 3 (2012), 353–361.
[57]
Rajiv Nishtala, Paul M. Carpenter, Vinicius Petrucci, and Xavier Martorell. 2017. Hipster: Hybrid task manager for latency-critical cloud workloads. In Proceedings of the 2017 IEEE Symposium on High-Performance Computer Architecture. IEEE, 409–420.
[58]
Wellington Oliveira, Renato Oliveira, Fernando Castor, Gustavo Pinto, and João Paulo Fernandes. 2021. Improving energy-efficiency by recommending Java collections. Empirical Software Engineering 26, 3 (2021), 55. DOI:
[59]
Anne-Cecile Orgerie, Marcos Dias de Assunç ão, and Laurent Lefevre. 2014. A survey on techniques for improving the energy efficiency of large-scale distributed systems. ACM Computing Surveys 46, 4 (2014), 47:1–47:31. DOI:https://doi.org/10.1145/2532637
[60]
Jinsu Park, Seongbeom Park, and Woongki Baek. 2018. RPPC: A holistic runtime system for maximizing performance under power capping. In Proceedings of the 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing. IEEE, 41–50.
[61]
Suraj Paul, Navonil Chatterjee, Prasun Ghosal, and Jean-Philippe Diguet. 2020. Adaptive task allocation and scheduling onnoc-based multicore platforms with multitasking processors. ACM Transactions on Embedded Computing Systems 20, 1 (2020) Article 4, 26 pages. DOI:https://doi.org/10.1145/3408324
[62]
F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. 2011. Scikit-learn: Machine learning in python. Journal of Machine Learning Research 12, 85 (2011), 2825–2830.
[63]
Vinicius Petrucci, Orlando Loques, Daniel Mossé, Rami Melhem, Neven Abou Gazala, and Sameh Gobriel. 2015. Energy-efficient thread assignment optimization for heterogeneous multicore systems. ACM Transactions on Embedded Computing System 14, 1 (2015), 15:1–15:26.
[64]
Guilherme Piccoli, Henrique N. Santos, Raphael E. Rodrigues, Christiane Pousa, Edson Borin, and Fernando M. Quintão Pereira. 2014. Compiler support for selective page migration in NUMA architectures. In Proceedings of the 23rd International Conference on Parallel Architectures and Compilation. ACM, New York, NY, 369–380.
[65]
Gabriel Poesia, Breno Campos Ferreira Guimarães, Fabricio Ferracioli, and Fernando Magno Quintão Pereira. 2017. Static placement of computation on heterogeneous devices. Proceedings of the ACM on Programming Languages 1, OOPSLA (2017), 50:1–50:28.
[66]
Aleksandar Prokopec, Andrea Rosà, David Leopoldseder, Gilles Duboscq, Petr Tůma, Martin Studener, Lubomír Bulej, Yudi Zheng, Alex Villazón, Doug Simon, Thomas Würthinger, and Walter Binder. 2019. Renaissance: Benchmarking suite for parallelapplications on the JVM. In Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation. ACM, New York, NY, 31–47.
[67]
Krishna K. Rangan, Gu-Yeon Wei, and David Brooks. 2009. Thread Motion: Fine-grained power management for multi-core systems. In Proceedings of the International Science Community Association. ACM, New York, NY, 302–313.
[68]
Basireddy Karunakar Reddy, Amit Kumar Singh, Bashir M. Al-Hashimi, and Geoff V. Merrett. 2020. AdaMD: Adaptive mapping and dvfs for energy-efficient heterogeneous multicores. Transactions on Computer Aided Design of Integrated Circuits and Systems 39, 10 (2020), 2206–2217. DOI:
[69]
Uladizislau Rezki and Vitaly Wool. 2015. Doing big.LITTLE Right: Little And Big Obstacles. Softprise Consulting.
[70]
Julius Roeder, Sebastian Altmeyer, Benjamin Rouxel, and Clemens Grelck. 2021. Energy-aware scheduling of multi-version tasks on heterogeneous real-time systems. In Proceedings of the 36th Annual ACM Symposium on Applied Computing. ACM, New York, NY, 1–10.
[71]
Christopher J. Rossbach, Yuan Yu, Jon Currey, Jean-Philippe Martin, and Dennis Fetterly. 2013. Dandelion: A compiler and runtime for heterogeneous systems. In Proceedings of the 24th ACM Symposium on Operating Systems Principles. ACM, New York, NY, 49–68.
[72]
Benjamin Schiller, Clemens Deusser, Jerónimo Castrillón, and Thorsten Strufe. 2016. Compile- and run-time approaches for the selection of efficient data structures for dynamic graph analysis. Applied Network Science 1, 1(2016), 9. DOI:
[73]
Daniel Shelepov, Juan Carlos Saez Alcaide, Stacey Jeffery, Alexandra Fedorova, Nestor Perez, Zhi Feng Huang, Sergey Blagodurov, and Viren Kumar. 2009. HASS: A scheduler for heterogeneous multicore systems. SIGOPS Operating Systems Review 43, 2 (2009), 66–75.
[74]
Zhen-Jun Shi. 2004. Convergence of line search methods for unconstrained optimization. Applied Mathematics and Computation 157, 2 (2004), 393–405. DOI:https://doi.org/10.1016/j.amc.2003.08.058
[75]
Julian Shun, Guy E. Blelloch, Jeremy T. Fineman, Phillip B. Gibbons, Aapo Kyrola, Harsha Vardhan Simhadri, and Kanat Tangwongsan. 2012. Brief announcement: The problem based benchmarksuite. In Proceedings of the 24th Annual ACM Symposium on Parallelism in Algorithms and Architectures. ACM, New York, NY, 68–70.
[76]
Amit Kumar Singh, Somdip Dey, Klaus D. McDonald-Maier, Basireddy Karunakar Reddy, Geoff V. Merrett, and Bashir M. Al-Hashimi. 2020. Dynamic Energy and thermal management of multi-core mobile platforms: A survey. Design and Test 37, 5 (2020), 25–33. DOI:
[77]
Thannirmalai Somu Muthukaruppan, Anuj Pathania, and Tulika Mitra. 2014. Price theory based power management for heterogeneous multi-cores. In Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems. ACM, New York, NY, 161–176.
[78]
Jyothi Krishna Viswakaran Sreelatha, Shankar Balachandran, and Rupesh Nasre. 2018. CHOAMP: Cost based hardware optimization for asymmetric multicore processors. Transactions on Multi-Scale Computing Systems 4, 2 (2018), 163–176.
[79]
Lingjia Tang, Jason Mars, Wei Wang, Tanima Dey, and Mary Lou Soffa. 2013. ReQoS: Reactive static/dynamic compilation for qosin warehouse scale computers. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems. ACM, New York, NY, USA, 89–100.
[80]
Jürgen Teich, Pouya Mahmoody, Behnaz Pourmohseni, Sascha Roloff, Wolfgang Schröder-Preikschat, and Stefan Wildermann. 2021. Run-time enforcement of non-functional program properties on MPSoCs. In A Journey of Embedded and Cyber-Physical Systems—Essays Dedicated to Peter Marwedel on the Occasion of His 70th Birthday, Jian-Jia Chen(Ed.). Springer-Verlag, Berlin, 125–149. DOI:
[81]
Stavros Tzilis, Pedro Trancoso, and Ioannis Sourdis. 2019. Energy-efficient runtime management of heterogeneous multicores using online projection. Transactions on Architecture and Code Optimization 15, 4 (2019), 63:1–63:26.
[82]
Raja Vallée-Rai, Phong Co, Etienne Gagnon, Laurie Hendren, Patrick Lam, and Vijay Sundaresan. 1999. Soot—A java bytecode optimization framework. In Proceedings of the 1999 Conference of the Centre for Advanced Studies on Collaborative ResearchCASCON. IBM Press, Indianapolis, US, 13.
[83]
Kenzo Van Craeynest, Aamer Jaleel, Lieven Eeckhout, Paolo Narvaez, and Joel Emer. 2012. Scheduling heterogeneous multi-cores through performance impact estimation (PIE). In Proceedings of the 2012 39th Annual International Symposium on Computer Architecture. IEEE, New York, NY, 213–224.
[84]
Kenzo Van Craeynest, Aamer Jaleel, Lieven Eeckhout, Paolo Narvaez, and Joel Emer. 2012. Scheduling heterogeneous multi-cores through performance impact estimation (PIE). In Proceedings of the 2012 39th Annual International Symposium on Computer Architecture. IEEE Computer Society, 213–224.
[85]
Zheng Wang and Michael F. P. O’Boyle. 2018. Machine learning in compiler optimization. Proceedings of the IEEE 106, 11 (2018), 1879–1901. DOI:
[86]
Anton Weber, Kim-AnhTran, Stefanos Kaxiras, and Alexandra Jimborean. 2017. Decoupled access-execute on ARM big.LITTLE. arxiv:1701.05478Retrieved from http://arxiv.org/abs/1701.05478.
[87]
Youfeng Wu and James R. Larus. 1994. Static branch frequency and program profile analysis. In Proceedings of the 27th Annual International Symposium on Microarchitecture MICRO. ACM, New York, NY, 1–11. DOI:https://doi.org/10.1145/192724.192725
[88]
A. Yazdanbakhsh, J. Park, H. Sharma, P. Lotfi-Kamran, and H. Esmaeilzadeh. 2015. Neural acceleration for GPU through put processors. In Proceedings of the 48th International Symposium on Microarchitecture MICRO. IEEE, 482–493.
[89]
Huazhe Zhang and Henry Hoffmann. 2016. Maximizing performance under a power cap: A Comparison of hardware, software, and hybrid techniques. In Proceedings of the 21st International Conference on Architectural Support for Programming Languages and Operating Systems. ACM, New York, NY, 545–559.
[90]
Yunming Zhang, Ajay Brahmakshatriya, Xinyi Chen, Laxman Dhulipala, Shoaib Kamil, Saman Amarasinghe, and Julian Shun. 2020. Optimizing ordered graph algorithms with graphit. In Proceedings of the International Symposium on Code Generation and Optimization. ACM, New York, NY, 158–170. DOI:

Cited By

View all
  • (2024)The Droplet Search Algorithm for Kernel SchedulingACM Transactions on Architecture and Code Optimization10.1145/365010921:2(1-28)Online publication date: 21-May-2024

Index Terms

  1. Mapping Computations in Heterogeneous Multicore Systems with Statistical Regression on Program Inputs

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image ACM Transactions on Embedded Computing Systems
        ACM Transactions on Embedded Computing Systems  Volume 20, Issue 6
        November 2021
        256 pages
        ISSN:1539-9087
        EISSN:1558-3465
        DOI:10.1145/3485150
        • Editor:
        • Tulika Mitra
        Issue’s Table of Contents

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Journal Family

        Publication History

        Published: 18 October 2021
        Accepted: 01 July 2021
        Revised: 01 July 2021
        Received: 01 March 2021
        Published in TECS Volume 20, Issue 6

        Permissions

        Request permissions for this article.

        Check for updates

        Author Tags

        1. Regression
        2. function
        3. heterogeneous architecture
        4. scheduling
        5. big.LITTLE

        Qualifiers

        • Research-article
        • Refereed

        Funding Sources

        • ANR
        • CNPq
        • FAPEMIG
        • CAPES

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)46
        • Downloads (Last 6 weeks)8
        Reflects downloads up to 15 Oct 2024

        Other Metrics

        Citations

        Cited By

        View all
        • (2024)The Droplet Search Algorithm for Kernel SchedulingACM Transactions on Architecture and Code Optimization10.1145/365010921:2(1-28)Online publication date: 21-May-2024

        View Options

        Get Access

        Login options

        Full Access

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Full Text

        View this article in Full Text.

        Full Text

        HTML Format

        View this article in HTML Format.

        HTML Format

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media