research-article

Mapping Computations in Heterogeneous Multicore Systems with Statistical Regression on Program Inputs

Authors:

Junio Cezar Ribeiro Da Silva,

Vinicius Petrucci,

Abdoulaye Gamatié,

Fernando Magno Quintão PereiraAuthors Info & Claims

ACM Transactions on Embedded Computing Systems (TECS), Volume 20, Issue 6

Article No.: 112, Pages 1 - 35

https://doi.org/10.1145/3478288

Published: 18 October 2021 Publication History

Abstract

A hardware configuration is a set of processors and their frequency levels in a multicore heterogeneous system. This article presents a compiler-based technique to match functions with hardware configurations. Such a technique consists of using multivariate linear regression to associate function arguments with particular hardware configurations. By showing that this classification space tends to be convex in practice, this article demonstrates that linear regression is not only an efficient tool to map computations to heterogeneous hardware, but also an effective one. To demonstrate the viability of multivariate linear regression as a way to perform adaptive compilation for heterogeneous architectures, we have implemented our ideas onto the Soot Java bytecode analyzer. Code that we produce can predict the best configuration for a large class of Java and Scala benchmarks running on an Odroid XU4 big.LITTLE board; hence, outperforming prior techniques such as ARM’s GTS and CHOAMP, a recently released static program scheduler.

References

[1]

Umut A. Acar, Arthur Charguéraud, Adrien Guatto, Mike Rainey, and Filip Sieczkowski. 2018. Heartbeat scheduling: Provable efficiency fornested parallelism. In Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation. ACM, New York, NY, 769–782.

Digital Library

[2]

Amir H. Ashouri, William Killian, John Cavazos, Gianluca Palermo, and Cristina Silvano. 2018. A survey on compiler autotuning using machine learning. ACM Computing Surveys 51, 5 (2018), 96:1–96:42. DOI:https://doi.org/10.1145/3197978

Digital Library

[3]

Cedric Augonnet, Samuel Thibault, Raymond Namyst, and Pierre-Andre Wacrenier. 2011. StarPU: A unified platform for task scheduling on heterogeneous multicore architectures. Concurrency and Computation : Practice and Experience 23, 2 (2011), 187–198.

Digital Library

[4]

Muhammad Waqar Azhar, Miquel Pericàs, and Per Stenström. 2019. SaC: Exploiting execution-time slack to save energy in heterogeneous multicore systems. In Proceedings of the 48th International Conference on Parallel Processing. ACM, New York, NY, 26:1–26:12. DOI:https://doi.org/10.1145/3337821.3337865

[5]

M. Waqar Azhar, Per Stenström, and Vassilis Papaefstathiou. 2017. SLOOP: QoS-supervised loop execution to reduce energy on heterogeneous architectures. ACM Transactions on Architecture and Code Optimization 14, 4(2017), Article 41, 25 pages. DOI:https://doi.org/10.1145/3148053

[6]

D. H. Bailey, E. Barszcz, J. T. Barton, D. S. Browning, R. L. Carter, L. Dagum, R. A. Fatoohi, P. O. Frederickson, T. A. Lasinski, R. S. Schreiber, H. D. Simon, V. Venkatakrishnan, and S. K. Weeratunga. 1991. The NAS parallel benchmarks & mdash; Summary and Preliminary Results. In Proceedings of the 1991 ACM/IEEE Conference on Supercomputing. ACM, New York, NY, 158–165.

Digital Library

[7]

Thomas Ball and James R. Larus. 1993. Branch prediction for free. ACMSIGPLAN Notices 28, 6 (1993), 300–313. DOI:https://doi.org/10.1145/173262.155119

[8]

Rajkishore Barik, Naila Farooqui, Brian T. Lewis, Chunling Hu, and Tatiana Shpeisman. 2016. A black-box approach to energy-aware scheduling on integrated CPU-GPU systems. In Proceedings of the 2016 International Symposium on Code Generation and Optimization. ACM, New York, NY, 70–81.

Digital Library

[9]

Tarsila Bessa, Ghristopher Gull, Pedro Quint ao, Michael Frank, José Nacif, and Fernando Magno Quint ao Pereira. 2017. JetsonLEAP: A framework to measure power on a heterogeneous system-on-a-chip device. Science of Computer Programming 33, 1 (2017), 1–37.

[10]

Carlo Emilio Bonferroni. 1936. Teoria statistica delle classi e calcolo delle probabilità. Pubblicazioni del R Istituto Superiore di Scienze Economiche e Commericiali di Firenze, 8, (1936), 3–62.

[11]

Stephen Boyd and Lieven Vandenberghe. 2004. Convex Optimization. Cambridge University Press, New York, NY.

[12]

Paul Butcher. 2014. Seven Concurrency Models in Seven Weeks (1st ed.). Pragmatic Bookshelf, Raleigh, NC, US.

Digital Library

[13]

Haoran Cai, Qiang Cao, Feng Sheng, Manyi Zhang, Chuanyi Qi, Jie Yao, and Changsheng Xie. 2016. Montgolfier: Latency-aware power management system for heterogeneous servers. In Proceedings of the IEEE International Conference on Performance, Computing and Communications. IEEE, 1–8.

[14]

M. Augustine Cauchy. 1847. Méthode Générale pour la résolutiondes systèmes d’Équations simultanées. Comptes Rendus Hebd. Séances Acad.Sci. 25, 10 (1847), 536–538.

[15]

Junio Cezar Ribeiro da Silva, Fernando Magno Quintão Pereira, Michael Frank, and Abdoulaye Gamatié. 2018. A compiler-centric infra-structure for whole-board energy measurement on heterogeneous android systems. In Proceedings of the International Workshop on Reconfigurable Communication-Centric Systems-on-Chip. IEEE, 1–8.

[16]

Junio Cezar Ribeiro da Silva, Lorena Le ao, Vinícius Petrucci, Abdoulaye Gamatié, and Fernando Magno Quint ao Pereira. 2019. Scheduling in Heterogeneous Architecturesvia Multivariate Linear Regression on Function Inputs. Technical Report LIRMM-02281112. CNRS.

[17]

Junio Cezar Ribeiro da Silva, Lorena Le ao, Vinícius Petrucci, Abdoulaye Gamatié, and Fernando Magno Quint ao Pereira. 2020. Mapping computations in heterogeneous multicore systems with statistical regression on inputs. In Proceedings of the Brazilian Symposium on Computing System Engineering. IEEE, 42–49.

[18]

Stanley Chan. 2020. Linear Separability. (2020). Lecture Notes on Machine Learning - STAT598. School of Electrical and Computer Engineering, Purdue University.

[19]

Jason Cong and Bo Yuan. 2012. Energy-efficient scheduling on heterogeneous multi-core architectures. In Proceedings of the 2012 ACM/IEEE International Symposium on Low Power Electronics and Design. ACM, New York, NY, 345–350.

Digital Library

[20]

Keith D. Cooper, Alexander Grosul, Timothy J. Harvey, Steven Reeves, Devika Subramanian, Linda Torczon, and Todd Waterman. 2005. ACME: Adaptive compilation made efficient. In Proceedings of the 2005 ACM SIGPLAN/SIGBED Conference on Languages, Compilers, and Tools for Embedded Systems. ACM, New York, NY, 69–77.

Digital Library

[21]

Diego Costa and Artur Andrzejak. 2018. Collection Switch: A framework for efficient and dynamic collection selection. In Proceedings of the 2018 International Symposium on Code Generation and Optimization. ACM, New York, NY, 16–26. DOI:https://doi.org/10.1145/3168825

Digital Library

[22]

Marco Couto, João Saraiva, and João Paulo Fernandes. 2020. Energy refactorings for android in the large and in the wild. In Proceedings of the 2020 IEEE International Conference on Software Analysis, Evolution and Reengineering, Kostas Kontogiannis, Foutse Khomh, Alexander Chatzigeorgiou, Marios-Eleftherios Fokaefs, and Minghui Zhou (Eds.). IEEE, 217–228. DOI:

[23]

Florian David, Gael Thomas, Julia Lawall, and Gilles Muller. 2014. Continuously measuring critical section pressure with the free-lunch profiler. ACM SIGPLAN Notices 49, 10 (2014), 291–307.

Digital Library

[24]

Christina Delimitrou and Christos Kozyrakis. 2014. Quasar: Resource-efficient and QoS-aware Cluster Management. In Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems. ACM, New York, NY, 127–144.

Digital Library

[25]

Bryan Donyanavard, Tiago Mück, Santanu Sarma, and Nikil Dutt. 2016. SPARTA: Runtime task allocation for energy efficient heterogeneous many-cores. In Proceedings of the 11th IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis CODES. ACM, New York, NY, 27:1–27:10.

Digital Library

[26]

Olive Jean Dunn. 1958. Estimation of the means for dependent variables. Annals of Mathematical Statistics. 29, 4 (1958), 1095–1111.

[27]

Khalil Esper, Stefan Wildermann, and Jürgen Teich. 2021. A comparative evaluation of latency-aware energy optimization approaches in many-core systems (Invited Paper). In Proceedings of the 2nd Workshop on Next Generation Real-Time Embedded Systems (OpenAccess Series in Informatics (OASIcs)), Marko Bertognaand Federico Terraneo (Eds.), Vol. 87. Schloss Dagstuhl–Leibniz-Zentrum für Informatik, Dagstuhl, Germany, 1:1–1:12. DOI:

[28]

Ronald A. Fisher. 1918. The correlation between relatives on the supposition of mendelian inheritance. Philosophical Transactions 52, 2 (1918), 399–433.

[29]

M. Frigo and S. G. Johnson. 2005. The design and implementation of FFTW3. Proceedings of the IEEE 93, 2 (2005), 216 –231. DOI:

[30]

Adrian Garcia-Garcia, Juan Carlos Saez, and Manuel Prieto. 2018. Contention-aware fair scheduling for asymmetric single-ISA multicore systems. IEEE Transactions on Computers 67, 12 (2018), 1703–1719. DOI:

[31]

Francisco Gaspar, Luis Taniça, Pedro Tomás, AleksandarIlic, and Leonel Sousa. 2015. A framework for application-guided task management on heterogeneous embedded systems. ACM Transactions on Architecture Code Optimization 12, 4 (Dec. 2015), 42:1–42:25.

Digital Library

[32]

Peter Greenhalgh. 2011. Big.LITTLE processing with ARM cortex-A15 &cortex-A7. (2011). White paper, Vol. 17. Retrieved from https://www.eetimes.com/document.asp?doc_id=1279167.

[33]

Massimiliano Guarrasi, Giovanni Erbacci, and Andrew Emerson. 2013. Auto-tuning of the FFTW Library for Massively Parallel Supercomputers. Partnership Advanced Computing Europe, Tech. Rep (2013), 1–12.

[34]

Ujjwal Gupta, Chetan Arvind Patil, Ganapati Bhat, Prabhat Mishra, and Umit Y. Ogras. 2017. DyPO: Dynamic pareto-optimal configuration selection for heterogeneous MpSoCs. Transactions on Embedded Computing Systems 16, 5s (2017), 123:1–123:20. DOI:https://doi.org/10.1145/3126530

[35]

Mark Gurman, Debby Wu, and Ian King. 2020. Apple Aims to Sell Macs With Its Own Chips Startingin 2021. (2020). Accessed on July 2021.https://www.bloomberg.com/news/articles/2020-04-23/apple-aims-to-sell-macs-with-its-own-chips-startingin-2021.

[36]

Marcus Hähnel and Hermann Härtig. 2014. Heterogeneity by the Numbers: A study of the ODROIDXU+E Big. LITTLE platform. In Proceedings of the 6th Workshop on Power-Aware Computing and Systems HotPower. USENIX Association, Berkeley, CA, 3–3.

Digital Library

[37]

Connor Imes, David H. K. Kim, Martina Maggio, and Henry Hoffmann. 2015. POET: A portable approach to minimizing energy under soft real-time constraints. In Proceedings of the IEEE Symposium on Real-Time and Embedded Technology and Applications. IEEE, 75–86. DOI:

[38]

A. Jain, M. A. Laurenzano, L. Tang, and J. Mars. 2016. Continuous shape shifting: Enabling loopco-optimization via near-free dynamic code rewriting. In Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture MICRO. IEEE, 1–12.

Digital Library

[39]

Brian Jeff. 2013. big.LITTLE Technology moves towards fully heterogeneous Global Task Scheduling. Technical Report. Arm Ltd.

[40]

José A. Joao, M. Aater Suleman, Onur Mutlu, and Yale N. Patt. 2012. Bottleneck identification and scheduling inmultithreaded applications. In Proceedings of the Architectural Support for Programming Languages and Operating Systems. ACM, New York, NY, 223–234.

Digital Library

[41]

Changhee Jung, Silvius Rus, Brian P. Railing, Nathan Clark, and Santosh Pande. 2011. Brainy: Effective selection of data structures. In Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation. ACM, New York, NY, 86–97. DOI:https://doi.org/10.1145/1993498.1993509

[42]

Jörg Keller, Christoph Kessler, and Jesper Larsson Träff. 2000. Practical Pram Programming. John Wiley & Sons, Inc., USA.

Digital Library

[43]

J. M. Kim, S. K. Seo,and S. W. Chung. 2014. Looking into heterogeneity: when simple is faster. In Proceedings of the 2nd International Workshop on Parallelism in Mobile Platforms. Retrieved from https://news.ycombinator.com/item?id=8714613.

[44]

Jyothi Krishna and Rupesh Nasre. 2018. Optimizing graph algorithms in asymmetric multicore processors. Transactions on CAD of Integrated Circuits and Systems 37, 11(2018), 2673–2684. DOI:

[45]

Rakesh Kumar, Dean M. Tullsen, Parthasarathy Ranganathan, Norman P. Jouppi, and Keith I. Farkas. 2004. Single-ISA Heterogeneous multi-core architecturesfor multithreaded workload performance. SIGARCH Computer Architecture News 32, 2 (2004), 64. DOI:https://doi.org/10.1145/1028176.1006707

[46]

Chris Lattner and Sarita V. Adve. 2004. LLVM: A compilation framework for lifelong program analysis transformation. In Proceedings of the International Symposium on Code Generation and Optimization. IEEE, 75–86. DOI:https://doi.org/10.1109/CGO.2004.1281665

[47]

Chi-Keung Luk, Sunpyo Hong, and Hyesoon Kim. 2009. Qilin: Exploiting parallelism on heterogeneous multiprocessors with adaptive mapping. In Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture MICRO. ACM, New York, NY, 45–55.

Digital Library

[48]

A. Lukefahr, S. Padmanabha, R. Das, F. M. Sleiman, R. G. Dreslinski, T. F. Wenisch, and S. Mahlke. 2016. Exploring fine-grained heterogeneity with composite cores. Transactions on Computers 65, 2 (2016), 535–547.

Digital Library

[49]

Agostino Mascitti, Tommaso Cucinotta, and Mauro Marinoni. 2020. An Adaptive, utilization-based approach to schedulereal-time tasks for ARM Big.LITTLE architectures. SIGBED Review 17, 1 (2020), 18–23. DOI:https://doi.org/10.1145/3412821.3412824

[50]

Gilberto Melfe, Alcides Fonseca, and João Paulo Fernandes. 2018. Helping developers write energy efficient haskell through a data-structure evaluation. In Proceedings of the 2018 IEEE/ACM 6th International Workshop on Green and Sustainable Software, Ivano Malavolta, Rick Kazman, and João Saraiva(Eds.). ACM, New York, NY, 9–15. DOI:https://doi.org/10.1145/3194078.3194080

[51]

Gleison Mendonça, Breno Guimarães, Péricles Alves, Márcio Pereira, Guido Araújo, and Fernando Magno Quintão Pereira. 2017. DawnCC: Automatic annotation for data parallelism and offloading. Transactions on Architecture and Code Optimization 14, 2(2017), 13:1–13:25.

Digital Library

[52]

Sparsh Mittal. 2016. A survey of techniques for architecting and managing asymmetric multicore processors. Computing Surveys 48, 3 (2016), 45:1–45:38. DOI:https://doi.org/10.1145/2856125

[53]

Sparsh Mittal and Jeffrey S. Vetter. 2015. A Survey of CPU-GPU heterogeneous computing techniques. Computing Surveys 47, 4 (2015), 69:1–69:35.

Digital Library

[54]

Mehrzad Nejat, Madhavan Manivannan, Miquel Pericas, and Per Stenstrom. 2020. Coordinated management of processor configuration and cache partitioning to optimize energy under QoS constraints. In Proceedings of the 2020 IEEE International Parallel and Distributed Processing Symposium. IEEE, 303–313. DOI:

[55]

Jose Leal Domingues Neto, Se-Young Yu, Daniel F. Macedo, José Marcos S. Nogueira, Rami Langar, and Stefano Secci. 2018. ULOOF: A user level online offloading framework for mobile edge computing. IEEE Transactions on Mobile Computing 17, 11 (2018), 2660–2674. DOI:

[56]

Pengcheng Nie and Zhenhua Duan. 2012. Efficient and scalable scheduling for performance heterogeneous multicore systems. Journal of Parallel and Distributed Computing 72, 3 (2012), 353–361.

Digital Library

[57]

Rajiv Nishtala, Paul M. Carpenter, Vinicius Petrucci, and Xavier Martorell. 2017. Hipster: Hybrid task manager for latency-critical cloud workloads. In Proceedings of the 2017 IEEE Symposium on High-Performance Computer Architecture. IEEE, 409–420.

[58]

Wellington Oliveira, Renato Oliveira, Fernando Castor, Gustavo Pinto, and João Paulo Fernandes. 2021. Improving energy-efficiency by recommending Java collections. Empirical Software Engineering 26, 3 (2021), 55. DOI:

[59]

Anne-Cecile Orgerie, Marcos Dias de Assunç ão, and Laurent Lefevre. 2014. A survey on techniques for improving the energy efficiency of large-scale distributed systems. ACM Computing Surveys 46, 4 (2014), 47:1–47:31. DOI:https://doi.org/10.1145/2532637

[60]

Jinsu Park, Seongbeom Park, and Woongki Baek. 2018. RPPC: A holistic runtime system for maximizing performance under power capping. In Proceedings of the 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing. IEEE, 41–50.

Digital Library

[61]

Suraj Paul, Navonil Chatterjee, Prasun Ghosal, and Jean-Philippe Diguet. 2020. Adaptive task allocation and scheduling onnoc-based multicore platforms with multitasking processors. ACM Transactions on Embedded Computing Systems 20, 1 (2020) Article 4, 26 pages. DOI:https://doi.org/10.1145/3408324

[62]

F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. 2011. Scikit-learn: Machine learning in python. Journal of Machine Learning Research 12, 85 (2011), 2825–2830.

Digital Library

[63]

Vinicius Petrucci, Orlando Loques, Daniel Mossé, Rami Melhem, Neven Abou Gazala, and Sameh Gobriel. 2015. Energy-efficient thread assignment optimization for heterogeneous multicore systems. ACM Transactions on Embedded Computing System 14, 1 (2015), 15:1–15:26.

Digital Library

[64]

Guilherme Piccoli, Henrique N. Santos, Raphael E. Rodrigues, Christiane Pousa, Edson Borin, and Fernando M. Quintão Pereira. 2014. Compiler support for selective page migration in NUMA architectures. In Proceedings of the 23rd International Conference on Parallel Architectures and Compilation. ACM, New York, NY, 369–380.

Digital Library

[65]

Gabriel Poesia, Breno Campos Ferreira Guimarães, Fabricio Ferracioli, and Fernando Magno Quintão Pereira. 2017. Static placement of computation on heterogeneous devices. Proceedings of the ACM on Programming Languages 1, OOPSLA (2017), 50:1–50:28.

Digital Library

[66]

Aleksandar Prokopec, Andrea Rosà, David Leopoldseder, Gilles Duboscq, Petr Tůma, Martin Studener, Lubomír Bulej, Yudi Zheng, Alex Villazón, Doug Simon, Thomas Würthinger, and Walter Binder. 2019. Renaissance: Benchmarking suite for parallelapplications on the JVM. In Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation. ACM, New York, NY, 31–47.

Digital Library

[67]

Krishna K. Rangan, Gu-Yeon Wei, and David Brooks. 2009. Thread Motion: Fine-grained power management for multi-core systems. In Proceedings of the International Science Community Association. ACM, New York, NY, 302–313.

Digital Library

[68]

Basireddy Karunakar Reddy, Amit Kumar Singh, Bashir M. Al-Hashimi, and Geoff V. Merrett. 2020. AdaMD: Adaptive mapping and dvfs for energy-efficient heterogeneous multicores. Transactions on Computer Aided Design of Integrated Circuits and Systems 39, 10 (2020), 2206–2217. DOI:

[69]

Uladizislau Rezki and Vitaly Wool. 2015. Doing big.LITTLE Right: Little And Big Obstacles. Softprise Consulting.

[70]

Julius Roeder, Sebastian Altmeyer, Benjamin Rouxel, and Clemens Grelck. 2021. Energy-aware scheduling of multi-version tasks on heterogeneous real-time systems. In Proceedings of the 36th Annual ACM Symposium on Applied Computing. ACM, New York, NY, 1–10.

Digital Library

[71]

Christopher J. Rossbach, Yuan Yu, Jon Currey, Jean-Philippe Martin, and Dennis Fetterly. 2013. Dandelion: A compiler and runtime for heterogeneous systems. In Proceedings of the 24th ACM Symposium on Operating Systems Principles. ACM, New York, NY, 49–68.

Digital Library

[72]

Benjamin Schiller, Clemens Deusser, Jerónimo Castrillón, and Thorsten Strufe. 2016. Compile- and run-time approaches for the selection of efficient data structures for dynamic graph analysis. Applied Network Science 1, 1(2016), 9. DOI:

[73]

Daniel Shelepov, Juan Carlos Saez Alcaide, Stacey Jeffery, Alexandra Fedorova, Nestor Perez, Zhi Feng Huang, Sergey Blagodurov, and Viren Kumar. 2009. HASS: A scheduler for heterogeneous multicore systems. SIGOPS Operating Systems Review 43, 2 (2009), 66–75.

Digital Library

[74]

Zhen-Jun Shi. 2004. Convergence of line search methods for unconstrained optimization. Applied Mathematics and Computation 157, 2 (2004), 393–405. DOI:https://doi.org/10.1016/j.amc.2003.08.058

[75]

Julian Shun, Guy E. Blelloch, Jeremy T. Fineman, Phillip B. Gibbons, Aapo Kyrola, Harsha Vardhan Simhadri, and Kanat Tangwongsan. 2012. Brief announcement: The problem based benchmarksuite. In Proceedings of the 24th Annual ACM Symposium on Parallelism in Algorithms and Architectures. ACM, New York, NY, 68–70.

Digital Library

[76]

Amit Kumar Singh, Somdip Dey, Klaus D. McDonald-Maier, Basireddy Karunakar Reddy, Geoff V. Merrett, and Bashir M. Al-Hashimi. 2020. Dynamic Energy and thermal management of multi-core mobile platforms: A survey. Design and Test 37, 5 (2020), 25–33. DOI:

[77]

Thannirmalai Somu Muthukaruppan, Anuj Pathania, and Tulika Mitra. 2014. Price theory based power management for heterogeneous multi-cores. In Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems. ACM, New York, NY, 161–176.

Digital Library

[78]

Jyothi Krishna Viswakaran Sreelatha, Shankar Balachandran, and Rupesh Nasre. 2018. CHOAMP: Cost based hardware optimization for asymmetric multicore processors. Transactions on Multi-Scale Computing Systems 4, 2 (2018), 163–176.

[79]

Lingjia Tang, Jason Mars, Wei Wang, Tanima Dey, and Mary Lou Soffa. 2013. ReQoS: Reactive static/dynamic compilation for qosin warehouse scale computers. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems. ACM, New York, NY, USA, 89–100.

Digital Library

[80]

Jürgen Teich, Pouya Mahmoody, Behnaz Pourmohseni, Sascha Roloff, Wolfgang Schröder-Preikschat, and Stefan Wildermann. 2021. Run-time enforcement of non-functional program properties on MPSoCs. In A Journey of Embedded and Cyber-Physical Systems—Essays Dedicated to Peter Marwedel on the Occasion of His 70th Birthday, Jian-Jia Chen(Ed.). Springer-Verlag, Berlin, 125–149. DOI:

[81]

Stavros Tzilis, Pedro Trancoso, and Ioannis Sourdis. 2019. Energy-efficient runtime management of heterogeneous multicores using online projection. Transactions on Architecture and Code Optimization 15, 4 (2019), 63:1–63:26.

Digital Library

[82]

Raja Vallée-Rai, Phong Co, Etienne Gagnon, Laurie Hendren, Patrick Lam, and Vijay Sundaresan. 1999. Soot—A java bytecode optimization framework. In Proceedings of the 1999 Conference of the Centre for Advanced Studies on Collaborative ResearchCASCON. IBM Press, Indianapolis, US, 13.

Digital Library

[83]

Kenzo Van Craeynest, Aamer Jaleel, Lieven Eeckhout, Paolo Narvaez, and Joel Emer. 2012. Scheduling heterogeneous multi-cores through performance impact estimation (PIE). In Proceedings of the 2012 39th Annual International Symposium on Computer Architecture. IEEE, New York, NY, 213–224.

Digital Library

[84]

Kenzo Van Craeynest, Aamer Jaleel, Lieven Eeckhout, Paolo Narvaez, and Joel Emer. 2012. Scheduling heterogeneous multi-cores through performance impact estimation (PIE). In Proceedings of the 2012 39th Annual International Symposium on Computer Architecture. IEEE Computer Society, 213–224.

Digital Library

[85]

Zheng Wang and Michael F. P. O’Boyle. 2018. Machine learning in compiler optimization. Proceedings of the IEEE 106, 11 (2018), 1879–1901. DOI:

[86]

Anton Weber, Kim-AnhTran, Stefanos Kaxiras, and Alexandra Jimborean. 2017. Decoupled access-execute on ARM big.LITTLE. arxiv:1701.05478Retrieved from http://arxiv.org/abs/1701.05478.

[87]

Youfeng Wu and James R. Larus. 1994. Static branch frequency and program profile analysis. In Proceedings of the 27th Annual International Symposium on Microarchitecture MICRO. ACM, New York, NY, 1–11. DOI:https://doi.org/10.1145/192724.192725

[88]

A. Yazdanbakhsh, J. Park, H. Sharma, P. Lotfi-Kamran, and H. Esmaeilzadeh. 2015. Neural acceleration for GPU through put processors. In Proceedings of the 48th International Symposium on Microarchitecture MICRO. IEEE, 482–493.

Digital Library

[89]

Huazhe Zhang and Henry Hoffmann. 2016. Maximizing performance under a power cap: A Comparison of hardware, software, and hybrid techniques. In Proceedings of the 21st International Conference on Architectural Support for Programming Languages and Operating Systems. ACM, New York, NY, 545–559.

Digital Library

[90]

Yunming Zhang, Ajay Brahmakshatriya, Xinyi Chen, Laxman Dhulipala, Shoaib Kamil, Saman Amarasinghe, and Julian Shun. 2020. Optimizing ordered graph algorithms with graphit. In Proceedings of the International Symposium on Code Generation and Optimization. ACM, New York, NY, 158–170. DOI:

Digital Library

Cited By

Canesche MRosário VBorin EQuintão Pereira F(2024)The Droplet Search Algorithm for Kernel SchedulingACM Transactions on Architecture and Code Optimization10.1145/365010921:2(1-28)Online publication date: 21-May-2024
https://dl.acm.org/doi/10.1145/3650109

Index Terms

Mapping Computations in Heterogeneous Multicore Systems with Statistical Regression on Program Inputs
1. Computing methodologies
  1. Machine learning
  2. Parallel computing methodologies
    1. Parallel programming languages
2. Software and its engineering
  1. Software notations and tools
    1. Compilers

Recommendations

Lightweight asynchronous scheduling in heterogeneous reconfigurable systems
Abstract
The trend for heterogeneous embedded systems is the integration of accelerators and general-purpose CPU cores on the same die. In these integrated architectures, like the Zynq UltraScale+ board (CPU+FPGA) that we target in this work, hardware ...
Heterogeneous parallel_for Template for CPU---GPU Chips

Heterogeneous processors, comprising CPU cores and a GPU, are the de facto standard in desktop and mobile platforms. In many cases it is worthwhile to exploit both the CPU and GPU simultaneously. However, the workload distribution poses a challenge when ...
Compiler and runtime support for enabling reduction computations on heterogeneous systems

A trend that has materialized, and has given rise to much attention, is of the increasingly heterogeneous computing platforms. Presently, it has become very common for a desktop or a notebook computer to come equipped with both a multi-core CPU and a ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Embedded Computing Systems

ACM Transactions on Embedded Computing Systems Volume 20, Issue 6

November 2021

256 pages

ISSN:1539-9087

EISSN:1558-3465

DOI:10.1145/3485150

Editor:
Tulika Mitra
National University of Singapore, Singapore

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Journal Family

ACM Journals for the Design of Smart and Connected Systems

Publication History

Published: 18 October 2021

Accepted: 01 July 2021

Revised: 01 July 2021

Received: 01 March 2021

Published in TECS Volume 20, Issue 6

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Refereed

Funding Sources

ANR
CNPq
FAPEMIG
CAPES

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
284
Total Downloads

Downloads (Last 12 months)46
Downloads (Last 6 weeks)8

Reflects downloads up to 15 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Canesche MRosário VBorin EQuintão Pereira F(2024)The Droplet Search Algorithm for Kernel SchedulingACM Transactions on Architecture and Code Optimization10.1145/365010921:2(1-28)Online publication date: 21-May-2024
https://dl.acm.org/doi/10.1145/3650109

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Full Text

View this article in Full Text.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View full text|Download PDF

View Issue’s Table of Contents