research-article

A black-box approach to energy-aware scheduling on integrated CPU-GPU systems

Authors:

Rajkishore Barik,

Naila Farooqui,

Brian T. Lewis,

Tatiana ShpeismanAuthors Info & Claims

CGO '16: Proceedings of the 2016 International Symposium on Code Generation and Optimization

Pages 70 - 81

https://doi.org/10.1145/2854038.2854052

Published: 29 February 2016 Publication History

Abstract

Energy efficiency is now a top design goal for all computing systems, from fitness trackers and tablets, where it affects battery life, to cloud computing centers, where it directly impacts operational cost, maintainability, and environmental impact. Today's widespread integrated CPU-GPU processors combine a CPU and a GPU compute device with different power-performance characteristics. For these integrated processors, hardware vendors implement automatic power management policies that are typically not exposed to the end-user. Furthermore, these policies often vary between different processor generations and SKUs. As a result, it is challenging to design a generally-applicable energy-aware runtime to schedule work onto both the CPU and GPU of such integrated CPU-GPU processors to optimize energy consumption. We propose a new black-box scheduling technique to reduce energy use by effectively partitioning work across the CPU and GPU cores of integrated CPU-GPU processors. Our energy-aware scheduler combines a power model with information about the runtime behavior of a specific workload. This power model is computed once for each processor to characterize its power consumption for different kinds of workloads. On two widely different platforms, a high-end desktop system and a low-power tablet, our energy-aware runtime yields an energy-delay product that is 96% and 93%, respectively, of the near-ideal Oracle energy-delay product on a diverse set of workloads.

References

[1]

C. Augonnet, S. Thibault, R. Namyst, and P.-A. Wacrenier. StarPU: a unified platform for task scheduling on heterogeneous multicore architectures. Concurrency and Computation: Practice and Experience, 23(2):187–198, 2011.

Digital Library

[2]

R. Barik, R. Kaleem, D. Majeti, B. Lewis, T. Shpeisman, C. Hu, Y. Ni, and A.-R. Adl-Tabatabai. Efficient mapping of irregular C++ applications to integrated GPUs. In IEEE/ACM International Symposium on Code Generation and Optimization (CGO), 2014.

Digital Library

[3]

J. Barnes and P. Hut. A hierarchical O(N log N ) force calculation algorithm. Nature, 324:446–449, 1986.

[4]

C. Bienia, S. Kumar, J. P. Singh, and K. Li. The PARSEC benchmark suite: characterization and architectural implications. In Proceedings of the 17th international conference on Parallel architectures and compilation techniques (PACT), pages 72–81, NY, USA, 2008.

Digital Library

[5]

K. Chandramohan and M. F. O’Boyle. Partitioning dataparallel programs for heterogeneous mpsocs: Time and energy design space exploration. In Proceedings of the 2014 SIGPLAN/SIGBED Conference on Languages, Compilers and Tools for Embedded Systems (LCTES), pages 73–82, 2014.

Digital Library

[6]

R. Ge, X. Feng, M. Burtscher, and Z. Zong. PEACH: A Model for Performance and Energy Aware Cooperative Hybrid Computing. In Proceedings of the 11th ACM Conference on Computing Frontiers (CF), pages 24:1–24:2, 2014.

Digital Library

[7]

H. Hoffmann. Racing and pacing to idle: An evaluation of heuristics for energy-aware resource allocation. In Proceedings of the Workshop on Power-Aware Computing and Systems (HotPower), pages 13:1–13:5, 2013.

Digital Library

[8]

S. Hong and H. Kim. An integrated GPU power and performance model. SIGARCH Comput. Archit. News, 38(3): 280–289, June 2010.

Digital Library

[9]

Intel Performance Counter Monitor. URL https://software.intel.com/en-us/articles/ intel-performance-counter-monitor.

[10]

Intel Thread Building Blocks. URL https://www. threadingbuildingblocks.org/.

[11]

Q. Jiao, M. Lu, H. P. Huynh, and T. Mitra. Improving GPGPU Energy-efficiency Through Concurrent Kernel Execution and DVFS. In Proceedings of the 13th Annual IEEE/ACM International Symposium on Code Generation and Optimization (CGO), pages 1–11, 2015.

Digital Library

[12]

R. Kaleem, R. Barik, T. Shpeisman, B. Lewis, C. Hu, and K. Pingali. Adaptive Heterogeneous Scheduling on Integrated GPUs. In Proceedings of the 23rd International Conference on Parallel Architectures and Compilation Techniques (PACT), 2014.

Digital Library

[13]

S. Kim, I. Roy, and V. Talwar. Evaluating integrated graphics processors for data center workloads. In Proceedings of the Workshop on Power-Aware Computing and Systems (HotPower), pages 8:1–8:5, 2013.

Digital Library

[14]

J. Lee, M. Samadi, Y. Park, and S. Mahlke. Transparent CPUGPU collaboration for data-parallel kernels on heterogeneous systems. In Proceedings of the 22nd international conference on Parallel architectures and compilation techniques (PACT), 2013.

Digital Library

[15]

C.-K. Luk, S. Hong, and H. Kim. Qilin: exploiting parallelism on heterogeneous multiprocessors with adaptive mapping. In Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pages 45–55, 2009.

Digital Library

[16]

K. Ma, X. Li, W. Chen, C. Zhang, and X. Wang. GreenGPU: A Holistic Approach to Energy Efficiency in GPU-CPU Heterogeneous Architectures. In Proceedings of the 2012 41st International Conference on Parallel Processing (ICPP), pages 48–57, 2012.

Digital Library

[17]

X. Mei, L. S. Yung, K. Zhao, and X. Chu. A Measurement Study of GPU DVFS on Energy Conservation. In Proceedings of the Workshop on Power-Aware Computing and Systems, HotPower ’13, pages 10:1–10:5, 2013.

Digital Library

[18]

OpenSource Computer Vision Library. URL http: //sourceforge.net/projects/opencvlibrary/.

[19]

I. Paul, V. Ravi, S. Manne, M. Arora, and S. Yalamanchili. Coordinated energy management in heterogeneous processors. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC), pages 59:1–59:12, 2013.

Digital Library

[20]

P. M. Phothilimthana, J. Ansel, J. Ragan-Kelley, and S. Amarasinghe. Portable performance on heterogeneous architectures. In Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems (ASPLOS), pages 431–444, 2013.

Digital Library

[21]

C. J. Rossbach, Y. Yu, J. Currey, J.-P. Martin, and D. Fetterly. Dandelion: a compiler and runtime for heterogeneous systems. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles (SOSP), pages 49–68, NY, USA, 2013.

Digital Library

[22]

C. Shen, S. Chakraborty, K. R. Raghavan, H. Choi, and M. B. Srivastava. Exploiting processor heterogeneity for energy efficient context inference on mobile phones. In Proceedings of the Workshop on Power-Aware Computing and Systems (HotPower), pages 9:1–9:5, 2013.

Digital Library

[23]

T. Somu Muthukaruppan, A. Pathania, and T. Mitra. Price theory based power management for heterogeneous multi-cores. In Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pages 161–176, 2014.

Digital Library

[24]

E. Totoni, M. Dikmen, and M. J. Garzarán. Easy, fast, and energy-efficient object detection on heterogeneous on-chip architectures. ACM Trans. Archit. Code Optim., 10(4): 45:1–45:25, Dec. 2013.

Digital Library

[25]

H. Wang, V. Sathish, R. Singh, M. J. Schulte, and N. S. Kim. Workload and power budget partitioning for single-chip heterogeneous processors. In Proceedings of the 21st International Conference on Parallel Architectures and Compilation Techniques (PACT), pages 401–410, 2012.

Digital Library

[26]

G. Wu, J. L. Greathouse, A. Lyashevsky, N. Jayasena, and D. Chiou. GPGPU performance and power estimation using machine learning. In IEEE 21st International Symposium on High Performance Computer Architecture (HPCA), pages 564–576, Feb 2015.

[27]

Q. Wu, M. Martonosi, D. W. Clark, V. J. Reddi, D. Connors, Y. Wu, J. Lee, and D. Brooks. A dynamic compilation framework for controlling microprocessor energy and performance. In Proceedings of the 38th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pages 271–282, 2005.

Digital Library

Cited By

Jayaweera MKong MWang YKaeli DGrosser TDubach CSteuwer MXue JOttoni GQuintão Pereira F(2024)Energy-Aware Tile Size Selection for Affine Programs on GPUsProceedings of the 2024 IEEE/ACM International Symposium on Code Generation and Optimization10.1109/CGO57630.2024.10444795(13-27)Online publication date: 2-Mar-2024
https://dl.acm.org/doi/10.1109/CGO57630.2024.10444795
Kocot BCzarnul PProficz J(2023)Energy-Aware Scheduling for High-Performance Computing Systems: A SurveyEnergies10.3390/en1602089016:2(890)Online publication date: 12-Jan-2023
https://doi.org/10.3390/en16020890
Da Silva JLeão LPetrucci VGamatié APereira F(2021)Mapping Computations in Heterogeneous Multicore Systems with Statistical Regression on Program InputsACM Transactions on Embedded Computing Systems10.1145/347828820:6(1-35)Online publication date: 18-Oct-2021
https://dl.acm.org/doi/10.1145/3478288
Show More Cited By

Index Terms

A black-box approach to energy-aware scheduling on integrated CPU-GPU systems
1. Software and its engineering
  1. Software notations and tools
    1. Compilers

Recommendations

Affinity-aware work-stealing for integrated CPU-GPU processors
PPoPP '16

Recent integrated CPU-GPU processors like Intel's Broadwell and AMD's Kaveri support hardware CPU-GPU shared virtual memory, atomic operations, and memory coherency. This enables fine-grained CPU-GPU work-stealing, but architectural differences between ...
Energy Efficiency of Multithreaded WZ Factorization with the Use of OpenMP and OpenACC on CPU and GPU
Computational Science – ICCS 2024
Abstract
Energy efficiency research aims to optimize the use of computing resources by minimizing energy consumption and increasing computational efficiency. This article explores the effect of the directive-based parallel programming model on energy ...
Understanding Co-Running Behaviors on Integrated CPU/GPU Architectures

Architecture designers tend to integrate both CPUs and GPUs on the same chip to deliver energy-efficient designs. It is still an open problem to effectively leverage the advantages of both CPUs and GPUs on integrated architectures. In this work, we port ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

CGO '16: Proceedings of the 2016 International Symposium on Code Generation and Optimization

February 2016

283 pages

ISBN:9781450337786

DOI:10.1145/2854038

General Chair:
Bjoern Franke
University of Edinburgh, UK
,
Program Chairs:
Youfeng Wu
Intel, USA
,
Fabrice Rastello
Inria, France

Copyright © 2016 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

In-Cooperation

IEEE-CS: Computer Society

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 29 February 2016

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

CGO '16

Sponsor:

CGO '16: 14th Annual IEEE/ACM International Symposium on Code Generation and Optimization

March 12 - 18, 2016

Barcelona, Spain

Acceptance Rates

CGO '16 Paper Acceptance Rate 25 of 108 submissions, 23%;

Overall Acceptance Rate 312 of 1,061 submissions, 29%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

41
Total Citations
View Citations
741
Total Downloads

Downloads (Last 12 months)39
Downloads (Last 6 weeks)7

Reflects downloads up to 03 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Jayaweera MKong MWang YKaeli DGrosser TDubach CSteuwer MXue JOttoni GQuintão Pereira F(2024)Energy-Aware Tile Size Selection for Affine Programs on GPUsProceedings of the 2024 IEEE/ACM International Symposium on Code Generation and Optimization10.1109/CGO57630.2024.10444795(13-27)Online publication date: 2-Mar-2024
https://dl.acm.org/doi/10.1109/CGO57630.2024.10444795
Kocot BCzarnul PProficz J(2023)Energy-Aware Scheduling for High-Performance Computing Systems: A SurveyEnergies10.3390/en1602089016:2(890)Online publication date: 12-Jan-2023
https://doi.org/10.3390/en16020890
Da Silva JLeão LPetrucci VGamatié APereira F(2021)Mapping Computations in Heterogeneous Multicore Systems with Statistical Regression on Program InputsACM Transactions on Embedded Computing Systems10.1145/347828820:6(1-35)Online publication date: 18-Oct-2021
https://dl.acm.org/doi/10.1145/3478288
Xu YBelviranli MShen XVetter J(2021)PCCS: Processor-Centric Contention-aware Slowdown Model for Heterogeneous System-on-ChipsMICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3466752.3480101(1282-1295)Online publication date: 18-Oct-2021
https://dl.acm.org/doi/10.1145/3466752.3480101
Zhu ZLiu AZhang FChen F(2021)FPGA Resource Pooling in Cloud ComputingIEEE Transactions on Cloud Computing10.1109/TCC.2018.28740119:2(610-626)Online publication date: 1-Apr-2021
https://doi.org/10.1109/TCC.2018.2874011
da Silva AKind Bde Souza Magalhães JRocha JGuimarães BPereira FLee J(2021)AnghaBenchProceedings of the 2021 IEEE/ACM International Symposium on Code Generation and Optimization10.1109/CGO51591.2021.9370322(378-390)Online publication date: 27-Feb-2021
https://dl.acm.org/doi/10.1109/CGO51591.2021.9370322
Yesil SOzturk O(2021)Scheduling for heterogeneous systems in accelerator-rich environmentsThe Journal of Supercomputing10.1007/s11227-021-03883-5Online publication date: 25-May-2021
https://doi.org/10.1007/s11227-021-03883-5
Zanella Ada Silva AQuintão FCavalcante EDantas FBatista T(2020)YACOSProceedings of the 24th Brazilian Symposium on Context-Oriented Programming and Advanced Modularity10.1145/3427081.3427089(56-63)Online publication date: 19-Oct-2020
https://dl.acm.org/doi/10.1145/3427081.3427089
Monil MBelviranli MLee SVetter JMalony ASarkar VKim H(2020)MEPHESTOProceedings of the ACM International Conference on Parallel Architectures and Compilation Techniques10.1145/3410463.3414671(413-425)Online publication date: 30-Sep-2020
https://dl.acm.org/doi/10.1145/3410463.3414671
Li JLi JLi MWang GZhou JLu YLi DHuang Y(2020)Minimizing Energy of Heterogeneous Computing Systems by Task Scheduling ApproachJournal of Circuits, Systems and Computers10.1142/S0218126620501947Online publication date: 29-Jan-2020
https://doi.org/10.1142/S0218126620501947
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents