research-article

Open access

JOSS: Joint Exploration of CPU-Memory DVFS and Task Scheduling for Energy Efficiency

Authors:

Madhavan Manivannan,

Bhavishya Goel,

Miquel PericàsAuthors Info & Claims

ICPP '23: Proceedings of the 52nd International Conference on Parallel Processing

Pages 828 - 838

https://doi.org/10.1145/3605573.3605586

Published: 13 September 2023 Publication History

All formats PDF

Abstract

Energy-efficient execution of task-based parallel applications is crucial as tasking is a widely supported feature in many parallel programming libraries and runtimes. Currently, state-of-the-art proposals primarily rely on leveraging core asymmetry and CPU DVFS. Additionally, these proposals mostly use heuristics and lack the ability to explore the trade-offs between energy usage and performance. However, our findings demonstrate that focusing solely on CPU energy consumption for energy-efficient scheduling while neglecting memory energy consumption leaves room for further energy savings. We propose JOSS, a runtime scheduling framework that leverages both CPU DVFS and memory DVFS in conjunction with core asymmetry and task characteristics to enable energy-efficient execution of task-based applications. JOSS also enables the exploration of energy and performance trade-offs by supporting user-defined performance constraints. JOSS uses a set of models to predict task execution time, CPU and memory power consumption, and then selects the configuration for the tunable knobs to achieve the desired energy performance trade-off. Our evaluation shows that JOSS achieves 21.2% energy reduction, on average, compared to the state-of-the-art. Moreover, we demonstrate that even in the absence of a memory DVFS knob, taking energy consumption of both CPU and memory into account achieves better energy savings compared to only accounting for CPU energy. Furthermore, JOSS is able to adapt scheduling to reduce energy consumption while satisfying the desired performance constraints.

References

[1]

2014. Documentation of StarPU. https://files.inria.fr/starpu/doc/starpu.pdf.

[2]

2015. ODROID XU4. https://magazine.odroid.com/wp-content/uploads/odroid-xu4-user-manual.pdf.

[3]

2017. Jetson TX2 Module. https://developer.nvidia.com/embedded/jetson-tx2.

[4]

2018. XiTAO Runtime. https://github.com/CHART-Team/xitao.git.

[5]

2019. DDR5/4/3/2: How Memory Density and Speed Increased with each Generation of DDR. https://blogs.synopsys.com/vip-central/2019/02/27/ddr5-4-3-2-how-memory-density-and-speed-increased-with-each-generation-of-ddr/.

[6]

2020. ARM BIG.LITTLE. https://www.arm.com/why-arm/technologies/big-little.

[7]

2020. Biomarker Discovery. https://legato-project.eu/use-cases/healthcare.

[8]

2022. Apple A16 Bionic. https://en.wikipedia.org/wiki/Apple_A16.

[9]

Bilge Acun, Kavitha Chandrasekar, and Laxmikant V. Kale. 2019. Fine-Grained Energy Efficiency Using Per-Core DVFS with an Adaptive Runtime System. In 2019 Tenth International Green and Sustainable Computing Conference (IGSC).

[10]

E. Castillo, M. Moreto, M. Casas, L. Alvarez, E. Vallejo, K. Chronaki, R. Badia, J. L. Bosque, R. Beivide, E. Ayguade, J. Labarta, and M. Valero. 2016. CATA: Criticality Aware Task Acceleration for Multicore Processors. In 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[11]

Jing Chen, Madhavan Manivannan, Mustafa Abduljabbar, and Miquel Pericàs. 2022. ERASE: Energy Efficient Task Mapping and Resource Management for Work Stealing Runtimes. ACM Trans. Archit. Code Optim. (mar 2022).

[12]

Jing Chen, Madhavan Manivannan, Bhavishya Goel, Mustafa Abduljabbar, and Miquel Pericàs. 2022. STEER: Asymmetry-aware Energy Efficient Task Scheduler for Cluster-based Multicore Architectures. In 2022 IEEE 34th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD).

[13]

Gilberto Contreras and Margaret Martonosi. 2008. Characterizing and improving the performance of intel threading building blocks. In 2008 IEEE International Symposium on Workload Characterization. IEEE, 57–66.

[14]

AM Coutinho Demetrios, Daniele De Sensi, Arthur Francisco Lorenzon, Kyriakos Georgiou, Jose Nunez-Yanez, Kerstin Eder, and Samuel Xavier-de Souza. 2020. Performance and energy trade-offs for parallel applications on heterogeneous multi-processing systems. Energies 13, 9 (2020), 2409.

[15]

Sanjeev Das, Jan Werner, Manos Antonakakis, Michalis Polychronakis, and Fabian Monrose. 2019. SoK: The Challenges, Pitfalls, and Perils of Using Hardware Performance Counters for Security. In 2019 IEEE S&P.

[16]

Howard David, Chris Fallin, Eugene Gorbatov, Ulf R. Hanebutte, and Onur Mutlu. 2011. Memory Power Management via Dynamic Voltage/Frequency Scaling. In Proceedings of the 8th ACM International Conference on Autonomic Computing(ICAC ’11). 31–40.

Digital Library

[17]

Qingyuan Deng, David Meisner, Abhishek Bhattacharjee, Thomas F. Wenisch, and Ricardo Bianchini. 2012. CoScale: Coordinating CPU and Memory System DVFS in Server Systems. In 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.

[18]

Qingyuan Deng, David Meisner, Luiz Ramos, Thomas F. Wenisch, and Ricardo Bianchini. 2011. MemScale: Active Low-Power Modes for Main Memory. In Proceedings of the Sixteenth International Conference on Architectural Support for Programming Languages and Operating Systems.

Digital Library

[19]

Alejandro Duran, Xavier Teruel, Roger Ferrer, Xavier Bofill, and Eduard Parra. 2009. Barcelona OpenMP Tasks Suite: A Set of Benchmarks Targeting the Exploitation of Task Parallelism in OpenMP. Proceedings of the International Conference on Parallel Processing (09 2009).

Digital Library

[20]

Alejandro Duran, Xavier Teruel, Roger Ferrer, Xavier Martorell, and Eduard Ayguade. 2009. Barcelona openmp tasks suite: A set of benchmarks targeting the exploitation of task parallelism in openmp. In 2009 international conference on parallel processing. IEEE, 124–131.

Digital Library

[21]

Mark Endrei, Chao Jin, Minh Ngoc Dinh, David Abramson, Heidi Poxon, Luiz DeRose, and Bronis R. de Supinski. 2018. Energy Efficiency Modeling of Parallel Applications. In SC18: International Conference for High Performance Computing, Networking, Storage and Analysis.

[22]

Matteo Frigo, Charles E. Leiserson, and Keith H. Randall. 1998. The Implementation of the Cilk-5 Multithreaded Language, In Proceedings of SIGPLAN 1998. SIGPLAN.

Digital Library

[23]

Bhavishya Goel. 2016. Measurement, Modeling, and Characterization for Energy-efficient Computing. Chalmers University of Technology.

[24]

Houzeaux Guillaume and Vazquez Mariano. [n.d.]. Alya Application. https://www.bsc.es/research-development/research-areas/engineering-simulations/alya-high-performance-computational.

[25]

Jawad Haj-Yahya, Mohammed Alser, Jeremie Kim, A. Giray Yağlıkçı, Nandita Vijaykumar, Efraim Rotem, and Onur Mutlu. 2020. SysScale: Exploiting Multi-domain Dynamic Voltage and Frequency Scaling for Energy Efficient Mobile Processors. In 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA).

[26]

Myeonggyun Han, Jinsu Park, and Woongki Baek. 2021. Design and Implementation of a Criticality- and Heterogeneity-Aware Runtime System for Task-Parallel Applications. IEEE TPDS (2021).

Digital Library

[27]

Sebastian Herbert and Diana Marculescu. 2007. Analysis of dynamic voltage/frequency scaling in chip-multiprocessors. In Proceedings of the 2007 international symposium on Low power electronics and design (ISLPED ’07).

Digital Library

[28]

Simon Holmbacka and Jörg Keller. 2017. Workload Type-Aware Scheduling on big.LITTLE Platforms. In Algorithms and Architectures for Parallel Processing.

[29]

Canturk Isci, Gilberto Contreras, and Margaret Martonosi. 2006. Live, Runtime Phase Monitoring and Prediction on Real Systems with Application to Dynamic Power Management. In 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’06).

Digital Library

[30]

Ivan Jibaja, Ting Cao, Stephen M. Blackburn, and Kathryn S. McKinley. 2016. Portable Performance on Asymmetric Multicore Processors. In Proceedings of the 2016 International Symposium on Code Generation and Optimization(CGO ’16).

Digital Library

[31]

Tipp Moseley, Neil Vachharajani, and William Jalby. 2011. Hardware Performance Monitoring for the Rest of Us: A Position and Survey. In Network and Parallel Computing.

[32]

Onur Mutlu, Saugata Ghose, Juan Gómez-Luna, and Rachata Ausavarungnirun. 2023. A modern primer on processing in memory. In Emerging Computing: From Devices to Systems. Springer, 171–243.

[33]

Antoni Navarro Muñoz, Arthur F. Lorenzon, Eduard Ayguadé Parra, and Vicenç Beltran Querol. 2021. Combining Dynamic Concurrency Throttling with Voltage and Frequency Scaling on Task-Based Programming Models. In 50th International Conference on Parallel Processing(ICPP 2021).

[34]

OpenMP Architecture Review Board. 2018. OpenMP Application Program Interface. Version 5.0.

[35]

Thomas Rauber and Gudula Rünger. 2018. A scheduling selection process for energy‐efficient task execution on DVFS processors. Concurrency and Computation: Practice and Experience 31 (10 2018).

[36]

Basireddy Karunakar Reddy, Amit Kumar Singh, Dwaipayan Biswas, Geoff V. Merrett, and Bashir M. Al-Hashimi. 2018. Inter-Cluster Thread-to-Core Mapping and DVFS on Heterogeneous Multi-Cores. IEEE Transactions on Multi-Scale Computing Systems 4, 3 (2018).

[37]

Haris Ribic and Yu Liu. 2016. AEQUITAS: Coordinated Energy Management Across Parallel Applications. In 2016 ACM International Conference on Supercomputing. 1–12.

Digital Library

[38]

Haris Ribic and Yu David Liu. 2014. Energy-Efficient Work-Stealing Language Runtimes. In Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems(ASPLOS ’14).

Digital Library

[39]

Efraim Rotem, Yuli Mandelblat, Vadim Basin, Eli Weissmann, Arik Gihon, Rajshree Chabukswar, Russ Fenger, and Monica Gupta. 2021. Alder Lake Architecture. In 2021 IEEE Hot Chips 33 Symposium (HCS).

[40]

Mark Sagi, Nguyen Anh Vu Doan, Martin Rapp, Thomas Wild, Jörg Henkel, and Andreas Herkersdorf. 2020. A Lightweight Nonlinear Methodology to Accurately Model Multicore Processor Power. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 39, 11 (2020).

[41]

Rishad A. Shafik, Anup Das, Sheng Yang, Geoff Merrett, and Bashir M. Al-Hashimi. 2015. Adaptive Energy Minimization of OpenMP Parallel Applications on Many-Core Systems. In Proceedings of the 6th Workshop on Parallel Programming and Run-Time Management Techniques for Many-Core Architectures(PARMA-DITAM ’15).

Digital Library

[42]

Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).

[43]

Vaibhav Sundriyal and Masha Sosonkina. 2016. Joint Frequency Scaling of Processor and DRAM. J. Supercomput. 72, 4 (apr 2016), 1549–1569.

Digital Library

[44]

Christopher Torng, Moyang Wang, and Christopher Batten. 2016. Asymmetry-Aware Work-Stealing Runtimes. In 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA). 40–52.

[45]

Xingfu Wu, Valerie Taylor, Jeanine Cook, and Philip J. Mucci. 2016. Using Performance-Power Modeling to Improve Energy Efficiency of HPC Applications. Computer 49, 10 (2016), 20–29.

Cited By

Chen JManivannan MGoel BPericàs M(2024)SWEEP: Adaptive Task Scheduling for Exploring Energy Performance Trade-offs2024 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS57955.2024.00036(325-336)Online publication date: 27-May-2024
https://doi.org/10.1109/IPDPS57955.2024.00036
Allaqband SNazish MAllaqband SBashir JBanday M(2024)An efficient machine learning based CPU scheduler for heterogeneous multicore processorsInternational Journal of Information Technology10.1007/s41870-024-01936-5Online publication date: 24-May-2024
https://doi.org/10.1007/s41870-024-01936-5
Lysenko SKachur A(2023)Challenges Towards VR Technology: VR Architecture Optimization2023 13th International Conference on Dependable Systems, Services and Technologies (DESSERT)10.1109/DESSERT61349.2023.10416538(1-9)Online publication date: 13-Oct-2023
https://doi.org/10.1109/DESSERT61349.2023.10416538

Index Terms

JOSS: Joint Exploration of CPU-Memory DVFS and Task Scheduling for Energy Efficiency
1. Computing methodologies
  1. Parallel computing methodologies
    1. Parallel algorithms

Recommendations

Energy-efficient task scheduling for multi-core platforms with per-core DVFS

Energy-efficient task scheduling is a fundamental issue in many application domains, such as energy conservation for mobile devices and the operation of green computing data centers. Modern processors support dynamic voltage and frequency scaling (DVFS) ...
An Energy-Efficient Task Scheduler for Multi-core Platforms with Per-core DVFS Based on Task Characteristics
BRACIS '14: Proceedings of the 2014 Brazilian Conference on Intelligent Systems

Energy-efficient task scheduling is a fundamental issue in many application domains, such as energy conservation for mobile devices and the operation of green computing data centers. Modern processors support dynamic voltage and frequency scaling (DVFS) ...
Energy-efficient scheduling of a real-time task on DVFS-enabled multi-cores
ICHIT '09: Proceedings of the 2009 International Conference on Hybrid Information Technology

We propose an energy-efficient scheduling of a long-lived real-time video task running on DVFS-enabled multi-core platforms. The proposed scheme reduces the energy consumption by executing the task in parallel on an appropriate number of cores with the ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

ICPP '23: Proceedings of the 52nd International Conference on Parallel Processing

August 2023

858 pages

ISBN:9798400708435

DOI:10.1145/3605573

Copyright © 2023 Owner/Author.

This work is licensed under a Creative Commons Attribution International 4.0 License.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 September 2023

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

European High-Performance Computing Joint Undertaking (JU)

Conference

ICPP 2023

ICPP 2023: 52nd International Conference on Parallel Processing

August 7 - 10, 2023

UT, Salt Lake City, USA

Acceptance Rates

Overall Acceptance Rate 91 of 313 submissions, 29%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
341
Total Downloads

Downloads (Last 12 months)332
Downloads (Last 6 weeks)56

Reflects downloads up to 04 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Chen JManivannan MGoel BPericàs M(2024)SWEEP: Adaptive Task Scheduling for Exploring Energy Performance Trade-offs2024 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS57955.2024.00036(325-336)Online publication date: 27-May-2024
https://doi.org/10.1109/IPDPS57955.2024.00036
Allaqband SNazish MAllaqband SBashir JBanday M(2024)An efficient machine learning based CPU scheduler for heterogeneous multicore processorsInternational Journal of Information Technology10.1007/s41870-024-01936-5Online publication date: 24-May-2024
https://doi.org/10.1007/s41870-024-01936-5
Lysenko SKachur A(2023)Challenges Towards VR Technology: VR Architecture Optimization2023 13th International Conference on Dependable Systems, Services and Technologies (DESSERT)10.1109/DESSERT61349.2023.10416538(1-9)Online publication date: 13-Oct-2023
https://doi.org/10.1109/DESSERT61349.2023.10416538

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents