Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3605573.3605586acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicppConference Proceedingsconference-collections
research-article
Open access

JOSS: Joint Exploration of CPU-Memory DVFS and Task Scheduling for Energy Efficiency

Published: 13 September 2023 Publication History

Abstract

Energy-efficient execution of task-based parallel applications is crucial as tasking is a widely supported feature in many parallel programming libraries and runtimes. Currently, state-of-the-art proposals primarily rely on leveraging core asymmetry and CPU DVFS. Additionally, these proposals mostly use heuristics and lack the ability to explore the trade-offs between energy usage and performance. However, our findings demonstrate that focusing solely on CPU energy consumption for energy-efficient scheduling while neglecting memory energy consumption leaves room for further energy savings. We propose JOSS, a runtime scheduling framework that leverages both CPU DVFS and memory DVFS in conjunction with core asymmetry and task characteristics to enable energy-efficient execution of task-based applications. JOSS also enables the exploration of energy and performance trade-offs by supporting user-defined performance constraints. JOSS uses a set of models to predict task execution time, CPU and memory power consumption, and then selects the configuration for the tunable knobs to achieve the desired energy performance trade-off. Our evaluation shows that JOSS achieves 21.2% energy reduction, on average, compared to the state-of-the-art. Moreover, we demonstrate that even in the absence of a memory DVFS knob, taking energy consumption of both CPU and memory into account achieves better energy savings compared to only accounting for CPU energy. Furthermore, JOSS is able to adapt scheduling to reduce energy consumption while satisfying the desired performance constraints.

References

[1]
2014. Documentation of StarPU. https://files.inria.fr/starpu/doc/starpu.pdf.
[2]
2015. ODROID XU4. https://magazine.odroid.com/wp-content/uploads/odroid-xu4-user-manual.pdf.
[3]
2017. Jetson TX2 Module. https://developer.nvidia.com/embedded/jetson-tx2.
[4]
2018. XiTAO Runtime. https://github.com/CHART-Team/xitao.git.
[5]
2019. DDR5/4/3/2: How Memory Density and Speed Increased with each Generation of DDR. https://blogs.synopsys.com/vip-central/2019/02/27/ddr5-4-3-2-how-memory-density-and-speed-increased-with-each-generation-of-ddr/.
[6]
2020. ARM BIG.LITTLE. https://www.arm.com/why-arm/technologies/big-little.
[7]
2020. Biomarker Discovery. https://legato-project.eu/use-cases/healthcare.
[8]
2022. Apple A16 Bionic. https://en.wikipedia.org/wiki/Apple_A16.
[9]
Bilge Acun, Kavitha Chandrasekar, and Laxmikant V. Kale. 2019. Fine-Grained Energy Efficiency Using Per-Core DVFS with an Adaptive Runtime System. In 2019 Tenth International Green and Sustainable Computing Conference (IGSC).
[10]
E. Castillo, M. Moreto, M. Casas, L. Alvarez, E. Vallejo, K. Chronaki, R. Badia, J. L. Bosque, R. Beivide, E. Ayguade, J. Labarta, and M. Valero. 2016. CATA: Criticality Aware Task Acceleration for Multicore Processors. In 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS).
[11]
Jing Chen, Madhavan Manivannan, Mustafa Abduljabbar, and Miquel Pericàs. 2022. ERASE: Energy Efficient Task Mapping and Resource Management for Work Stealing Runtimes. ACM Trans. Archit. Code Optim. (mar 2022).
[12]
Jing Chen, Madhavan Manivannan, Bhavishya Goel, Mustafa Abduljabbar, and Miquel Pericàs. 2022. STEER: Asymmetry-aware Energy Efficient Task Scheduler for Cluster-based Multicore Architectures. In 2022 IEEE 34th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD).
[13]
Gilberto Contreras and Margaret Martonosi. 2008. Characterizing and improving the performance of intel threading building blocks. In 2008 IEEE International Symposium on Workload Characterization. IEEE, 57–66.
[14]
AM Coutinho Demetrios, Daniele De Sensi, Arthur Francisco Lorenzon, Kyriakos Georgiou, Jose Nunez-Yanez, Kerstin Eder, and Samuel Xavier-de Souza. 2020. Performance and energy trade-offs for parallel applications on heterogeneous multi-processing systems. Energies 13, 9 (2020), 2409.
[15]
Sanjeev Das, Jan Werner, Manos Antonakakis, Michalis Polychronakis, and Fabian Monrose. 2019. SoK: The Challenges, Pitfalls, and Perils of Using Hardware Performance Counters for Security. In 2019 IEEE S&P.
[16]
Howard David, Chris Fallin, Eugene Gorbatov, Ulf R. Hanebutte, and Onur Mutlu. 2011. Memory Power Management via Dynamic Voltage/Frequency Scaling. In Proceedings of the 8th ACM International Conference on Autonomic Computing(ICAC ’11). 31–40.
[17]
Qingyuan Deng, David Meisner, Abhishek Bhattacharjee, Thomas F. Wenisch, and Ricardo Bianchini. 2012. CoScale: Coordinating CPU and Memory System DVFS in Server Systems. In 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.
[18]
Qingyuan Deng, David Meisner, Luiz Ramos, Thomas F. Wenisch, and Ricardo Bianchini. 2011. MemScale: Active Low-Power Modes for Main Memory. In Proceedings of the Sixteenth International Conference on Architectural Support for Programming Languages and Operating Systems.
[19]
Alejandro Duran, Xavier Teruel, Roger Ferrer, Xavier Bofill, and Eduard Parra. 2009. Barcelona OpenMP Tasks Suite: A Set of Benchmarks Targeting the Exploitation of Task Parallelism in OpenMP. Proceedings of the International Conference on Parallel Processing (09 2009).
[20]
Alejandro Duran, Xavier Teruel, Roger Ferrer, Xavier Martorell, and Eduard Ayguade. 2009. Barcelona openmp tasks suite: A set of benchmarks targeting the exploitation of task parallelism in openmp. In 2009 international conference on parallel processing. IEEE, 124–131.
[21]
Mark Endrei, Chao Jin, Minh Ngoc Dinh, David Abramson, Heidi Poxon, Luiz DeRose, and Bronis R. de Supinski. 2018. Energy Efficiency Modeling of Parallel Applications. In SC18: International Conference for High Performance Computing, Networking, Storage and Analysis.
[22]
Matteo Frigo, Charles E. Leiserson, and Keith H. Randall. 1998. The Implementation of the Cilk-5 Multithreaded Language, In Proceedings of SIGPLAN 1998. SIGPLAN.
[23]
Bhavishya Goel. 2016. Measurement, Modeling, and Characterization for Energy-efficient Computing. Chalmers University of Technology.
[24]
Houzeaux Guillaume and Vazquez Mariano. [n.d.]. Alya Application. https://www.bsc.es/research-development/research-areas/engineering-simulations/alya-high-performance-computational.
[25]
Jawad Haj-Yahya, Mohammed Alser, Jeremie Kim, A. Giray Yağlıkçı, Nandita Vijaykumar, Efraim Rotem, and Onur Mutlu. 2020. SysScale: Exploiting Multi-domain Dynamic Voltage and Frequency Scaling for Energy Efficient Mobile Processors. In 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA).
[26]
Myeonggyun Han, Jinsu Park, and Woongki Baek. 2021. Design and Implementation of a Criticality- and Heterogeneity-Aware Runtime System for Task-Parallel Applications. IEEE TPDS (2021).
[27]
Sebastian Herbert and Diana Marculescu. 2007. Analysis of dynamic voltage/frequency scaling in chip-multiprocessors. In Proceedings of the 2007 international symposium on Low power electronics and design (ISLPED ’07).
[28]
Simon Holmbacka and Jörg Keller. 2017. Workload Type-Aware Scheduling on big.LITTLE Platforms. In Algorithms and Architectures for Parallel Processing.
[29]
Canturk Isci, Gilberto Contreras, and Margaret Martonosi. 2006. Live, Runtime Phase Monitoring and Prediction on Real Systems with Application to Dynamic Power Management. In 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’06).
[30]
Ivan Jibaja, Ting Cao, Stephen M. Blackburn, and Kathryn S. McKinley. 2016. Portable Performance on Asymmetric Multicore Processors. In Proceedings of the 2016 International Symposium on Code Generation and Optimization(CGO ’16).
[31]
Tipp Moseley, Neil Vachharajani, and William Jalby. 2011. Hardware Performance Monitoring for the Rest of Us: A Position and Survey. In Network and Parallel Computing.
[32]
Onur Mutlu, Saugata Ghose, Juan Gómez-Luna, and Rachata Ausavarungnirun. 2023. A modern primer on processing in memory. In Emerging Computing: From Devices to Systems. Springer, 171–243.
[33]
Antoni Navarro Muñoz, Arthur F. Lorenzon, Eduard Ayguadé Parra, and Vicenç Beltran Querol. 2021. Combining Dynamic Concurrency Throttling with Voltage and Frequency Scaling on Task-Based Programming Models. In 50th International Conference on Parallel Processing(ICPP 2021).
[34]
OpenMP Architecture Review Board. 2018. OpenMP Application Program Interface. Version 5.0.
[35]
Thomas Rauber and Gudula Rünger. 2018. A scheduling selection process for energy‐efficient task execution on DVFS processors. Concurrency and Computation: Practice and Experience 31 (10 2018).
[36]
Basireddy Karunakar Reddy, Amit Kumar Singh, Dwaipayan Biswas, Geoff V. Merrett, and Bashir M. Al-Hashimi. 2018. Inter-Cluster Thread-to-Core Mapping and DVFS on Heterogeneous Multi-Cores. IEEE Transactions on Multi-Scale Computing Systems 4, 3 (2018).
[37]
Haris Ribic and Yu Liu. 2016. AEQUITAS: Coordinated Energy Management Across Parallel Applications. In 2016 ACM International Conference on Supercomputing. 1–12.
[38]
Haris Ribic and Yu David Liu. 2014. Energy-Efficient Work-Stealing Language Runtimes. In Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems(ASPLOS ’14).
[39]
Efraim Rotem, Yuli Mandelblat, Vadim Basin, Eli Weissmann, Arik Gihon, Rajshree Chabukswar, Russ Fenger, and Monica Gupta. 2021. Alder Lake Architecture. In 2021 IEEE Hot Chips 33 Symposium (HCS).
[40]
Mark Sagi, Nguyen Anh Vu Doan, Martin Rapp, Thomas Wild, Jörg Henkel, and Andreas Herkersdorf. 2020. A Lightweight Nonlinear Methodology to Accurately Model Multicore Processor Power. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 39, 11 (2020).
[41]
Rishad A. Shafik, Anup Das, Sheng Yang, Geoff Merrett, and Bashir M. Al-Hashimi. 2015. Adaptive Energy Minimization of OpenMP Parallel Applications on Many-Core Systems. In Proceedings of the 6th Workshop on Parallel Programming and Run-Time Management Techniques for Many-Core Architectures(PARMA-DITAM ’15).
[42]
Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).
[43]
Vaibhav Sundriyal and Masha Sosonkina. 2016. Joint Frequency Scaling of Processor and DRAM. J. Supercomput. 72, 4 (apr 2016), 1549–1569.
[44]
Christopher Torng, Moyang Wang, and Christopher Batten. 2016. Asymmetry-Aware Work-Stealing Runtimes. In 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA). 40–52.
[45]
Xingfu Wu, Valerie Taylor, Jeanine Cook, and Philip J. Mucci. 2016. Using Performance-Power Modeling to Improve Energy Efficiency of HPC Applications. Computer 49, 10 (2016), 20–29.

Cited By

View all
  • (2024)SWEEP: Adaptive Task Scheduling for Exploring Energy Performance Trade-offs2024 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS57955.2024.00036(325-336)Online publication date: 27-May-2024
  • (2024)An efficient machine learning based CPU scheduler for heterogeneous multicore processorsInternational Journal of Information Technology10.1007/s41870-024-01936-5Online publication date: 24-May-2024
  • (2023)Challenges Towards VR Technology: VR Architecture Optimization2023 13th International Conference on Dependable Systems, Services and Technologies (DESSERT)10.1109/DESSERT61349.2023.10416538(1-9)Online publication date: 13-Oct-2023

Index Terms

  1. JOSS: Joint Exploration of CPU-Memory DVFS and Task Scheduling for Energy Efficiency

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    ICPP '23: Proceedings of the 52nd International Conference on Parallel Processing
    August 2023
    858 pages
    ISBN:9798400708435
    DOI:10.1145/3605573
    This work is licensed under a Creative Commons Attribution International 4.0 License.

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 13 September 2023

    Check for updates

    Author Tags

    1. DVFS
    2. energy efficiency
    3. performance modeling
    4. power modeling
    5. task scheduling

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Funding Sources

    • European High-Performance Computing Joint Undertaking (JU)

    Conference

    ICPP 2023
    ICPP 2023: 52nd International Conference on Parallel Processing
    August 7 - 10, 2023
    UT, Salt Lake City, USA

    Acceptance Rates

    Overall Acceptance Rate 91 of 313 submissions, 29%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)332
    • Downloads (Last 6 weeks)56
    Reflects downloads up to 04 Oct 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)SWEEP: Adaptive Task Scheduling for Exploring Energy Performance Trade-offs2024 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS57955.2024.00036(325-336)Online publication date: 27-May-2024
    • (2024)An efficient machine learning based CPU scheduler for heterogeneous multicore processorsInternational Journal of Information Technology10.1007/s41870-024-01936-5Online publication date: 24-May-2024
    • (2023)Challenges Towards VR Technology: VR Architecture Optimization2023 13th International Conference on Dependable Systems, Services and Technologies (DESSERT)10.1109/DESSERT61349.2023.10416538(1-9)Online publication date: 13-Oct-2023

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media