research-article

Open access

Static and Dynamic Frequency Scaling on Multicore CPUs

Authors:

Sudheer Chunduri,

Sriram Krishnamoorthy,

Louis-Noël Pouchet,

Fabrice Rastello,

P. SadayappanAuthors Info & Claims

ACM Transactions on Architecture and Code Optimization (TACO), Volume 13, Issue 4

Article No.: 51, Pages 1 - 26

https://doi.org/10.1145/3011017

Published: 28 December 2016 Publication History

Abstract

Dynamic Voltage and Frequency Scaling (DVFS) typically adapts CPU power consumption by modifying a processor’s operating frequency (and the associated voltage). Typical DVFS approaches include using default strategies such as running at the lowest or the highest frequency or reacting to the CPU’s runtime load to reduce or increase frequency based on the CPU usage. In this article, we argue that a compile-time approach to CPU frequency selection is achievable for affine program regions and can significantly outperform runtime-based approaches. We first propose a lightweight runtime approach that can exploit the properties of the power profile specific to a processor, outperforming classical Linux governors such as powersave or on-demand for computational kernels. We then demonstrate that, for affine kernels in the application, a purely compile-time approach to CPU frequency and core count selection is achievable, providing significant additional benefits over the runtime approach. Our framework relies on a one-time profiling of the target CPU, along with a compile-time categorization of loop-based code segments in the application. These are combined to determine at compile-time the frequency and the number of cores to use to execute each affine region to optimize energy or energy-delay product. Extensive evaluation on 60 benchmarks and 5 multi-core CPUs show that our approach systematically outperforms the powersave Linux governor while also improving overall performance.

References

[1]

Brian Austin and Nicholas J. Wright. 2014. Measurement and interpretation of micro-benchmark and application energy use on the cray XC30. In Proceedings of E2SC. 51--59.

Digital Library

[2]

Wenlei Bao. 2014. Power-Aware WCET Analysis. Master’s thesis. Ohio State University.

[3]

Wenlei Bao, Sriram Krishnamoorthy, Louis-Noël Pouchet, Fabrice Rastello, and P. Sadayappan. 2016. PolyCheck: Dynamic verification of iteration space transformations on affine programs. In Proceedings of POPL. ACM, 539--554.

Digital Library

[4]

Wenlei Bao, Sanket Tavarageri, Fusun Ozguner, and P. Sadayappan. 2014. PWCET: Power-aware worst case execution time analysis. In Proceedings of ICPPW. IEEE, 439--447.

Digital Library

[5]

Cedric Bastoul. 2004. Code generation in the polyhedral model is easier than you think. In Proceedings of PACT. 7--16.

[6]

U. Bondhugula, A. Hartono, J. Ramanujam, and P. Sadayappan. 2008. PLUTO: A practical and fully automatic polyhedral program optimization system. In Proceedings of PLDI.

Digital Library

[7]

Siddhartha Chatterjee, Erin Parker, Philip J. Hanlon, and Alvin R. Lebeck. 2001. Exact analysis of the cache behavior of nested loops. In Proceedings of PLDI. ACM, 286--297.

Digital Library

[8]

Karel De Vogeleer, Gerard Memmi, Pierre Jouvelot, and Fabien Coelho. 2014. The energy/frequency convexity rule: Modeling and experimental validation on mobile devices. In Parallel Processing and Applied Mathematics. Vol. 8384. Springer Berlin, 793--803.

[9]

Tahir Diop, Natalie Enright Jerger, and Jason Anderson. 2014. Power modeling for heterogeneous processors. In Proceedings of GPGPU.

[10]

Keith I. Farkas, Jason Flinn, Godmar Back, Dirk Grunwald, and Jennifer M. Anderson. 2000. Quantifying the energy consumption of a pocket computer and a Java virtual machine. ACM SIGMETRICS Performance Evaluation Review 28, 1 (2000), 252--263.

Digital Library

[11]

P. Feautrier. 1992. Some efficient solutions to the affine scheduling problem, part II: multidimensional time. International Journal of Parallel Programming 21, 6 (Dec. 1992), 389--420.

[12]

Jeanne Ferrante, Vivek Sarkar, and Wendy Thrash. 1991. On estimating and enhancing cache effectiveness. LCPC 589 (1991), 328--343.

[13]

M. Floyd, B. Brock, M. Ware, K. Rajamani, A. Drake, C. Lefurgy, and L. Pesantez. 2010. Harnessing the adaptive energy management features of the power7 chip. HOT Chips 2010 (2010).

[14]

Rong Ge, Xizhou Feng, Wu-chun Feng, and Kirk W. Cameron. 2007. CPU miser: A performance-directed, run-time system for power-aware clusters. In Proceedings of ICPP. 18--25.

Digital Library

[15]

Rong Ge, Ryan Vogt, Jahangir Majumder, Arif Alam, Martin Burtscher, and Ziliang Zong. 2013. Effects of dynamic voltage and frequency scaling on a K20 GPU. In Proceedings of ICPP. 826--833.

Digital Library

[16]

Somnath Ghosh, Margaret Martonosi, and Sharad Malik. 1999. Cache miss equations: A compiler framework for analyzing and tuning memory behavior. ACM Transactions on Programming Languages and Systems (TOPLAS) 21, 4 (1999), 703--746.

Digital Library

[17]

Sylvain Girbal, Nicolas Vasilache, Cédric Bastoul, Albert Cohen, David Parello, Marc Sigler, and Olivier Temam. 2006. Semi-automatic composition of loop transformations for deep parallelism and memory hierarchies. International Journal of Parallel Programming 34, 3 (2006).

Digital Library

[18]

Changwan Hong, Wenlei Bao, Albert Cohen, Sriram Krishnamoorthy, Louis-Noël Pouchet, Fabrice Rastello, J. Ramanujam, and P. Sadayappan. 2016. Effective padding of multidimensional arrays to avoid cache conflict misses. In Proceedings of PLDI. ACM, 129--144.

Digital Library

[19]

Chung-Hsing Hsu and Ulrich Kremer. 2003. The design, implementation, and evaluation of a compiler algorithm for CPU energy reduction. In Proceedings of PLDI. ACM, 38--48.

Digital Library

[20]

Intel. Intel Math Kernel Library (Intel MKL). https://software.intel.com/en-us/intel-mkl.

[21]

Intel. Intel Performance Counter Monitor. www.intel.com/software/pcm.

[22]

Alexandra Jimborean, Konstantinos Koukos, Vasileios Spiliopoulos, David Black-Schaffer, and Stefanos Kaxiras. 2014. Fix the code. Don’t tweak the hardware: A new compiler approach to voltage-frequency scaling. In Proceedings of CGO. ACM, 262.

Digital Library

[23]

Jian Li and Jose F. Martinez. 2006. Dynamic power-performance adaptation of parallel computation on chip multiprocessors. In Proceedings of HPCA. 77--87.

[24]

Jacob R. Lorch and Alan Jay Smith. 2001. Improving dynamic voltage scaling algorithms with PACE. In ACM SIGMETRICS Performance Evaluation Review, Vol. 29. ACM, 50--61.

Digital Library

[25]

John D. McCalpin. 1991-2007. STREAM: Sustainable Memory Bandwidth in High Performance Computers. Technical Report. University of Virginia, Charlottesville, Virginia. http://www.cs.virginia.edu/stream/ A continually updated technical report. Retrieved from http://www.cs.virginia.edu/stream/.

[26]

Xinxin Mei, Ling Sing Yung, Kaiyong Zhao, and Xiaowen Chu. 2013. A measurement study of GPU DVFS on energy conservation. In Proceedings of Workshop on Power-Aware Computing and Systems. 10.

Digital Library

[27]

Netlib. Netlib BLAS. Retrieved from http://www.netlib.org/blas/index.html.

[28]

OpenCV. OpenCV: Open Source Computer Vision Library. Retrieved from http://opencv.org.

[29]

PoCC, the Polyhedral Compiler Collection, version 1.3. Retrieved from http://pocc.sourceforge.net.

[30]

PolyBench/C 3.2. Retrieved from http://polybench.sourceforge.net.

[31]

Louis-Noël Pouchet, Peng Zhang, P. Sadayappan, and Jason Cong. 2013. Polyhedral-based data reuse optimization for configurable computing. In Proceedings of FPGA.

Digital Library

[32]

H. Saputra, M. Kandemir, and others. 2002. Energy-conscious compilation based on voltage scaling. In Proceedings of LCTES.

Digital Library

[33]

Vivek Sarkar. 1997. Automatic selection of high order transformations in the IBM XL Fortran compilers. IBM Journal of Research 8 Development 41, 3 (May 1997).

[34]

Markus Schordan, Pei-Hung Lin, Dan Quinlan, and Louis-Noel Pouchet. 2014. Verification of polyhedral optimizations with constant loop bounds in finite state space computations. In Leveraging Applications of Formal Methods, Verification and Validation. Specialized Techniques and Applications, Tiziana Margaria and Bernhard Steffen (Eds.). Lecture Notes in Computer Science, Vol. 8803. Springer Berlin Heidelberg, 493--508.

[35]

Kevin Skadron, Mircea R. Stan, Karthik Sankaranarayanan, Wei Huang, Sivakumar Velusamy, and David Tarjan. 2004. Temperature-aware microarchitecture: Modeling and implementation. ACM Transactions on Architecture and Code Optimization 1, 1 (March 2004), 94--125.

Digital Library

[36]

Sanket Tavarageri and P. Sadayappan. 2013. A compiler analysis to determine useful cache size for energy efficiency. In Proceedings of IPDPSW. IEEE, 923--930.

Digital Library

[37]

Sven Verdoolaege. 2010. isl: An integer set library for the polyhedral model. In Mathematical Software--ICMS 2010. Springer, 299--302.

[38]

Sven Verdoolaege, Gerda Janssens, and Maurice Bruynooghe. 2009. Equivalence checking of static affine programs using widening to handle recurrences. In Computer Aided Verification. Springer, 599--613.

Digital Library

[39]

S. Verdoolaege, R. Seghir, K. Beyls, V. Loechner, and M. Bruynooghe. 2007. Counting integer points in parametric polytopes using Barvinok’s rational functions. Algorithmica 48, 1 (June 2007), 37--66.

Digital Library

[40]

Daecheol You and K.-S. Chung. 2012. Dynamic voltage and frequency scaling framework for low-power embedded GPUs. Electronics Letters 48, 21 (2012), 1333--1334.

[41]

Tomofumi Yuki and Sanjay Rajopadhye. 2014. Folklore confirmed: Compiling for speed = compiling for energy. In Proceedings of LCPC. 169--184.

Cited By

Kumakura KKamiyama TOguchi MYamaguchi S(2024)CPU Clock Rate Control Based on Method Invocation in Foreground Application in Android SmartphoneJournal of Information Processing10.2197/ipsjjip.32.27532(275-286)Online publication date: 2024
https://doi.org/10.2197/ipsjjip.32.275
Jayaweera MKong MWang YKaeli D(2024)Energy-Aware Tile Size Selection for Affine Programs on GPUs2024 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)10.1109/CGO57630.2024.10444795(13-27)Online publication date: 2-Mar-2024
https://doi.org/10.1109/CGO57630.2024.10444795
Fan KD'Antonio MCarpentieri LCosenza BFicarelli FCesarini DMohror KArnold DBadia R(2023)SYnergy: Fine-grained Energy-Efficient Heterogeneous Computing for Scalable Energy SavingProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3581784.3607055(1-13)Online publication date: 12-Nov-2023
https://dl.acm.org/doi/10.1145/3581784.3607055
Show More Cited By

Index Terms

Static and Dynamic Frequency Scaling on Multicore CPUs
1. Hardware
  1. Power and energy
    1. Power estimation and optimization
      1. Chip-level power issues
2. Software and its engineering
  1. Software notations and tools
    1. Compilers
  2. Software organization and properties
    1. Contextual software domains
      1. Operating systems
        Process management
        Power management

Recommendations

Voltage scaling and dark silicon in symmetric multicore processors

As technology scales further, multicore and many-core processors emerge as an alternative to keep up with performance demands. However, because of power and thermal constraints, we are obliged to power off remarkable area of chip. Many innovative ...
Optimizing total power of many-core processors considering voltage scaling limit and process variations
ISLPED '09: Proceedings of the 2009 ACM/IEEE international symposium on Low power electronics and design

Recently, processor manufacturers have integrated more than a hundred cores in a single die to deliver extremely high throughput for highly-parallel, data-intensive applications like physics simulations, 3D-graphics, etc. Meanwhile, excessive power ...
Parallelism via Multithreaded and Multicore CPUs

Multicore and multithreaded CPUs have become the new approach to obtaining increases in CPU performance. Numeric applications mostly benefit from a large number of computationally powerful cores. Servers typically benefit more if chip circuitry is used ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Architecture and Code Optimization

ACM Transactions on Architecture and Code Optimization Volume 13, Issue 4

December 2016

648 pages

ISSN:1544-3566

EISSN:1544-3973

DOI:10.1145/3012405

Editor:
Koen De Bosschere
Ghent University

Issue’s Table of Contents

Copyright © 2016 ACM.

© 2016 Association for Computing Machinery. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of the United States government. As such, the United States Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 December 2016

Accepted: 01 October 2016

Revised: 01 September 2016

Received: 01 June 2016

Published in TACO Volume 13, Issue 4

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

U.S. Department of Energys (DOE) Office of Science
U.S. National Science Foundation
Office of Advanced Scientific Computing Research
Pacific Northwest National Laboratory
Battelle for DOE

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

31
Total Citations
View Citations
1,271
Total Downloads

Downloads (Last 12 months)226
Downloads (Last 6 weeks)30

Reflects downloads up to 01 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Kumakura KKamiyama TOguchi MYamaguchi S(2024)CPU Clock Rate Control Based on Method Invocation in Foreground Application in Android SmartphoneJournal of Information Processing10.2197/ipsjjip.32.27532(275-286)Online publication date: 2024
https://doi.org/10.2197/ipsjjip.32.275
Jayaweera MKong MWang YKaeli D(2024)Energy-Aware Tile Size Selection for Affine Programs on GPUs2024 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)10.1109/CGO57630.2024.10444795(13-27)Online publication date: 2-Mar-2024
https://doi.org/10.1109/CGO57630.2024.10444795
Fan KD'Antonio MCarpentieri LCosenza BFicarelli FCesarini DMohror KArnold DBadia R(2023)SYnergy: Fine-grained Energy-Efficient Heterogeneous Computing for Scalable Energy SavingProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3581784.3607055(1-13)Online publication date: 12-Nov-2023
https://dl.acm.org/doi/10.1145/3581784.3607055
Herveau KPfaffe PTillmann MTichy WDachsbacher C(2023)Analysis of Acceleration Structure Parameters and Hybrid Autotuning for Ray TracingIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2021.311349929:2(1345-1356)Online publication date: 1-Feb-2023
https://doi.org/10.1109/TVCG.2021.3113499
Ciko KWelzl MTeymoori P(2023)Going Dark: A Software “Light Switch” for Internet Servers2023 IEEE 29th International Symposium on Local and Metropolitan Area Networks (LANMAN)10.1109/LANMAN58293.2023.10189419(1-6)Online publication date: 10-Jul-2023
https://doi.org/10.1109/LANMAN58293.2023.10189419
Thethi SKumar R(2023)Power optimization of a single-core processor using LSTM based encoder–decoder model for online DVFSSādhanā10.1007/s12046-023-02086-348:2Online publication date: 25-Mar-2023
https://doi.org/10.1007/s12046-023-02086-3
Premanand BSheeba V(2022)Level-Crossing Sampling with Multiple Temporal Resolutions for Speech SignalsJournal of Circuits, Systems and Computers10.1142/S021812662250149331:08Online publication date: 11-Feb-2022
https://doi.org/10.1142/S0218126622501493
Ahmad ZJavanmard MCroisdale GGregory AGanapathi PPouchet LChowdhury R(2022)FOURST: A code generator for FFT-based fast stencil computations2022 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)10.1109/ISPASS55109.2022.00010(99-108)Online publication date: May-2022
https://doi.org/10.1109/ISPASS55109.2022.00010
Kumakura KKamiyama TOguchi MYamaguchi S(2022)Stack-based Method Invocation and Return Monitoring in ART for CPU Clock Rate Adjustment2022 IEEE International Conference on Consumer Electronics - Taiwan10.1109/ICCE-Taiwan55306.2022.9869064(563-564)Online publication date: 6-Jul-2022
https://doi.org/10.1109/ICCE-Taiwan55306.2022.9869064
Kumakura KOguchi MKamiyama TYamaguchi S(2022)CPU Usage Trends in Android Applications2022 IEEE International Conference on Big Data (Big Data)10.1109/BigData55660.2022.10020953(6730-6732)Online publication date: 17-Dec-2022
https://doi.org/10.1109/BigData55660.2022.10020953
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Media

Figures

Other

Tables

View Issue’s Table of Contents