Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article
Open access

Static and Dynamic Frequency Scaling on Multicore CPUs

Published: 28 December 2016 Publication History

Abstract

Dynamic Voltage and Frequency Scaling (DVFS) typically adapts CPU power consumption by modifying a processor’s operating frequency (and the associated voltage). Typical DVFS approaches include using default strategies such as running at the lowest or the highest frequency or reacting to the CPU’s runtime load to reduce or increase frequency based on the CPU usage. In this article, we argue that a compile-time approach to CPU frequency selection is achievable for affine program regions and can significantly outperform runtime-based approaches. We first propose a lightweight runtime approach that can exploit the properties of the power profile specific to a processor, outperforming classical Linux governors such as powersave or on-demand for computational kernels. We then demonstrate that, for affine kernels in the application, a purely compile-time approach to CPU frequency and core count selection is achievable, providing significant additional benefits over the runtime approach. Our framework relies on a one-time profiling of the target CPU, along with a compile-time categorization of loop-based code segments in the application. These are combined to determine at compile-time the frequency and the number of cores to use to execute each affine region to optimize energy or energy-delay product. Extensive evaluation on 60 benchmarks and 5 multi-core CPUs show that our approach systematically outperforms the powersave Linux governor while also improving overall performance.

References

[1]
Brian Austin and Nicholas J. Wright. 2014. Measurement and interpretation of micro-benchmark and application energy use on the cray XC30. In Proceedings of E2SC. 51--59.
[2]
Wenlei Bao. 2014. Power-Aware WCET Analysis. Master’s thesis. Ohio State University.
[3]
Wenlei Bao, Sriram Krishnamoorthy, Louis-Noël Pouchet, Fabrice Rastello, and P. Sadayappan. 2016. PolyCheck: Dynamic verification of iteration space transformations on affine programs. In Proceedings of POPL. ACM, 539--554.
[4]
Wenlei Bao, Sanket Tavarageri, Fusun Ozguner, and P. Sadayappan. 2014. PWCET: Power-aware worst case execution time analysis. In Proceedings of ICPPW. IEEE, 439--447.
[5]
Cedric Bastoul. 2004. Code generation in the polyhedral model is easier than you think. In Proceedings of PACT. 7--16.
[6]
U. Bondhugula, A. Hartono, J. Ramanujam, and P. Sadayappan. 2008. PLUTO: A practical and fully automatic polyhedral program optimization system. In Proceedings of PLDI.
[7]
Siddhartha Chatterjee, Erin Parker, Philip J. Hanlon, and Alvin R. Lebeck. 2001. Exact analysis of the cache behavior of nested loops. In Proceedings of PLDI. ACM, 286--297.
[8]
Karel De Vogeleer, Gerard Memmi, Pierre Jouvelot, and Fabien Coelho. 2014. The energy/frequency convexity rule: Modeling and experimental validation on mobile devices. In Parallel Processing and Applied Mathematics. Vol. 8384. Springer Berlin, 793--803.
[9]
Tahir Diop, Natalie Enright Jerger, and Jason Anderson. 2014. Power modeling for heterogeneous processors. In Proceedings of GPGPU.
[10]
Keith I. Farkas, Jason Flinn, Godmar Back, Dirk Grunwald, and Jennifer M. Anderson. 2000. Quantifying the energy consumption of a pocket computer and a Java virtual machine. ACM SIGMETRICS Performance Evaluation Review 28, 1 (2000), 252--263.
[11]
P. Feautrier. 1992. Some efficient solutions to the affine scheduling problem, part II: multidimensional time. International Journal of Parallel Programming 21, 6 (Dec. 1992), 389--420.
[12]
Jeanne Ferrante, Vivek Sarkar, and Wendy Thrash. 1991. On estimating and enhancing cache effectiveness. LCPC 589 (1991), 328--343.
[13]
M. Floyd, B. Brock, M. Ware, K. Rajamani, A. Drake, C. Lefurgy, and L. Pesantez. 2010. Harnessing the adaptive energy management features of the power7 chip. HOT Chips 2010 (2010).
[14]
Rong Ge, Xizhou Feng, Wu-chun Feng, and Kirk W. Cameron. 2007. CPU miser: A performance-directed, run-time system for power-aware clusters. In Proceedings of ICPP. 18--25.
[15]
Rong Ge, Ryan Vogt, Jahangir Majumder, Arif Alam, Martin Burtscher, and Ziliang Zong. 2013. Effects of dynamic voltage and frequency scaling on a K20 GPU. In Proceedings of ICPP. 826--833.
[16]
Somnath Ghosh, Margaret Martonosi, and Sharad Malik. 1999. Cache miss equations: A compiler framework for analyzing and tuning memory behavior. ACM Transactions on Programming Languages and Systems (TOPLAS) 21, 4 (1999), 703--746.
[17]
Sylvain Girbal, Nicolas Vasilache, Cédric Bastoul, Albert Cohen, David Parello, Marc Sigler, and Olivier Temam. 2006. Semi-automatic composition of loop transformations for deep parallelism and memory hierarchies. International Journal of Parallel Programming 34, 3 (2006).
[18]
Changwan Hong, Wenlei Bao, Albert Cohen, Sriram Krishnamoorthy, Louis-Noël Pouchet, Fabrice Rastello, J. Ramanujam, and P. Sadayappan. 2016. Effective padding of multidimensional arrays to avoid cache conflict misses. In Proceedings of PLDI. ACM, 129--144.
[19]
Chung-Hsing Hsu and Ulrich Kremer. 2003. The design, implementation, and evaluation of a compiler algorithm for CPU energy reduction. In Proceedings of PLDI. ACM, 38--48.
[20]
Intel. Intel Math Kernel Library (Intel MKL). https://software.intel.com/en-us/intel-mkl.
[21]
Intel. Intel Performance Counter Monitor. www.intel.com/software/pcm.
[22]
Alexandra Jimborean, Konstantinos Koukos, Vasileios Spiliopoulos, David Black-Schaffer, and Stefanos Kaxiras. 2014. Fix the code. Don’t tweak the hardware: A new compiler approach to voltage-frequency scaling. In Proceedings of CGO. ACM, 262.
[23]
Jian Li and Jose F. Martinez. 2006. Dynamic power-performance adaptation of parallel computation on chip multiprocessors. In Proceedings of HPCA. 77--87.
[24]
Jacob R. Lorch and Alan Jay Smith. 2001. Improving dynamic voltage scaling algorithms with PACE. In ACM SIGMETRICS Performance Evaluation Review, Vol. 29. ACM, 50--61.
[25]
John D. McCalpin. 1991-2007. STREAM: Sustainable Memory Bandwidth in High Performance Computers. Technical Report. University of Virginia, Charlottesville, Virginia. http://www.cs.virginia.edu/stream/ A continually updated technical report. Retrieved from http://www.cs.virginia.edu/stream/.
[26]
Xinxin Mei, Ling Sing Yung, Kaiyong Zhao, and Xiaowen Chu. 2013. A measurement study of GPU DVFS on energy conservation. In Proceedings of Workshop on Power-Aware Computing and Systems. 10.
[27]
Netlib. Netlib BLAS. Retrieved from http://www.netlib.org/blas/index.html.
[28]
OpenCV. OpenCV: Open Source Computer Vision Library. Retrieved from http://opencv.org.
[29]
PoCC, the Polyhedral Compiler Collection, version 1.3. Retrieved from http://pocc.sourceforge.net.
[30]
PolyBench/C 3.2. Retrieved from http://polybench.sourceforge.net.
[31]
Louis-Noël Pouchet, Peng Zhang, P. Sadayappan, and Jason Cong. 2013. Polyhedral-based data reuse optimization for configurable computing. In Proceedings of FPGA.
[32]
H. Saputra, M. Kandemir, and others. 2002. Energy-conscious compilation based on voltage scaling. In Proceedings of LCTES.
[33]
Vivek Sarkar. 1997. Automatic selection of high order transformations in the IBM XL Fortran compilers. IBM Journal of Research 8 Development 41, 3 (May 1997).
[34]
Markus Schordan, Pei-Hung Lin, Dan Quinlan, and Louis-Noel Pouchet. 2014. Verification of polyhedral optimizations with constant loop bounds in finite state space computations. In Leveraging Applications of Formal Methods, Verification and Validation. Specialized Techniques and Applications, Tiziana Margaria and Bernhard Steffen (Eds.). Lecture Notes in Computer Science, Vol. 8803. Springer Berlin Heidelberg, 493--508.
[35]
Kevin Skadron, Mircea R. Stan, Karthik Sankaranarayanan, Wei Huang, Sivakumar Velusamy, and David Tarjan. 2004. Temperature-aware microarchitecture: Modeling and implementation. ACM Transactions on Architecture and Code Optimization 1, 1 (March 2004), 94--125.
[36]
Sanket Tavarageri and P. Sadayappan. 2013. A compiler analysis to determine useful cache size for energy efficiency. In Proceedings of IPDPSW. IEEE, 923--930.
[37]
Sven Verdoolaege. 2010. isl: An integer set library for the polyhedral model. In Mathematical Software--ICMS 2010. Springer, 299--302.
[38]
Sven Verdoolaege, Gerda Janssens, and Maurice Bruynooghe. 2009. Equivalence checking of static affine programs using widening to handle recurrences. In Computer Aided Verification. Springer, 599--613.
[39]
S. Verdoolaege, R. Seghir, K. Beyls, V. Loechner, and M. Bruynooghe. 2007. Counting integer points in parametric polytopes using Barvinok’s rational functions. Algorithmica 48, 1 (June 2007), 37--66.
[40]
Daecheol You and K.-S. Chung. 2012. Dynamic voltage and frequency scaling framework for low-power embedded GPUs. Electronics Letters 48, 21 (2012), 1333--1334.
[41]
Tomofumi Yuki and Sanjay Rajopadhye. 2014. Folklore confirmed: Compiling for speed = compiling for energy. In Proceedings of LCPC. 169--184.

Cited By

View all
  • (2024)CPU Clock Rate Control Based on Method Invocation in Foreground Application in Android SmartphoneJournal of Information Processing10.2197/ipsjjip.32.27532(275-286)Online publication date: 2024
  • (2024)Energy-Aware Tile Size Selection for Affine Programs on GPUs2024 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)10.1109/CGO57630.2024.10444795(13-27)Online publication date: 2-Mar-2024
  • (2023)SYnergy: Fine-grained Energy-Efficient Heterogeneous Computing for Scalable Energy SavingProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3581784.3607055(1-13)Online publication date: 12-Nov-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Architecture and Code Optimization
ACM Transactions on Architecture and Code Optimization  Volume 13, Issue 4
December 2016
648 pages
ISSN:1544-3566
EISSN:1544-3973
DOI:10.1145/3012405
Issue’s Table of Contents
© 2016 Association for Computing Machinery. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of the United States government. As such, the United States Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 December 2016
Accepted: 01 October 2016
Revised: 01 September 2016
Received: 01 June 2016
Published in TACO Volume 13, Issue 4

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Affine Programs
  2. CPU Energy
  3. Static Analysis
  4. Voltage and Frequency Scaling

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

  • U.S. Department of Energys (DOE) Office of Science
  • U.S. National Science Foundation
  • Office of Advanced Scientific Computing Research
  • Pacific Northwest National Laboratory
  • Battelle for DOE

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)226
  • Downloads (Last 6 weeks)30
Reflects downloads up to 01 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2024)CPU Clock Rate Control Based on Method Invocation in Foreground Application in Android SmartphoneJournal of Information Processing10.2197/ipsjjip.32.27532(275-286)Online publication date: 2024
  • (2024)Energy-Aware Tile Size Selection for Affine Programs on GPUs2024 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)10.1109/CGO57630.2024.10444795(13-27)Online publication date: 2-Mar-2024
  • (2023)SYnergy: Fine-grained Energy-Efficient Heterogeneous Computing for Scalable Energy SavingProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3581784.3607055(1-13)Online publication date: 12-Nov-2023
  • (2023)Analysis of Acceleration Structure Parameters and Hybrid Autotuning for Ray TracingIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2021.311349929:2(1345-1356)Online publication date: 1-Feb-2023
  • (2023)Going Dark: A Software “Light Switch” for Internet Servers2023 IEEE 29th International Symposium on Local and Metropolitan Area Networks (LANMAN)10.1109/LANMAN58293.2023.10189419(1-6)Online publication date: 10-Jul-2023
  • (2023)Power optimization of a single-core processor using LSTM based encoder–decoder model for online DVFSSādhanā10.1007/s12046-023-02086-348:2Online publication date: 25-Mar-2023
  • (2022)Level-Crossing Sampling with Multiple Temporal Resolutions for Speech SignalsJournal of Circuits, Systems and Computers10.1142/S021812662250149331:08Online publication date: 11-Feb-2022
  • (2022)FOURST: A code generator for FFT-based fast stencil computations2022 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)10.1109/ISPASS55109.2022.00010(99-108)Online publication date: May-2022
  • (2022)Stack-based Method Invocation and Return Monitoring in ART for CPU Clock Rate Adjustment2022 IEEE International Conference on Consumer Electronics - Taiwan10.1109/ICCE-Taiwan55306.2022.9869064(563-564)Online publication date: 6-Jul-2022
  • (2022)CPU Usage Trends in Android Applications2022 IEEE International Conference on Big Data (Big Data)10.1109/BigData55660.2022.10020953(6730-6732)Online publication date: 17-Dec-2022
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media