research-article

Post-silicon CPU adaptation made practical using machine learning

Authors:

Stephen J. Tarsa,

Rangeen Basu Roy Chowdhury,

Gautham Chinya,

Karthik Sankaranarayanan,

Robert Chappell,

Ronak Singhal, and

Hong WangAuthors Info & Claims

ISCA '19: Proceedings of the 46th International Symposium on Computer Architecture

June 2019

Pages 14 - 26

https://doi.org/10.1145/3307650.3322267

Published: 22 June 2019 Publication History

Abstract

Processors that adapt architecture to workloads at runtime promise compelling performance per watt (PPW) gains, offering one way to mitigate diminishing returns from pipeline scaling. State-of-the-art adaptive CPUs deploy machine learning (ML) models on-chip to optimize hardware by recognizing workload patterns in event counter data. However, despite breakthrough PPW gains, such designs are not yet widely adopted due to the potential for systematic adaptation errors in the field.

This paper presents an adaptive CPU based on Intel SkyLake that (1) closes the loop to deployment, and (2) provides a novel mechanism for post-silicon customization. Our CPU performs predictive cluster gating, dynamically setting the issue width of a clustered architecture while clock-gating unused resources. Gating decisions are driven by ML adaptation models that execute on an existing microcontroller, minimizing design complexity and allowing performance characteristics to be adjusted with the ease of a firmware update. Crucially, we show that although adaptation models can suffer from statistical blindspots that risk degrading performance on new workloads, these can be reduced to minimal impact with careful design and training.

Our adaptive CPU improves PPW by 31.4% over a comparable non-adaptive CPU on SPEC2017, and exhibits two orders of magnitude fewer Service Level Agreement (SLA) violations than the state-of-the-art. We show how to optimize PPW using models trained to different SLAs or to specific applications, e.g. to improve datacenter hardware in situ. The resulting CPU meets real world deployment criteria for the first time and provides a new means to tailor hardware to individual customers, even as their needs change.

References

[1]

{n. d.}. Dell PowerEdge Updates Best Practices Guide. https://tinyurl.com/y7ce6758.

[2]

{n. d.}. Hewlett-Packard Enterprise SUM Best Practices Implementation Guide. https://tinyurl.com/y85tp8a6.

[3]

Joshua Attenberg, Panos Ipeirotis, and Foster Provost. 2015. Beat the machine: Challenging humans to find a predictive model's âÄIJunknown unknownsâĂİ. Journal of Data and Information Quality (JDIQ) (2015).

Digital Library

[4]

R Iris Bahar and Srilatha Manne. 2001. Power and energy reduction via pipeline balancing. In ISCA. IEEE.

Digital Library

[5]

Rajeev Balasubramonian, Sandhya Dwarkadas, and David H Albonesi. 2003. Dynamically managing the communication-parallelism trade-off in future clustered processors. In ISCA. IEEE.

Digital Library

[6]

Amirali Baniasadi and Andreas Moshovos. 2000. Instruction distribution heuristics for quad-cluster, dynamic ally-scheduled, superscalar processors. In MICRO. IEEE.

Digital Library

[7]

Luiz André Barroso, Jimmy Clidaras, and Urs Hölzle. 2013. The datacenter as a computer: An introduction to the design of warehouse-scale machines. Synthesis lectures on computer architecture (2013).

Digital Library

[8]

Nathan Beckmann and Daniel Sanchez. 2017. Maximizing cache Perf. under uncertainty. In HPCA. IEEE.

[9]

Ramazan Bitirgen, Engin Ipek, and Jose F Martinez. 2008. Coordinated management of multiple interacting resources in chip multiprocessors: A machine learning approach. In MICRO. IEEE Computer Society.

Digital Library

[10]

Ramon Canal, J-M Parcerisa, and Antonio Gonzalez. 1999. A cost-effective clustered architecture. In Parallel Architectures and Compilation Techniques. IEEE.

Digital Library

[11]

Ramon Canal, Joan-Manuel Parcerisa, and Antonio González. 2000. Dynamic cluster assignment mechanisms. In HPCA. IEEE.

[12]

Pedro Chaparro, Jose Gonzalez, and Antonio Gonzalez. 2004. Thermal-aware clustered microarchitectures. In Intrnl. Conf on Computer Design: VLSI in Computers and Processors. IEEE.

Digital Library

[13]

Rangeen Basu Roy Chowdhury, Anil K Kannepalli, Sungkwan Ku, and Eric Rotenberg. 2016. Anycore: A synthesizable rtl model for exploring and fabricating adaptive superscalar cores. In ISPASS. IEEE.

[14]

Ryan Collett and Dorian Pyle. 2013. What Happens When Chip-Design Complexity Outpaces Productivity?

[15]

Christophe Dubach, Timothy M Jones, Edwin V Bonilla, and Michael FP O'Boyle. 2010. A predictive model for dynamic microarchitectural adaptivity control. In MICRO. IEEE Computer Society.

Digital Library

[16]

Stijn Eyerman, Lieven Eeckhout, Tejas Karkhanis, and James E Smith. 2006. A Perf. counter architecture for computing accurate CPI components. In SIGOPS Operating Systems Review. ACM.

Digital Library

[17]

Keith I Farkas, Paul Chow, Norman P Jouppi, and Zvonko Vranesic. 1997. The multicluster architecture: Reducing cycle time through partitioning. In MICRO. IEEE Computer Society.

Digital Library

[18]

Soraya Ghiasi, Jason Casmira, and Dirk Grunwald. 2000. Using IPC variation in workloads with externally specified rates to reduce power consumption. In In Workshop on Complexity Effective Design.

[19]

Linley Gwennap et al. 1996. Digital 21264 sets new standard. Microprocessor report (1996).

[20]

Jawad Haj-Yihia, Ahmad Yasin, Yosi Ben Asher, and Avi Mendelson. 2016. Fine-grain power breakdown of modern out-of-order cores and its implications on skylake-based systems. Trans. on Arch. and Code Opt. (TACO) (2016).

Digital Library

[21]

Zhigang Hu, Alper Buyuktosunoglu, Viji Srinivasan, Victor Zyuban, Hans Jacobson, and Pradip Bose. 2004. Microarchitectural techniques for power gating of execution units. In Symposium on Low power electronics and design. ACM.

Digital Library

[22]

Michael C Huang, Jose Renau, and Josep Torrellas. 2003. Positional adaptation of processors: application to energy reduction. In ISCA. IEEE.

Digital Library

[23]

Engin Ipek, Onur Mutlu, José F Martínez, and Rich Caruana. 2008. Self-optimizing memory controllers: A reinforcement learning approach. In SIGARCH Computer Architecture News. IEEE Computer Society.

Digital Library

[24]

Richard E Kessler. 1999. The alpha 21264 microprocessor. (1999).

[25]

Wonyoung Kim, Meeta S Gupta, Gu-Yeon Wei, and David Brooks. 2008. System level analysis of fast, per-core DVFS using on-chip switching regulators. In HPCA, 2008. IEEE.

[26]

Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).

[27]

Christos Kozyrakis and David Patterson. 2003. Overcoming the limitations of conventional vector processors. ACM SIGARCH Computer Architecture News (2003).

Digital Library

[28]

Himabindu Lakkaraju, Ece Kamar, Rich Caruana, and Eric Horvitz. 2017. Identifying Unknown Unknowns in the Open World: Representations and Policies for Guided Exploration. In AAAI.

Digital Library

[29]

Hai Li, Swarup Bhunia, Yiran Chen, TN Vijaykumar, and Kaushik Roy. 2003. Deterministic clock gating for microprocessor power reduction. In HPCA. IEEE.

Digital Library

[30]

Andrew Lukefahr, Shruti Padmanabha, Reetuparna Das, Faissal M Sleiman, Ronald Dreslinski, Thomas F Wenisch, and Scott Mahlke. 2012. Composite cores: Pushing heterogeneity into a core. In MICRO. IEEE Computer Society.

Digital Library

[31]

Peter Macken, Marc Degrauwe, Mark Van Paemel, and Henri Oguey. 1990. A voltage reduction technique for digital systems. In Solid-State Circuits Conf. IEEE.

[32]

Diana Marculescu. 2000. On the use of microarchitecture-driven dynamic voltage scaling. In Workshop on Complexity-Effective Design.

[33]

Cade Metz. 2012. Ultimate Silicon Valley Perk: Custom Chips from Intel and AMD. Available at https://www.wired.com/2012/09/intel-amd-custom-chips/.

[34]

Shruti Padmanabha, Andrew Lukefahr, Reetuparna Das, and Scott Mahlke. 2013. Trace based phase prediction for tightly-coupled heterogeneous cores. In MICRO. ACM.

Digital Library

[35]

S Palacharla, NP Jouppi, and JE Smith. 1997. Complexity-Effective Superscalar Processors. In ISCA. IEEE.

Digital Library

[36]

F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. 2011. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research (2011).

Digital Library

[37]

Pietro Perona and William Freeman. 1998. A factorization approach to grouping. In European Conf on Computer Vision. Springer.

[38]

Paula Petrica, Adam M Izraelevitz, David H Albonesi, and Christine A Shoemaker. 2013. Flicker: A dynamically adaptive architecture for power limited multicore systems. In SIGARCH computer architecture news. ACM.

Digital Library

[39]

Nicolas Pinto, David Doukhan, James J DiCarlo, and David D Cox. 2009. A high-throughput screening approach to discovering good forms of biologically inspired visual representation. PLoS computational biology (2009).

[40]

Dmitry Ponomarev, Gurhan Kucuk, and Kanad Ghose. 2006. Dynamic resizing of superscalar datapath components for energy efficiency. IEEE Trans. on Computers (2006).

Digital Library

[41]

Gokul Subramanian Ravi and Mikko H Lipasti. 2017. CHARSTAR:Clock hierarchy aware resource scaling in tiled architectures. SIGARCH Computer Architecture News (2017).

Digital Library

[42]

Tajana Simunic, Luca Benini, Andrea Acquaviva, Peter Glynn, and Giovanni De Micheli. 2001. Dynamic voltage scaling and power management for portable systems. In Design Automation Conf. ACM.

Digital Library

[43]

Leendert van Dorn. 2017. Enabling cloud workloads through innovations in Silicon. Available at https://azure.microsoft.com/en-us/blog/.

[44]

Augusto Vega, Alper Buyuktosunoglu, Heather Hanson, Pradip Bose, and Srinivasan Ramani. 2013. Crank it up or dial it down: coordinated multiprocessor frequency and folding control. In MICRO. IEEE.

Digital Library

[45]

Yair Weiss. 1999. Segmentation using eigenvectors: a unifying view. In Intrnl. Conf on Computer vision. IEEE.

Digital Library

[46]

Qiang Wu, Philo Juang, Margaret Martonosi, and Douglas W Clark. 2005. Voltage and frequency control with adaptive reaction time in multiple-clock-domain processors. In HPCA, 2005. IEEE.

Digital Library

[47]

Hanan Youssef, Sami Iqram, and Scott Van Woudenberg. 2017. Compute Engine updates bring Skylake GA, extended memory and more VM flexibility. Available at https://cloud.google.com/blog/products/.

[48]

Jason Zander. 2015. Building the Intelligent Cloud: Announcing New Azure Innovations to Transform Business. Available at https://azure.microsoft.com/en-us/blog/.

Cited By

Weston KMahmud FJanfaza VMuzahid A(2023)SmartIndex: Learning to Index Caches to Improve PerformanceIEEE Computer Architecture Letters10.1109/LCA.2023.326447822:1(33-36)Online publication date: 1-Jan-2023
https://dl.acm.org/doi/10.1109/LCA.2023.3264478
Weston KJanfaza VTaur AMungra DKansal AZahran MMuzahid A(2023)Post-Silicon Customization Using Deep Neural NetworksArchitecture of Computing Systems10.1007/978-3-031-42785-5_9(120-136)Online publication date: 13-Jun-2023
https://dl.acm.org/doi/10.1007/978-3-031-42785-5_9
Farooq MAkram ZAlvi AOmer U(2022)Role of Logistic Regression in Malware Detection: A Systematic Literature ReviewVFAST Transactions on Software Engineering10.21015/vtse.v10i2.96310:2(36-46)Online publication date: 15-May-2022
https://doi.org/10.21015/vtse.v10i2.963
Show More Cited By

Index Terms

Post-silicon CPU adaptation made practical using machine learning
1. Computer systems organization
  1. Architectures
    1. Serial architectures
      1. Superscalar architectures
2. Computing methodologies
  1. Artificial intelligence
    1. Control methods
      1. Computational control theory
  2. Machine learning
    1. Learning settings
      1. Batch learning

Recommendations

Multi-stage programming for GPUs in C++ using PACXX
GPGPU '16: Proceedings of the 9th Annual Workshop on General Purpose Processing using Graphics Processing Unit

Writing and optimizing programs for high performance on systems with Graphics Processing Units (GPUs) remains a challenging task even for expert programmers. A promising optimization technique is multi-stage programming -- evaluating parts of the ...
Read More
Predicting GPU Performance from CPU Runs Using Machine Learning
SBAC-PAD '14: Proceedings of the 2014 IEEE 26th International Symposium on Computer Architecture and High Performance Computing

Graphics processing units (GPUs) can deliver considerable performance gains over general purpose processors. However, GPU performance improvement vary considerably across applications. Porting applications to GPUs by rewriting code with GPU-specific ...
Read More
Cross-architecture performance prediction (XAPP) using CPU code to predict GPU performance
MICRO-48: Proceedings of the 48th International Symposium on Microarchitecture

GPUs have become prevalent and more general purpose, but GPU programming remains challenging and time consuming for the majority of programmers. In addition, it is not always clear which codes will benefit from getting ported to GPU. Therefore, having a ...
Read More

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ISCA '19: Proceedings of the 46th International Symposium on Computer Architecture

June 2019

849 pages

ISBN:9781450366694

DOI:10.1145/3307650

General Chair:
Srilatha (Bobbie) Manne
Microsoft
,
Program Chairs:
Hillery Hunter
IBM
,
Erik Altman
IBM Research

Copyright © 2019 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGARCH: ACM Special Interest Group on Computer Architecture

In-Cooperation

IEEE-CS\DATC: IEEE Computer Society

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 June 2019

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

ISCA '19

Sponsor:

SIGARCH

ISCA '19: The 46th Annual International Symposium on Computer Architecture

June 22 - 26, 2019

Arizona, Phoenix

Acceptance Rates

ISCA '19 Paper Acceptance Rate 62 of 365 submissions, 17%;

Overall Acceptance Rate 543 of 3,203 submissions, 17%

Upcoming Conference

ISCA '25

Sponsor:
sigarch

The 52nd Annual International Symposium on Computer Architecture

June 21 - 25, 2025

Tokyo , Japan

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

9
Total Citations
View Citations
1,425
Total Downloads

Downloads (Last 12 months)46
Downloads (Last 6 weeks)4

Other Metrics

View Author Metrics

Citations

Cited By

Weston KMahmud FJanfaza VMuzahid A(2023)SmartIndex: Learning to Index Caches to Improve PerformanceIEEE Computer Architecture Letters10.1109/LCA.2023.326447822:1(33-36)Online publication date: 1-Jan-2023
https://dl.acm.org/doi/10.1109/LCA.2023.3264478
Weston KJanfaza VTaur AMungra DKansal AZahran MMuzahid A(2023)Post-Silicon Customization Using Deep Neural NetworksArchitecture of Computing Systems10.1007/978-3-031-42785-5_9(120-136)Online publication date: 13-Jun-2023
https://dl.acm.org/doi/10.1007/978-3-031-42785-5_9
Farooq MAkram ZAlvi AOmer U(2022)Role of Logistic Regression in Malware Detection: A Systematic Literature ReviewVFAST Transactions on Software Engineering10.21015/vtse.v10i2.96310:2(36-46)Online publication date: 15-May-2022
https://doi.org/10.21015/vtse.v10i2.963
Wu NXie Y(2022)A Survey of Machine Learning for Computer Architecture and SystemsACM Computing Surveys10.1145/349452355:3(1-39)Online publication date: 3-Feb-2022
https://dl.acm.org/doi/10.1145/3494523
Banerjee SJha SKalbarczyk ZIyer RSherwood TBerger EKozyrakis C(2021)BayesPerf: minimizing performance monitoring errors using Bayesian statisticsProceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3445814.3446739(832-844)Online publication date: 19-Apr-2021
https://dl.acm.org/doi/10.1145/3445814.3446739
Reza MLe T(2021)Reinforcement Learning Enabled Routing for High-Performance Networks-on-Chip2021 IEEE International Symposium on Circuits and Systems (ISCAS)10.1109/ISCAS51556.2021.9401790(1-5)Online publication date: May-2021
https://doi.org/10.1109/ISCAS51556.2021.9401790
Mirbagher-Ajorpaz SPokam GMohammadian-Koruyeh EGarza EAbu-Ghazaleh NJimenez D(2020)PerSpectron: Detecting Invariant Footprints of Microarchitectural Attacks with Perceptron2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO50266.2020.00093(1124-1137)Online publication date: Oct-2020
https://doi.org/10.1109/MICRO50266.2020.00093
Kulkarni NGonzalez-Pumariega GKhurana AShoemaker CDelimitrou CAlbonesi D(2020)CuttleSys: Data-Driven Resource Management for Interactive Services on Reconfigurable Multicores2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO50266.2020.00060(650-664)Online publication date: Oct-2020
https://doi.org/10.1109/MICRO50266.2020.00060
Lin CTarsa S(2019)Branch Prediction Is Not A Solved Problem: Measurements, Opportunities, and Future Directions2019 IEEE International Symposium on Workload Characterization (IISWC)10.1109/IISWC47752.2019.9042108(228-238)Online publication date: Nov-2019
https://doi.org/10.1109/IISWC47752.2019.9042108

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents