Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3307650.3322267acmconferencesArticle/Chapter ViewAbstractPublication PagesiscaConference Proceedingsconference-collections
research-article

Post-silicon CPU adaptation made practical using machine learning

Published: 22 June 2019 Publication History
  • Get Citation Alerts
  • Abstract

    Processors that adapt architecture to workloads at runtime promise compelling performance per watt (PPW) gains, offering one way to mitigate diminishing returns from pipeline scaling. State-of-the-art adaptive CPUs deploy machine learning (ML) models on-chip to optimize hardware by recognizing workload patterns in event counter data. However, despite breakthrough PPW gains, such designs are not yet widely adopted due to the potential for systematic adaptation errors in the field.
    This paper presents an adaptive CPU based on Intel SkyLake that (1) closes the loop to deployment, and (2) provides a novel mechanism for post-silicon customization. Our CPU performs predictive cluster gating, dynamically setting the issue width of a clustered architecture while clock-gating unused resources. Gating decisions are driven by ML adaptation models that execute on an existing microcontroller, minimizing design complexity and allowing performance characteristics to be adjusted with the ease of a firmware update. Crucially, we show that although adaptation models can suffer from statistical blindspots that risk degrading performance on new workloads, these can be reduced to minimal impact with careful design and training.
    Our adaptive CPU improves PPW by 31.4% over a comparable non-adaptive CPU on SPEC2017, and exhibits two orders of magnitude fewer Service Level Agreement (SLA) violations than the state-of-the-art. We show how to optimize PPW using models trained to different SLAs or to specific applications, e.g. to improve datacenter hardware in situ. The resulting CPU meets real world deployment criteria for the first time and provides a new means to tailor hardware to individual customers, even as their needs change.

    References

    [1]
    {n. d.}. Dell PowerEdge Updates Best Practices Guide. https://tinyurl.com/y7ce6758.
    [2]
    {n. d.}. Hewlett-Packard Enterprise SUM Best Practices Implementation Guide. https://tinyurl.com/y85tp8a6.
    [3]
    Joshua Attenberg, Panos Ipeirotis, and Foster Provost. 2015. Beat the machine: Challenging humans to find a predictive model's âÄIJunknown unknownsâĂİ. Journal of Data and Information Quality (JDIQ) (2015).
    [4]
    R Iris Bahar and Srilatha Manne. 2001. Power and energy reduction via pipeline balancing. In ISCA. IEEE.
    [5]
    Rajeev Balasubramonian, Sandhya Dwarkadas, and David H Albonesi. 2003. Dynamically managing the communication-parallelism trade-off in future clustered processors. In ISCA. IEEE.
    [6]
    Amirali Baniasadi and Andreas Moshovos. 2000. Instruction distribution heuristics for quad-cluster, dynamic ally-scheduled, superscalar processors. In MICRO. IEEE.
    [7]
    Luiz André Barroso, Jimmy Clidaras, and Urs Hölzle. 2013. The datacenter as a computer: An introduction to the design of warehouse-scale machines. Synthesis lectures on computer architecture (2013).
    [8]
    Nathan Beckmann and Daniel Sanchez. 2017. Maximizing cache Perf. under uncertainty. In HPCA. IEEE.
    [9]
    Ramazan Bitirgen, Engin Ipek, and Jose F Martinez. 2008. Coordinated management of multiple interacting resources in chip multiprocessors: A machine learning approach. In MICRO. IEEE Computer Society.
    [10]
    Ramon Canal, J-M Parcerisa, and Antonio Gonzalez. 1999. A cost-effective clustered architecture. In Parallel Architectures and Compilation Techniques. IEEE.
    [11]
    Ramon Canal, Joan-Manuel Parcerisa, and Antonio González. 2000. Dynamic cluster assignment mechanisms. In HPCA. IEEE.
    [12]
    Pedro Chaparro, Jose Gonzalez, and Antonio Gonzalez. 2004. Thermal-aware clustered microarchitectures. In Intrnl. Conf on Computer Design: VLSI in Computers and Processors. IEEE.
    [13]
    Rangeen Basu Roy Chowdhury, Anil K Kannepalli, Sungkwan Ku, and Eric Rotenberg. 2016. Anycore: A synthesizable rtl model for exploring and fabricating adaptive superscalar cores. In ISPASS. IEEE.
    [14]
    Ryan Collett and Dorian Pyle. 2013. What Happens When Chip-Design Complexity Outpaces Productivity?
    [15]
    Christophe Dubach, Timothy M Jones, Edwin V Bonilla, and Michael FP O'Boyle. 2010. A predictive model for dynamic microarchitectural adaptivity control. In MICRO. IEEE Computer Society.
    [16]
    Stijn Eyerman, Lieven Eeckhout, Tejas Karkhanis, and James E Smith. 2006. A Perf. counter architecture for computing accurate CPI components. In SIGOPS Operating Systems Review. ACM.
    [17]
    Keith I Farkas, Paul Chow, Norman P Jouppi, and Zvonko Vranesic. 1997. The multicluster architecture: Reducing cycle time through partitioning. In MICRO. IEEE Computer Society.
    [18]
    Soraya Ghiasi, Jason Casmira, and Dirk Grunwald. 2000. Using IPC variation in workloads with externally specified rates to reduce power consumption. In In Workshop on Complexity Effective Design.
    [19]
    Linley Gwennap et al. 1996. Digital 21264 sets new standard. Microprocessor report (1996).
    [20]
    Jawad Haj-Yihia, Ahmad Yasin, Yosi Ben Asher, and Avi Mendelson. 2016. Fine-grain power breakdown of modern out-of-order cores and its implications on skylake-based systems. Trans. on Arch. and Code Opt. (TACO) (2016).
    [21]
    Zhigang Hu, Alper Buyuktosunoglu, Viji Srinivasan, Victor Zyuban, Hans Jacobson, and Pradip Bose. 2004. Microarchitectural techniques for power gating of execution units. In Symposium on Low power electronics and design. ACM.
    [22]
    Michael C Huang, Jose Renau, and Josep Torrellas. 2003. Positional adaptation of processors: application to energy reduction. In ISCA. IEEE.
    [23]
    Engin Ipek, Onur Mutlu, José F Martínez, and Rich Caruana. 2008. Self-optimizing memory controllers: A reinforcement learning approach. In SIGARCH Computer Architecture News. IEEE Computer Society.
    [24]
    Richard E Kessler. 1999. The alpha 21264 microprocessor. (1999).
    [25]
    Wonyoung Kim, Meeta S Gupta, Gu-Yeon Wei, and David Brooks. 2008. System level analysis of fast, per-core DVFS using on-chip switching regulators. In HPCA, 2008. IEEE.
    [26]
    Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).
    [27]
    Christos Kozyrakis and David Patterson. 2003. Overcoming the limitations of conventional vector processors. ACM SIGARCH Computer Architecture News (2003).
    [28]
    Himabindu Lakkaraju, Ece Kamar, Rich Caruana, and Eric Horvitz. 2017. Identifying Unknown Unknowns in the Open World: Representations and Policies for Guided Exploration. In AAAI.
    [29]
    Hai Li, Swarup Bhunia, Yiran Chen, TN Vijaykumar, and Kaushik Roy. 2003. Deterministic clock gating for microprocessor power reduction. In HPCA. IEEE.
    [30]
    Andrew Lukefahr, Shruti Padmanabha, Reetuparna Das, Faissal M Sleiman, Ronald Dreslinski, Thomas F Wenisch, and Scott Mahlke. 2012. Composite cores: Pushing heterogeneity into a core. In MICRO. IEEE Computer Society.
    [31]
    Peter Macken, Marc Degrauwe, Mark Van Paemel, and Henri Oguey. 1990. A voltage reduction technique for digital systems. In Solid-State Circuits Conf. IEEE.
    [32]
    Diana Marculescu. 2000. On the use of microarchitecture-driven dynamic voltage scaling. In Workshop on Complexity-Effective Design.
    [33]
    Cade Metz. 2012. Ultimate Silicon Valley Perk: Custom Chips from Intel and AMD. Available at https://www.wired.com/2012/09/intel-amd-custom-chips/.
    [34]
    Shruti Padmanabha, Andrew Lukefahr, Reetuparna Das, and Scott Mahlke. 2013. Trace based phase prediction for tightly-coupled heterogeneous cores. In MICRO. ACM.
    [35]
    S Palacharla, NP Jouppi, and JE Smith. 1997. Complexity-Effective Superscalar Processors. In ISCA. IEEE.
    [36]
    F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. 2011. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research (2011).
    [37]
    Pietro Perona and William Freeman. 1998. A factorization approach to grouping. In European Conf on Computer Vision. Springer.
    [38]
    Paula Petrica, Adam M Izraelevitz, David H Albonesi, and Christine A Shoemaker. 2013. Flicker: A dynamically adaptive architecture for power limited multicore systems. In SIGARCH computer architecture news. ACM.
    [39]
    Nicolas Pinto, David Doukhan, James J DiCarlo, and David D Cox. 2009. A high-throughput screening approach to discovering good forms of biologically inspired visual representation. PLoS computational biology (2009).
    [40]
    Dmitry Ponomarev, Gurhan Kucuk, and Kanad Ghose. 2006. Dynamic resizing of superscalar datapath components for energy efficiency. IEEE Trans. on Computers (2006).
    [41]
    Gokul Subramanian Ravi and Mikko H Lipasti. 2017. CHARSTAR:Clock hierarchy aware resource scaling in tiled architectures. SIGARCH Computer Architecture News (2017).
    [42]
    Tajana Simunic, Luca Benini, Andrea Acquaviva, Peter Glynn, and Giovanni De Micheli. 2001. Dynamic voltage scaling and power management for portable systems. In Design Automation Conf. ACM.
    [43]
    Leendert van Dorn. 2017. Enabling cloud workloads through innovations in Silicon. Available at https://azure.microsoft.com/en-us/blog/.
    [44]
    Augusto Vega, Alper Buyuktosunoglu, Heather Hanson, Pradip Bose, and Srinivasan Ramani. 2013. Crank it up or dial it down: coordinated multiprocessor frequency and folding control. In MICRO. IEEE.
    [45]
    Yair Weiss. 1999. Segmentation using eigenvectors: a unifying view. In Intrnl. Conf on Computer vision. IEEE.
    [46]
    Qiang Wu, Philo Juang, Margaret Martonosi, and Douglas W Clark. 2005. Voltage and frequency control with adaptive reaction time in multiple-clock-domain processors. In HPCA, 2005. IEEE.
    [47]
    Hanan Youssef, Sami Iqram, and Scott Van Woudenberg. 2017. Compute Engine updates bring Skylake GA, extended memory and more VM flexibility. Available at https://cloud.google.com/blog/products/.
    [48]
    Jason Zander. 2015. Building the Intelligent Cloud: Announcing New Azure Innovations to Transform Business. Available at https://azure.microsoft.com/en-us/blog/.

    Cited By

    View all
    • (2023)SmartIndex: Learning to Index Caches to Improve PerformanceIEEE Computer Architecture Letters10.1109/LCA.2023.326447822:1(33-36)Online publication date: 1-Jan-2023
    • (2023)Post-Silicon Customization Using Deep Neural NetworksArchitecture of Computing Systems10.1007/978-3-031-42785-5_9(120-136)Online publication date: 13-Jun-2023
    • (2022)Role of Logistic Regression in Malware Detection: A Systematic Literature ReviewVFAST Transactions on Software Engineering10.21015/vtse.v10i2.96310:2(36-46)Online publication date: 15-May-2022
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ISCA '19: Proceedings of the 46th International Symposium on Computer Architecture
    June 2019
    849 pages
    ISBN:9781450366694
    DOI:10.1145/3307650
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    In-Cooperation

    • IEEE-CS\DATC: IEEE Computer Society

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 22 June 2019

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. adaptive hardware
    2. clustered architectures
    3. machine learning
    4. runtime optimization

    Qualifiers

    • Research-article

    Conference

    ISCA '19
    Sponsor:

    Acceptance Rates

    ISCA '19 Paper Acceptance Rate 62 of 365 submissions, 17%;
    Overall Acceptance Rate 543 of 3,203 submissions, 17%

    Upcoming Conference

    ISCA '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)46
    • Downloads (Last 6 weeks)4

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)SmartIndex: Learning to Index Caches to Improve PerformanceIEEE Computer Architecture Letters10.1109/LCA.2023.326447822:1(33-36)Online publication date: 1-Jan-2023
    • (2023)Post-Silicon Customization Using Deep Neural NetworksArchitecture of Computing Systems10.1007/978-3-031-42785-5_9(120-136)Online publication date: 13-Jun-2023
    • (2022)Role of Logistic Regression in Malware Detection: A Systematic Literature ReviewVFAST Transactions on Software Engineering10.21015/vtse.v10i2.96310:2(36-46)Online publication date: 15-May-2022
    • (2022)A Survey of Machine Learning for Computer Architecture and SystemsACM Computing Surveys10.1145/349452355:3(1-39)Online publication date: 3-Feb-2022
    • (2021)BayesPerf: minimizing performance monitoring errors using Bayesian statisticsProceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3445814.3446739(832-844)Online publication date: 19-Apr-2021
    • (2021)Reinforcement Learning Enabled Routing for High-Performance Networks-on-Chip2021 IEEE International Symposium on Circuits and Systems (ISCAS)10.1109/ISCAS51556.2021.9401790(1-5)Online publication date: May-2021
    • (2020)PerSpectron: Detecting Invariant Footprints of Microarchitectural Attacks with Perceptron2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO50266.2020.00093(1124-1137)Online publication date: Oct-2020
    • (2020)CuttleSys: Data-Driven Resource Management for Interactive Services on Reconfigurable Multicores2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO50266.2020.00060(650-664)Online publication date: Oct-2020
    • (2019)Branch Prediction Is Not A Solved Problem: Measurements, Opportunities, and Future Directions2019 IEEE International Symposium on Workload Characterization (IISWC)10.1109/IISWC47752.2019.9042108(228-238)Online publication date: Nov-2019

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media