Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3061639.3062257acmconferencesArticle/Chapter ViewAbstractPublication PagesdacConference Proceedingsconference-collections
research-article

HALWPE: Hardware-Assisted Light Weight Performance Estimation for GPUs

Published: 18 June 2017 Publication History

Abstract

This paper presents a predictive modeling framework for GPU performance. The key innovation underlying this approach is that performance statistics collected from representative workloads running on current generation GPUs can effectively predict the performance of next-generation GPUs. This is useful when simulators are available for the next-generation device, but simulation times are exorbitant, rendering early design space exploration of microarchitectural parameters and other features infeasible. When predicting performance across three Intel GPU generations (Haswell, Broadwell, Skylake), our models achieved low out-of-sample-errors ranging from 7.45% to 8.91%, while running 30,000-45,000 times faster than cycle-accurate simulation.

References

[1]
N. Ardalani, et al., "Cross-architecture performance prediction (XAPP) using CPU to predict GPU performance" in Proc. Int. Symp. Microarchitecture (MICRO-48), 2015, pp 725--737.
[2]
P. E. Bailey, et al., "Adaptive configuration selection for power-constrained heterogeneous systems," in Proc. Int. Conf. on Par. Proc. (ICPP), 2014, pp. 371--380.
[3]
A. Bakhoda, et al., "Analyzing CUDA workloads using a detailed GPU simulator," in Proc. Int. Symp. Perf. Analysis of Systems and Software (ISPASS), 2009, pp. 163--174.
[4]
V.M. del Barrio, et al., "ATILLA: a cycle-accurate execution drive simulator for modern GPU architectures" in Proc. Int. Symp. Perf. Analysis of Systems and Software, (ISPASS), 2006, pp. 231--241.
[5]
J. Chen, et al., "Tree structured analysis on GPU Power study" in Proc. Int. Conf. Computer Design (ICCD), 2011, pp. 57--64.
[6]
A. Gutierrez, et al., "Sources of error in full-system simulation," in Proc. of the Int. Symp. Perf. Analysis of Systems and Software (ISPASS), 2014, pp. 13--22.
[7]
T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning. Springer, New York, 2001.
[8]
Intel Corporation, "The Compute Architecture of Intel Processor Graphics Gen7.5." {Online}. Available: https://goo.gl/5HZ54v
[9]
Intel Corporation, "The Compute Architecture of Intel Processor Graphics Gen8." {Online}. Available: https://goo.gl/TnpAGc
[10]
Intel Corporation, "The Compute Architecture of Intel Processor Graphics Gen9." {Online}. Available: https://goo.gl/RMmUc6
[11]
E. Ipek, et al., "Efficiently Exploring architectural design spaces via predictive modeling," in Proc. Int. Conf. Arch. Support for Prog. Languages and Operating Systems (ASPLOS), 2006, pp. 195--206.
[12]
W. Jia, K. Shaw, and M. Martonosi, "Starchart: hardware and software optimization using recursive partitioning regression trees," in Proc. Int. Conf. Parallel Architectures and Compilation Techniques (PACT), 2013, pp. 257--268.
[13]
B. C. Lee and D. M. Brooks, "Accurate and efficient regression modeling for microarchitectural performance and power prediction," in Proc. Int. Conf. Arch. Support for Prog. Languages and Operating Systems (ASPLOS), 2006, pp. 185--194.
[14]
S. Lee and W. W. Ro, "Parallel GPU architecture simulation framework exploiting work allocation unit parallelism," in Proc. Int. Symp. Perf. Anal. Systems and Software (ISPASS), 2013, 107--117.
[15]
X. Ma, et al., "Statistical power consumption analysis and modeling for GPU-based computing," in Proc. ACM SOSP Workshop on Power Aware Computing and Systems (HotPower), 2009.
[16]
B. Ozisikyilmaz, G. Memik, and A. Choudhary, "Machine learning models to predict performance of computer system design alternatives" in Proc. Int. Conf. Par. Proc. (ICPP), 2008, pp. 495--502.
[17]
S. Song, et al., "A simplified and accurate model of power-performance efficiency on emergent GPU architectures", in Proc. Int. Symp. Parallel & Distributed Proc. (IPDPS), 2013. pp. 673--686.
[18]
R. Ubal, et al., "Multi2Sim: a simulation framework for CPU-GPU computing," in Proc. Int. Conf. Parallel Architectures and Compilation Techniques (PACT), 2012, pp. 335--344.
[19]
G. Wu, et al. "GPGPU performance and power estimation using machine learning" in Proc. Int. Symp. High Perf. Comp. Arch. (HPCA), 2015, pp. 564--576.
[20]
Y. Zhang, Y. Hu, B. Li, and L. Peng, "Performance and power analysis of ATI GPU: a statistical approach," in Proc. Int. Conf. Networking, Architecture and Storage (NAS), 2011, pp. 149--158.
[21]
X. Zheng, L.K. John, and A. Gerstlauer, "Accurate phase-level cross-platform power and performance estimation" in Proc. Design Automation Conf. (DAC), 2016, article no. 4.
[22]
X. Zheng, et al., "Learning-based analytical cross-platform performance prediction" in Proc. Int. Conf. Embedded Computer Sys., Arch., Modeling and Simulation (SAMOS), 2015, pp. 52--59.

Cited By

View all
  • (2024)Many-BSP: an analytical performance model for CUDA kernelsComputing10.1007/s00607-023-01255-w106:5(1519-1555)Online publication date: 1-May-2024
  • (2023)Flydeling: Streamlined Performance Models for Hardware Acceleration of CNNs through System IdentificationACM Transactions on Modeling and Performance Evaluation of Computing Systems10.1145/35948708:3(1-33)Online publication date: 18-Jul-2023
  • (2022)Prediction Modeling for Application-Specific Communication Architecture Design of Optical NoCACM Transactions on Embedded Computing Systems10.1145/352024121:4(1-29)Online publication date: 23-Aug-2022
  • Show More Cited By

Index Terms

  1. HALWPE: Hardware-Assisted Light Weight Performance Estimation for GPUs

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    DAC '17: Proceedings of the 54th Annual Design Automation Conference 2017
    June 2017
    533 pages
    ISBN:9781450349277
    DOI:10.1145/3061639
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    In-Cooperation

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 18 June 2017

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. DirectX
    2. GPU
    3. Predictive Modeling
    4. Render Pipeline

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    DAC '17
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 1,770 of 5,499 submissions, 32%

    Upcoming Conference

    DAC '25
    62nd ACM/IEEE Design Automation Conference
    June 22 - 26, 2025
    San Francisco , CA , USA

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)9
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 01 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Many-BSP: an analytical performance model for CUDA kernelsComputing10.1007/s00607-023-01255-w106:5(1519-1555)Online publication date: 1-May-2024
    • (2023)Flydeling: Streamlined Performance Models for Hardware Acceleration of CNNs through System IdentificationACM Transactions on Modeling and Performance Evaluation of Computing Systems10.1145/35948708:3(1-33)Online publication date: 18-Jul-2023
    • (2022)Prediction Modeling for Application-Specific Communication Architecture Design of Optical NoCACM Transactions on Embedded Computing Systems10.1145/352024121:4(1-29)Online publication date: 23-Aug-2022
    • (2022)A Survey of Machine Learning for Computer Architecture and SystemsACM Computing Surveys10.1145/349452355:3(1-39)Online publication date: 3-Feb-2022
    • (2022)Power-Aware Computing on GPGPU Systems Using ML Classification Techniques2022 IEEE International Symposium on Circuits and Systems (ISCAS)10.1109/ISCAS48785.2022.9937872(1487-1491)Online publication date: 28-May-2022
    • (2020)Predictive Compositional Method to Design and Reoptimize Complex Behavioral DataflowsIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2020.296644739:10(2615-2627)Online publication date: Oct-2020
    • (2020)Comparison of analytical and ML-based models for predicting CPU–GPU data transfer timeComputing10.1007/s00607-019-00780-xOnline publication date: 8-Jan-2020
    • (2019)Hardware-Assisted Cross-Generation Prediction of GPUs Under DesignIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2018.283439838:6(1133-1146)Online publication date: 1-Jun-2019
    • (2018)Predictive Modeling for CPU, GPU, and FPGA Performance and Power Consumption: A Survey2018 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)10.1109/ISVLSI.2018.00143(763-768)Online publication date: Jul-2018

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media