Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2254756.2254791acmconferencesArticle/Chapter ViewAbstractPublication PagesmetricsConference Proceedingsconference-collections
research-article

ADP: automated diagnosis of performance pathologies using hardware events

Published: 11 June 2012 Publication History

Abstract

Performance characterization of applications' hardware behavior is essential for making the best use of available hardware resources. Modern architectures offer access to many hardware events that are capable of providing information to reveal architectural performance bottlenecks throughout the core and memory hierarchy. These events can provide programmers with unique and powerful insights into the causes of the resource bottlenecks in their applications. However, interpreting these events has been a significant challenge. We present an automated system that uses machine learning to identify an application's performance problems. Our system provides programmers with insights about the performance of their applications while shielding them from the onerous task of digesting hardware events. It uses a decision tree algorithm, random forests on our micro-benchmarks to fingerprint the performance problems. Our system divides a profiled application into functions and automatically classifies each function by the dominant hardware resource bottlenecks. Using the classifications from the hotspot functions, we were able to achieve an average speedup of 1.73 from three applications in the PARSEC benchmark suite. Our system provides programmers with a guideline of where, what, and how to fix the detected performance problems in applications, which would have otherwise required considerable architectural knowledge.

References

[1]
L. Adhianto, S. Banerjee, M. Fagan, M. Krentel, G. Marin, J. Mellor-Crummey, and N. R. Tallent. Hpctoolkit: tools for performance analysis of optimized parallel programs http://hpctoolkit.org. Concurr. Comput. : Pract. Exper., 22:685--701, April 2010.
[2]
R. Azimi, M. Stumm, and R. Wisniewski. Online performance analysis by statistical sampling of microprocessor performance counters. In Proc. of International Conference on Supercomputing, pages 101--110. ACM, 2005.
[3]
R. Bagrodia, R. Meyer, M. Takai, Y. Chen, X. Zeng, J. Martin, and H. Song. Parsec: A parallel simulation environment for complex systems. Computer, 31(10):77--85, 1998.
[4]
R. Bitirgen, E. Ipek, and J. Martinez. Coordinated management of multiple interacting resources in chip multiprocessors: A machine learning approach. In Proc. of IEEE/ACM International Symposium on Microarchitecture, pages 318--329. IEEE Computer Society, 2008.
[5]
L. Breiman. Classification and regression trees. Chapman & Hall/CRC, 1984.
[6]
L. Breiman. Random forests. Machine learning, 45(1):5--32, 2001.
[7]
S. Browne, J. Dongarra, N. Garner, G. Ho, and P. Mucci. A portable programming interface for performance evaluation on modern processors. International Journal of High Performance Computing Applications, 14(3), 2000.
[8]
S. Browne, J. Dongarra, N. Garner, K. London, and P. Mucci. A scalable cross-platform infrastructure for application performance tuning using hardware counters. In Proc. of ACM/IEEE Conference on Supercomputing. IEEE Computer Society, 2000.
[9]
M. Burtscher, B.-D. Kim, J. Diamond, J. McCalpin, L. Koesterke, and J. Browne. Perfexpert: An easy-to-use performance diagnosis tool for hpc applications. In Proc. of International Conference for High Performance Computing, Networking, Storage and Analysis, SC '10, pages 1--11. IEEE Computer Society, 2010.
[10]
M. Casas, R. M. Badia, and J. Labarta. Automatic phase detection and structure extraction of mpi applications. Int. J. High Perform. Comput. Appl., 24:335--360, August 2010.
[11]
P. Cheeseman and J. Stutz. Bayesian classification (AutoClass): Theory and results. In Advances in knowledge discovery and data mining. American Association for Artificial Intelligence, 1996.
[12]
M. Curtis-Maury, F. Blagojevic, C. Antonopoulos, and D. Nikolopoulos. Prediction-based power-performance adaptation of multithreaded scientific codes. IEEE Transactions on Parallel and Distributed Systems, pages 1396--1410, 2008.
[13]
B. David, M. Geimer, F. Wolf, and L. Arnold. Identifying the root causes of wait states in large-scale parallel applications. In Proc. of International Conference on Parallel Processing. IEEE Computer Society, 2010.
[14]
U. Fayyad and K. Irani. Multi-interval discretization of continuous-valued attributes for classification learning. In Proc. of International Joint Conference on Artificial Intelligence, pages 1022--1029, 1993.
[15]
M. Gerndt and M. Ott. Automatic performance analysis with periscope. Concurr. Comput. : Pract. Exper., 22:736--748, 2010.
[16]
E. Ghiselli. Theory of psychological measurement. McGraw-Hill, 1964.
[17]
M. A. Hall. Correlation-based feature selection for discrete and numeric class machine learning. In Proc. of International Conference on Machine Learning, pages 359--366, 2000.
[18]
T. Heath, A. Centeno, P. George, L. Ramos, Y. Jaluria, and R. Bianchini. Mercury and freon: temperature emulation and management for server systems. In Proc. of International Conference on Architectural Support for Programming Languages and Operating Systems, pages 106--116. ACM, 2006.
[19]
J. Henning. SPEC CPU2006 benchmark descriptions. ACM SIGARCH Computer Architecture News, 34(4):1--17, 2006.
[20]
L. Hyafil and R. Rivest. Constructing optimal binary decision trees is NP-complete. Information Processing Letters, 5(1):15--17, 1976.
[21]
Intel. Vtune Amplifier XE. Web site: www.intel.com/software/products/vtune, 2011.
[22]
R. Kufrin. Perfsuite: An accessible, open source performance analysis environment for linux. In The International Conference on Linux Clusters, volume 151, 2005.
[23]
J. Levon and P. Elie. Oprofile: A system profiler for linux. Web site: http://oprofile.sourceforge.net, 2011.
[24]
C. Luk, R. Cohn, R. Muth, H. Patil, A. Klauser, G. Lowney, S. Wallace, V. Reddi, and K. Hazelwood. Pin: building customized program analysis tools with dynamic instrumentation. ACM SIGPLAN Notices, 40(6):190--200, 2005.
[25]
D. T. Marr, F. Binns, D. L. Hill, G. Hinton, D. A. Koufaty, J. A. Miller, and M. Upton. Hyper-threading technology architecture and microarchitecture. Intel Technology Journal, 6(1):1--12, 2002.
[26]
A. Merkel, J. Stoess, and F. Bellosa. Resource-conscious scheduling for energy efficiency on multicore processors. In Proc. of European conference on Computer systems, EuroSys, pages 153--166. ACM, 2010.
[27]
W. E. Nagel, A. Arnold, M. Weber, H.-C. Hoppe, and K. Solchenbach. Vampir: Visualization and analysis of mpi resources. SUPERCOMPUTER, 12:69--80, 1996.
[28]
E. Ould-Ahmed-Vall, J. Woodlee, C. Yount, K. Doshi, and S. Abraham. Using model trees for computer architecture performance analysis of software applications. IEEE International Symmposium on Performance Analysis of Systems and Software, 0:116--125, 2007.
[29]
V. Pillet, J. Labarta, T. Cortes, and S. Girona. Paraver: A tool to visualize and analyze parallel code. IN WOTUG-18, pages 17--31, 1995.
[30]
J. Quinlan. Induction of decision trees. Machine learning, 1(1):81--106, 1986.
[31]
J. Quinlan. C4. 5: programs for machine learning. Morgan Kaufmann, 1993.
[32]
J. R. Quinlan and R. L. Rivest. Inferring decision trees using the minimum description length principle. Inf. Comput., 80:227--248, 1989.
[33]
P. C. Roth and B. P. Miller. On-line automated performance diagnosis on thousands of processes. In Proc. of Symposium on Principles and Practice of Parallel Programming, pages 69--80. ACM, 2006.
[34]
D. Rumelhart. Learning internal representations by error propagation. MIT Press, 1986.
[35]
F. Schneider, M. Payer, and T. Gross. Online optimizations driven by hardware performance monitoring. In Proc. of Conference on Programming Language Design and Implementation, pages 373--382. ACM, 2007.
[36]
K. Shen, M. Zhong, S. Dwarkadas, C. Li, C. Stewart, and X. Zhang. Hardware counter driven on-the-fly request signatures. ACM SIGOPS Operating Systems Review, 42(2), 2008.
[37]
S. S. Shende and A. D. Malony. The tau parallel performance system. Int. J. High Perform. Comput. Appl., 20:287--311, 2006.
[38]
J. Stoess, C. Lang, and F. Bellosa. Energy management for hypervisor-based virtual machines. In Proc. of the USENIX Annual Technical Conference, page 1. USENIX Association, 2007.
[39]
D. M. Tullsen, S. J. Eggers, and H. M. Levy. Simultaneous multithreading: maximizing on-chip parallelism. In Proc. of International Symposium on Computer Architecture, ISCA '95, pages 392--403. ACM, 1995.
[40]
V. Vapnik. An overview of statistical learning theory. Neural Networks, IEEE Transactions on, 10(5):988--999, 1999.
[41]
J. Vetter. Performance analysis of distributed applications using automatic classification of communication inefficiencies. In Proc. of International Conference on Supercomputing, pages 245--254. ACM, 2000.
[42]
L. Xu, P. Yan, and T. Chang. Best first strategy for feature selection. In International Conference on Pattern Recognition, pages 706 --708 vol.2, 1988.
[43]
W. Xu, L. Huang, A. Fox, D. Patterson, and M. Jordan. Detecting large-scale system problems by mining console logs. In Proc. of ACM Symposium on Operating Systems Principles, pages 117--132. ACM, 2009.
[44]
W. Yoo, K. Larson, S. Kim, W. Ahn, R. Campbell, and L. Baugh. Automated Fingerprinting of Performance Pathologies Using Performance Monitoring Units (PMUs). In Proc. of USENIX Workshop on Hot topics in parallelism. USENIX Association, 2011.
[45]
O. Zaki, E. Lusk, W. Gropp, and D. Swider. Toward scalable performance visualization with jumpshot. International Journal of High Performance Computing Applications, 13(3), 1999.

Cited By

View all
  • (2021)BayesPerf: minimizing performance monitoring errors using Bayesian statisticsProceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3445814.3446739(832-844)Online publication date: 19-Apr-2021
  • (2018)Detecting Data Exploits Using Low-level Hardware InformationProceedings of the First Workshop on Radical and Experiential Security10.1145/3203422.3203433(41-47)Online publication date: 24-May-2018
  • (2018)Spatio-Temporal Analysis of HPC I/O and Connection Data2018 IEEE 38th International Conference on Distributed Computing Systems (ICDCS)10.1109/ICDCS.2018.00176(1585-1588)Online publication date: Jul-2018
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGMETRICS '12: Proceedings of the 12th ACM SIGMETRICS/PERFORMANCE joint international conference on Measurement and Modeling of Computer Systems
June 2012
450 pages
ISBN:9781450310970
DOI:10.1145/2254756
  • cover image ACM SIGMETRICS Performance Evaluation Review
    ACM SIGMETRICS Performance Evaluation Review  Volume 40, Issue 1
    Performance evaluation review
    June 2012
    433 pages
    ISSN:0163-5999
    DOI:10.1145/2318857
    Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 June 2012

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. fingerprint
  2. hardware event
  3. machine learning
  4. micro-benchmark
  5. performance analysis
  6. resource bottleneck

Qualifiers

  • Research-article

Conference

SIGMETRICS '12
Sponsor:

Acceptance Rates

Overall Acceptance Rate 459 of 2,691 submissions, 17%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)9
  • Downloads (Last 6 weeks)0
Reflects downloads up to 18 Aug 2024

Other Metrics

Citations

Cited By

View all
  • (2021)BayesPerf: minimizing performance monitoring errors using Bayesian statisticsProceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3445814.3446739(832-844)Online publication date: 19-Apr-2021
  • (2018)Detecting Data Exploits Using Low-level Hardware InformationProceedings of the First Workshop on Radical and Experiential Security10.1145/3203422.3203433(41-47)Online publication date: 24-May-2018
  • (2018)Spatio-Temporal Analysis of HPC I/O and Connection Data2018 IEEE 38th International Conference on Distributed Computing Systems (ICDCS)10.1109/ICDCS.2018.00176(1585-1588)Online publication date: Jul-2018
  • (2017)Detecting Memory-Boundedness with Hardware Performance CountersProceedings of the 8th ACM/SPEC on International Conference on Performance Engineering10.1145/3030207.3030223(27-38)Online publication date: 17-Apr-2017
  • (2017)Application Execution Time Prediction for Effective CPU Provisioning in Virtualization EnvironmentIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2017.270754328:11(3074-3088)Online publication date: 1-Nov-2017
  • (2016)ErnestProceedings of the 13th Usenix Conference on Networked Systems Design and Implementation10.5555/2930611.2930635(363-378)Online publication date: 16-Mar-2016
  • (2016)Can Data-Only Exploits be Detected at Runtime Using Hardware Events?Proceedings of the Hardware and Architectural Support for Security and Privacy 201610.1145/2948618.2948620(1-7)Online publication date: 18-Jun-2016
  • (2016)Machine learning based job status prediction in scientific clusters2016 SAI Computing Conference (SAI)10.1109/SAI.2016.7555961(44-53)Online publication date: Jul-2016
  • (2016)Measurement Bias from Address Aliasing2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW.2016.126(1506-1515)Online publication date: May-2016
  • (2016)Performance Analysis Tool for HPC and Big Data Applications on Scientific ClustersConquering Big Data with High Performance Computing10.1007/978-3-319-33742-5_7(139-161)Online publication date: 17-Sep-2016
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media