Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1555754.1555800acmconferencesArticle/Chapter ViewAbstractPublication PagesiscaConference Proceedingsconference-collections
research-article

End-to-end performance forecasting: finding bottlenecks before they happen

Published: 20 June 2009 Publication History
  • Get Citation Alerts
  • Abstract

    Many important workloads today, such as web-hosted services, are limited not by processor core performance but by interactions among the cores, the memory system, I/O devices, and the complex software layers that tie these components together. Architects designing future systems for these workloads are challenged to identify performance bottlenecks because, as in any concurrent system, overheads in one component may be hidden due to overlap with other operations. These overlaps span the user/kernel and software/hardware boundaries, making traditional performance analysis techniques inadequate.
    We present a methodology for identifying end-to-end critical paths across software and simulated hardware in complex networked systems. By modeling systems as collections of state machines interacting via queues, we can trace critical paths through multiplexed processing engines, identify when resources create bottlenecks (including abstract resources such as flow-control credits), and predict the benefit of eliminating bottlenecks by increasing hardware speeds or expanding available resources.
    We implement our technique in a full-system simulator and analyze a TCP microbenchmark, a web server, the Linux TCP/IP stack, and an Ethernet controller. From a single run of the microbenchmark, our tool--within minutes--correctly identifies a series of bottlenecks, and predicts the performance of hypothetical systems in which these bottlenecks are successively eliminated, culminating in a total speedup of 3X.We then validate these predictions through hours of additional simulation, and find them to be accurate within 1--17%. We also analyze the web server, find it to be CPU-bound, and predict the performance of a system with an additional core within 6%.

    References

    [1]
    M. K. Aguilera, J. C. Mogul, J. L. Wiener, P. Reynolds, and A. Muthitacharoen. Performance debugging for distributed systems of black boxes. In Proc. Nineteenth ACM Symp. on Operating System Principles (SOSP), pages 74--89, 2003.
    [2]
    O. Azizi, J. Collins, D. Patil, H. Wang, and M. Horowitz. Processor performance modeling using symbolic simulation. In Proc. 2008 IEEE Int'l Symp. on Performance Analysis of Systems and Software, pages 127--138, Apr. 2008.
    [3]
    P. Barford and M. Crovella. Generating representative web workloads for network and server performance evaluation. In Proc. 1998 ACM SIGMETRICS Conf. on Measurement and Modeling of Computer Systems, pages 151--160, 1998.
    [4]
    P. Barford and M. Crovella. Critical path analysis of TCP transactions. In Proc. SIGCOMM '00, pages 127--138, 2000.
    [5]
    N. L. Binkert, R. G. Dreslinski, L. R. Hsu, K. T. Lim, A. G. Saidi, and S. K. Reinhardt. The M5 simulator: Modeling networked systems. IEEE Micro, 26(4):52--60, Jul/Aug 2006.
    [6]
    B. Cantrill. Hidden in plain sight. Queue, 4(1):26--36, 2006.
    [7]
    W. Feng et al. Optimizing 10-Gigabit Ethernet for networks of workstations, clusters, and grids: A case study. In Proc. Supercomputing 2003, Nov. 2003.
    [8]
    B. Fields, R. Bodík, and M. D. Hill. Slack: maximizing performance under technological constraints. In Proc. 29th Ann. Int'l Symp. on Computer Architecture, pages 47--58, 2002.
    [9]
    B. Fields, S. Rubin, and R. Bodík. Focusing processor policies via critical-path prediction. In Proc. 28th Ann. Int'l Symp. on Computer Architecture, pages 74--85, May 2001.
    [10]
    B. A. Fields, R. Bodík, M. D. Hill, and C. J. Newburn. Using interaction costs for microarchitectural bottleneck analysis. In Proc. 36th Ann. Int'l Symp. on Microarchitecture, pages 228--239, Dec. 2003.
    [11]
    M. Hauswirth, P. F. Sweeney, A. Diwan, and M. Hind. Vertical profiling: understanding the behavior of object-oriented applications. In Proc. 19th Ann. Conf. on Object-Oriented Programming Systems, Languages and Applications (OOPSLA '04), pages 251--269, 2004.
    [12]
    J. K. Hollingsworth. An online computation of critical path profiling. In Proc. SIGMETRICS Symp. on Parallel and Distributed Tools (SPDT'96), pages 11--20, 1996.
    [13]
    J. K. Hollingsworth and B. P. Miller. Parallel program performance metrics: a comprison and validation. In Proc. 1992 Int'l Conf. on Supercomputing, pages 4--13, Nov. 1992.
    [14]
    J. K. Hollingsworth and B. P. Miller. Slack: A performance metric for parallel programs. Technical Report 1260, Computer Sciences Department, University of Wisconsin-Madison, Dec. 1994.
    [15]
    D. Kegel. Mindcraft redux. http://www.kegel.com/mindcraft_redux.html, Jan. 2003.
    [16]
    J. Kneschke. lighttpd. http://www.lighttpd.net.
    [17]
    T. Li, A. R. Lebeck, and D. J. Sorin. Quantifying instruction criticality for shared memory multiprocessors. In Proc. fifteenth ACM Symp. on Parallel Algorithms and Architectures (SPAA), pages 128--137, New York, NY, USA, 2003. ACM.
    [18]
    S. Mysore, B. Mazloom, B. Agrawal, and T. Sherwood. Understanding and visualizing full systems with data flow tomography. SIGARCH Comput. Archit. News, 36(1):211--221, 2008.
    [19]
    R. Nagarajan, X. Chen, R. G. McDonald, D. Burger, and S. W. Keckler. Critical path analysis of the TRIPS architecture. In Proc. 2006 IEEE Int'l Symp. on Performance Analysis of Systems and Software, pages 37--47, Mar. 2006.
    [20]
    A. G. Saidi, N. L. Binkert, S. K. Reinhardt, and T. Mudge. Full-system critical path analysis. In Proc. 2008 IEEE Int'l Symp. on Performance Analysis of Systems and Software, Apr. 2008.
    [21]
    J. Sørensen. gensink. http://jes.home.cern.ch/jes/gensink/.
    [22]
    E. Tune, D. M. Tullsen, and B. Calder. Quantifying instruction criticality. In Proc. 11th Ann. Int'l Conf. on Parallel Architectures and Compilation Techniques, page 104, 2002.
    [23]
    C.-Q. Yang and B. P. Miller. Critical path analysis for the execution of parallel and distributed programs. In Proc. 8th Int'l Conf. on Distributed Computing Systems, pages 366--373, June 1988.

    Cited By

    View all
    • (2022)CalipersProceedings of the 36th ACM International Conference on Supercomputing10.1145/3524059.3532390(1-14)Online publication date: 28-Jun-2022
    • (2015)Where does the time go? characterizing tail latency in memcached2015 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)10.1109/ISPASS.2015.7095781(21-31)Online publication date: Mar-2015
    • (2013)Criticality stacksACM SIGARCH Computer Architecture News10.1145/2508148.248596641:3(511-522)Online publication date: 23-Jun-2013
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ISCA '09: Proceedings of the 36th annual international symposium on Computer architecture
    June 2009
    510 pages
    ISBN:9781605585260
    DOI:10.1145/1555754
    • cover image ACM SIGARCH Computer Architecture News
      ACM SIGARCH Computer Architecture News  Volume 37, Issue 3
      June 2009
      495 pages
      ISSN:0163-5964
      DOI:10.1145/1555815
      Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 20 June 2009

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. critical path analysis
    2. performance analysis

    Qualifiers

    • Research-article

    Conference

    ISCA '09
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 543 of 3,203 submissions, 17%

    Upcoming Conference

    ISCA '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)13
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 10 Aug 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2022)CalipersProceedings of the 36th ACM International Conference on Supercomputing10.1145/3524059.3532390(1-14)Online publication date: 28-Jun-2022
    • (2015)Where does the time go? characterizing tail latency in memcached2015 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)10.1109/ISPASS.2015.7095781(21-31)Online publication date: Mar-2015
    • (2013)Criticality stacksACM SIGARCH Computer Architecture News10.1145/2508148.248596641:3(511-522)Online publication date: 23-Jun-2013
    • (2013)Criticality stacksProceedings of the 40th Annual International Symposium on Computer Architecture10.1145/2485922.2485966(511-522)Online publication date: 23-Jun-2013
    • (2013)Platform-independent analysis of function-level communication in workloads2013 IEEE International Symposium on Workload Characterization (IISWC)10.1109/IISWC.2013.6704685(196-206)Online publication date: Sep-2013
    • (2012)Critical lock analysisProceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis10.5555/2388996.2389093(1-11)Online publication date: 10-Nov-2012
    • (2012)Critical lock analysisProceedings of the 2012 International Conference for High Performance Computing, Networking, Storage and Analysis10.1109/SC.2012.40(1-11)Online publication date: 10-Nov-2012
    • (2010)Criticality-driven superscalar design space explorationProceedings of the 19th international conference on Parallel architectures and compilation techniques10.1145/1854273.1854308(261-272)Online publication date: 11-Sep-2010
    • (2010)Long-Haul Transmission Performance in the InternetProceedings of the 2010 8th Annual Communication Networks and Services Research Conference10.1109/CNSR.2010.42(387-395)Online publication date: 11-May-2010
    • (2023)Performal: Formal Verification of Latency Properties for Distributed SystemsProceedings of the ACM on Programming Languages10.1145/35912357:PLDI(368-393)Online publication date: 6-Jun-2023
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media