Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1882291.1882297acmconferencesArticle/Chapter ViewAbstractPublication PagesfseConference Proceedingsconference-collections
research-article

Finding latent performance bugs in systems implementations

Published: 07 November 2010 Publication History

Abstract

Robust distributed systems commonly employ high-level recovery mechanisms enabling the system to recover from a wide variety of problematic environmental conditions such as node failures, packet drops and link disconnections. Unfortunately, these recovery mechanisms also effectively mask additional serious design and implementation errors, disguising them as latent performance bugs that severely degrade end-to-end system performance. These bugs typically go unnoticed due to the challenge of distinguishing between a bug and an intermittent environmental condition that must be tolerated by the system. We present techniques that can automatically pinpoint latent performance bugs in systems implementations, in the spirit of recent advances in model checking by systematic state space exploration. The techniques proceed by automating the process of conducting random simulations, identifying performance anomalies, and analyzing anomalous executions to pinpoint the circumstances leading to performance degradation.
By focusing our implementation on the MACE toolkit, MACEPC can be used to test our implementations directly, without modification. We have applied MACEPC to five thoroughly tested and trusted distributed systems implementations. MACEPC was able to find significant, previously unknown, long-standing performance bugs in each of the systems, and led to fixes that significantly improved the end-to-end performance of the systems.

References

[1]
Bittorrent. http://bitconjurer.org/BitTorrent.
[2]
CADAR, C., DUNBAR, D., AND ENGLER, D. R. Klee: Unassisted and automatic generation of high-coverage tests for complex systems programs. In OSDI (2008).
[3]
CASTRO, M., DRUSCHEL, P., KERMARREC, A.-M., NANDI, A., ROWSTRON, A., AND SINGH, A. SplitStream: High-bandwidth content distribution in cooperative environments. In SOSP (2003).
[4]
DABEK, F., COX, R., KAASHOEK, F., AND MORRIS, R. Vivaldi: A decentralized network coordinate system. In SIGCOMM (Portland, Oregon, 2004).
[5]
ENGLER, D. R., CHEN, D. Y., AND CHOU, A. Bugs as inconsistent behavior: A general approach to inferring errors in systems code. In SOSP (2001), pp. 57--72.
[6]
GEELS, D., ALTEKAR, G., MANIATIS, P., ROSCOE, T., AND STOICA, I. Friday: Global comprehension for distributed replay. In NSDI (2007).
[7]
GODEFROID, P. Model checking for programming languages using Verisoft. In POPL (1997).
[8]
GODEFROID, P., KLARLUND, N., AND SEN, K. Dart: directed automated random testing. In PLDI (2005).
[9]
GOLDSMITH, S., AIKEN, A., AND WILKERSON, D. S. Measuring empirical computational complexity. In ESEC/SIGSOFT FSE (2007), pp. 395--404.
[10]
HAVELUND, K., AND PRESSBURGER, T. Model checking Java programs using Java Pathfinder. Software Tools for Technology Transfer (STTT) 2(4) (2000), 72--84.
[11]
JANNOTTI, J., GIFFORD, D. K., JOHNSON, K. L., KAASHOEK, M. F., AND JAMES W. O'TOOLE, J. Overcast: Reliable Multicasting with an Overlay Network. In OSDI(2000).
[12]
KILLIAN, C., ANDERSON, J. W., BRAUD, R., JHALA, R., AND VAHDAT, A. Mace: Language support for building distributed systems. In PLDI (2007).
[13]
KILLIAN, C., ANDERSON, J. W., JHALA, R., AND VAHDAT, A. Life, death, and the critical transition: Detecting liveness bugs in systems code. In NSDI (2007).
[14]
KOSTIĆ, D., BRAUD, R., KILLIAN, C., VANDEKIEFT, E., ANDERSON, J. W., SNOEREN, A. C., AND VAHDAT, A. Maintaining high bandwidth under dynamic network conditions. In USENIX ATC (2005).
[15]
KOSTIĆ, D., RODRIGUEZ, A., ALBRECHT, J., BHIRUD, A., AND VAHDAT, A. Using Random Subsets to Build Scalable Network Services. In USITS (2003).
[16]
LAMPORT, L. The part-time parliament. ACM Trans. Comput. Syst. 16, 2 (May 1998), 133--169.
[17]
LUI, X., LIN, W., PAN, A., AND ZHANG, Z. Wids checker: Combating bugs in distributed systems. In NSDI (2007).
[18]
MOORE, D. S., AND MCCABE, G. P. Introduction to the Practice of Statistics, 3rd ed. W.H. Freeman, New York, 1999.
[19]
MUSUVATHI, M., PARK, D., CHOU, A., ENGLER, D., AND DILL, D. CMC: A pragmatic approach to model checking real code. In OSDI (2002).
[20]
MUSUVATHI, M., AND QADEER, S. Iterative context bounding for systematic testing of multithreaded programs. In PLDI (2007).
[21]
MUSUVATHI, M., AND QADEER, S. Fair stateless model checking. In PLDI (2008).
[22]
MUSUVATHI, M., QADEER, S., BALL, T., BASLER, G., NAINAR, P. A., AND NEAMTIU, I. Finding and reproducing heisenbugs in concurrent programs. In OSDI (2008).
[23]
PATRICK REYNOLDS, CHARLES KILLIAN, J. L. W. J. C. M. M. A. S., AND VAHDAT, A. Pip: Detecting the unexpected in distributed systems. In NSDI (2006).
[24]
RHEA, S., GEELS, D., ROSCOE, T., AND KUBIATOWICZ, J. Handling churn in a dht. In USENIX ATC (2004).
[25]
RODRIGO FONSECA, GEORGE PORTER, R. H. K. S. S., AND STOICA, I. X-trace: A pervasive network tracing framework. In NSDI (2007).
[26]
ROWSTRON, A., AND DRUSCHEL, P. Pastry: Scalable, distributed object location and routing for large-scale peer-to-peer systems. In Middleware (2001).
[27]
STOICA, I., MORRIS, R., KARGER, D., KAASHOEK, F., AND BALAKRISHNAN, H. Chord: A scalable peer to peer lookup service for internet applications. In SIGCOMM(2001).
[28]
VAHDAT, A., YOCUM, K., WALSH, K., MAHADEVAN, P., KOSTI´C, D., CHASE, J., AND BECKER, D. Scalability and Accuracy in a Large-Scale Network Emulator. In OSDI(2002).
[29]
YANG, J., CHEN, T., WU, M., XU, Z., LIU, X., LIN, H., YANG, M., LONG, F., ZHANG, L., AND ZHOU, L. MODIST: Transparent Model Checking of Unmodified Distributed Systems . In NSDI (2009).
[30]
ZELLER, A. Yesterday, my program worked. today, it does not. why? In ESEC / SIGSOFT FSE (1999), pp. 253--267.
[31]
ZHANG, X., GUPTA, N., AND GUPTA, R. Locating faults through automated predicate switching. In ICSE (New York, NY, USA, 2006), ACM, pp. 272--281.

Cited By

View all
  • (2024)Enhancing Performance Bug Prediction Using Performance Code MetricsProceedings of the 21st International Conference on Mining Software Repositories10.1145/3643991.3644920(50-62)Online publication date: 15-Apr-2024
  • (2024)Revealing inputs causing web API performance latency using response-time-guided genetic algorithm fuzzingArtificial Life and Robotics10.1007/s10015-024-00957-429:4(459-472)Online publication date: 2-Aug-2024
  • (2023)Performal: Formal Verification of Latency Properties for Distributed SystemsProceedings of the ACM on Programming Languages10.1145/35912357:PLDI(368-393)Online publication date: 6-Jun-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
FSE '10: Proceedings of the eighteenth ACM SIGSOFT international symposium on Foundations of software engineering
November 2010
302 pages
ISBN:9781605587912
DOI:10.1145/1882291
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 November 2010

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. debugging
  2. distributed systems
  3. mace
  4. macepc
  5. performance

Qualifiers

  • Research-article

Conference

SIGSOFT/FSE'10
Sponsor:

Acceptance Rates

Overall Acceptance Rate 17 of 128 submissions, 13%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)33
  • Downloads (Last 6 weeks)0
Reflects downloads up to 06 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Enhancing Performance Bug Prediction Using Performance Code MetricsProceedings of the 21st International Conference on Mining Software Repositories10.1145/3643991.3644920(50-62)Online publication date: 15-Apr-2024
  • (2024)Revealing inputs causing web API performance latency using response-time-guided genetic algorithm fuzzingArtificial Life and Robotics10.1007/s10015-024-00957-429:4(459-472)Online publication date: 2-Aug-2024
  • (2023)Performal: Formal Verification of Latency Properties for Distributed SystemsProceedings of the ACM on Programming Languages10.1145/35912357:PLDI(368-393)Online publication date: 6-Jun-2023
  • (2023)Performance Bug Analysis and Detection for Distributed Storage and Computing SystemsACM Transactions on Storage10.1145/358028119:3(1-33)Online publication date: 19-Jun-2023
  • (2022)Explaining and debugging pathological program behaviorProceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3540250.3558910(1795-1799)Online publication date: 7-Nov-2022
  • (2022)Algorithmic Profiling for Real-World Complexity ProblemsIEEE Transactions on Software Engineering10.1109/TSE.2021.306765248:7(2680-2694)Online publication date: 1-Jul-2022
  • (2021)Probabilistic profiling of stateful data planes for adversarial testingProceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3445814.3446764(286-301)Online publication date: 19-Apr-2021
  • (2021)How Developers Optimize Virtual Reality ApplicationsProceedings of the 43rd International Conference on Software Engineering10.1109/ICSE43902.2021.00052(473-485)Online publication date: 22-May-2021
  • (2021)Automatic Microprocessor Performance Bug Detection2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA51647.2021.00053(545-556)Online publication date: Feb-2021
  • (2021)Understanding and Detecting Performance Bugs in Markdown Compilers2021 36th IEEE/ACM International Conference on Automated Software Engineering (ASE)10.1109/ASE51524.2021.9678611(892-904)Online publication date: Nov-2021
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media