Abstract
This paper describes AntSM, a system that uses the inherent parallelism of multi-threaded programs to reduce the overhead of statistical and invariant violations detection-based debugging tools. The runtime monitoring of these tools leads to high overheads. The key insight of the AntSM system is that this overhead can be reduced in parallel programs by performing sampled monitoring across parallel regions of the program that are performing similar actions. AntSM implements this sampling using a combination of static and dynamic analyses to determine similar parts of the program executing in parallel and the number of threads executing those parts of the program. Experimental results, performed using the C-DIDUCE (a variant of DIDUCE for C) debugging tool on eleven Pthreads benchmarks from the PARSEC suite, show monitoring overhead is reduced by up to 18.14 times (and on average 8.73 times) on an eight-core machine relative to a naive port that performs no sampling.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
AccMon uses special hardware.
- 2.
The differences between DIDUCE and C-DIDUCE come from the former targeting Java and the latter C. These differences are explained in [3].
- 3.
No significant technical challenge prevents us from using OpenMP.
References
Software errors cost U.S. economy \({\$}59.5\) billion annually. NIST News Release 2002–10
Hangal, S., Lam, M.S.: Tracking down software bugs using automatic anomaly detection. In: Proceedings of the 24th International Conference on Software Engineering, pp. 291–301 (2002)
Fei, L., Midkiff, S.P.: Artemis: practical runtime monitoring of applications for execution anomalies. In: PLDI ’06, pp. 84–95, New York, NY, USA (2006)
Zhou, P., Liu, W., Fei, L., Lu, S., Qin, F., Zhou, Y., Midkiff, S.P., Torrellas, J.: AccMon: automatically detecting memory-related bugs via program counter-based invariants. In: Proceedings of MICRO’04 (2004)
Liblit, B., Naik, M., Zheng, A.X., Aiken, A., Jordan, M.I.: Scalable statistical bug isolation. In: PLDI ’05 (2005)
Liblit, B., Aiken, A., Zheng, A.X., Jordan, M.I.: Bug isolation via remote program sampling. In: PLDI ’03, pp. 141–154 (2003)
Liu, C., Yan, X., Fei, L., Han, J., Midkiff, S.P.: Sober: statistical model-based bug localization. In: ESEC/FSE-13: 10th European Software Engineering Conference Held Jointly with 13th International Symposium on Foundations of Software Engineering (2005)
The PARSEC Benchmark Suite. http://parsec.cs.princeton.edu
Hutchins, M., Foster, H., Goradia, T., Ostrand, T.: Experiments of the effectiveness of dataflow- and controlflow-based test adequacy criteria. In: International Conference on Software Engineering, ICSE ’94, pp. 191–200, Los Alamitos, CA, USA (1994)
Ernst, M.D., Czeisler, A., Griswold, W.G., Notkin, D.: Quickly detecting relevant program invariants. In: Proceedings of the 22nd International Conference on Software Engineering, pp. 449–458 (2000)
The LLVM Compiler Infrastructure. http://llvm.org
Lee, J.-W., Bachega, L.R., Midkiff, S.P., Hu, Y.C.: Ant: a debugging framework for MPI parallel programs. In: Kasahara, H., Kimura, K. (eds.) LCPC 2012. LNCS, vol. 7760, pp. 220–233. Springer, Heidelberg (2013)
Totalview user guide. Accessed 28 Sept 2012
Lumetta, S.S., Culler, D.E.: The mantis parallel debugger. In: SPDT ’96: Proceedings of the SIGMETRICS Symposium on Parallel and Distributed Tools, pp. 118–126, New York, NY, USA (1996)
Sistare, S., Dorenkamp, E., Nevin, N., Loh, E.: MPI support in the Prism programming environment. In: Supercomputing ’99, pp. 22 (1999)
Wismuller, R., Oberhubera, M., Krammera, J., Hansenb, O.: Interactive debugging and performance analysis of massively parallel applications. Parallel Comput. 22(3), 415–442 (1996)
Stringhini, D., Navaux, P., de Kergommeaux, J.C.: A selection mechanism to group processes in a parallel debugger. In: Proceedings of 2000 International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA’00), June 2000
Cheng, D., Hood, R.: A portable debugger for parallel and distributed programs. In: Supercomputing ’94, pp. 723–732, November 1994
Mirgorodskiy, A.V., Maruyama, N., Miller, B.P.: Problem diagnosis in large-scale computing environments. In: SC ’06, pp. 88. ACM (2006)
Gao, Q., Qin, F., Panda, D.K.: DMTracker: finding bugs in large-scale parallel programs by detecting anomaly in data movements. In: SC ’07. ACM (2007)
Arnold, D.C., Ahn, D.H., de Supinski, B.R., Lee, G.L., Miller, B.P., Schulz, M.: Stack trace analysis for large scale debugging. Parallel and Distributed Processing Symposium, p. 64 (2007)
Lee, G.L., Ahn, D.H., Arnold, D.C., de Supinski, B.R., Legendre, M., Miller, B.P., Schulz, M., Liblit, B.: Lessons learned at 208k: towards debugging millions of cores. In: SC ’08, pp. 1–9, Piscataway, NJ, USA (2008)
Strom, R.E., Bacon, D.F., Goldberg, A.P., Lowry, A., Yellin, D.M., Yemini, S.A.: Hermes: A Language for Distributed Computing. Prentice-Hall Inc., Upper Saddle River (1991)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Lee, JW., Midkiff, S.P. (2014). AntSM: Efficient Debugging for Shared Memory Parallel Programs. In: Cașcaval, C., Montesinos, P. (eds) Languages and Compilers for Parallel Computing. LCPC 2013. Lecture Notes in Computer Science(), vol 8664. Springer, Cham. https://doi.org/10.1007/978-3-319-09967-5_12
Download citation
DOI: https://doi.org/10.1007/978-3-319-09967-5_12
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-09966-8
Online ISBN: 978-3-319-09967-5
eBook Packages: Computer ScienceComputer Science (R0)