Abstract
Debugging is a crucial part of the software development process. Especially massively-parallel programs impose huge difficulties to program analyis and debugging due to their higher complexity compared to sequential programs. For debugging and analysing parallel programs there are several tools available, but many of these fail in case of massively-parallel programs with potentially thousands of processes.
In this work we introduce the single process debugging strategy, a scalable debugging strategy for massively-parallel programs. The goal of this strategy is to make debugging large scale programs as simple and straight-forward as debugging sequential programs. This is achieved by adapting and combining several techniques which are well known from sequential debugging. In combination, these techniques give the user the possibility to execute and investigate small fractions of a possibly huge parallel program, without having to (re-)execute the entire program.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Foster, I., Kesselman, C., Tuecke, S.: The anatomy of the grid: Enabling scalable vir-tual organizations. The International Journal of High Performance Computing Applications 15, 200–222 (2001)
Schaubschläger, C.: Automatic testing of nondeterministic programs in message passing systems. Masters Thesis, GUP, Johannes Kepler University, Linz, Austria (2000), http://www.gup.unilinz.ac.at/~cs/thesis
LeBlanc, T.J., Mellor-Crummey, J.M.: Debugging parallel programs with instant replay. IEEE Trans. Comput. 36, 471–482 (1987)
Balle, S.M., Brett, B.R., Chen, C.P., LaFrance-Linden, D.: Extending a traditional debugger to debug massively parallel programs. Journal of Parallel and Distributed Computing 64, 617–628 (2004)
Cunha, J., Lourenco, J., Antao, T.: A debugging engine for parallel and distributed environment (1996)
Kacsuk, P.: Systematic macrostep debugging of message passing parallel programs. Future Gener. Comput. Syst. 16, 609–624 (2000)
Etnus: Totalview debugger (2005), http://www.etnus.com/
Absoft, Corp.: DDT - Distributed Debugging Tool (2005)
Weiser, M.: Program slicing. In: ICSE 1981: Proceedings of the 5th international con-ference on Software engineering, Piscataway, NJ, USA, pp. 439–449. IEEE Press, Los Alamitos (1981)
Duesterwald, E., Gupta, R., Soffa, M.L.: Distributed Slicing and Partial Re-execution for Distributed Programs. In: Languages and Compilers for Parallel Computing, pp. 497–511 (1992)
Message Passing Interface Forum: MPI: A Message-Passing Interface Standard - Verion 1.1, http://www.mcs.anl.gov/mpi/ (1995)
Kranzlmüller, D.: Event graph analysis for debugging massively parallel programs. PhD thesis, GUP, Joh. Kepler Univ. Linz (2000), http://www.gup.uni-linz.ac.at/~dk/thesis
Lamport, L.: Time, clocks, and the ordering of events in a distributed system. Communications of the ACM 21(7), 558–565 (1978)
Kranzlmüller, D., Volkert, J.: NOPE: A nondeterministic program evaluator. In: Zinterhof, P., Vajtersic, M., Uhl, A. (eds.) ACPC 1999 and ParNum 1999, vol. 1557, pp. 490–499. Springer, Heidelberg (1999)
Kobler, R., Schaubschläger, C., Aichinger, B., Kranzlmller, D., Volkert, J.: Exam-ples of monitoring and program analysis activities with dewiz. In: Proc. DAPSYS 2004 (5th Austrian- Hungarian Workshop On Distributed And Parallel Systems) (2004)
Thoai, N.: Checkpointing techniques for minimizing the waiting time during debug-ging long-running parallel programs. PhD. Thesis, GUP, Johannes Kepler University, Linz, Austria (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Schaubschläger, C., Kranzlmüller, D., Volkert, J. (2006). Using Sequential Debugging Techniques with Massively Parallel Programs. In: Alexandrov, V.N., van Albada, G.D., Sloot, P.M.A., Dongarra, J. (eds) Computational Science – ICCS 2006. ICCS 2006. Lecture Notes in Computer Science, vol 3992. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11758525_75
Download citation
DOI: https://doi.org/10.1007/11758525_75
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-34381-3
Online ISBN: 978-3-540-34382-0
eBook Packages: Computer ScienceComputer Science (R0)