Abstract
Cyclic debugging of nondeterministic parallel programs requires some kind of record and replay technique, because successive executions may produce different results even if the same input is supplied. The NOndeterministic Program Evaluator NOPE is an implementation of record and replay for message-passing systems. During an initial record phase, ordering information about occurring events is stored in traces, which preserve an equivalent execution during follow-up replay phases. In comparison to other tools, NOPE produces less overhead in time and space by relying on certain properties of MPI and PVM. The key factor is the non-overtaking rule which simplifies not only tracing and replay but also race condition detection. In addition, an automatic approach to event manipulation allows extensive investigation of nondeterministic behavior.
Preview
Unable to display preview. Download preview PDF.
References
Curtis, R.S. and Wittie, L.D.: BugNet: A Debugging System for Parallel Programming Environments. Proc. 3rd Intl. Conf. Distributed Computing Systems, Miami, FL, pp. 394–399 (Oct. 1982).
Damodaran-Kamal, S.K. and Francioni, J.M.: Testing Races in Parallel Programs with an OtOt Strategy. Proc. 1994 Intl. Symp. on Software Testing and Analysis, Seattle, WA (1994).
Geist, A., Beguelin, A., Dongarra, J., Joang, W., Manchek, R., Sunderam, V.: PVM 3 User's Guide and Reference Manual. Techn. Rep. ORNL/TM-12187, Oak Ridge Natl. Lab., Oak Ridge, TN (May 1994).
Kranzlmüller, D., Grabner, S. and Volkert, J.: Debugging with the MAD Environment. Parallel Computing, Vol. 23, Nos. 1–2, pp. 199–217 (Apr. 1997).
Lamport, L.: Time, Clocks, and the Ordering of Events in a Distributed System. Comm. ACM, pp. 558–565 (July 1978).
LeBlanc, T.J. and Mellor-Crummey, J.M.: Debugging Parallel Programs with Instant Replay. IEEE Trans. on Comp., Vol. C-36, No. 4, pp. 471–481 (1987).
Leu, E., Schiper, A., and Zramdini, A.: Execution Replay on Distributed Memory Architectures. Proc. 2nd IEEE Symp. on Parallel & Distributed Processing, Dallas, TX, pp. 106–112 (Dec. 1990).
Message Passing Interface Forum: MPI: A Message-Passing Interface Standard — Version 1.1. http://www.mcs.anl.gov/mpi/ (June 1995).
Netzer, R.H.B. and Miller, B.P.: Optimal Tracing and Replay for Message-Passing Parallel Programs. Supercomputing '92, Minneapolis, MN (Nov. 1992).
Ronsse, M.A. and Kranzlmüller, D.: RoltMP — Replay of Lamport Timestamps for Message Passing Systems. Proc. 6th EUROMICRO Workshop on Parallel and Distributed Processing, Madrid, Spain, pp. 87–93, (Jan. 21–23, 1998).
Snelling, D.F. and Hoffmann, G.-R.: A comparative study of libraries for parallel processing. Proc. Intl. Conf. on Vector and Parallel Processors, Computational Science III, Parallel Computing, Vol. 8 (1–3), pp. 255–266 (1988).
Wasserman, H. and Blum, M.: Program result-checking: a theory of testing meets a test of theory. Proc. 35th IEEE Symp. Foundations of Computer Science, pp. 382–392 (1994).
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1998 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kranzlmüller, D., Volkert, J. (1998). Debugging point-to-point communication in MPI and PVM. In: Alexandrov, V., Dongarra, J. (eds) Recent Advances in Parallel Virtual Machine and Message Passing Interface. EuroPVM/MPI 1998. Lecture Notes in Computer Science, vol 1497. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0056584
Download citation
DOI: https://doi.org/10.1007/BFb0056584
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-65041-6
Online ISBN: 978-3-540-49705-9
eBook Packages: Springer Book Archive