Abstract
Event traces are helpful in understanding the performance behavior of parallel applications since they allow the in-depth analysis of communication and synchronization patterns. However, the absence of synchronized clocks on most cluster systems may render the analysis ineffective because inaccurate relative event timings may misrepresent the logical event order and lead to errors when quantifying the impact of certain behaviors or confuse the users of time-line visualization tools by showing messages flowing backward in time. In our earlier work, we have developed a scalable algorithm called the controlled logical clock that eliminates inconsistent inter-process timings postmortem in traces of pure MPI applications, potentially running on large processor configurations. In this paper, we first demonstrate that our algorithm also proves beneficial in computational grids, where a single application is executed using the combined computational power of several geographically dispersed clusters. Second, we present an extended version of the algorithm that—in addition to message-passing event semantics—also preserves and restores shared-memory event semantics, enabling the correction of traces from hybrid applications.
Similar content being viewed by others
References
Amdahl, G.M.: Validity of the single processor approach to achieving large scale computing capabilities. In: Proc. of the AFIPS Joint Computer Conferences, Atlantic City, NJ, USA, pp. 483–485. ACM Press, New York (1967). doi:10.1145/1465482.1465560
Babaoǧlu, O., Drummond, R.: (Almost) no cost clock synchronization. Technical Report TR86-791, Cornell University (1986)
Barnes, J.E., Hut, P.: A hierarchical O(N log N) force-calculation algorithm. Nature 324(6096), 446–449 (1986). doi:10.1038/324446a0
Becker, D., Wolf, F., Frings, W., Geimer, M., Wylie, B.J.N., Mohr, B.: Automatic trace-based performance analysis of metacomputing applications. In: Proc. of the International Parallel and Distributed Processing Symposium, Long Beach, CA, USA. IEEE Press, New York (2007)
Becker, D., Frings, W., Wolf, F.: Performance evaluation and optimization of parallel grid computing applications. In: Proc. of the 16th Euromicro International Conference on Parallel, Distributed and Network-Based Processing, Toulouse, France, pp. 193–199. IEEE Press, New York (2008)
Becker, D., Rabenseifner, R., Wolf, F.: Implications of non-constant clock drifts for the timestamps of concurrent events. In: Proc. of the IEEE Cluster Conference, Tsukuba, Japan, pp. 59–68. IEEE Press, New York (2008)
Becker, D., Rabenseifner, R., Wolf, F., Linford, J.C.: Scalable timestamp synchronization for event traces of message-passing applications. Parallel Comput. 35(12), 595–607 (2009)
Biberstein, M., Harel, Y., Heilper, A.: Clock synchronization in Cell BE traces. In: Proc. of the 14th Euro-Par Conference, Las Palmas de Gran Canaria, Spain. LNCS, vol. 5168, pp. 3–12. Springer, Berlin (2008)
Brown, P.N., Falgout, R.D., Jones, J.E.: Semicoarsening multigrid on distributed memory machines. SIAM J. Sci. Comput. 21(5), 1823–1834 (2000)
Cell Broadband Engine resource center: (2011). www.ibm.com/developerworks/power/cell
Cristian, F.: Probabilistic clock synchronization. Distrib. Comput. 3(3), 146–158 (1989)
Doleschal, J., Knüpfer, A., Müller, M.S., Nagel, W.: Internal timer synchronization for parallel event tracing. In: Proc. of the 15th European PVM/MPI Users’ Group Meeting, Dublin, Ireland. LNCS, vol. 5205, pp. 202–209. Springer, Berlin (2008)
Dorta, A.J., Rodriguez, C., de Sande, F., Gonzalez-Escribano, A.: The OpenMP source code repository. In: Proc. of the 13th Euromicro International Conference on Parallel, Distributed and Network-Based Processing, Lugano, Switzerland, pp. 244–250. IEEE Press, New York (2005)
Drummond, R., Babaoǧlu, O.: Low-cost clock synchronization. Distrib. Comput. 6(4), 193–203 (1993)
Duda, A., Harrus, G., Haddad, Y., Bernard, G.: Estimating global time in distributed systems. In: Proc. of the 7th International Conference on Distributed Computing Systems, Berlin, Germany, pp. 299–306. IEEE Press, New York (1987)
Dunigan, T.H.: Hypercube clock synchronization. ORNL TM-11744 (1994). www.csm.ornl.gov/dunigan/clock.ps
Edwards, D., Kearns, P.: DTVS: A distributed trace visualization system. In: Proc. of the 6th IEEE Symposium on Parallel and Distributed Processing, Dallas, TX, USA, pp. 281–288. IEEE Press, New York (1994)
Fidge, C.J.: Timestamps in message-passing systems that preserve partial ordering. Aust. Comput. Sci. Commun. 10(1), 56–66 (1988)
Fidge, C.J.: Partial orders for parallel debugging. ACM SIGPLAN Not. 24(1), 183–194 (1989)
Foster, I.T.: Globus toolkit version 4: Software for service-oriented systems. In: Proc. of the International Conference on Network and Parallel Computing, Tokyo, Japan. LNCS, vol. 3779, pp. 2–13. Springer, Berlin (2006)
Geimer, M., Wolf, F., Knüpfer, A., Mohr, B., Wylie, B.J.N.: A parallel trace-data interface for scalable performance analysis. In: Proc. of the Workshop on State-of-the-Art in Scientific and Parallel Computing, Umeå, Sweden. LNCS, vol. 4699, pp. 398–408. Springer, Berlin (2006)
Geimer, M., Wolf, F., Wylie, B.J.N., Mohr, B.: A scalable tool architecture for diagnosing wait states in massively parallel applications. Parallel Comput. 35(7), 375–388 (2009)
Haban, D., Weigel, W.: Global events and global breakpoints in distributed systems. In: Proc. of the 21st Hawaii International Conference on System Sciences, Kailua-Kona, HI, USA, pp. 166–175. IEEE Press, New York (1988)
Hoeflinger, J.P.: Extending OpenMP to clusters (2005). cache-www.intel.com/cd/00/00/28/58/285865_285865.pdf
Hofmann, R.: Gemeinsame Zeitskala für lokale Ereignisspuren. In: Messung, Modellierung und Bewertung von Rechen- und Kommunikationssystemen, Aachen, Germany, pp. 333–345. Springer, Berlin (1993)
Hofmann, R., Hilgers, U.: Theory and tool for estimating global time in parallel and distributed systems. In: Proc. of the 6th Euromicro Workshop on Parallel and Distributed Processing, Madrid, Spain, pp. 173–179. IEEE Press, New York (1998)
Huband, S., McDonald, C.: A preliminary topological debugger for MPI programs. In: Proc. of the 1st IEEE/ACM International Symposium on Cluster Computing and the Grid, Brisbane, Australia, pp. 422–429. IEEE Press, New York (2001)
Jafri, H.: Measuring causal propagation of overhead of inefficiencies in parallel applications. In: Proc. of the 19th IASTED International Conference on Parallel and Distributed Computing and Systems, Cambridge, MA, pp. 237–243 (2007)
Janet: UK’s Education and Research Network: (2011). www.ja.net
Jézéquel, J.M.: Building a global time on parallel machines. In: Proc. of the 3rd International Workshop on Distributed Algorithms, Nice, France. LNCS, vol. 392, pp. 136–147. Springer, Berlin (1989)
Karonis, N., Toonen, B., Foster, I.: MPICH-G2: a grid-enabled implementation of the message passing interface. J. Parallel Distrib. Comput. 63(5), 551–563 (2003)
Labarta, J., Girona, S., Pillet, V., Cortes, T., Gregoris, L.: DiP: a parallel program development environment. In: Proc. of the European Conference on Parallel Computing, Lyon, France. LNCS, vol. 1124, pp. 665–674. Springer, Berlin (1996)
Lamport, L.: Time, clocks, and the ordering of events in a distributed system. Commun. ACM 21(7), 558–565 (1978)
Lorenz, D., Mohr, B., Rössel, C., Schmidl, D., Wolf, F.: How to reconcile event-based performance analysis with tasking in OpenMP. In: Proc. of the 6th International Workshop on OpenMP, Tsukuba, Japan. LNCS, vol. 6132, pp. 109–121. Springer, Berlin (2010)
MacLaren, J.: HARC: the highly-available resource co-allocator. In: Proc. of On the Move to Meaningful Internet Systems 2007: CoopIS, DOA, ODBASE, GADA, and IS, Vilamoura, Portugal. LNCS, vol. 4804, pp. 1385–1402. Springer, Berlin (2007)
Maillet, E., Tron, C.: On efficiently implementing global time for performance evaluation on multiprocessor systems. J. Parallel Distrib. Comput. 28, 84–93 (1995)
Mattern, F.: Virtual time and global states of distributed systems. In: Proc. of the International Workshop on Parallel and Distributed Algorithms, Chateau de Bonas, France, pp. 215–226. Elsevier Science, Amsterdam (1989)
Mills, D.L.: Network Time Protocol (Version 3). The Internet Engineering Task Force—Network Working Group (1992). RFC 1305
Mohr, B., Malony, A., Shende, S., Wolf, F.: Design and prototype of a performance tool interface for OpenMP. J. Supercomput. 23(1), 105–128 (2002)
Nagel, W., Arnold, A., Weber, M., Hoppe, H.C., Solchenbach, K.: Vampir: visualization and analysis of MPI resources. Supercomputer 12(1), 69–80 (1996)
NGS: National Grid Service: (2011). www.grid-support.ac.uk
Pfalzner, S., Gibbon, P.: Many-Body Tree Methods in Physics. Cambridge University Press, Cambridge (1996)
Probert, R.L., Yu, H., Saleh, K.: Relative-clock-based specification and test result analysis of distributed systems. In: Proc. of the 11th Annual International Phoenix Conference on Computers and Communications, Scottsdale, AZ, USA, pp. 687–694. IEEE Press, New York (1992)
Rabenseifner, R.: The controlled logical clock—a global time for trace based software monitoring of parallel applications in workstation clusters. In: Proc. of the 5th Euromicro Workshop on Parallel and Distributed Processing, London, UK, pp. 477–484. IEEE Press, New York (1997)
Rabenseifner, R.: Die geregelte logische Uhr, eine globale Uhr für die tracebasierte Überwachung paralleler Anwendungen. Ph.D. thesis, University of Stuttgart, Stuttgart (2000)
Rodriguez, G., Badia, R.M., Labarta, J.: Generation of simple analytical models for message passing applications. In: Proc. of the European Conference on Parallel Computing, Pisa, Italy. LNCS, vol. 3149, pp. 183–188. Springer, Berlin (2004)
Schwarz, R., Mattern, F.: Detecting causal relationships in distributed computations: in search of the holy grail. Distrib. Comput. 7(3), 149–174 (1994)
Smarr, L., Catlett, C.E.: Metacomputing. Commun. ACM 35(6), 44–52 (1992)
van Dijk, G.J.V., van der Wal, J.V.D.: Partial ordering of synchronization events for distributed debugging in tightly-coupled multiprocessor systems. In: Proc. of the 2nd European Conference on Distributed Memory Computing, Munich, Germany. LNCS, vol. 487, pp. 100–109. Springer, Berlin (1991)
Warren, M.S., Salmon, J.K.: A parallel hashed oct-tree n-body algorithm. In: Proc. of the Conference on High Performance Networking and Computing, Portland, OR, USA, pp. 12–21. ACM Press, New York (1993). doi:10.1145/169627.169640
Wong, A.K.L., Goscinski, A.M.: Using an enterprise grid for execution of MPI parallel applications—a case study. In: Proc. of the 13th European PVM/MPI Users’ Group Meeting, Bonn, Germany. LNCS, vol. 4192. Springer, Berlin (2006)
Yang, Z., Marsland, T.A.: Annotated bibliography on global states and times in distributed systems. Oper. Syst. Rev. 27(3), 55–74 (1993)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Becker, D., Geimer, M., Rabenseifner, R. et al. Extending the scope of the controlled logical clock. Cluster Comput 16, 171–189 (2013). https://doi.org/10.1007/s10586-011-0181-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10586-011-0181-8