Extending the scope of the controlled logical clock

Becker, Daniel; Geimer, Markus; Rabenseifner, Rolf; Wolf, Felix

doi:10.1007/s10586-011-0181-8

Extending the scope of the controlled logical clock

Published: 23 September 2011

Volume 16, pages 171–189, (2013)
Cite this article

Cluster Computing Aims and scope Submit manuscript

Daniel Becker¹,
Markus Geimer²,
Rolf Rabenseifner³ &
…
Felix Wolf^1,2,4

173 Accesses
1 Citation
Explore all metrics

Abstract

Event traces are helpful in understanding the performance behavior of parallel applications since they allow the in-depth analysis of communication and synchronization patterns. However, the absence of synchronized clocks on most cluster systems may render the analysis ineffective because inaccurate relative event timings may misrepresent the logical event order and lead to errors when quantifying the impact of certain behaviors or confuse the users of time-line visualization tools by showing messages flowing backward in time. In our earlier work, we have developed a scalable algorithm called the controlled logical clock that eliminates inconsistent inter-process timings postmortem in traces of pure MPI applications, potentially running on large processor configurations. In this paper, we first demonstrate that our algorithm also proves beneficial in computational grids, where a single application is executed using the combined computational power of several geographically dispersed clusters. Second, we present an extended version of the algorithm that—in addition to message-passing event semantics—also preserves and restores shared-memory event semantics, enabling the correction of traces from hybrid applications.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Amdahl, G.M.: Validity of the single processor approach to achieving large scale computing capabilities. In: Proc. of the AFIPS Joint Computer Conferences, Atlantic City, NJ, USA, pp. 483–485. ACM Press, New York (1967). doi:10.1145/1465482.1465560
Google Scholar
Babaoǧlu, O., Drummond, R.: (Almost) no cost clock synchronization. Technical Report TR86-791, Cornell University (1986)
Barnes, J.E., Hut, P.: A hierarchical O(N log N) force-calculation algorithm. Nature 324(6096), 446–449 (1986). doi:10.1038/324446a0
Article Google Scholar
Becker, D., Wolf, F., Frings, W., Geimer, M., Wylie, B.J.N., Mohr, B.: Automatic trace-based performance analysis of metacomputing applications. In: Proc. of the International Parallel and Distributed Processing Symposium, Long Beach, CA, USA. IEEE Press, New York (2007)
Google Scholar
Becker, D., Frings, W., Wolf, F.: Performance evaluation and optimization of parallel grid computing applications. In: Proc. of the 16th Euromicro International Conference on Parallel, Distributed and Network-Based Processing, Toulouse, France, pp. 193–199. IEEE Press, New York (2008)
Google Scholar
Becker, D., Rabenseifner, R., Wolf, F.: Implications of non-constant clock drifts for the timestamps of concurrent events. In: Proc. of the IEEE Cluster Conference, Tsukuba, Japan, pp. 59–68. IEEE Press, New York (2008)
Google Scholar
Becker, D., Rabenseifner, R., Wolf, F., Linford, J.C.: Scalable timestamp synchronization for event traces of message-passing applications. Parallel Comput. 35(12), 595–607 (2009)
Article MathSciNet Google Scholar
Biberstein, M., Harel, Y., Heilper, A.: Clock synchronization in Cell BE traces. In: Proc. of the 14th Euro-Par Conference, Las Palmas de Gran Canaria, Spain. LNCS, vol. 5168, pp. 3–12. Springer, Berlin (2008)
Google Scholar
Brown, P.N., Falgout, R.D., Jones, J.E.: Semicoarsening multigrid on distributed memory machines. SIAM J. Sci. Comput. 21(5), 1823–1834 (2000)
Article MathSciNet MATH Google Scholar
Cell Broadband Engine resource center: (2011). www.ibm.com/developerworks/power/cell
Cristian, F.: Probabilistic clock synchronization. Distrib. Comput. 3(3), 146–158 (1989)
Article MATH Google Scholar
Doleschal, J., Knüpfer, A., Müller, M.S., Nagel, W.: Internal timer synchronization for parallel event tracing. In: Proc. of the 15th European PVM/MPI Users’ Group Meeting, Dublin, Ireland. LNCS, vol. 5205, pp. 202–209. Springer, Berlin (2008)
Google Scholar
Dorta, A.J., Rodriguez, C., de Sande, F., Gonzalez-Escribano, A.: The OpenMP source code repository. In: Proc. of the 13th Euromicro International Conference on Parallel, Distributed and Network-Based Processing, Lugano, Switzerland, pp. 244–250. IEEE Press, New York (2005)
Chapter Google Scholar
Drummond, R., Babaoǧlu, O.: Low-cost clock synchronization. Distrib. Comput. 6(4), 193–203 (1993)
Article MATH Google Scholar
Duda, A., Harrus, G., Haddad, Y., Bernard, G.: Estimating global time in distributed systems. In: Proc. of the 7th International Conference on Distributed Computing Systems, Berlin, Germany, pp. 299–306. IEEE Press, New York (1987)
Google Scholar
Dunigan, T.H.: Hypercube clock synchronization. ORNL TM-11744 (1994). www.csm.ornl.gov/dunigan/clock.ps
Edwards, D., Kearns, P.: DTVS: A distributed trace visualization system. In: Proc. of the 6th IEEE Symposium on Parallel and Distributed Processing, Dallas, TX, USA, pp. 281–288. IEEE Press, New York (1994)
Google Scholar
Fidge, C.J.: Timestamps in message-passing systems that preserve partial ordering. Aust. Comput. Sci. Commun. 10(1), 56–66 (1988)
Google Scholar
Fidge, C.J.: Partial orders for parallel debugging. ACM SIGPLAN Not. 24(1), 183–194 (1989)
Article Google Scholar
Foster, I.T.: Globus toolkit version 4: Software for service-oriented systems. In: Proc. of the International Conference on Network and Parallel Computing, Tokyo, Japan. LNCS, vol. 3779, pp. 2–13. Springer, Berlin (2006)
Chapter Google Scholar
Geimer, M., Wolf, F., Knüpfer, A., Mohr, B., Wylie, B.J.N.: A parallel trace-data interface for scalable performance analysis. In: Proc. of the Workshop on State-of-the-Art in Scientific and Parallel Computing, Umeå, Sweden. LNCS, vol. 4699, pp. 398–408. Springer, Berlin (2006)
Chapter Google Scholar
Geimer, M., Wolf, F., Wylie, B.J.N., Mohr, B.: A scalable tool architecture for diagnosing wait states in massively parallel applications. Parallel Comput. 35(7), 375–388 (2009)
Article Google Scholar
Haban, D., Weigel, W.: Global events and global breakpoints in distributed systems. In: Proc. of the 21st Hawaii International Conference on System Sciences, Kailua-Kona, HI, USA, pp. 166–175. IEEE Press, New York (1988)
Google Scholar
Hoeflinger, J.P.: Extending OpenMP to clusters (2005). cache-www.intel.com/cd/00/00/28/58/285865_285865.pdf
Hofmann, R.: Gemeinsame Zeitskala für lokale Ereignisspuren. In: Messung, Modellierung und Bewertung von Rechen- und Kommunikationssystemen, Aachen, Germany, pp. 333–345. Springer, Berlin (1993)
Chapter Google Scholar
Hofmann, R., Hilgers, U.: Theory and tool for estimating global time in parallel and distributed systems. In: Proc. of the 6th Euromicro Workshop on Parallel and Distributed Processing, Madrid, Spain, pp. 173–179. IEEE Press, New York (1998)
Google Scholar
Huband, S., McDonald, C.: A preliminary topological debugger for MPI programs. In: Proc. of the 1st IEEE/ACM International Symposium on Cluster Computing and the Grid, Brisbane, Australia, pp. 422–429. IEEE Press, New York (2001)
Chapter Google Scholar
Jafri, H.: Measuring causal propagation of overhead of inefficiencies in parallel applications. In: Proc. of the 19th IASTED International Conference on Parallel and Distributed Computing and Systems, Cambridge, MA, pp. 237–243 (2007)
Google Scholar
Janet: UK’s Education and Research Network: (2011). www.ja.net
Jézéquel, J.M.: Building a global time on parallel machines. In: Proc. of the 3rd International Workshop on Distributed Algorithms, Nice, France. LNCS, vol. 392, pp. 136–147. Springer, Berlin (1989)
Chapter Google Scholar
Karonis, N., Toonen, B., Foster, I.: MPICH-G2: a grid-enabled implementation of the message passing interface. J. Parallel Distrib. Comput. 63(5), 551–563 (2003)
Article MATH Google Scholar
Labarta, J., Girona, S., Pillet, V., Cortes, T., Gregoris, L.: DiP: a parallel program development environment. In: Proc. of the European Conference on Parallel Computing, Lyon, France. LNCS, vol. 1124, pp. 665–674. Springer, Berlin (1996)
Google Scholar
Lamport, L.: Time, clocks, and the ordering of events in a distributed system. Commun. ACM 21(7), 558–565 (1978)
Article MATH Google Scholar
Lorenz, D., Mohr, B., Rössel, C., Schmidl, D., Wolf, F.: How to reconcile event-based performance analysis with tasking in OpenMP. In: Proc. of the 6th International Workshop on OpenMP, Tsukuba, Japan. LNCS, vol. 6132, pp. 109–121. Springer, Berlin (2010)
Google Scholar
MacLaren, J.: HARC: the highly-available resource co-allocator. In: Proc. of On the Move to Meaningful Internet Systems 2007: CoopIS, DOA, ODBASE, GADA, and IS, Vilamoura, Portugal. LNCS, vol. 4804, pp. 1385–1402. Springer, Berlin (2007)
Chapter Google Scholar
Maillet, E., Tron, C.: On efficiently implementing global time for performance evaluation on multiprocessor systems. J. Parallel Distrib. Comput. 28, 84–93 (1995)
Article MATH Google Scholar
Mattern, F.: Virtual time and global states of distributed systems. In: Proc. of the International Workshop on Parallel and Distributed Algorithms, Chateau de Bonas, France, pp. 215–226. Elsevier Science, Amsterdam (1989)
Google Scholar
Mills, D.L.: Network Time Protocol (Version 3). The Internet Engineering Task Force—Network Working Group (1992). RFC 1305
Mohr, B., Malony, A., Shende, S., Wolf, F.: Design and prototype of a performance tool interface for OpenMP. J. Supercomput. 23(1), 105–128 (2002)
Article MATH Google Scholar
Nagel, W., Arnold, A., Weber, M., Hoppe, H.C., Solchenbach, K.: Vampir: visualization and analysis of MPI resources. Supercomputer 12(1), 69–80 (1996)
Google Scholar
NGS: National Grid Service: (2011). www.grid-support.ac.uk
Pfalzner, S., Gibbon, P.: Many-Body Tree Methods in Physics. Cambridge University Press, Cambridge (1996)
Book Google Scholar
Probert, R.L., Yu, H., Saleh, K.: Relative-clock-based specification and test result analysis of distributed systems. In: Proc. of the 11th Annual International Phoenix Conference on Computers and Communications, Scottsdale, AZ, USA, pp. 687–694. IEEE Press, New York (1992)
Google Scholar
Rabenseifner, R.: The controlled logical clock—a global time for trace based software monitoring of parallel applications in workstation clusters. In: Proc. of the 5th Euromicro Workshop on Parallel and Distributed Processing, London, UK, pp. 477–484. IEEE Press, New York (1997)
Google Scholar
Rabenseifner, R.: Die geregelte logische Uhr, eine globale Uhr für die tracebasierte Überwachung paralleler Anwendungen. Ph.D. thesis, University of Stuttgart, Stuttgart (2000)
Rodriguez, G., Badia, R.M., Labarta, J.: Generation of simple analytical models for message passing applications. In: Proc. of the European Conference on Parallel Computing, Pisa, Italy. LNCS, vol. 3149, pp. 183–188. Springer, Berlin (2004)
Google Scholar
Schwarz, R., Mattern, F.: Detecting causal relationships in distributed computations: in search of the holy grail. Distrib. Comput. 7(3), 149–174 (1994)
Article MATH Google Scholar
Smarr, L., Catlett, C.E.: Metacomputing. Commun. ACM 35(6), 44–52 (1992)
Article Google Scholar
van Dijk, G.J.V., van der Wal, J.V.D.: Partial ordering of synchronization events for distributed debugging in tightly-coupled multiprocessor systems. In: Proc. of the 2nd European Conference on Distributed Memory Computing, Munich, Germany. LNCS, vol. 487, pp. 100–109. Springer, Berlin (1991)
Chapter Google Scholar
Warren, M.S., Salmon, J.K.: A parallel hashed oct-tree n-body algorithm. In: Proc. of the Conference on High Performance Networking and Computing, Portland, OR, USA, pp. 12–21. ACM Press, New York (1993). doi:10.1145/169627.169640
Google Scholar
Wong, A.K.L., Goscinski, A.M.: Using an enterprise grid for execution of MPI parallel applications—a case study. In: Proc. of the 13th European PVM/MPI Users’ Group Meeting, Bonn, Germany. LNCS, vol. 4192. Springer, Berlin (2006)
Google Scholar
Yang, Z., Marsland, T.A.: Annotated bibliography on global states and times in distributed systems. Oper. Syst. Rev. 27(3), 55–74 (1993)
Article Google Scholar

Download references

Author information

Authors and Affiliations

German Research School for Simulation Sciences, 52062, Aachen, Germany
Daniel Becker & Felix Wolf
Jülich Supercomputing Centre, 52425, Jülich, Germany
Markus Geimer & Felix Wolf
University of Stuttgart, 70550, Stuttgart, Germany
Rolf Rabenseifner
RWTH Aachen University, 52056, Aachen, Germany
Felix Wolf

Authors

Daniel Becker
View author publications
You can also search for this author in PubMed Google Scholar
Markus Geimer
View author publications
You can also search for this author in PubMed Google Scholar
Rolf Rabenseifner
View author publications
You can also search for this author in PubMed Google Scholar
Felix Wolf
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Daniel Becker.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Becker, D., Geimer, M., Rabenseifner, R. et al. Extending the scope of the controlled logical clock. Cluster Comput 16, 171–189 (2013). https://doi.org/10.1007/s10586-011-0181-8

Download citation

Received: 21 July 2011
Accepted: 11 August 2011
Published: 23 September 2011
Issue Date: March 2013
DOI: https://doi.org/10.1007/s10586-011-0181-8

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Extending the scope of the controlled logical clock

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Memory Usage Optimizations for Online Event Analysis

Event-Action Mappings for Parallel Tools Infrastructures

Study of the Event Log Method to Organize Fault Tolerant and Self-Balancing Calculations in a Hybrid Environment

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Extending the scope of the controlled logical clock

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Memory Usage Optimizations for Online Event Analysis

Event-Action Mappings for Parallel Tools Infrastructures

Study of the Event Log Method to Organize Fault Tolerant and Self-Balancing Calculations in a Hybrid Environment

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation