Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Extending the scope of the controlled logical clock

  • Published:
Cluster Computing Aims and scope Submit manuscript

Abstract

Event traces are helpful in understanding the performance behavior of parallel applications since they allow the in-depth analysis of communication and synchronization patterns. However, the absence of synchronized clocks on most cluster systems may render the analysis ineffective because inaccurate relative event timings may misrepresent the logical event order and lead to errors when quantifying the impact of certain behaviors or confuse the users of time-line visualization tools by showing messages flowing backward in time. In our earlier work, we have developed a scalable algorithm called the controlled logical clock that eliminates inconsistent inter-process timings postmortem in traces of pure MPI applications, potentially running on large processor configurations. In this paper, we first demonstrate that our algorithm also proves beneficial in computational grids, where a single application is executed using the combined computational power of several geographically dispersed clusters. Second, we present an extended version of the algorithm that—in addition to message-passing event semantics—also preserves and restores shared-memory event semantics, enabling the correction of traces from hybrid applications.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Amdahl, G.M.: Validity of the single processor approach to achieving large scale computing capabilities. In: Proc. of the AFIPS Joint Computer Conferences, Atlantic City, NJ, USA, pp. 483–485. ACM Press, New York (1967). doi:10.1145/1465482.1465560

    Google Scholar 

  2. Babaoǧlu, O., Drummond, R.: (Almost) no cost clock synchronization. Technical Report TR86-791, Cornell University (1986)

  3. Barnes, J.E., Hut, P.: A hierarchical O(N log N) force-calculation algorithm. Nature 324(6096), 446–449 (1986). doi:10.1038/324446a0

    Article  Google Scholar 

  4. Becker, D., Wolf, F., Frings, W., Geimer, M., Wylie, B.J.N., Mohr, B.: Automatic trace-based performance analysis of metacomputing applications. In: Proc. of the International Parallel and Distributed Processing Symposium, Long Beach, CA, USA. IEEE Press, New York (2007)

    Google Scholar 

  5. Becker, D., Frings, W., Wolf, F.: Performance evaluation and optimization of parallel grid computing applications. In: Proc. of the 16th Euromicro International Conference on Parallel, Distributed and Network-Based Processing, Toulouse, France, pp. 193–199. IEEE Press, New York (2008)

    Google Scholar 

  6. Becker, D., Rabenseifner, R., Wolf, F.: Implications of non-constant clock drifts for the timestamps of concurrent events. In: Proc. of the IEEE Cluster Conference, Tsukuba, Japan, pp. 59–68. IEEE Press, New York (2008)

    Google Scholar 

  7. Becker, D., Rabenseifner, R., Wolf, F., Linford, J.C.: Scalable timestamp synchronization for event traces of message-passing applications. Parallel Comput. 35(12), 595–607 (2009)

    Article  MathSciNet  Google Scholar 

  8. Biberstein, M., Harel, Y., Heilper, A.: Clock synchronization in Cell BE traces. In: Proc. of the 14th Euro-Par Conference, Las Palmas de Gran Canaria, Spain. LNCS, vol. 5168, pp. 3–12. Springer, Berlin (2008)

    Google Scholar 

  9. Brown, P.N., Falgout, R.D., Jones, J.E.: Semicoarsening multigrid on distributed memory machines. SIAM J. Sci. Comput. 21(5), 1823–1834 (2000)

    Article  MathSciNet  MATH  Google Scholar 

  10. Cell Broadband Engine resource center: (2011). www.ibm.com/developerworks/power/cell

  11. Cristian, F.: Probabilistic clock synchronization. Distrib. Comput. 3(3), 146–158 (1989)

    Article  MATH  Google Scholar 

  12. Doleschal, J., Knüpfer, A., Müller, M.S., Nagel, W.: Internal timer synchronization for parallel event tracing. In: Proc. of the 15th European PVM/MPI Users’ Group Meeting, Dublin, Ireland. LNCS, vol. 5205, pp. 202–209. Springer, Berlin (2008)

    Google Scholar 

  13. Dorta, A.J., Rodriguez, C., de Sande, F., Gonzalez-Escribano, A.: The OpenMP source code repository. In: Proc. of the 13th Euromicro International Conference on Parallel, Distributed and Network-Based Processing, Lugano, Switzerland, pp. 244–250. IEEE Press, New York (2005)

    Chapter  Google Scholar 

  14. Drummond, R., Babaoǧlu, O.: Low-cost clock synchronization. Distrib. Comput. 6(4), 193–203 (1993)

    Article  MATH  Google Scholar 

  15. Duda, A., Harrus, G., Haddad, Y., Bernard, G.: Estimating global time in distributed systems. In: Proc. of the 7th International Conference on Distributed Computing Systems, Berlin, Germany, pp. 299–306. IEEE Press, New York (1987)

    Google Scholar 

  16. Dunigan, T.H.: Hypercube clock synchronization. ORNL TM-11744 (1994). www.csm.ornl.gov/dunigan/clock.ps

  17. Edwards, D., Kearns, P.: DTVS: A distributed trace visualization system. In: Proc. of the 6th IEEE Symposium on Parallel and Distributed Processing, Dallas, TX, USA, pp. 281–288. IEEE Press, New York (1994)

    Google Scholar 

  18. Fidge, C.J.: Timestamps in message-passing systems that preserve partial ordering. Aust. Comput. Sci. Commun. 10(1), 56–66 (1988)

    Google Scholar 

  19. Fidge, C.J.: Partial orders for parallel debugging. ACM SIGPLAN Not. 24(1), 183–194 (1989)

    Article  Google Scholar 

  20. Foster, I.T.: Globus toolkit version 4: Software for service-oriented systems. In: Proc. of the International Conference on Network and Parallel Computing, Tokyo, Japan. LNCS, vol. 3779, pp. 2–13. Springer, Berlin (2006)

    Chapter  Google Scholar 

  21. Geimer, M., Wolf, F., Knüpfer, A., Mohr, B., Wylie, B.J.N.: A parallel trace-data interface for scalable performance analysis. In: Proc. of the Workshop on State-of-the-Art in Scientific and Parallel Computing, Umeå, Sweden. LNCS, vol. 4699, pp. 398–408. Springer, Berlin (2006)

    Chapter  Google Scholar 

  22. Geimer, M., Wolf, F., Wylie, B.J.N., Mohr, B.: A scalable tool architecture for diagnosing wait states in massively parallel applications. Parallel Comput. 35(7), 375–388 (2009)

    Article  Google Scholar 

  23. Haban, D., Weigel, W.: Global events and global breakpoints in distributed systems. In: Proc. of the 21st Hawaii International Conference on System Sciences, Kailua-Kona, HI, USA, pp. 166–175. IEEE Press, New York (1988)

    Google Scholar 

  24. Hoeflinger, J.P.: Extending OpenMP to clusters (2005). cache-www.intel.com/cd/00/00/28/58/285865_285865.pdf

  25. Hofmann, R.: Gemeinsame Zeitskala für lokale Ereignisspuren. In: Messung, Modellierung und Bewertung von Rechen- und Kommunikationssystemen, Aachen, Germany, pp. 333–345. Springer, Berlin (1993)

    Chapter  Google Scholar 

  26. Hofmann, R., Hilgers, U.: Theory and tool for estimating global time in parallel and distributed systems. In: Proc. of the 6th Euromicro Workshop on Parallel and Distributed Processing, Madrid, Spain, pp. 173–179. IEEE Press, New York (1998)

    Google Scholar 

  27. Huband, S., McDonald, C.: A preliminary topological debugger for MPI programs. In: Proc. of the 1st IEEE/ACM International Symposium on Cluster Computing and the Grid, Brisbane, Australia, pp. 422–429. IEEE Press, New York (2001)

    Chapter  Google Scholar 

  28. Jafri, H.: Measuring causal propagation of overhead of inefficiencies in parallel applications. In: Proc. of the 19th IASTED International Conference on Parallel and Distributed Computing and Systems, Cambridge, MA, pp. 237–243 (2007)

    Google Scholar 

  29. Janet: UK’s Education and Research Network: (2011). www.ja.net

  30. Jézéquel, J.M.: Building a global time on parallel machines. In: Proc. of the 3rd International Workshop on Distributed Algorithms, Nice, France. LNCS, vol. 392, pp. 136–147. Springer, Berlin (1989)

    Chapter  Google Scholar 

  31. Karonis, N., Toonen, B., Foster, I.: MPICH-G2: a grid-enabled implementation of the message passing interface. J. Parallel Distrib. Comput. 63(5), 551–563 (2003)

    Article  MATH  Google Scholar 

  32. Labarta, J., Girona, S., Pillet, V., Cortes, T., Gregoris, L.: DiP: a parallel program development environment. In: Proc. of the European Conference on Parallel Computing, Lyon, France. LNCS, vol. 1124, pp. 665–674. Springer, Berlin (1996)

    Google Scholar 

  33. Lamport, L.: Time, clocks, and the ordering of events in a distributed system. Commun. ACM 21(7), 558–565 (1978)

    Article  MATH  Google Scholar 

  34. Lorenz, D., Mohr, B., Rössel, C., Schmidl, D., Wolf, F.: How to reconcile event-based performance analysis with tasking in OpenMP. In: Proc. of the 6th International Workshop on OpenMP, Tsukuba, Japan. LNCS, vol. 6132, pp. 109–121. Springer, Berlin (2010)

    Google Scholar 

  35. MacLaren, J.: HARC: the highly-available resource co-allocator. In: Proc. of On the Move to Meaningful Internet Systems 2007: CoopIS, DOA, ODBASE, GADA, and IS, Vilamoura, Portugal. LNCS, vol. 4804, pp. 1385–1402. Springer, Berlin (2007)

    Chapter  Google Scholar 

  36. Maillet, E., Tron, C.: On efficiently implementing global time for performance evaluation on multiprocessor systems. J. Parallel Distrib. Comput. 28, 84–93 (1995)

    Article  MATH  Google Scholar 

  37. Mattern, F.: Virtual time and global states of distributed systems. In: Proc. of the International Workshop on Parallel and Distributed Algorithms, Chateau de Bonas, France, pp. 215–226. Elsevier Science, Amsterdam (1989)

    Google Scholar 

  38. Mills, D.L.: Network Time Protocol (Version 3). The Internet Engineering Task Force—Network Working Group (1992). RFC 1305

  39. Mohr, B., Malony, A., Shende, S., Wolf, F.: Design and prototype of a performance tool interface for OpenMP. J. Supercomput. 23(1), 105–128 (2002)

    Article  MATH  Google Scholar 

  40. Nagel, W., Arnold, A., Weber, M., Hoppe, H.C., Solchenbach, K.: Vampir: visualization and analysis of MPI resources. Supercomputer 12(1), 69–80 (1996)

    Google Scholar 

  41. NGS: National Grid Service: (2011). www.grid-support.ac.uk

  42. Pfalzner, S., Gibbon, P.: Many-Body Tree Methods in Physics. Cambridge University Press, Cambridge (1996)

    Book  Google Scholar 

  43. Probert, R.L., Yu, H., Saleh, K.: Relative-clock-based specification and test result analysis of distributed systems. In: Proc. of the 11th Annual International Phoenix Conference on Computers and Communications, Scottsdale, AZ, USA, pp. 687–694. IEEE Press, New York (1992)

    Google Scholar 

  44. Rabenseifner, R.: The controlled logical clock—a global time for trace based software monitoring of parallel applications in workstation clusters. In: Proc. of the 5th Euromicro Workshop on Parallel and Distributed Processing, London, UK, pp. 477–484. IEEE Press, New York (1997)

    Google Scholar 

  45. Rabenseifner, R.: Die geregelte logische Uhr, eine globale Uhr für die tracebasierte Überwachung paralleler Anwendungen. Ph.D. thesis, University of Stuttgart, Stuttgart (2000)

  46. Rodriguez, G., Badia, R.M., Labarta, J.: Generation of simple analytical models for message passing applications. In: Proc. of the European Conference on Parallel Computing, Pisa, Italy. LNCS, vol. 3149, pp. 183–188. Springer, Berlin (2004)

    Google Scholar 

  47. Schwarz, R., Mattern, F.: Detecting causal relationships in distributed computations: in search of the holy grail. Distrib. Comput. 7(3), 149–174 (1994)

    Article  MATH  Google Scholar 

  48. Smarr, L., Catlett, C.E.: Metacomputing. Commun. ACM 35(6), 44–52 (1992)

    Article  Google Scholar 

  49. van Dijk, G.J.V., van der Wal, J.V.D.: Partial ordering of synchronization events for distributed debugging in tightly-coupled multiprocessor systems. In: Proc. of the 2nd European Conference on Distributed Memory Computing, Munich, Germany. LNCS, vol. 487, pp. 100–109. Springer, Berlin (1991)

    Chapter  Google Scholar 

  50. Warren, M.S., Salmon, J.K.: A parallel hashed oct-tree n-body algorithm. In: Proc. of the Conference on High Performance Networking and Computing, Portland, OR, USA, pp. 12–21. ACM Press, New York (1993). doi:10.1145/169627.169640

    Google Scholar 

  51. Wong, A.K.L., Goscinski, A.M.: Using an enterprise grid for execution of MPI parallel applications—a case study. In: Proc. of the 13th European PVM/MPI Users’ Group Meeting, Bonn, Germany. LNCS, vol. 4192. Springer, Berlin (2006)

    Google Scholar 

  52. Yang, Z., Marsland, T.A.: Annotated bibliography on global states and times in distributed systems. Oper. Syst. Rev. 27(3), 55–74 (1993)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Daniel Becker.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Becker, D., Geimer, M., Rabenseifner, R. et al. Extending the scope of the controlled logical clock. Cluster Comput 16, 171–189 (2013). https://doi.org/10.1007/s10586-011-0181-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10586-011-0181-8

Keywords