Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article

Detecting causal relationships in distributed computations: in search of the holy grail

Published: 01 March 1994 Publication History

Abstract

The paper shows that characterizing the causal relationship between significant events is an important but non-trivial aspect for understanding the behavior of distributed programs. An introduction to the notion of causality and its relation to logical time is given; some fundamental results concerning the characterization of causality are presented. Recent work on the detection of causal relationships in distributed computations is surveyed. The issue of observing distributed computations in a causally consistent way and the basic problems of detecting global predicates are discussed. To illustrate the major difficulties, some typical monitoring and debugging approaches are assessed, and it is demonstrated how their feasibility is severely limited by the fundamental problem to master the complexity of causal relationships.

References

[1]
1. Acharya A, Badrinath BR: Recording distributed snapshots based on causal order of message delivery. Inf Process Lett 44: 317-321 (1992).]]
[2]
2. Ahamad M, Hutto PW, John R: Implementing and programming causal distributed shared memory. Proc 11th Int Conference on Distributed Computing Systems, Arlington, Texas, pp 274-281, 1991.]]
[3]
3. Ahuja M, Carlson T, Gahlot A, Shands D: Timestamping events for inferring 'affects' relation and potential causality. Proc 15th IEEE Int Computer Software and Application Conference COMPSAC 91, Tokyo, pp 606-611, 1991.]]
[4]
4. Baldy P, Dicky H, Medina R, Morvan M, Vilarem JF: Efficient reconstruction of the causal relationship in distributed computations. Technical Report 92-013, Laboratoire d'Informatique, de Robotique et de Microélectronique de Montpellier, Montpellier 1992.]]
[5]
5. Bates PC, Wileden JC: High-level debugging of distributed systems: the behavioral abstraction approach. J Syst Softw 4(3): 255-264 (1983).]]
[6]
6. Birman K, Joseph T: Exploiting virtual synchrony in distributed systems. Operating Syst Rev 22(1): 123-138 (1987).]]
[7]
7. Birman K: The process group approach to reliable distributed computing. Technical Report, Computer Science Department, Cornell University, Ithaca, New York 1991.]]
[8]
8. Birman K, Schiper A, Stephenson P: Lightweight causal and atomic group multicast. ACM Trans Comput Syst 9(3): 272-314 (1991).]]
[9]
9. Bruegge B, Hibbard P: Generalized Path Expressions. J Syst Softw 2(2): 265-276 (1983).]]
[10]
10. Chandy KM, Lamport L: Distributed snapshots: determining global states of distributed systems. ACM Trans Comput Syst 3(1): 63-75 (1985).]]
[11]
11. Charron-Bost B: Combinatorics and geometry of consistent cuts: application to concurrency theory. In: Bermond JC, Raynal M (eds) Distributed algorithms. LNCS, vol 392. Springer, Berlin Heidelberg New York 1989, pp 45-56.]]
[12]
12. Charron-Bost B: Concerning the size of logical clocks in distributed systems. Inf Process Lett 39: 11-16 (1991).]]
[13]
13. Charron-Bost B, Delporte-Gallet C, Fauconnier H: Local and temporal predicates in distributed systems. Technical Report, LITP, IBP, Université Paris 7, Paris 1992.]]
[14]
14. Charron-Bost B, Mattern F, Tel G: Synchronous, asynchronous, and causally ordered communication. Distrib Comput (to appear).]]
[15]
15. Cooper R, Marzullo K: Consistent detection of global predicates. Proc ACM/ONR Workshop on Parallel and Distributed Debugging, Santa Cruz, California 1991, pp 163-173.]]
[16]
16. Diehl C, Jard C: Interval approximations and message causality in distributed systems. In: Finkel A, Jantzen M (eds) Proc of the 9th Annual Symposium on Theoretical Aspects of Computer Science STACS '92, LNCS, vol 577. Springer, Berlin Heidelberg New York 1992, pp 363-374.]]
[17]
17. Fidge CJ: Timestamps in message-passing systems that preserve the partial ordering. Proc 11th Australian Computer Science Conference, University of Queensland, pp 55-66, 1988.]]
[18]
18. Fidge CJ: Partial orders for parallel debugging. ACM SIGPLAN Notices 24(1): 183-194 (1989).]]
[19]
19. Fidge CJ: Dynamic analysis of event orderings in message passing systems. PhD Thesis, Department of Computer Science, Australian National University, Canberra 1989.]]
[20]
20. Fidge CJ: Logical time in distributed computing systems. IEEE Comput 24(8): 28-33 (1991).]]
[21]
21. Fischer MJ, Michael A: Sacrificing serializability to attain high availability of data in an unreliable network. Proc ACM SIGACT-SIGOPS Symposium on Principles of Database Systems, pp 70-75, 1982.]]
[22]
22. Fishburn PC: Interval orders and interval graphs. Wiley, New York 1985.]]
[23]
23. Floyd RW: Algorithm 97, shortest path. Commun ACM 5: 345 (1962).]]
[24]
24. Fowler J, Zwaenepoel W: Causal distributed breakpoints. Proc 10th Int Conference on Distributed Computing Systems, Paris, pp 134-141, 1990.]]
[25]
25. Gait J: A probe effect in concurrent programs. Softw Pract Exper 16(3): 225-233 (1986).]]
[26]
26. Garg VK, Waldecker B: Detection of unstable predicates in distributed programs. In: Shyamasundra R (ed) Proc 12th Conference on the Foundation of Software Technology and Theoretical Computer Science, LNCS, vol 652. Springer, Berlin Heidelberg New York 1992, pp 253-264.]]
[27]
27. Haban D, Weigel W: Global events and global breakpoints in distributed systems. Proc 21st Annual Hawaii Int Conference on System Sciences, pp 166-175, 1988.]]
[28]
28. Haban D, Zhou S, Maurer D, Wilhelm R: Specification and detection of global breakpoints in distributed systems. Technical Report SFB124-08/1991, Universität des Saarlandes, Saarbrücken 1991.]]
[29]
29. Harel D, Pnueli A: On the development of reactive systems. In: Apt K (ed) Logics and models of concurrent systems, NATO ASI Series F, vol 13. Springer, Berlin Heidelberg New York 1985, pp 477-498.]]
[30]
30. Hseush W, Kaiser GE: Modeling concurrency in parallel debugging. ACM SIGPLAN Notices 25(3): 11-20 (1990).]]
[31]
31. Hutto P, Ahamad M: Slow memory: weakening consistency to enhance concurrency in distributed shared memories. Proc 10th Int Conference on Distributed Computing Systems, Paris, pp 302-309, 1990.]]
[32]
32. Johnson DB, Zwaenepoel W: Recovery in distributed systems using optimistic message logging and checkpointing. J Algorithms 11(3): 462-491 (1990).]]
[33]
33. Katz S, Peled D: Interleaving set temporal logic. Theor Comput Sci 75: 263-287 (1990).]]
[34]
34. Katz S, Peled D: Verification of distributed programs using representative interleaving sequences. Distrib Comput 6: 107-120 (1992).]]
[35]
35. Kearns JP, Koodalattupuram B: Immediate ordered service in distributed systems. Proc 9th Int Conference on Distributed Computing Systems, Newport Beach, California, 611-618, 1989.]]
[36]
36. Lamport L: Time, clocks, and the ordering of events in a distributed system. Commun ACM 21(7): 558-565 (1978).]]
[37]
37. LeBlanc TJ, Mellor-Crummey JM: Debugging parallel programs with instant replay. IEEE Trans Comput 36(4): 471-482 (1987).]]
[38]
38. Leu E, Schiper A, Zramdini A: Efficient execution replay for distributed memory architectures. In: Bode A (ed) Proc 2nd European Distributed Memory Computing Conference, Munich, Germany, LNCS, vol 487. Springer, Berlin Heidelberg New York 1991, pp 315-324.]]
[39]
39. Manabe Y, Imase M: Global conditions in debugging distributed programs. J Parallel Distrib Comput 15: 62-69 (1992).]]
[40]
40. Manna Z, Pnueli A: The temporal logic of reactive and concurrent systems. Springer, Berlin Heidelberg New York 1992.]]
[41]
41. Marzullo K, Neiger G: Detection of global state predicates. In: Toueg S, Spirakis PG, Kirousis L (eds) Proc 5th Workshop on Distributed Algorithms (WDAG-91), Delphi, Greece, LNCS, vol 579. Springer, Berlin Heidelberg New York 1991, pp 254-272.]]
[42]
42. Masuzawa T, Tokura N: A causal distributed breakpoint algorithm for distributed debugger. Proc ICICE Fall Conf (SD-1-8) 6: 373-374 (1992).]]
[43]
43. Mattern F: Algorithms for distributed termination detection. Distrib Comput 2: 161-175 (1987).]]
[44]
44. Mattern F: Virtual time and global states in distributed systems. In: Cosnard M et al. (eds) Proc Workshop on Parallel and Distributed Algorithms, Chateau de Bonas, Oct. 1988, Elsevier, North Holland, 1989, pp 215-226.]]
[45]
45. Mattern F: Efficient algorithms for distributed snapshots and global virtual time approximation. J Parallel Distrib Comput 18: 423-434 (1993).]]
[46]
46. Medina R: Incremental garbage collection of causal relationship computation in distributed systems. Proc 5th IEEE Symposium on Parallel and Distributed Processing, Irving, Texas, 1993.]]
[47]
47. Meldal S, Sankar S, Vera J: Exploiting locality in maintaining potential causality. Proc 10th Annual ACM Symposium on Principles of Distributed Computing, Montreal, Canada, pp 231-239, 1991.]]
[48]
48. Miller BP, Choi JD: Breakpoints and halting in distributed programs. Proc 8th Int Conference on Distributed Computing Systems, pp 316-323, 1988.]]
[49]
49. Minkowski H: Raum und Zeit. In: Lorentz HA, Einstein A, Minkowski H: Das Relativitätsprinzip. Eine Sammlung von Abhandlungen. Teubner, Leipzig 1915, pp 56-68.]]
[50]
50. Netzer RHB, Miller BP: Optimal tracing and replay for debugging message-passing parallel programs. Proc Supercomputing '92, Minneapolis, pp 502-511, 1992.]]
[51]
51. Nielsen M, Plotkin G, Winskel G: Petri nets, event structures and domains, part I. Theor Comput Sci 13: 85-108 (1981).]]
[52]
52. Ochmanski E: Inevitability in concurrent systems. Inf Process Lett 25: 221-225 (1987).]]
[53]
53. Panangaden P, Taylor K: Concurrent common knowledge: defining agreement for asynchronous systems. Distrib Comput 6(2): 73-93 (1992).]]
[54]
54. Ponamgi MK, Hseush W, Kaiser GE: Debugging multithreaded programs with MPD. IEEE Software, 8(3): 37-43 (1991).]]
[55]
55. Pratt V: Modeling concurrency with partial orders. Int J Parallel Program 15(1): 33-71 (1986).]]
[56]
56. Pratt V: Modeling concurrency with geometry. Proc 18th Annual Symposium on Principles of Programming Languages (POPL-91), pp 311-322, 1991.]]
[57]
57. Pratt V: Arithmetic + Logic + Geometry = Concurrency. In: Simon I (ed) Proc LATIN '92, LNCS, vol 583. Springer, Berlin Heidelberg New York 1992, pp 430-447.]]
[58]
58. Raynal M, Schiper A, Toueg S: The causal ordering abstraction and a simple way to implement it. Inf Process Lett 39: 343-350 (1991).]]
[59]
59. Reisig W: A strong part of concurrency. In: Rozenberg G (ed) Advances in Petri nets, LNCS, vol 266. Springer, Berlin Heidelberg New York 1987, pp 238-272.]]
[60]
60. Reisig W: Temporal logic and causality in concurrent systems. In: Vogt FH (ed) Proc Concurrency '88, LNCS, vol 335. Springer, Berlin Heidelberg New York 1988, pp 121-139.]]
[61]
61. Reisig W: Parallel composition of liveness. Technical Report SFB342/30/91A, Technische Universität München, München 1991.]]
[62]
62. van Renesse R: Causal controversy at Le Mont St.-Michel. ACM Operating Syst Rev 27(2): 44-53 (1993).]]
[63]
63. Sandoz A, Schiper A: A characterization of consistent distributed snapshots using causal order. Technical Report 92-14, Département d'Informatique, Ecole Polytechnique Fédérale de Lausanne, Lausanne 1992.]]
[64]
64. Schiper A, Eggli J, Sandoz A: A new algorithm to implement causal ordering. In: Bermond JC, Raynal M (eds) Proc Workshop on Distributed Algorithms, Nice, France, LNCS, vol 392. Springer, Berlin Heidelberg New York 1989, pp 219-232.]]
[65]
65. Singhal M, Kshemkalyani A: An efficient implementation of vector clocks. Inf Process Lett 43: 47-52 (1992).]]
[66]
66. Spezialetti M, Kearns JP: Simultaneous regions: a framework for the consistent monitoring of distributed systems. Proc 9th Int Conference on Distributed Computing Systems, Newport Beach, California, pp 61-68, 1989.]]
[67]
67. Stone JM: Visualizing concurrent processes. Technical Report RC 12973, IBM TJ Watson Research Center 1987.]]
[68]
68. Stone JM: Debugging concurrent processes: a case study. Proc SIGPLAN Conf on Programming Language Design and Implementation, Atlanta, Georgia, pp 145-153, 1988.]]
[69]
69. Stone JM: A graphical representation of concurrent processes. ACM SIGPLAN Notices 24(1): 226-235 (1989).]]
[70]
70. Strom R, Yemini S: Optimistic recovery in distributed systems. ACM Trans Comput Syst 3(3): 204-226 (1985).]]
[71]
71. Szpilrajn E: Sur l'extension de l'ordre partiel. Fund Math 16: 386-389 (1930).]]
[72]
72. Warshall S: A theorem on Boolean matrices. J ACM 9: 11-12 (1962).]]
[73]
73. Winskel G: An introduction to event structures. In: de Bakker JW, de Roever WP, Rozenberg G (eds) Proc Workshop on Linear Time, Branching Time and Partial Order in Logics and Models for Concurrency, Noordwijkerhout, The Netherlands, LNCS, vol 354. Springer, Berlin Heidelberg New York 1988, pp 364-397.]]
[74]
74. Wuu GTJ, Bernstein AJ: Efficient solutions to the replicated log and dictionary problems. Proc ACM Symposium on Principles of Distributed Computing, pp 233-242, 1984.]]

Cited By

View all
  • (2024)RMASanitizer: Generalized Runtime Detection of Data Races in Remote Memory Access ApplicationsProceedings of the 53rd International Conference on Parallel Processing10.1145/3673038.3673109(833-844)Online publication date: 12-Aug-2024
  • (2024)Logical Clocks and Monotonicity for Byzantine-Tolerant Replicated Data TypesProceedings of the 11th Workshop on Principles and Practice of Consistency for Distributed Data10.1145/3642976.3653034(37-43)Online publication date: 22-Apr-2024
  • (2023)Message Chains for Distributed System VerificationProceedings of the ACM on Programming Languages10.1145/36228767:OOPSLA2(2224-2250)Online publication date: 16-Oct-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Distributed Computing
Distributed Computing  Volume 7, Issue 3
March 1994
57 pages

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 01 March 1994

Author Tags

  1. causal ordering
  2. causality
  3. distributed computation
  4. distributed debugging
  5. distributed system
  6. global predicate detection
  7. logical time
  8. timestamps
  9. vector time

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 15 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2024)RMASanitizer: Generalized Runtime Detection of Data Races in Remote Memory Access ApplicationsProceedings of the 53rd International Conference on Parallel Processing10.1145/3673038.3673109(833-844)Online publication date: 12-Aug-2024
  • (2024)Logical Clocks and Monotonicity for Byzantine-Tolerant Replicated Data TypesProceedings of the 11th Workshop on Principles and Practice of Consistency for Distributed Data10.1145/3642976.3653034(37-43)Online publication date: 22-Apr-2024
  • (2023)Message Chains for Distributed System VerificationProceedings of the ACM on Programming Languages10.1145/36228767:OOPSLA2(2224-2250)Online publication date: 16-Oct-2023
  • (2023)Consistency vs. Availability in Distributed Cyber-Physical SystemsACM Transactions on Embedded Computing Systems10.1145/360911922:5s(1-24)Online publication date: 31-Oct-2023
  • (2023)Brief Announcement: Byzantine-Tolerant Detection of Causality in Synchronous SystemsStabilization, Safety, and Security of Distributed Systems10.1007/978-3-031-44274-2_5(57-61)Online publication date: 2-Oct-2023
  • (2022)A tree clock data structure for causal orderings in concurrent executionsProceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3503222.3507734(710-725)Online publication date: 28-Feb-2022
  • (2021)variED: an editor for collaborative, real-time feature modelingEmpirical Software Engineering10.1007/s10664-020-09892-x26:2Online publication date: 1-Mar-2021
  • (2021)The Bloom Clock for Causality TestingDistributed Computing and Internet Technology10.1007/978-3-030-65621-8_1(3-23)Online publication date: 7-Jan-2021
  • (2020)Efficient reordering and replay of execution traces of distributed reactive systems in the context of model-driven developmentProceedings of the 23rd ACM/IEEE International Conference on Model Driven Engineering Languages and Systems10.1145/3365438.3410939(285-296)Online publication date: 16-Oct-2020
  • (2019)Foundations of Collaborative, Real-Time Feature ModelingProceedings of the 23rd International Systems and Software Product Line Conference - Volume A10.1145/3336294.3336308(257-264)Online publication date: 9-Sep-2019
  • Show More Cited By

View Options

View options

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media