Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article

Specifying and using a partitionable group communication service

Published: 01 May 2001 Publication History
  • Get Citation Alerts
  • Abstract

    Group communication services are becoming accepted as effective building blocks for the construction of fault-tolerant distributed applications. Many specifications for group communication services have been proposed. However, there is still no agreement about what these specifications should say, especially in cases where the services are partitionable, i.e., where communication failures may lead to simultaneous creation of groups with disjoint memberships, such that each group is unware of the existence of any other group. In this paper, we present a new, succinct specification for a view-oriented partitionable group communication service. The service associates each message with a particular view of the group membership. All send and receive events for a message occur within the associated view. The service provides a total order on the messages within each view, and each processor receives a prefix of this order. Our specification separates safety requirements from performance and fault-tolerance requirements. The safety requirements are expressed by an abstract, global state machine. To present the performance and fault-tolerance requirements, we include failure-status input actions in the specification; we then give properties saying that consensus on the view and timely message delivery are guaranteed in an execution provided that the execution stabilizes to a situation in which the failure-status stops changing and corresponds to consistently partioned system. Because consensus is not required in every execution, the specification is not subject to the existing impossibility results for partionable systems. Our specification has a simple implementation, based on the membership algorithm of Christian and Schmuck. We show the utility of the specification by constructing an ordered-broadcast application, using an algorithm (based on algorithms of Amir, Dolev, Keidar, and others) that reconciles information derived from different instantiations of the group. The application manages the view-change activity to build a shared sequence of messages, i.e., the per-view total orders of the group service are combined to give a universal total order. We prove the correctness and analyze the performance and fault-tolerance of the resulting application.

    References

    [1]
    AMIR, Y., CHOKLER,G.V.,DOLEV, D., AND VITENBERG, R. 1997. Efficient state transfer in partitionable environments. In Proceedings of 2nd European Research Seminar on Advances in Distributed Systems (ERSADS'97, Zinal, Switzerland, Mar.). 183-192.]]
    [2]
    AMIR, Y., DOLEV, D., KRAMER, S., AND MALKI, D. 1992. Transis: A communication subsystem for high availability. In Proceedings of the 22nd IEEE Symposium on Fault-Tolerant Computing (FTCS, Boston, MA, July). IEEE Press, Piscataway, NJ, 76-84.]]
    [3]
    AMIR, Y., DOLEV, D., MELLIAR-SMITH, P., AND MOSER, L. 1994. Robust and efficient replication using group communication. 94-20.]]
    [4]
    AMIR, Y., MOSER, L., MELLIAR-SMITH, P., AGRAWAL, D., AND CIARFELLA, P. 1993. Fast message ordering and membership using a logical token-passing ring. In Proceedings of 13th IEEE International Conference on Distributed Computing Systems (May). IEEE Press, Piscat-away, NJ, 551-560.]]
    [5]
    AMIR, Y., MOSER, L., MELLIAR-SMITH, P., AGRAWAL, D., AND CIARFELLA, P. 1995. The Totem single-ring ordering and membership protocol. ACM Trans. Comput. Syst. 13, 4 (Nov.), 311-342.]]
    [6]
    BABAOGLU, O., DAVOLI, R., GIACHINI, L., AND BAKER, M. 1995a. Relacs: A communication infrastructure for constructing reliable applications in large-scale distributed systems. In Proceedings of Hawaii International Conference on Computer and System Science. 612-621.]]
    [7]
    BABAOGLU, O., DAVOLI, R., AND MONTRESOR, A. 1995b. Failure detectors, group membership and view-synchronous communication in partitionable asynchronous systems. UBLCS-95-18.]]
    [8]
    BABAOGLU, O., DAVOLI, R., GIACHINI, L., AND SABATTINI, P. 1995c. The inherent cost of strong-partial view synchronous communication. In Proceedings of the Workshop on Distributed Algorithms on Graphs. 72-86.]]
    [9]
    BABAOGLU, O., DAVOLI, R., AND MONTRESOR, A. 1998. Group communication in partitionable systems: Specification and algorithms. UBLCS 98-01.]]
    [10]
    BIRMAN, K. P. 1996. Building Reliable and Secure Network Applications. Prentice-Hall, New York, NY.]]
    [11]
    BIRMAN, K. P. 1999. A review of experiences with reliable multicast. Softw. Pract. Exper. 29,9.]]
    [12]
    BIRMAN,K.P.AND VAN RENESSE, R. 1994. Reliable Distributed Computing with the Isis Toolkit. IEEE Computer Society Press, Los Alamitos, CA.]]
    [13]
    BIRMAN, K., SCHIPER, A., AND STEPHENSON, P. 1991. Lightweight causal and atomic group multicast. ACM Trans. Comput. Syst. 9, 3 (Aug.), 272-314.]]
    [14]
    CHANDRA,T.D.,HADZILACOS, V., TOUEG, S., AND CHARRON-BOST, B. 1996. On the impossibility of group membership. In Proceedings of the 15th Annual ACM Symposium on Principles of Distributed Computing (PODC '96, Philadelphia, PA, May 23-26), J. E. Burns and Y. Moses, Chairs. ACM Press, New York, NY, 322-330.]]
    [15]
    CHEINER,O.AND SHVARTSMAN, A. A. 1999. Implementing and evaluating an eventually-serializable data service as a distributed system building block. In Networks in Distributed Computing. DIMACS Series on Discrete Mathematics and Theoretical Computer Science, vol. 45. 43-71.]]
    [16]
    COHEN, J., ED. 1996. Commun. ACM. 39, 4 (Apr.).]]
    [17]
    CRISTIAN, F. 1996a. Synchronous and asynchronous group communication. Commun. ACM 39, 4 (Apr.), 88-97.]]
    [18]
    CRISTIAN, F. 1996b. Group, majority and strict agreement in timed asynchronous distributed systems. In Proceedings of the 26th Conference on Fault-Tolerant Computer Systems. 178-187.]]
    [19]
    CRISTIAN,F.AND SCHMUCK, F. 1995. Agreeing on processor group membership in asynchronous distributed systems. CSE95-428.]]
    [20]
    DE PRISCO, R., FEKETE, A., LYNCH, N., AND SHVARTSMAN, A. 1998. A dynamic view-oriented group communication service. In Proceedings of the 17th ACM Symposium on Principle of Distributed Computing (PODC, Puerto Vallarta, Mexico). 227-236.]]
    [21]
    DE PRISCO, R., FEKETE, A., LYNCH, N., AND SHVARTSMAN, A. A. 1999. Dynamic primary configuration group communication service. In Proceedings of the 13th International Conference on Distributed Computing (DISC).]]
    [22]
    DOLEV,D.AND MALKI, D. 1996. The Transis approach to high availability cluster communication. Commun. ACM 39, 4 (Apr.), 63-70.]]
    [23]
    DOLEV, D., MALKI, D., AND STRONG, R. 1994. A framework for partitionable membership service. TR94-6.]]
    [24]
    DOLEV, S., SEGALA, R., AND SHVARTSMAN, A. 1999. Dynamic load balancing with group communication. In Proceedings of the Sixth International Colloquium on Structural Information and Communication Complexity.]]
    [25]
    EZHILCHELVAN,P.D.,MACEDO,R.A.,AND SHRIVASTAVA, S. K. 1995. Newtop: A fault-tolerant group communication protocol. In Proceedings of the 15th IEEE International Conference on Distributed Computing Systems on Distributed Computing Systems (Vancouver, Canada, May/June). IEEE Computer Society Press, Los Alamitos, CA, 296-306.]]
    [26]
    FEKETE, A., KAASHOEK, F., AND LYNCH, N. 1995. Providing sequentially-consistent shared objects using group and point-to-point communication. In Proceedings of the IEEE International Conference on Distributed Computer Systems. 439-449.]]
    [27]
    FEKETE, A., KHAZAN, R., AND LYNCH, N. 1998. Group communication as a base for a load-balancing, replicated data service. In Proceedings of the 12th International Symposium on Distributed Computing (DISC, Sept.).]]
    [28]
    FRIEDMAN,R.AND VAN RENESSE, R. 1995. Strong and weak virtual synchrony in Horus. Tech Rep. TR-95-1537. Department of Computer Science, Cornell University, Ithaca, NY.]]
    [29]
    FRIEDMAN,R.AND VAYSBURG, A. 1997. Fast replicated state machines over partitionable networks. In Proceedings of the 16th Symposium on Reliable Distributed Systems (SRDS, Durham, NC, Oct.).]]
    [30]
    HAYDEN, M. 1998. The ensemble system. Ph.D. Dissertation.]]
    [31]
    HAYDEN,M.AND VAN RENESSE, R. 1996. Optimizing layered communication protocols. TR96-1613.]]
    [32]
    HICKEY, J., LYNCH, N., AND VAN RENESSE, R. 1999. Specifications and proofs of ensemble layers. In Proceedings of the Conference on Tools and Algorithms for the Construction and Analysis of Systems (TACAS'99).]]
    [33]
    HILTUNEN,M.AND SCHLICHTING, R. 1995. Properties of membership services. In Proceedings of the 2nd IEEE Symposium on Autonomous Decentralized Systems (Phoenix, AZ, Apr.). IEEE Press, Piscataway, NJ, 200-207.]]
    [34]
    JAHANIAN, F., FAKHOURI, S., AND RAJKUMAR, R. 1993. Processor group membership protocols: Specification, design and implementation. In Proceedings of the 12th IEEE Symposium on Reliable Distributed Systems (Princeton, NJ, Oct.). IEEE Press, Piscataway, NJ, 2-11.]]
    [35]
    KEIDAR, I. 1994. A highly available paradigm for consistent object replication. Master's Thesis. See also TR CS95-5 available at http://www.cs.huji.ac.il/ztransis/publications.html.]]
    [36]
    KEIDAR,I.AND DOLEV, D. 1996. Efficient message ordering in dynamic networks. In Proceedings of the 15th Annual ACM Symposium on Principles of Distributed Computing (PODC '96, Philadelphia, PA, May 23-26), J. E. Burns and Y. Moses, Chairs. ACM Press, New York, NY, 68-76.]]
    [37]
    LAMPORT, L. 1978. Time, clocks, and the ordering of events in a distributed system. Commun. ACM 21, 7, 558-565.]]
    [38]
    LYNCH, N. 1996. Distributed Algorithms. Morgan Kaufmann, San Mateo, CA.]]
    [39]
    LYNCH,N.A.AND TUTTLE, M. R. 1989. An introduction to input/output automata. CWI Q. 2, 3, 219-246.]]
    [40]
    LYNCH,N.AND VAANDRAGER, F. 1995. Forward and backward simulations I.: untimed systems. Inf. Comput. 121, 2 (Sept.), 214-233.]]
    [41]
    LYNCH,N.AND VAANDRAGER, F. 1996. Forward and backward simulations II.: Timing-based systems. Inf. Comput. 128, 1, 1-25.]]
    [42]
    MALLOTH,C.AND SCHIPER, A. 1995. View synchronous communication in large scale networks. In Proceedings of the 2nd Open Workshop on ESPRIT Project BROADCAST (July).]]
    [43]
    MISHRA, S., PETERSON,L.L.,AND SCHLICHTING, R. L. 1991. Consul: A communication substrate for fault-tolerant distributed programs. TR 91-32.]]
    [44]
    MONTRESOR, A., DAVOLI, R., AND BABAOGLU, O. 1999. Group-enhanced remote method invocations. UBLCS 99-05.]]
    [45]
    MOSER,L.E.,AMIR, Y., MELLIAR-SMITH,P.M.,AND AGARWAL, D. A. 1994. Extended virtual synchrony. In Proceedings of the 14th IEEE International Conference on Distributed Computing Systems (ICDCS '94, Poznan, Poland, June). IEEE Computer Society Press, Los Alamitos, CA, 56-65.]]
    [46]
    MOSER,L.E.,MELLIAR-SMITH,P.M.,AGARWAL,D.A.,BUDHIA,R.K.,AND LINGLEY-PAPADOPOULOS, C. A. 1996. Totem: a fault-tolerant multicast group communication system. Commun. ACM 39, 4 (Apr.), 54-63.]]
    [47]
    NEIGER, G. 1996. A new look at membership services (extended abstract). In Proceedings of the 15th Annual ACM Symposium on Principles of Distributed Computing (PODC '96, Philadelphia, PA, May 23-26), J. E. Burns and Y. Moses, Chairs. ACM Press, New York, NY, 331-340.]]
    [48]
    RICCIARDI, A. 1992. The group membership problem in asynchronous systems. TR92-1313.]]
    [49]
    RICCIARDI, A., SCHIPER, A., AND BIRMAN, K. 1993. Understanding partitions and the no partitions assumption. TR93-1355.]]
    [50]
    SCHNEIDER, F. B. 1990. Implementing fault-tolerant services using the state machine approach: A tutorial. ACM Comput. Surv. 22, 4 (Dec.), 299-319.]]
    [51]
    VAN RENESSE, R., BIRMAN, K., HAYDEN, M., VAYSBURD, A., AND KARR, D. 1998. Building adaptive systems using ensemble. Softw. Pract. Exper. 28, 9, 963-979.]]
    [52]
    VAN RENESSE, R., BIRMAN,K.P.,AND MAFFEIS, S. 1996. Horus: A flexible group communication system. Commun. ACM 39, 4 (Apr.), 76-83.]]
    [53]
    VITENBERG, R., KEIDAR, I., CHOCKLER,G.V.,AND DOLEV, D. 1999. Group communication specifications: A comprehensive study. Tech. Rep. MIT-LCS-TR-790. MIT Laboratory for Computer Science, Cambridge, MA. http://theory.lcs.mit.edu/idish/ftp/gcs-survey-tr.ps.]]
    [54]
    WHETTEN, B., MONTGOMERY, T., AND KAPLAN, S. 1995. A high performance totally ordered multicast protocol. In Theory and Practice in Distributed Systems, K. Birman, F. Mattern, and A. Schiper, Eds. Springer-Verlag, Berlin, Germany, 33-57.]]

    Cited By

    View all
    • (2021)Consistent Distributed StorageSynthesis Lectures on Distributed Computing Theory10.2200/S01069ED1V01Y202012DCT01720:1(1-192)Online publication date: 28-Jun-2021
    • (2019)Dione: A Protocol Verification System Built with Dafny for I/O AutomataIntegrated Formal Methods10.1007/978-3-030-34968-4_13(227-245)Online publication date: 22-Nov-2019
    • (2018)Self-Stabilizing Supervised Publish-Subscribe Systems2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS.2018.00114(1050-1059)Online publication date: May-2018
    • Show More Cited By

    Recommendations

    Reviews

    Ashoke Deb

    A group communication service is a building block for development of practical distributed systems where processes located at different nodes of the system operate collectively as a group using a communication service to multicast messages to all members of the group. Such services are said to be partitionable if the communication failure may lead to simultaneously creating groups of disjoint membership such that each group is unaware of the existence of any other group. In such a partitionable system, each process has a unique view of the membership of the group, and this view can alter from time to time. In a view-synchronous system, the processes that proceed together through two consecutive views deliver the same set of messages between these two views, and if a particular message is delivered to several processes, all have the same view of the membership when the message is delivered. In this paper, the authors present VS, a new and succinct way of specifying view-synchronous partitionable group communication services, and demonstrate its effectiveness using an example of totally ordered broadcast communication service achieving a sequentially consistent memory. Although their specification does not describe all the possible properties that may be useful in a particular implementation, it does deal with the properties that are needed for ordered broadcast applications. VS deals with, but separates, the issues of safety requirements, and performance and fault-tolerance requirements. The safety requirements are formulated in terms of an abstract global input-output state machine; the performance and fault-tolerance requirements are expressed as a collection of properties that must hold during execution of the service, expressed in precise natural language requiring operational reasoning. VS has a number of interesting properties, including the fact that it is not subject the impossibility results that afflict some other existing schemes. Online Computing Reviews Service

    Access critical reviews of Computing literature here

    Become a reviewer for Computing Reviews.

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Computer Systems
    ACM Transactions on Computer Systems  Volume 19, Issue 2
    May 2001
    171 pages
    ISSN:0734-2071
    EISSN:1557-7333
    DOI:10.1145/377769
    Issue’s Table of Contents

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 01 May 2001
    Published in TOCS Volume 19, Issue 2

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. composable building blocks
    2. conditional performance analysis
    3. distributed algorithms
    4. group communication protocols
    5. message-passing protocols
    6. ordered broadcast
    7. service specification
    8. total-order broadcast

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)16
    • Downloads (Last 6 weeks)3

    Other Metrics

    Citations

    Cited By

    View all

    View Options

    Get Access

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media